# Data Analysis with Machine Learners
## Table of Contents
- Overview
- Import dataset
- Exploratory Data Analysis
- Conclusions

## Overview
In this notebook I will perform Data Analysis on [2021 Kaggle Machine Learning & Data Science Survey dataset](https://www.kaggle.com/c/kaggle-survey-2021). So we can discover some insights on Machine Learning Industry. We can learn following information:
* What are basic informations of Machine Learners?
* What tools / skills do Machine Learners use or pay attention to? 
* Where do Machine Learners start Machine Learning Journey from?
* What Platforms do Machine Learners Share notebooks and idea?
* What do they do in their job?

## Import dataset

In [None]:
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
data = pd.read_csv("/kaggle/input/kaggle-survey-2021/kaggle_survey_2021_responses.csv")
questions = data.iloc[0]
columns = questions.index
question_values = list(questions)
data = data[1:]
data.head()

In [None]:
def visualize(data, key, kind="bar", max_count=None):
    value_counts = data[key].value_counts()
    keys = [item.split("(")[0] for item in value_counts.index]
    df = pd.DataFrame({key: keys, "count": list(value_counts)})
    if max_count is not None and len(value_counts) > max_count:
        df_others = df[max_count + 1:]
        df = df[:max_count].copy()
        df_others_sum = df_others["count"].sum()
        if "Other" in list(df_others[key]):
            df.append({key: "Other", "count": df_others_sum})
        else:
            index =keys.index("Other")
            df.loc[index, "count"] += df_others_sum
        df.sort_values(by="count", ascending=False, inplace=True)
    if kind == "bar":
        sns.barplot(x="count", y=key, data=df, orient = 'h')
    if kind == "pie":
        plt.pie(df["count"], labels=df[key], startangle = 90, counterclock = False)
        plt.legend(loc='upper right', bbox_to_anchor=(1.2, 0.2))
    plt.show()

In [None]:
def visualize_user_counts(key, begin, end, title=""):
    keys = []
    user_counts = []
    for i in range(begin, end):
        values = list(data[data.columns[i]].unique())
        for value in values:
            if type(value) == str:
                keys.append(value)
        user_count = data[data[data.columns[i]] == keys[-1]].shape[0]
        user_counts.append(user_count)
    keys = [item.split("(")[0] for item in keys]
    statics_info = pd.DataFrame({key: keys, "user_count": user_counts})
    statics_info["percentage"] = statics_info["user_count"].apply(lambda item: item / data.shape[0])
    statics_info.sort_values(ascending=False, by="user_count", inplace=True)
    sns.barplot(x="user_count", y=key, data=statics_info, orient = 'h')
    if len(title) > 0:
        plt.title(title)
    plt.show()
    print(statics_info[[key, "percentage"]])

## Exploratory Data Analysis

### Age Distribution of Machine Learners

In [None]:
visualize(data, "Q1")

### Gender Distribution of Machine Learners

In [None]:
visualize(data, "Q2")

### Where are Machine Learners from?
Top 10 countries are India, US, Japan, China, Brazil, Rusia, Nigeria, UK, Pakistan, Egypt.

In [None]:
visualize(data, "Q3", max_count=20)

### What are Education Backgournds of Machine Learners?
Most Machine Learners' degree are Master and Bachelor. About 12% of people's degree are Doctor or Professional Doctorate. There are still 7% of people who doesn't have college degree. Having a PHD or Master Degree is benificial, but you can also go on with Machile Learning with Bachlor degree or even without a degree.

In [None]:
visualize(data, "Q4")

### What are title of Machine Learners?

In [None]:
visualize(data, "Q5")

### What are Code Experience of Machine Learners?

In [None]:
visualize(data, "Q6")

### What Programming Language do Machine Learners use?
Except for Data Science Skills like Python, SQL, R, Matlab, many people also have C/C++/Java/JavaScript skills. It's not hard to understand, many people from other stacks may want to change to a Data Career, and deploying AI solutions requires backend skills.

In [None]:
visualize_user_counts("Language", 7, 20, "User counts of Programming Languages")

### What are recommanded Data Science skills?
Top 3 Data Science skills are Python, R, SQL. We may also consider learning about  C / C++ / Java, which are also used in deploying AI solutions.

In [None]:
visualize(data, "Q8")

### What IDEs do Machine Learners use?
Most Machine Learners use Jupyter Notbooks / Jupyter Labs. Many Machine Learners also use VS Code, PyCharm, RStudio.

In [None]:
visualize_user_counts("IDE", 21, 34, "Number of Users on IDE")

## What Notebooks do Machine Learners use?

In [None]:
visualize_user_counts("Notebook", 34, 51, "Number of Users on Notebooks")

### What Computer Platform do Machine Learners use?
Most people use Laptops and Desktops, some people use Cloud Computing Platforms, a few pepole even use Deep Learning WorkStations.

In [None]:
visualize(data, "Q11")

### What Specialized Hardwares do Machine Learners use?

In [None]:
visualize_user_counts("Specialized Hardware", 52, 58, "Number of Users on Specialized Hardware")

### What Data Visualization Libraries do Machine Learners use?

In [None]:
visualize_user_counts("Data Visualization Libraries", 59, 71, "Number of Users on Data Visualization Libraries")

### What Frameworks do Machine Learners use?
Most Machine Learners use SKLearn, TensorFlow, Keras, PyTorch, XGBoost.

In [None]:
visualize_user_counts("Machine Learning Frameworks", 72, 90, "Number of Users on Machine Learning Frameworks")

### What ML algorithms do Machine Learners use?

In [None]:
visualize_user_counts("ML algorithms", 90, 102, "Number of Users on  ML algorithms")

### What Computer Vision Methods do Machine Learners use?
Top 3 is Image Classification, Image Segmentation, Object Detection. Many people also use General Purpose image/video tools and Generative Networks.

In [None]:
visualize_user_counts("Computer Vision Methods", 102, 109, "Number of Users on Computer Vision Methods")

### What NLP methods do Machine Learners use?
Top 3 NLP methods are Word embedding, Transformer language Models and Encoder-Decoder Models.

In [None]:
visualize_user_counts("NLP methods", 109, 115, "Number of Users on NLP methods")

### What Industry do Machine Learners from?

In [None]:
visualize(data, data.columns[115])

### What are the size of company of Machine Learners?
Most companies are less that 50 people. But 3000+ companies are large companies with 10000+ emploees.

In [None]:
visualize(data, data.columns[116])

### How many individuals are responsible for data science workloads at Machine Learners' company?

In [None]:
visualize(data, data.columns[117])

### Does Employers use machine learning methods in Business?

In [None]:
visualize(data, data.columns[118])

### Activities that make up an important part of Machine Learners' role at work

In [None]:
visualize_user_counts("Activity", 119, 127, "Activities of Machine Learners")

### how much money have Machine Learners spent on Machine Learning?

In [None]:
visualize(data, data.columns[128])

### What Cloud Computing Platforms do Machine Learners use?

In [None]:
visualize_user_counts("Cloud Computing Platforms", 129, 141, "Number of Users on Cloud Computing Platforms")

### What Cloud Platforms are Machine Learners most enjoyable / frequent to use?

In [None]:
visualize(data, data.columns[141])

### What Cloud Computing Products do Machine Learners use?

In [None]:
visualize_user_counts("Cloud Computing Products", 142, 147, "Number of Users on Cloud Computing Platforms")

### What Data Storage Products do Machine Learners use?

In [None]:
visualize_user_counts("Data Storage Product", 147, 155, "Number of Users on Data Storage Products")

### What Managed Machine Learning Products do Machine Learners use?
Most People don't use Managed Machine Learning Products, but other populars products are Amazon Sage Maker, Azure Machine Learning Studios, Data Bricks.

In [None]:
visualize_user_counts("Managed Machine Learning Products", 155, 165, "Number of Users on Managed Machine Learning Products")

### What Big Data Products do Machine Learners use? 
Most are Popular SQL tools. Some are File Storage products from different cloud platforms.

In [None]:
visualize_user_counts("Big Data Products", 165, 186, "Number of Users on Big Data Products")

### What Big Data Products do Machine Learners use most often? 

In [None]:
visualize(data, data.columns[186])

### What Business Inteligence Tools do Machine Learners use?

In [None]:
visualize_user_counts("Business Inteligence Tools", 187, 204, "Number of Users on Business Inteligence Tools")

### What Business Inteligence Tools do Machine Learners use use Often?

In [None]:
visualize(data, data.columns[204])

### What Machine Learning Tools do Machine Learners use?

In [None]:
visualize_user_counts("Machine Learning Tools", 205, 213, "Number of Users on Machine Learning Tools")

### What Specific AutoML Tools do Machine Learners use?

In [None]:
visualize_user_counts("Specific AutoML Tools", 213, 221, "Number of Users on Specific AutoML Tools")

### What ML Experiment Management tools do Machine Learners use?

In [None]:
visualize_user_counts("ML Experiment Management tools", 222, 234, "Number of Users on ML Experiment Management tools")

### What's Sharing Platform of Machine Learners?
Most People share their data analysis / machine learning applications on Coursera, many people also share them on Github / Kaggle / Colab, many people also don't like to share their code.

In [None]:
visualize_user_counts("Sharing Platform", 234, 243, "Number of Users on Sharing Platform")

### Learning Platform of Machine Learners

In [None]:
visualize_user_counts("Learning Platform", 243, 255, "Number of Users on Learning Platform")

### What Data Analysis Tools do Machine Learners use?

In [None]:
visualize(data, data.columns[255])

### What Social Media do Machine Learners use for Machine Learning?

In [None]:
visualize_user_counts("Social Media", 256, 268, "Number of Users on Social Media")

### Which Cloud Computing Platform would Machine Learners likely to learn?
Most people are likely to learn about AWS/GCP/Azure in the future.

In [None]:
visualize_user_counts("Cloud Computing Platform", 268, 280, "Number of Potential Users on Cloud Computing Platform")

### Which Cloud Computing Products are Machine Learners likely to learn?
A lot of Machine Learners are likely to learn a new Cloud Computing Products in the future.

In [None]:
visualize_user_counts("Cloud Computing Products", 280, 285, "Number of Potential Users on Cloud Computing Products")

### Which Managed Machine Learning products are Machine Learners likely to learn?

In [None]:
visualize_user_counts("Managed Machine Learning products", 293, 303, "Number of Potential Users on Managed Machine Learning products")

### Which Big Data products are Machine Learners likely to learn?

In [None]:
visualize_user_counts("Big Data products", 303, 324, "Number of Potential Users on Big Data products")

### Which Business Intelligence Tools are Machine Learners likely to learn?

In [None]:
visualize_user_counts("Business Intelligence Tools", 324, 341, "Number of Potential Users on Business Intelligence Tools")

### Which Auto ML tools do Machine Learners like to Learn?

In [None]:
 visualize_user_counts("AutoML Tools", 341, 349, "Number of Potential Users on AutoML Tools")

### Which specific Auto ML tools do Machine Learners likely to learn?

In [None]:
visualize_user_counts("Specific AutoML Tools", 349, 357, "Number of Potential Users on Specific AutoML Tools")

### Which ML experiments Managing Tools do Machine Learners likely to learn?

In [None]:
visualize_user_counts("ML experiments Managing Tools", 357, 369, "Number of Potential Users on ML experiments Managing Tools")

## Work in Progress......

### 