# Data Visualization about Japanese Kagglers

* On Kaggle, there are many Japanese kagglers and some have won titles such as <span style="color:coral">Master</span> and <span style="color:#c1ab05">Grandmaster</span>.





* In this notebook, we observe survey responses by people in Japanese by using Plotly.
* If you like, feel free to give an **upvote!**
* This notebook can be fixed or added more elements.

# Import libraries and read csv file

In [None]:
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

In [None]:
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

In [None]:
survey_responses = pd.read_csv('../input/kaggle-survey-2020/kaggle_survey_2020_responses.csv')

# Extracting responses by kagglers in Japan

In [None]:
survey_japanese = survey_responses[1:][survey_responses["Q3"] == "Japan"]

In [None]:
survey_japanese

In [None]:
print("The number of responses by Japanese is " + str(len(survey_japanese)))

# Q1: What is your age?

In [None]:
Ages = survey_japanese["Q1"].value_counts()

In [None]:
fig = px.bar(Ages,labels={"index":"age","value":"Number of people"},text=Ages,title="Age (Q1)") 
fig.show()

# Q2:What is your gender?

In [None]:
gender = survey_japanese["Q2"].value_counts()

In [None]:
fig = px.bar(gender,labels={"index":"Gender","value":"Number of people"},title="Gender (Q2)"
             ,text=gender)
fig.show()

* The number of men is overwhelmingly larger than that of women.

# Q4: What is your highest level of formal education?

In [None]:
edu = survey_japanese["Q4"].value_counts()
edu_un = survey_japanese["Q4"].dropna().unique()
edu_figure = edu.unique()

In [None]:
q4 = pd.DataFrame(edu_figure,columns=["Number"],index=edu_un)
q4

In [None]:
fig = px.pie(q4,values="Number",names=q4.index, title='Formal education level (Q4)',width=900,height=1000)
fig.update_traces(textposition='inside')
fig.show()

In [None]:
fig = px.bar(edu,labels={"index":"Education","value":"Number of people"},
             text=edu,title="Formal education level(Q4)",color_discrete_sequence =['green']*len(edu))
fig.show()

# Q5: Select the title most similar to your current role

In [None]:
roles = survey_japanese["Q5"].value_counts()

In [None]:
fig = px.bar(roles,labels={"index":"Role","value":"Number of people"},text=roles,
             title="Role (Q5)",color_discrete_sequence =['blue']*len(roles))
fig.update_xaxes(tickangle=45)
fig.show()

* There are many students on Kaggle,but it seems that many kagglers enjoy kaggle while working.

# Q6: For how many years have you been writing code and/or programming?

In [None]:
code = survey_japanese["Q6"].value_counts()

In [None]:
fig = px.bar(code,labels={"index":"Codeing Experiences","value":"Number of people"}
             ,text=code,title="Codeing Experiences (Q6)",color_discrete_sequence =['orange']*len(code))
fig.show()

* About 15% of Japanese kagglers started coding in the last year.
* 19 kagglers have never written code, I guess they are about to learn coding.

# Q7: What programming languages do you use on a regular basis?

In [None]:
languages_value = []
for i in range(1,13):
    languages_value.append(np.array(survey_japanese[f"Q7_Part_{i}"].value_counts()))
languages_value.append(np.array(survey_japanese["Q7_OTHER"].value_counts()))

In [None]:
q7 = pd.DataFrame(data=languages_value,columns=['Number'],index=["Python","R","SQL","C","C++","Java"
                                                ,"Javascript","Julia","Swift","Bash","MATLAB","None","Other"])

In [None]:
q7

In [None]:
q7 = q7.sort_values('Number',ascending=True)
fig = px.bar(q7, y = q7.index,x = "Number",labels={"index":"Languages","value":"Number of people"}
             ,title="Popular Languages (Q7)",color_continuous_scale = "sunset",color="Number")
fig.update_xaxes(tickangle=45)
fig.show()

* Python is extremely popular.
* Many kagglers also use SQL because it is suitable for working with data.

# Q8: What programming language would you recommend an aspiring data scientist to learn first? 

In [None]:
recommended_lang = survey_japanese["Q8"].value_counts()

In [None]:
num = [recommended_lang[i] for i in range(12)]

In [None]:
recommended_languages = pd.DataFrame(data=num,columns=["Numbers"],index=["Python","R","C","SQL","C++",
                                                    "Other","MATLAB","None","Java","Javascript","Swift","Julia"])

In [None]:
recommended_languages

In [None]:
fig = px.pie(recommended_languages,values="Numbers",names=recommended_languages.index, title='Recommended languages for beginner (Q8)')
fig.update_traces(textposition='inside')
fig.show()

* Most of Japanese Kagglers recommend Python probably because Python is easy to learn for beginners and suitable for Machine Learning.

# Q9: Which of the following integrated development environments (IDE's) do you use on a regular basis? 

In [None]:
ide_value = []
for i in range(1,12):
    ide_value.append(np.array(survey_japanese[f"Q9_Part_{i}"].value_counts()))
ide_value.append(np.array(survey_japanese["Q7_OTHER"].value_counts()))

In [None]:
q9 = pd.DataFrame(data=ide_value,columns=['Number'],index=["Jupyter","R Studio","Visual Studio","VSCode","PyCharm","Spyder"
                                                ,"Notepad++","Sublime Text","Vim / Emacs","MATLAB","None","Other"])

In [None]:
q9

In [None]:
fig = px.pie(q9,values="Number",names=q9.index, title='Popular IDEs (Q9)')
fig.update_traces(textposition='inside')
fig.show()

# Q10: Which of the following hosted notebook products do you use on a regular basis?

In [None]:
notebook_value = []
for i in range(1,14):
    notebook_value.append(np.array(survey_japanese[f"Q10_Part_{i}"].value_counts()))
notebook_value.append(np.array(survey_japanese["Q10_OTHER"].value_counts()))

In [None]:
q10 = pd.DataFrame(data=notebook_value,columns=['Number'],index=["Kaggle Notebooks","Colab Notebook","Azure Notebooks","Paperspace/Gradient","Binder/JupyterHub","CodeOcean"
                                                ,"IBM Watson Studio","Amazon Sagemaker Studio","Amazon EMR Notebooks","Google Cloud AI Platform Notebooks","Google Cloud Datalab Notebooks",
                                                           "Databricks Collaborative Notebooks","None","Other"])

In [None]:
fig = px.pie(q10,values="Number",names=q10.index, title='Popular Notebook Products (Q10)',
            color_discrete_sequence=px.colors.qualitative.Bold)
fig.update_traces(textposition='inside')
fig.show()

# Q11: What type of computing platform do you use most often for your data science projects?

In [None]:
computing_platforms = survey_japanese["Q11"].value_counts()

In [None]:
fig = px.bar(computing_platforms,labels={"index":"Computing Platforms","value":"Number of people"},title="Popular Computing Platforms (Q11)")
fig.show()

# Q12: Which types of specialized hardware do you use on a regular basis?

In [None]:
accelerator_value = []
for i in range(1,4):
    accelerator_value.append(np.array(survey_japanese[f"Q12_Part_{i}"].value_counts()))
accelerator_value.append(np.array(survey_japanese["Q12_OTHER"].value_counts()))

In [None]:
q12 = pd.DataFrame(data=accelerator_value,columns=['Number'],index=["GPUs","TPUs","None","Other"])

In [None]:
q12

In [None]:
fig = px.pie(q12,values="Number",names=q12.index, title='Popular Specialized Hardware (Q12)',
            color_discrete_sequence=px.colors.qualitative.Alphabet)
fig.update_traces(textposition='inside')
fig.show()

# Q13: How many times have you used TPUs?

In [None]:
tpus = survey_japanese["Q13"].value_counts()
tpu_un = survey_japanese["Q13"].dropna().unique()
tpu_figure = tpus.unique()

In [None]:
q13 = pd.DataFrame(tpu_figure,columns=["Number"],index=tpu_un)
q13

In [None]:
fig = px.bar(q13,x="Number",y=q13.index,labels={"index":"Times","value":"Number of people"},title="TPU Usage Experiences (Q13)",
            color_continuous_scale = "sunset",color="Number") 
fig.show()

# Q14: What data visualization libraries or tools do you use on a regular basis?

In [None]:
visualize_value = []
for i in range(1,7):
    visualize_value.append(np.array(survey_japanese[f"Q14_Part_{i}"].value_counts()))
for i in range(8,12):
    visualize_value.append(np.array(survey_japanese[f"Q14_Part_{i}"].value_counts()))
visualize_value.append(np.array(survey_japanese["Q14_OTHER"].value_counts()))

In [None]:
q14 = pd.DataFrame(data=visualize_value,columns=['Number'],index=["Matplotlib","Seaborn","Plotly","Ggplot/ggplot2","Shiny",
                                                                 "D3 js","Bokeh","Geoplotlib","Leaflet/Folium","None","Other"])

In [None]:
q14

* Q14_Part7(Altair) will be excluded because no one kagglers in Japan selected it.

In [None]:
fig = px.pie(q14,values="Number",names=q14.index, title='Popular Data Visualization Libraries (Q14)')
fig.update_traces(textposition='inside')
fig.show()

# Q15:For how many years have you used machine learning methods?

In [None]:
ml = survey_japanese["Q15"].value_counts()

In [None]:
fig = px.bar(ml,labels={"index":"ML methods experience","value":"Number of people"},title="")
fig.show()

* About a quarter of Japanese kagglers started learning ML in the last year.
* We can understand easily that Kaggle is popular among beginners, as well as among intermediate or advanced.

# Q16: Which of the following machine learning frameworks do you use on a regular basis?

In [None]:
ml_framework_value = []
for i in range(1,16):
    ml_framework_value.append(np.array(survey_japanese[f"Q16_Part_{i}"].value_counts()))
ml_framework_value.append(np.array(survey_japanese["Q16_OTHER"].value_counts()))

In [None]:
q16 = pd.DataFrame(data=ml_framework_value,columns=['Number'],index=["Scikit-Learn","TensorFlow","Keras","PyTorch","Fast.ai","MXNet",
    "Xgboost","LightGBM","CatBoost","Prophet","H20 3","Caret","Tidymodels","JAX","None","Other"])

In [None]:
q16

In [None]:
fig = px.pie(q16,values="Number",names=q16.index, title='Popular ML Frameworks (Q16)',
            color_discrete_sequence=px.colors.qualitative.G10)
fig.update_traces(textposition='inside')
fig.show()

* Frameworks such as sklearn,keras,and tensorflow are very popular.
* At the same time, more and more kagglers start using XGBoost or LightGBM.

# Q17: Which of the following ML algorithms do you use on a regular basis?

In [None]:
ml_algorithms_value = []
for i in range(1,11):
    ml_algorithms_value.append(np.array(survey_japanese[f"Q17_Part_{i}"].value_counts()))
ml_algorithms_value.append(np.array(survey_japanese["Q17_OTHER"].value_counts()))

In [None]:
q17 = pd.DataFrame(data=ml_algorithms_value,columns=['Number'],index=["Linear or Logistic Regression","Decision Trees or Random Forests","GB Machines","Bayesian Approaches","Evolutionary Approaches","Dense NN",
    "CNN","GAN","RNN","Transformer","Other"])

In [None]:
q17

In [None]:
fig = px.pie(q17,values="Number",names=q17.index, title='Popular ML Algorithms (Q17)',
            color_discrete_sequence=px.colors.qualitative.Set3)
fig.update_traces(textposition='inside')
fig.show()

# Q18: Which categories of computer vision methods do you use on a regular basis? 

In [None]:
cv_methods_value = []
for i in range(1,7):
    cv_methods_value.append(np.array(survey_japanese[f"Q18_Part_{i}"].value_counts()))
cv_methods_value.append(np.array(survey_japanese["Q18_OTHER"].value_counts()))

In [None]:
q18 = pd.DataFrame(data=cv_methods_value,columns=["Number"],index=["General purpose image/video tools","Image segmentation methods",
                                                                  "Object detection methods","Image classification and other general purpose networks","Generative Networks",
                                                                  "None","Other"])

In [None]:
q18

In [None]:
fig = px.pie(q18,values="Number",names=q18.index, title='Popular Computer Vision Methods (Q18)',
            color_discrete_sequence=px.colors.qualitative.Light24,width=900,height=800)
fig.update_traces(textposition='inside')
fig.show()

# Q19: Which of the following natural language processing (NLP) methods do you use on a regular basis?

In [None]:
nlp_value = []
for i in range(1,5):
    nlp_value.append(np.array(survey_japanese[f"Q19_Part_{i}"].value_counts()))
nlp_value.append(np.array(survey_japanese["Q19_OTHER"].value_counts()))

In [None]:
q19 = pd.DataFrame(data=nlp_value,columns=["Number"],index=["Word embeddings/vectors","Encoder-decoder models",
                                                             "Contextualized embeddings","Transformer language models","Other"])

In [None]:
q19

In [None]:
fig = px.pie(q19,values="Number",names=q19.index, title='Popular NLP Methods (Q19)')
fig.update_traces(textposition='inside')
fig.show()

# Q20: What is the size of the company where you are employed?

In [None]:
company_size = survey_japanese["Q20"].value_counts()

In [None]:
fig = px.bar(company_size,labels={"index":"Company Size","value":"Number of people"},title="Company size (Q20)")
fig.show()

# Q24: What is your current yearly compensation (approximate USD)?

In [None]:
compensation = survey_japanese["Q24"].value_counts().to_dict()
comp_un = survey_japanese["Q24"].dropna().unique()
comp_figure = compensation.values()

In [None]:
q24 = pd.DataFrame(list(comp_figure),columns=["Number"],index=compensation.keys())
q24

In [None]:
fig = px.bar(q24,x=q24.index,y="Number",labels={"index":"Compensation(USD)","value":"Number of people"},title="Compensation (Q24)"
            ,color="Number", color_continuous_scale='sunset')
fig.update_xaxes(tickangle=45)
fig.show()

* Some kagglers earn over 100000 dollers a year.

# Q25: How much money did you spend on ML?(in the past 5years)

In [None]:
money = survey_japanese["Q25"].value_counts()

In [None]:
fig = px.bar(money,labels={"index":"Money Spent on ML (USD)","value":"Number of people"},
             title="How much money did you spend on ML? (USD) (Q25)")
fig.show()

* Some kagglers are enjoying ML without paying money.
* Kaggle is graet because we can use for free!

# Q26: Which of the following cloud computing platforms do you use on a regular basis?

In [None]:
com_value = []
for i in range(1,12):
    com_value.append(np.array(survey_japanese[f"Q26_A_Part_{i}"].value_counts()))
com_value.append(np.array(survey_japanese["Q26_A_OTHER"].value_counts()))

In [None]:
q26 = pd.DataFrame(data=com_value,columns=["Number"],index=["AWS","Microsoft Azure","Google Cloud Platform",
                                                           "IBM Cloud/Red Hat","Oracle Cloud","SAP Cloud","Salesforce Cloud",
                                                           "VMware Cloud","Alibaba Cloud","Tencent Cloud","None","Other"])

In [None]:
q26

In [None]:
fig = px.pie(q26,values="Number",names=q26.index, title='Popular Computing Platforms (Q26)')
fig.update_traces(textposition='inside')
fig.show()

* More than half of Japanese kagglers use GCP or AWS.
* In the meantime, 23% kagglers do not use any computing platforms.

# Q30: Which of the following big data products do you use most often?

In [None]:
bg_products =survey_japanese["Q30"].value_counts()

In [None]:
fig = px.bar(bg_products,labels={"index":"Big Data Products","value":"Number of people"},
             title="Popular Big Data Products (Q30)")
fig.update_xaxes(tickangle=45)
fig.show()

# Q32: Which of the following business intelligence tools do you use most often?

In [None]:
bi_tools = survey_japanese["Q32"].value_counts()

In [None]:
fig = px.bar(bi_tools,labels={"index":"Business Inteligence Tools","value":"Number of people"},
             title="Popular Business Intelligence Tools (Q32)")
fig.show()

* MS power BI and Tableau are especially popular.

# Q37: On which platforms have you begun or completed DS courses?

In [None]:
course_value = []
for i in range(1,11):
    course_value.append(np.array(survey_japanese[f"Q37_Part_{i}"].value_counts()))
course_value.append(np.array(survey_japanese["Q37_OTHER"].value_counts()))

In [None]:
q37 = pd.DataFrame(data=course_value,columns=["Number"],index=["Coursera","edX","Kaggle Learn Courses",
                                                              "DataCamp","Fast.ai","Udacity","Udemy","LinkedIn Learning",
                                                              "Cloud-certification programs","University Courses","Other"])

In [None]:
q37

In [None]:
fig = px.pie(q37,values="Number",names=q37.index, title='Popular Platforms for Learning DS (Q37)')
fig.update_traces(textposition='inside')
fig.show()

* Kaggle Courses and Coursera are very popular.

# Q39: Who/what are your favorite media sources that report on data science topics?

In [None]:
source_value = []
for i in range(1,12):
    source_value.append(np.array(survey_japanese[f"Q39_Part_{i}"].value_counts()))
source_value.append(np.array(survey_japanese["Q39_OTHER"].value_counts()))

In [None]:
q39 = pd.DataFrame(data=source_value,columns=["Number"],index=["Twitter","Email newsletters","Reddit","Kaggle",
                                                              "Course Forums","YouTube","Podcsts","Blogs","Journal Publications","Slack Communities","None","Other"])

In [None]:
q39

In [None]:
fig = px.pie(q39,values="Number",names=q39.index, title='Popular Media Sources Report on DS Topics (Q39)')
fig.update_traces(textposition='inside')
fig.show()

* According to the graph above, Japanese Kagglers often use Kaggle,YouTube,and Twitter for getting information.
* In addition to that, Slack communities are also popular, probably because there is a large slack community for Japanese kagglers.

This notebook ends here. <br> 
# Thank you for reading! Feel free to upvote!