## 2019 Kaggle ML & DS Survey - The most comprehensive dataset available on the state of ML and data science
[Crislânio Macêdo](https://medium.com/sapere-aude-tech) -  December, 31th, 2019

----------

<p><font size="6" color="Green">An analysis of the 2019 Kaggle ML and DS Survey - The story of Brazilian Kagglers</font><p>


![](https://www.incimages.com/uploaded_files/image/970x450/getty_660952912_363647.jpg)

<p><font size="5" color="Green">Objective</font></p>

<p><font size="3" color="Blue">
The objective of this notebook to analyze the survey data to answer an important question: How can Brazilian kagglers help to reduce social exclusion in Brazil and help in the development of the country?
<p><font size="5" color="Green">2019 Kaggle ML & DS Survey:</font></p>
</font></p>

<p><font size="3" color="Blue">
Tell a data story about a subset of the data science community represented in this survey, through a combination of both narrative text and data exploration. A “story” could be defined any number of ways, and that’s deliberate. The challenge is to deeply explore (through data) the impact, priorities, or concerns of a specific group of data science and machine learning practitioners. That group can be defined in the macro (for example: anyone who does most of their coding in Python) or the micro (for example: female data science students studying machine learning in masters programs). This is an opportunity to be creative and tell the story of a community you identify with or are passionate about!
</font></p>





<p><font size=5"" color="Green">The World Needs Kaggle</font></p>


<p><font size="3" color="Blue">    
The digital revolution has transformed our lives and societies with unprecedented speed and scale, delivering
immense opportunities as well as daunting challenges.
   
</font></p>

<p><font size="3" color="Blue">     
    
Whenever I am on Kaggle one of the most important things I always try to notice is where are my fellow Kagglers from, 
how can I learn from the most experienced people in the best data science learning platform in the word to build their Data Science Skills ? How show what I have learned to people I know and people who don't know much about technology?

</font></p>

<p><font size="6" color="Green">The big Nation</font></p>

<p><font size="5" color="Green">Internet access reproduces social and economic inequality in Brazil</font></p>
![](https://conteudo.imguol.com.br/c/noticias/6e/2019/08/16/internet-das-pessoas-1565988763033_v2_1170x540.jpg)
source:[link](https://www.instagram.com/tilt_uol/?utm_source=ig_embed)


<p><font size="3" color="Blue">   
Brazil is the largest country in South America, the fifth largest in the world in territorial area and fifth in population, and is one of the most multicultural and ethnically diverse nations, due to the strong immigration from various locations in the world. In this great Country, digital exclusion comes gaining prominence in the last years. The attention is converging to this subject that is already seen as a cause and consequence of exclusion in our society. Brazil has a lot of inequalities that create a request for the politics of transference and income generation. However, it is not enough to become the ways available, it is important to show to people how the technologies can contribute to their tasks and activities, bringing knowledge and opportunities.
</font></p> 

<p><font size="3" color="Green">
</font></p>

In [None]:
# Suppress warnings 
import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)
warnings.filterwarnings("ignore", category=UserWarning)
warnings.filterwarnings("ignore", category=FutureWarning)

from IPython.display import HTML

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt


import plotly.offline as py
import plotly.express as px
from plotly.offline import iplot, init_notebook_mode
import plotly.graph_objects as go
from plotly import tools
init_notebook_mode(connected=True)


%matplotlib inline
pd.set_option("display.max_rows",500)
pd.set_option("display.max_columns",200)

##### Plots. See this: [ranjeetjain3](https://www.kaggle.com/ranjeetjain3/aws-vs-gcp-vs-azure) (upvoted)


In [None]:
def compute_percentage(df,col):
    return df[col].value_counts(normalize=True) * 100

def bi_variant_chart(col1,col2,x_title,y_title, mcr_brazil):
    index = mcr_brazil[col1].dropna().unique()
    vals = mcr_brazil[col2].unique()
    layout = go.Layout()
    trace = []
    for j,y_axis in enumerate(vals):
        trace.append(go.Bar(x = mcr_brazil[mcr_brazil[col2] == y_axis][col1].value_counts().index,
                            y = mcr_brazil[mcr_brazil[col2] == y_axis][col1].sort_values().value_counts().values,
                opacity = 0.6, name = vals[j]))
    fig = go.Figure(data = trace, layout = layout)
    fig.update_layout(
        title = x_title,
        yaxis = dict(title = y_title),
        legend = dict( bgcolor = 'rgba(255, 255, 255, 0)', bordercolor = 'rgba(255, 255, 255, 0)'),
        bargap = 0.15, bargroupgap = 0.1,legend_orientation="h")
    fig.show()
    
def bar_graph(col,type_of_graph, mcr_brazil):
    data_frame = compute_percentage(mcr_brazil,col)
    layout = go.Layout()
    
    if type_of_graph == 'bar':
        data = [go.Bar(
                x = data_frame.values,
                y = data_frame.index,
                opacity = 0.6,
                orientation='h',
               marker=dict(color=data_frame.values,colorscale='portland') 

            )]
    elif type_of_graph == 'pie':
        data = [go.Pie(
            labels = data_frame.index,
            values = data_frame.values,
            textfont = dict(size = 20)
        )]
    fig = go.Figure(data = data, layout = layout)
    py.iplot(fig)

In [None]:
mcr = pd.read_csv('/kaggle/input/kaggle-survey-2019/multiple_choice_responses.csv')
qs = pd.read_csv('/kaggle/input/kaggle-survey-2019/questions_only.csv')

<p><font size="3" color="Green">There were 34 questions asked in the survey
</font></p>
<p><font size="3" color="Green">This notebook is limited to Brazilian respondents
</font></p>



![](https://media.gettyimages.com/videos/brazil-flag-loopable-video-id538144802?s=640x640)

In [None]:
mcr_brazil = mcr[mcr['Q3']=='Brazil']

In [None]:
print("There are ",mcr_brazil.shape[0], "respondents ",mcr_brazil.shape[0]/mcr.shape[0],"%")

In [None]:
mcr_brazil = mcr_brazil.reset_index().drop('index', axis=1)
mcr_brazil.shape

<p><font size="3" color="Green">Male respondents
</font></p>



In [None]:
mcr_brazil_male = mcr_brazil[mcr_brazil.Q2=='Male'].reset_index().drop('index', axis=1)
mcr_brazil_male.shape

<p><font size="3" color="Green">Female respondents
</font></p>



In [None]:
mcr_brazil_female = mcr_brazil[mcr_brazil.Q2=='Female'].reset_index().drop('index', axis=1)
mcr_brazil_female.shape

# <p><font size="5" color="Green">Insights</font></p>


In [None]:
mcr_brazil.head()

<p><font size="5" color="Green">Q1  :  What is your age (# years)?</font></p>
<p><font size="4" color="Blue">Most of the respondents have under the age of 25-29 and 30-34.</font></p>



In [None]:
mcr_brazil.Q1.value_counts()

<p><font size="5" color="Green">Q2  :  What is your gender? </font></p>
<p><font size="4" color="Blue">Most of the respondents are Male. </font></p>

In [None]:
mcr_brazil.Q2.value_counts()

<p><font size="5" color="Green">Q4  :  What is the highest level of formal education that you have attained or plan to attain within the next 2 years?</font></p>
<p><font size="4" color="Blue">Most of the respondents have Master’s degree.</font></p>


In [None]:
mcr_brazil.Q4.value_counts()

<p><font size="5" color="Green">Q5  :  Select the title most similar to your current role (or most recent title if retired): 
</font></p>
<p><font size="4" color="Blue">Most of the respondents have Data Scientist and Software Engineer.</font></p>




In [None]:
mcr_brazil.Q5.value_counts()

<p><font size="5" color="Green">Q6  :  What is the size of the company where you are employed?
</font></p>
<p><font size="4" color="Blue">Most of the respondents are in Company from 0-49 employees.</font></p>


In [None]:
mcr_brazil.Q6.value_counts()

<p><font size="5" color="Green">Q7  :  Approximately how many individuals are responsible for data science workloads at your place of business?
</font></p>


In [None]:
mcr_brazil.Q7.value_counts()

<p><font size="5" color="Green">Q8  :  Does your current employer incorporate machine learning methods into their business?
</font></p>


In [None]:
mcr_brazil.Q8.value_counts()

<p><font size="5" color="Green">Q11  :  Approximately how much money have you spent on machine learning and/or cloud computing
</font></p>



In [None]:
mcr_brazil.Q11.value_counts()

<p><font size="5" color="Green">Q14  :  What is the primary tool that you use at work or school to analyze data? (Include text response) 
</font></p>



In [None]:
mcr_brazil.Q14.value_counts()

<p><font size="5" color="Green">Q15  :  How long have you been writing code to analyze data (at work or at school)?
</font></p>




In [None]:
mcr_brazil.Q15.value_counts()

<p><font size="5" color="Green">Q19  :  What programming language would you recommend an aspiring data scientist to learn first? 
</font></p>

In [None]:
mcr_brazil.Q19.value_counts()

<p><font size="5" color="Green">Q22  :  Have you ever used a TPU (tensor processing unit)?
</font></p>



In [None]:
mcr_brazil.Q22.value_counts()

<p><font size="5" color="Green">Q23  :  For how many years have you used machine learning methods?
</font></p>

In [None]:
mcr_brazil.Q23.value_counts()

# <p><font size="5" color="Blue"> Gender distribution in percentage</font></p>


In [None]:
bar_graph("Q2","bar",mcr_brazil)

# <p><font size="5" color="Blue"> What Happens in different size of companies</font></p>


In [None]:
bi_variant_chart("Q6","Q1","Company size VS age group","Count",mcr_brazil)

Insights

- Most of the company employees fall under the age of 25-29 and **30-34**
- In all the company size we not have employees who are aged more than **70**.


In [None]:
bi_variant_chart("Q6","Q5","Company size VS Designation","Count",mcr_brazil)

Insights

* **Data scientists** are more in all the companies.
* Statups have more **Software Eng**. than **Data Analysts**.
* Large scale companies have more **Data Analysts** than **Software Eng**.


In [None]:
bi_variant_chart("Q6","Q10","Company size VS Salary","",mcr_brazil)

Insights

- 23 employees in startups are getting paid **0-999**. 
- 15 employees in startups are getting paid **30,000-39,999**. 


# <p><font size="10" color="Green">Women's Representation in Machine Learning and Data Science - Brazil</font></p>



<p><font size="5" color="Blue">What Happens in different size of companies</font></p>


In [None]:
bi_variant_chart("Q6","Q1","Company size VS age group","Count",mcr_brazil_female)

In [None]:
bi_variant_chart("Q6","Q4","Company size VS highest level of formal education","Count",mcr_brazil_female)

In [None]:
bi_variant_chart("Q6","Q5","Company size VS Designation","Count", mcr_brazil_female)

Insights

- 23 employees in startups are getting paid **0-999**. 
- 15 employees in startups are getting paid **30,000-39,999**. 


# <p><font size="5" color="Blue">Salary Range of Female respondents</font></p>


In [None]:
bi_variant_chart("Q6","Q10","Company size VS Salary","", mcr_brazil_female)

<p><font size="3" color="Blue">    


Nowadays, people that are not living in the computerized society find it more difficult, or even unable to perform some tasks made simpler by using new technology-based services Brazilian Kagglers can help this peoples through studies kernel, conference talk and experience report. 

</font></p>

<p><font size="10" color="Blue">
Teaching programming is the new literacy
</font></p>

In [None]:
HTML('<iframe width="1280" height="720" src="https://www.youtube.com/embed/zBqPg80l7xA" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>')


<p><font size="10" color="Blue">Invest in the future</font></p>


<p><font size="6" color="Green">Challenges of educating in the digital age</font></p>
In **Brazil**, **75%** of parents do not accompany their children in the digital world claiming “lack of time to track activities,” according to research commissioned by McAfee and conducted by Abaco Marketing Research.

About **65%** of elementary school children will work in professions that have not even been created (World Economic Forum). Learning the most used language in the world, programming, is a differential

# Brazil have great Kagglers: 
[Giba](https://www.kaggle.com/titericz)

[Mario Filho](https://www.kaggle.com/mariofilho)

[Leonardo Pereira](https://www.kaggle.com/kabure)

[Paulo Pinto](https://www.kaggle.com/paulorzp)

[Felipe Mello](https://www.kaggle.com/felipemello)

[Rafael Rui](https://www.kaggle.com/rafarui)

[Bruno G. do Amaral](https://www.kaggle.com/bguberfain)

[Marília Prata](https://www.kaggle.com/mpwolke)

[Henrique Mendonça](https://www.kaggle.com/hmendonca)
# You can make a difference to many who are starting out in the data world!


<html>
<body>
<p><font size="5" color="Green">I'm working in this kernel yet</font></p>
<p><font size="3" color="Blue">If you like my kernel please consider upvoting it</font></p>
<p><font size="2" color="Green">Don't hesitate to give your suggestions in the comment section</font></p>

</body>
</html>


# Final