# Introduction


In this notebook we are analysing Machine Learning and Data Science Survey data to understand how Job Profile varies for women and men for years 2018, 2019, 2020.

I was surprised to learn that **there are higher number of Female statisticians than Males who have participated in Survey ** which is in contrast to the myth that there are lesser girls in STEM fields.

Hope you enjoy the analysis and article.

Please do upvote if you like it.


# Imports

In [None]:
import numpy as np
import pandas as pd
import os
import matplotlib.pyplot as plt
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.express as px

# Helper Functions

In [None]:
def load_csv(base_dir,file_name):
    """Loads a CSV file into a Pandas DataFrame"""
    file_path = os.path.join(base_dir,file_name)
    df = pd.read_csv(file_path,low_memory=False)
    return df



In [None]:
def count_then_return_percent(dataframe,column_name):
    '''
    A helper function to return value counts as percentages.
    
    It has the following dependencies:
    numpy: 1.18.5; pandas: 1.1.3

    import numpy as np 
    import pandas as pd 
    '''
    
    counts = dataframe[column_name].value_counts(dropna=False)
    percentages = round(counts*100/(dataframe[column_name].count()),1)
    return percentages

def count_then_return_percent_for_multiple_column_questions(dataframe,list_of_columns_for_a_single_question,dictionary_of_counts_for_a_single_question):
    '''
    A helper function to convert counts to percentages.
    
    It has the following dependencies:
    numpy: 1.18.5; pandas: 1.1.3

    import numpy as np 
    import pandas as pd 
    '''
    
    df = dataframe
    subset = list_of_columns_for_a_single_question
    df = df[subset]
    df = df.dropna(how='all')
    total_count = len(df) 
    dictionary = dictionary_of_counts_for_a_single_question
    for i in dictionary:
        dictionary[i] = round(float(dictionary[i]*100/total_count),1)
    return dictionary 

def create_dataframe_of_counts(dataframe,column,rename_index,rename_column,return_percentages=False):
    '''
    A helper function to create a dataframe of either counts 
    or percentages, for a single multiple choice question.
    
    It has the following dependencies: 
    numpy: 1.18.5; pandas: 1.1.3
    
    import numpy as np 
    import pandas as pd  
    '''
    df = dataframe[column].value_counts().reset_index() 
    if return_percentages==True:
        df[column] = (df[column]*100)/(df[column].sum())
    df = pd.DataFrame(df) 
    df = df.rename({'index':rename_index, 'Q3':rename_column}, axis='columns')
    return df

def sort_dictionary_by_percent(dataframe,list_of_columns_for_a_single_question,dictionary_of_counts_for_a_single_question): 
    ''' 
    A helper function that can be used to sort a dictionary.
    
    It is an adaptation of a similar function
    from https://www.kaggle.com/sonmou/what-topics-from-where-to-learn-data-science.
    
    It has the following dependencies:
    numpy: 1.18.5; pandas: 1.1.3

    import numpy as np 
    import pandas as pd 
    '''
    dictionary = count_then_return_percent_for_multiple_column_questions(dataframe,
                                                                list_of_columns_for_a_single_question,
                                                                dictionary_of_counts_for_a_single_question)
    dictionary = {v:k    for(k,v) in dictionary.items()}
    list_tuples = sorted(dictionary.items(), reverse=False) 
    dictionary = {v:k for (k,v) in list_tuples}   
    return dictionary

In [None]:
# Draws bar-chart from python Dictionary
def draw_bar_chart(count_dict, title, orientation='v', color='blue'):
    
    count_series = pd.Series(count_dict)
    
    fig = go.Figure()
    
    if (orientation =='h'):
        angle = 0 
        ys= count_series.index
        xs= count_series.values
    elif (orientation =='v'):
        angle=-45
        xs= count_series.index
        ys= count_series.values
        
    trace = go.Bar(
        x=xs,
        y=ys,
        text=count_series.values,
        textposition='auto',
        marker=dict(
            color=color,
            ),
        orientation=orientation
    )

    fig.add_trace(trace)
    
    
    # Set layout properties for title, axis_tick_angle and background color
    fig.update_layout(
        autosize=False,
        width=800,
        height=500,
        xaxis_tickangle=angle,
        
        plot_bgcolor = 'White',
        
        title=dict(
            text=title,
            y=0.9,
            x=0.5,
            xanchor= 'center',
            yanchor= 'top'
        ),
        
        font=dict(
            family="Arial",
            size=14,
            color="#7f7f7f"
        ),
     )

    fig.show()

def draw_pie_chart(count_series, title, hole=0):
    '''
    This function creates a bar chart.
    
    It has the following dependencies:
    plotly express: 0.4.1
    
    import plotly.express as px
    '''
    labels = count_series.index
    sizes = count_series.values

    trace = go.Pie(labels=labels, values=sizes, hole=hole)

    layout = go.Layout(
        title=dict(
            text=title,
            y=0.5,
            x=0.5,
            xanchor= 'center',
            yanchor= 'top'
        ),
        
        font=dict(
            family="Arial",
            size=14,
            color="#7f7f7f"
        ),
    )

    data = [trace]

    fig = go.Figure(data=data, layout=layout)

    fig.show()
    
def plotly_bar_chart(response_counts,title,y_axis_title,orientation):
    '''
    This function creates a bar chart.
    
    It has the following dependencies:
    plotly express: 0.4.1
    
    import plotly.express as px
    '''
    response_counts_series = pd.Series(response_counts)
    fig = px.bar(response_counts_series,
             labels={"index": '',"value": y_axis_title},
             text=response_counts_series.values,
             orientation=orientation,)
    fig.update_layout(showlegend=False,
                      title={'text': title,
                             'y':0.95,
                             'x':0.5,})
    fig.show()

def plotly_choropleth_map(df, column, title, max_value):
    '''
    This function creates a choropleth map.
    
    It has the following dependencies:
    plotly express: 0.4.1
    
    import plotly.express as px
    '''
    fig = px.choropleth(df, 
                    locations = 'country',  
                    color = column,
                    locationmode = 'country names', 
                    color_continuous_scale = 'viridis',
                    title = title,
                    range_color = [0, max_value])
    fig.update(layout=dict(title=dict(x=0.5)))
    fig.show()


In [None]:
def draw_trend(x_data, y_data, labels = ['Male', 'Female'], colors = ['slateblue', 'mediumvioletred'], mode_size=[12, 12], line_size=[4, 4],  mode='lines+markers',title='Trend'):
    fig = go.Figure()
    
    for i in range(0, 2):
        fig.add_trace(go.Scatter(x=x_data[i], y=y_data[i], mode='lines+markers',
            name=labels[i],
            line=dict(color=colors[i], width=line_size[i]),
            connectgaps=True,
        ))
    
    fig.update_layout(
        xaxis=dict(
            showline=True,
            showgrid=False,
            linecolor='rgb(204, 204, 204)',
            linewidth=2,
            ticks='outside',
            tickfont=dict(
                family='Arial',
                size=12,
                color='rgb(82, 82, 82)',
            ),
         ),
        yaxis=dict(
            showgrid=False,
            showline=False,
            showticklabels=False,
        ),
        showlegend=False,
        plot_bgcolor='white'
    )


    annotations = []

    # Adding labels
    for y_trace, label, color in zip(y_data, labels, colors):
        # labeling the left_side of the plot
        annotations.append(dict(xref='paper', x=0.05, y=y_trace[0],
                                  xanchor='right', yanchor='middle',
                                  text=label + ' {}%'.format(y_trace[0]),
                                  font=dict(family='Arial',
                                            size=16),
                                  showarrow=False))
    
        # labeling the right_side of the plot
        annotations.append(dict(xref='paper', x=0.95, y=y_trace[2],
                                  xanchor='left', yanchor='middle',
                                  text='{}%'.format(y_trace[2]),
                                  font=dict(family='Arial',
                                            size=16),
                                  showarrow=False))


    # Title
    annotations.append(dict(xref='paper', yref='paper', x=0.0, y=1.05,
                              xanchor='left', yanchor='bottom',
                              text='Trend for last three years',
                              font=dict(family='Arial',
                                        size=24,
                                        color='rgb(37,37,37)'),
                              showarrow=False))
    # Source
    annotations.append(dict(xref='paper', yref='paper', x=0.5, y=-0.1,
                              xanchor='center', yanchor='top',
                              text='Year',
                              font=dict(family='Arial',
                                        size=14,
                                        color='rgb(150,150,150)'),
                              showarrow=False))

    fig.update_layout(annotations=annotations)

    fig.show()





In [None]:
from plotly.subplots import make_subplots

def draw_2_bar_chart(count_dict1,count_dict2, title, orientation='v', color1='mediumvioletred',color2='slateblue'):
    
    # Creating two subplots
    fig = make_subplots(rows=1, cols=2)

    # trace1
    count_series1 = pd.Series(count_dict1)
    
    if (orientation =='h'):
        angle = 0 
        ys= count_series1.index
        xs= count_series1.values
    elif (orientation =='v'):
        angle=-45
        xs= count_series1.index
        ys= count_series1.values
        
    trace1 = go.Bar(
        x=xs,
        y=ys,
        text=count_series1.values,
        textposition='auto',
        marker=dict(
        color='rgba(247, 121, 72, 0.6)',
        line=dict(color='rgba(247, 121, 72, 1.0)', width=2)
    ),
        orientation=orientation
    )

    fig.add_trace(trace1,1,1)
    
    # trace2
    count_series2 = pd.Series(count_dict2)
    
       
    if (orientation =='h'):
        angle = 0 
        ys= count_series2.index
        xs= count_series2.values
    elif (orientation =='v'):
        angle=-45
        xs= count_series2.index
        ys= count_series2.values
        
    trace2 = go.Bar(
        x=xs,
        y=ys,
        text=count_series2.values,
        textposition='auto',
        marker=dict(
        color='rgba(89, 72, 247, 0.6)',
        line=dict(color='rgba(89, 72, 247, 1.0)', width=2)
    ),
        orientation=orientation
    )

    fig.add_trace(trace2,1,2)
    
    
    

    # Set layout properties for title, axis_tick_angle and background color
    fig.update_layout(
        autosize=False,
        width=870,
        height=700,
        xaxis_tickangle=angle,
        
        plot_bgcolor = 'White',
        
        title=dict(
            text=title,
            y=0.9,
            x=0.5,
            xanchor= 'center',
            yanchor= 'top'
        ),
        
        font=dict(
            family="Arial",
            size=14,
            color="#7f7f7f"
        ),
        showlegend=False,
     )

    fig.show()


In [None]:
from plotly.subplots import make_subplots

def draw_side_by_side_bar_chart(count_dict1,count_dict2, title, orientation='v', color1='mediumvioletred',color2='slateblue'):
    
    fig = make_subplots(rows=1, cols=1)
    
    # trace1
    count_series1 = pd.Series(count_dict1)
    
    if (orientation =='h'):
        angle = 0 
        ys= count_series1.index
        xs= count_series1.values
    elif (orientation =='v'):
        angle=-45
        xs= count_series1.index
        ys= count_series1.values
        
    trace1 = go.Bar(
        x=xs,
        y=ys,
        text=count_series1.values,
        textposition='auto',
        marker=dict(
        color='rgba(247, 121, 72, 0.6)',
        line=dict(color='rgba(247, 121, 72, 1.0)', width=2)
    ),
        orientation=orientation
    )

    fig.add_trace(trace1,1,1)
    
    # trace2
    count_series2 = pd.Series(count_dict2)
    
       
    if (orientation =='h'):
        angle = 0 
        ys= count_series2.index
        xs= count_series2.values
    elif (orientation =='v'):
        angle=-45
        xs= count_series2.index
        ys= count_series2.values
        
    trace2 = go.Bar(
        x=xs,
        y=ys,
        text=count_series2.values,
        textposition='auto',
        marker=dict(
        color='rgba(89, 72, 247, 0.6)',
        line=dict(color='rgba(89, 72, 247, 1.0)', width=2)
    ),
        orientation=orientation
    )

    fig.add_trace(trace2,1,1)
    
    
    

    # Set layout properties for title, axis_tick_angle and background color
    fig.update_layout(
        autosize=False,
        width=900,
        height=900,
        xaxis_tickangle=angle,
        
        plot_bgcolor = 'White',
        
        title=dict(
            text=title,
            y=0.9,
            x=0.5,
            xanchor= 'center',
            yanchor= 'top'
        ),
        
        font=dict(
            family="Arial",
            size=14,
            color="#7f7f7f"
        ),
        showlegend=False,
     )

    fig.show()


In [None]:
def draw_2_trend_charts(x_data_1, y_data_1, x_data_2, y_data_2, labels = ['Male', 'Female'], colors = ['slateblue', 'mediumvioletred'],
                 mode_size=[12, 12], line_size=[4, 4],  mode='lines+markers',title='Trend', height = 500):
    
    # Creating two subplots
    fig = make_subplots(rows=1, cols=2, subplot_titles=('Females', 'Males'), shared_yaxes=True,
                    horizontal_spacing=0.03)
    
    for i in range(0, len(y_data_1)):
        fig.add_trace(go.Scatter(x=x_data_1[i], y=y_data_1[i], mode='lines+markers',
            name=labels[i],
            line=dict(color=colors[i], width=line_size[i]),
            connectgaps=True,
        ),1,1)
        
       
        
    for i in range(0, len(y_data_2)):
        fig.add_trace(go.Scatter(x=x_data_2[i], y=y_data_2[i], mode='lines+markers',
            name=labels[i],
            line=dict(color=colors[i], width=line_size[i]),
            connectgaps=True,
        ),1,2)
        
    
    # Update xaxis properties
    fig.update_xaxes(title_text="Year", row=1, col=1, showgrid=False, showline=True, linecolor='rgb(204, 204, 204)', linewidth=2,
                    ticks='outside', tickfont=dict(family='Arial',size=12, color='rgb(82, 82, 82)',),)
    fig.update_xaxes(title_text="Year", row=1, col=2, showgrid=False, showline=True, linecolor='rgb(204, 204, 204)', linewidth=2,
                    ticks='outside', tickfont=dict(family='Arial',size=12, color='rgb(82, 82, 82)',))
    
    # Update yaxis properties
    fig.update_yaxes(title_text="% of respondents", row=1, col=1, showgrid=False, showline=False,)
    fig.update_yaxes(title_text="% of respondents", row=1, col=2, showgrid=False, showline=False,)

    fig.update_layout(
        #showlegend=False,
        plot_bgcolor='white',
        title_text = title,
        height=height
    )
    
    fig.show()

# 2018 Kaggle Survey Data

In [None]:
# Load 2018 data


base_dir = '/kaggle/input/kaggle-survey-2018/'
file_name = 'multipleChoiceResponses.csv'
survey_2018_df = load_csv(base_dir,file_name)
responses_2018_df = survey_2018_df[1:]

# Preview 2018 data

print('Total Number of Responses: ',responses_2018_df.shape[0])
print('\nPreview of the data:')
responses_2018_df.head(2)


# 2019 Kaggle Survey Data

In [None]:
# Load 2019 data


base_dir = '/kaggle/input/kaggle-survey-2019/'
file_name = 'multiple_choice_responses.csv'
survey_2019_df = load_csv(base_dir,file_name)
responses_2019_df = survey_2019_df[1:]

# Preview 2019 data

print('Total Number of Responses: ',responses_2019_df.shape[0])
print('\nPreview of the data:')
responses_2019_df.head(2)

# 2020 Kaggle Survey Data

In [None]:
# Load 2020 data

base_dir = '/kaggle/input/kaggle-survey-2020'
file_name = 'kaggle_survey_2020_responses.csv'
survey_2020_df = load_csv(base_dir,file_name)
responses_2020_df = survey_2020_df[1:]


# Preview 2020 data

print('Total Number of Responses: ',responses_2020_df.shape[0])
print('\nPreview of the data:')
responses_2020_df.head(2)



# Get Sorted Data for Females and Males

In [None]:
question_name = 'Q1'#Gender
sorted_percentages_2018 = count_then_return_percent(responses_2018_df,question_name)
sorted_percentages_2018

In [None]:
question_name = 'Q2'#Gender
sorted_percentages_2019 = count_then_return_percent(responses_2019_df,question_name)
sorted_percentages_2019

In [None]:
question_name = 'Q2'#Gender
sorted_percentages_2020 = count_then_return_percent(responses_2020_df,question_name)

type(sorted_percentages_2020)
sorted_percentages_2020

In [None]:

y_data = np.array([
    [ sorted_percentages_2018['Male'], sorted_percentages_2019['Male'], sorted_percentages_2020['Man']],
    [ sorted_percentages_2018['Female'], sorted_percentages_2019['Female'], sorted_percentages_2020['Woman']],
    
])

x_data = np.vstack((np.array(['2018', '2019', '2020'], dtype='datetime64'),) * len(y_data))

title = 'Trend of Number of Females and Males'

labels = ['Male', 'Female']

colors = ['rgba(89, 72, 247, 1.0)', 'rgba(247, 121, 72, 1.0)']

draw_trend(x_data, y_data, labels, colors , title = 'Trend of Number of Females and Males')


### Observations

There is clearly very** high percentage**( around 80%) of **Male Respondents** and a small amount ( around 20%) of Female respondents.

The trend is as follows.

In year 2019, as compared to previous year 2018, there was **increase** in % of **Male Kagglers** and decrease in % of Female Kagglers.

But, In year 2020, as compared to previous year 2019, there was **increase** in % of **Female Kagglers** and decrease in % of Female Kagglers.

In [None]:
female_2018 = responses_2018_df[responses_2018_df['Q1']=='Female']
male_2018 = responses_2018_df[responses_2018_df['Q1']=='Male']


question_name = 'Q6'#Job Titles

sorted_percentages_female_2018 = count_then_return_percent(female_2018,question_name).iloc[::-1]
sorted_percentages_male_2018 = count_then_return_percent(male_2018,question_name).iloc[::-1]


title_for_chart = 'Most Common Job Titles for Females and Males in 2018'
title_for_y_axis = '% of respondents'
orientation_for_chart = 'h'

draw_2_bar_chart(sorted_percentages_female_2018, sorted_percentages_male_2018,
                 title=title_for_chart,
                 #y_axis_title=title_for_y_axis,
                 orientation=orientation_for_chart,
                 color1 = colors[1],color2=colors[0]) 

draw_side_by_side_bar_chart(sorted_percentages_female_2018, sorted_percentages_male_2018,
                 title=title_for_chart,
                 #y_axis_title=title_for_y_axis,
                 orientation='h',
                 color1 = 'mediumvioletred',color2='slateblue',
                ) 




In [None]:
female_2019 = responses_2019_df[responses_2019_df['Q2']=='Female']
male_2019 = responses_2019_df[responses_2019_df['Q2']=='Male']


question_name = 'Q5'#Job Titles

sorted_percentages_female_2019 = count_then_return_percent(female_2019,question_name).iloc[::-1]
sorted_percentages_male_2019 = count_then_return_percent(male_2019,question_name).iloc[::-1]


title_for_chart = 'Most Common Job Titles for Females and Males in 2019'
title_for_y_axis = '% of respondents'
orientation_for_chart = 'h'

draw_2_bar_chart(sorted_percentages_female_2019, sorted_percentages_male_2019,
                 title=title_for_chart,
                 #y_axis_title=title_for_y_axis,
                 orientation=orientation_for_chart,
                 color1 = colors[1],color2=colors[0]) 

draw_side_by_side_bar_chart(sorted_percentages_female_2019, sorted_percentages_male_2019,
                 title=title_for_chart,
                 #y_axis_title=title_for_y_axis,
                 orientation='h',
                 color1 = 'mediumvioletred',color2='slateblue',
                ) 


In [None]:
# Split datasets genderwise

male_2020 = responses_2020_df[responses_2020_df['Q2']=='Man']
female_2020 = responses_2020_df[responses_2020_df['Q2']=='Woman']

question_name = 'Q5'#Job Titles

sorted_percentages_female_2020 = count_then_return_percent(female_2020,question_name).iloc[::-1]
sorted_percentages_male_2020 = count_then_return_percent(male_2020,question_name).iloc[::-1]


title_for_chart = 'Most Common Job Titles for Females and Males in 2020'
title_for_y_axis = '% of respondents'
orientation_for_chart = 'h'

draw_2_bar_chart(sorted_percentages_female_2020, sorted_percentages_male_2020,
                 title=title_for_chart,
                 #y_axis_title=title_for_y_axis,
                 orientation=orientation_for_chart,
                 color1 = colors[1],color2=colors[0]) 


draw_side_by_side_bar_chart(sorted_percentages_female_2020, sorted_percentages_male_2020,
                 title=title_for_chart,
                 #y_axis_title=title_for_y_axis,
                 orientation='h',
                 color1 = 'mediumvioletred',color2='slateblue',
                ) 


In [None]:
y_data_1 = np.array([
    [sorted_percentages_female_2018['Student'], sorted_percentages_female_2019['Student'], sorted_percentages_female_2020['Student']],
    [sorted_percentages_female_2018['Not employed'], sorted_percentages_female_2019['Not employed'], sorted_percentages_female_2020['Currently not employed']],
    
])

x_data_1 = np.vstack((np.array(['2018', '2019', '2020'], dtype='datetime64'),) * len(y_data_1))


y_data_2 = np.array([
    [sorted_percentages_male_2018['Student'], sorted_percentages_male_2019['Student'], sorted_percentages_male_2020['Student']],
    [sorted_percentages_male_2018['Not employed'], sorted_percentages_male_2019['Not employed'], sorted_percentages_male_2020['Currently not employed']],
    
])

x_data_2 = np.vstack((np.array(['2018', '2019', '2020'], dtype='datetime64'),) * len(y_data_1))

title = 'Trend of Students and Not Employed Respondents'

colors = ['rgba(89, 72, 247, 1.0)', 'rgba(247, 121, 72, 1.0)']

labels = ['Student', 'Not employed']

draw_2_trend_charts(x_data_1, y_data_1, x_data_2, y_data_2, labels, colors, title=title)

In [None]:


y_data_1 = np.array([
    [sorted_percentages_female_2018['Data Scientist'], sorted_percentages_female_2019['Data Scientist'], sorted_percentages_female_2020['Data Scientist']],
    [sorted_percentages_female_2018['Software Engineer'], sorted_percentages_female_2019['Software Engineer'], sorted_percentages_female_2020['Software Engineer']],
    [sorted_percentages_female_2018['Data Analyst'], sorted_percentages_female_2019['Data Analyst'], sorted_percentages_female_2020['Data Analyst']],
    [sorted_percentages_female_2018['Business Analyst'], sorted_percentages_female_2019['Business Analyst'], sorted_percentages_female_2020['Business Analyst']],
    [sorted_percentages_female_2018['Research Scientist'], sorted_percentages_female_2019['Research Scientist'], sorted_percentages_female_2020['Research Scientist']],
    ])


x_data_1 = np.vstack((np.array(['2018', '2019', '2020'], dtype='datetime64'),) * len(y_data_1))

y_data_2 = np.array([
    [sorted_percentages_male_2018['Data Scientist'], sorted_percentages_male_2019['Data Scientist'], sorted_percentages_male_2020['Data Scientist']],
    [sorted_percentages_male_2018['Software Engineer'], sorted_percentages_male_2019['Software Engineer'], sorted_percentages_male_2020['Software Engineer']],
    [sorted_percentages_male_2018['Data Analyst'], sorted_percentages_male_2019['Data Analyst'], sorted_percentages_male_2020['Data Analyst']],
    [sorted_percentages_male_2018['Business Analyst'], sorted_percentages_male_2019['Business Analyst'], sorted_percentages_male_2020['Business Analyst']],
    [sorted_percentages_male_2018['Research Scientist'], sorted_percentages_male_2019['Research Scientist'], sorted_percentages_male_2020['Research Scientist']],
    ])


x_data_2 = np.vstack((np.array(['2018', '2019', '2020'], dtype='datetime64'),) * len(y_data_2))


title = 'Trend of More Common Job Titles'

colors = [ 'green','steelblue','deeppink','mediumvioletred', 'maroon']

labels = ['Data Scientist', 'Software Engineer','Data Analyst','Business Analyst','Research Scientist']

mode_size=[12] * len(y_data_2)

line_size=[3] * len(y_data_2)

draw_2_trend_charts(x_data_1, y_data_1, x_data_2, y_data_2, labels, colors, mode_size, line_size, title=title)

### Observations

#### *More common Job Profiles*

Trend remains same year by year.

###### Females

Order of Job Profiles among Females is 
Data Scientist,*Data Analysts,Software Engineers*,Research Scientist, Business Analyst

###### Males
While Order of Job Profiles among Males is 
Data Scientist,*Software Engineers,Data Analysts*,Research Scientist, Business Analyst



Second line(Pink) and Third line(Blue) of left chart(Female) is interchanged in right chart (Male). 

Among Female Respondents, number of Data Analysts is greater than Software Engineers. 
On contrary, among Male Respondents, number of Software Engineers is greater than number of Data Analysts.

Other than that order of Job Profiles remains same among Females and Males for 2018, 2019 and 2020 years.






In [None]:
y_data_1 = np.array([
    [sorted_percentages_female_2018['Data Engineer'], sorted_percentages_female_2019['Data Engineer'], sorted_percentages_female_2020['Data Engineer']],
    [sorted_percentages_female_2018['DBA/Database Engineer'], sorted_percentages_female_2019['DBA/Database Engineer'], sorted_percentages_female_2020['DBA/Database Engineer']],
    [sorted_percentages_female_2018['Statistician'], sorted_percentages_female_2019['Statistician'], sorted_percentages_female_2020['Statistician']],
    [sorted_percentages_female_2018['Product/Project Manager'], sorted_percentages_female_2019['Product/Project Manager'], sorted_percentages_female_2020['Product/Project Manager']],
])


x_data_1 = np.vstack((np.array(['2018', '2019', '2020'], dtype='datetime64'),) * len(y_data_1))

y_data_2 = np.array([
    [sorted_percentages_male_2018['Data Engineer'], sorted_percentages_male_2019['Data Engineer'], sorted_percentages_male_2020['Data Engineer']],
    [sorted_percentages_male_2018['DBA/Database Engineer'], sorted_percentages_male_2019['DBA/Database Engineer'], sorted_percentages_male_2020['DBA/Database Engineer']],
    [sorted_percentages_male_2018['Statistician'], sorted_percentages_male_2019['Statistician'], sorted_percentages_male_2020['Statistician']],
    [sorted_percentages_male_2018['Product/Project Manager'], sorted_percentages_male_2019['Product/Project Manager'], sorted_percentages_male_2020['Product/Project Manager']],
])


x_data_2 = np.vstack((np.array(['2018', '2019', '2020'], dtype='datetime64'),) * len(y_data_2))


title = 'Trend of Less Common Job Titles'

colors = [ 'mediumblue', 'tomato',  'orange' ,'peru']

labels = ['Data Engineer', 'DBA/Database Engineer','Statistician','Product/Project Manager']

mode_size=[12] * len(y_data_2)

line_size=[3] * len(y_data_2)

draw_2_trend_charts(x_data_1, y_data_1, x_data_2, y_data_2, labels, colors, mode_size, line_size, title=title)

### Observations
#### Less common Job Profiles 
Trend varies year by year.

###### Females

###### For year 2018

Order from Most common to Least common Job Profile is Data Engineer, Statistician, Product/Project Manager, DBA/Database Engineer

###### For year 2019
There are more of Product Manager Respondents and less of Data Engineer respondents.

Brown Line(Product Manager) went up crossing Blue line(Data Engineer) and Yellow line (Statistician)  as seen in left (Female) chart.

So, Order from Most common to Least common Job Profile is Product/Project Manager,Statistician, Data Engineer, DBA/Database Engineer

###### For year 2020

Blue line(Data Engineer) came even more down ,
So, Order from Most common to Least common Job Profile is Product/Project Manager, Data Engineer, Statistician,  DBA/Database Engineer


###### Males 

###### For year 2018

Order from Most common to Least common Job Profile is Data Engineer, Product/Project Manager, Statistician, DBA/Database Engineer

###### For years 2019 and 2020, 
There are more of Product Manager Respondents and less of Data Engineer respondents.

Brown Line went up and Blue line came down as seen in right (Male) chart.

So, Order from Most common to Least common Job Profile is  Product/Project Manager, Data Engineer, Statistician, DBA/Database Engineer








# Conclusion

##### More Common Job Profiles

1)For both Female and Male respondents,
    **Data Scientist** is at FIRST place
    
    **Data Analysts and Software Engineers** at SECOND and THIRD place (interchanged for Females and Males),
    
    **Research Scientist and Business Analyst** is at FOURTH and FIFTH place for both Females and Males.

2) Among **Female Respondents**, number of **Data Analysts** is greater than Software Engineers. 

3) On contrary, among **Male Respondents**, number of **Software Engineers** is greater than number of Data Analysts.


##### Less Common Job Profiles

1)For both Female and Male respondents,

    **Product/Project Manager** is at the TOP ;

    **DBA/Database Engineer** is at BOTTOM. 
    
    **Statisticians and Data Engineers** is IN BETWEEN the above two.
    

2)
Among **Female Respondents**, there is **Higher percentage of Statisticians** than Data Engineers.

3)
While, Among **Male Respondents**, there is **Higher percentage of Data Engineers** than Statisticians.

Thanks for Reading.

If you liked it, kindly upvote.