![How-to-Learn-Computer-Programming-Languages-1160x591.png](attachment:bf5b5655-b291-433b-9b80-fb10f606e14b.png)

Table of Content:
1. [**Introduction**](#section-zero)
2. [**Popularity**](#section-one)
3. [**Annual Salaries**](#section-two)
4. [**Country/Location**](#section-three)
5. [**Conclusion**](#section-four)
6. [**Sources**](#section-five)

<a id="section-zero"></a>
<div style="color:white;
           display:fill;
           border-radius:30px;
           border-style: solid;
           border-color: #46ABFF;
           background-color:#0065B9;
           font-size:20px;
           font-family:Verdana;
           padding:20px">
    <h1 style="text-align:center; color:white;"><b>Introduction</b></h1>
    <p style="line-height:120%; color:white;">&nbsp; &nbsp; &nbsp;In this article we will analyze Kaggle's surveys over the years 2019, 2020 and 2021 to find out the most appropriate languages for your field or the area you intend to join from many of the most widely used (popular) aspects among data scientists, analysts, etc., usually beginners wonder about which programming language they want to learn to start a successful journey in DS. Even most of the people tell beginners to start with Python there also a few people tell them to start with Python because of its power in Statistics or in Data Visualization and some of them (R-users) tell beginners to start with it to have a high salary like them as Python doesn't give such salaries, is that true or biased for R ? R users also say that R is popular (popular in jobs), but Python also is popular and Pythonista (Python users) says that Python is the most popular compared to R or any other programming language.</p>
    <p style="line-height:120%; color:white;">&nbsp; &nbsp; &nbsp;We will use bar graphs for our analysis as it will help to explain the idea clearly and no one doesn't understand the bar graph (it's simple), it will also be an important factor to measure the progress of the programming languages over the years. All of our plots (or maybe the major because we might update the article) will be plotted using <a href="https://plotly.com/python-api-reference/index.html" style="color:white;"><u>Plotly</u></a> Library to give you the ability to zoom in and/or out and to download the graph also. we will present the aspects that we will address hopefully:</p>
    <ul>
        <li style="font-size:15px">Popularity</li>
        <li style="font-size:15px">Annual salaries</li>
        <li style="font-size:15px">Country</li>
        <li style="font-size:15px">Job title</li>
    </ul>
    
</div>

In [None]:
# Data Analysis
import numpy as np
import pandas as pd
import random

# Data Visualization
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go

# Remove warnings
import warnings
warnings.filterwarnings('ignore')


# Data
survey19 = pd.read_csv('../input/surveys-2020-2019/kaggle-survey-2019/multiple_choice_responses.csv')
survey20 = pd.read_csv('../input/surveys-2020-2019/kaggle-survey-2020/kaggle_survey_2020_responses.csv')
survey21 = pd.read_csv('../input/kaggle-survey-2021/kaggle_survey_2021_responses.csv')

# Data
survey21.Q3.replace({'United Kingdom of Great Britain and Northern Ireland':'UK',
                           'Iran, Islamic Republic of...':'Iran',
                           'United Arab Emirates':'UAE',
                           'United States of America':'USA',
                           'Viet Nam':'Vietnam'}, inplace=True)

<div style="color:white;
           display:fill;
           border-radius:30px;
           border-style: solid;
           border-color: #A52A2A;
           background-color: #CC3300;
           font-size:20px;
           font-family:Verdana;
           padding:20px">
    <p style="line-height:110%; color:white;"><b>Warning</b>: we will exclude the very beginners DS (don't use programming language)</p>
</div>

In [None]:
# To skip the question cell
survey19 = survey19[1:]
survey20 = survey20[1:]
survey21 = survey21[1:]

# To specify the languages
prog_list = ['Python', 'R', 'SQL', 'Java', 'MATLAB', 'C', 'Bash', 'Javascript', 'C++']
prog_langs = pd.Series(prog_list)

# Excluding None and null in programming languages
survey19 = survey19[survey19['Q19'].isin(prog_langs)]
survey20 = survey20[survey20['Q8'].isin(prog_langs)]
survey21 = survey21[survey21['Q8'].isin(prog_langs)]

<a id='section-one'></a>
<div style="color:white;
           display:fill;
           border-radius:30px;
           border-style: solid;
           border-color: #46ABFF;
           background-color:#0065B9;
           font-size:20px;
           font-family:Verdana;
           padding:20px">
    <h1 style="text-align:center; color:white;"><b>Popularity</b></h1>
</div>

&nbsp; &nbsp; &nbsp; We will speak now in general and then speak in specific fields like Data Analysis, Data Science and ML because R users are more likely to use R in Data analysis or Statistics only so it may be like the last years in Data in general nothing compared to Python so be careful and don't let the graph fool you.

In [None]:
def plot_in_field(df19, df20, df21, langs=['Python', 'R', 'SQL'], field_name='in DS in General'):
    
    # 2019
    lang_counts19 = df19.Q19.value_counts()
    lang_counts19 = lang_counts19*100 / lang_counts19.sum()
    d19 = {idx:val for idx, val in zip(lang_counts19.index, lang_counts19.values)}

    # 2020
    lang_counts20 = df20.Q8.value_counts()
    lang_counts20 = lang_counts20*100 / lang_counts20.sum()
    d20 = {idx:val for idx, val in zip(lang_counts20.index, lang_counts20.values)}

    # 2021
    lang_counts21 = df21.Q8.value_counts()
    lang_counts21 = lang_counts21*100 / lang_counts21.sum()
    d21 = {idx:val for idx, val in zip(lang_counts21.index, lang_counts21.values)}

    fig = go.Figure(data=[
        go.Bar(name='2019', x=langs, y=[d19[lang] for lang in langs]),
        go.Bar(name='2020', x=langs, y=[d20[lang] for lang in langs]),
        go.Bar(name='2021', x=langs, y=[d21[lang] for lang in langs])
    ])
    # Change the bar mode
    fig.update_layout(barmode='group',
                      xaxis={'title':{'text':'Language'}},
                      yaxis={'title':{'text':'Percentage'}},
                      title={'text':f'The state of programming languages over the years in {field_name}'})
    fig.show()

plot_in_field(survey19, survey20, survey21, langs=prog_list)

So in general:

- Python beats all Languages without any competitor and increase in general.
- R usage decreases over the years.
- SQL is somewhat stable over the years.
- Any language other than Python, R and SQL aren't considered in DS seriously

<div style="color:white;
           display:fill;
           border-radius:30px;
           border-style: solid;
           border-color: #146575;
           background-color: #3CB0B0;
           font-size:20px;
           font-family:Verdana;
           padding:20px">
    <p style="line-height:120%; color:white;"><b>Expectation</b>: R will be useless in 2024, so learn Python if you didn't !</p>
</div>

Now let's focus on specific fields, but with those languages only: Python, R and SQL

In [None]:
survey19_field = survey19[survey19['Q5'] == 'Data Analyst']
survey20_field = survey20[survey20['Q5'] == 'Data Analyst']
survey21_field = survey21[survey21['Q5'] == 'Data Analyst']

plot_in_field(survey19_field, survey20_field, survey21_field, field_name='Data Analysis')

It's the same result as before except R and SQL have a bigger ratio that in general and SQL also is used more than R.

In [None]:
survey19_field = survey19[survey19['Q5'] == 'Data Scientist']
survey20_field = survey20[survey20['Q5'] == 'Data Scientist']
survey21_field = survey21[survey21['Q5'] == 'Data Scientist']

plot_in_field(survey19_field, survey20_field, survey21_field, field_name='Data Science')

So it seems the same as before except SQL has the least ratio

In [None]:
survey19_field = survey19[survey19['Q5'] == 'Data Engineer']
survey20_field = survey20[survey20['Q5'] == 'Data Engineer']
survey21_field = survey21[survey21['Q5'] == 'Data Engineer']

plot_in_field(survey19_field, survey20_field, survey21_field, field_name='Data Engineering')

SQL is used most compared to R in Data Engineering

In [None]:
survey19_field = survey19[survey19['Q5'] == 'DBA/Database Engineer']
survey20_field = survey20[survey20['Q5'] == 'DBA/Database Engineer']
survey21_field = survey21[survey21['Q5'] == 'DBA/Database Engineer']

plot_in_field(survey19_field, survey20_field, survey21_field, field_name='DBA/Database Engineering')

Also like last graphs except for SQL it has a huge transform from 20% to 10% and note at this range of years Python has a huge jump from 60% to 79%.

<div style="color:white;
           display:fill;
           border-radius:30px;
           border-style: solid;
           border-color: #46ABFF;
           background-color:#0065B9;
           font-size:20px;
           font-family:Verdana;
           padding:20px">
    <h1 style="text-align:center; color:white;"><b>Popularity Summary</b></h1>
    <ul>
        <li style="font-size:15px">Python is most used compared to any programming language and its percentage increase over the years.</li>
        <li style="font-size:15px">R usage decrease over the years in all the fields and with that rate of decrease it will be disappear.</li>
        <li style="font-size:15px">SQL is stable over the years and most used compared to R in all fields except for Data Scientist job title, but also nothing comared to Python and Python also has beated it and taked some of its ratio in DBA/Database Engineering in 2020</li>
        <li style="font-size:15px">Any Language except Python, R and SQL aren't used seriously in DS.</li>
    </ul>
</div>

<div style="color:white;
           display:fill;
           border-radius:30px;
           border-style: solid;
           border-color: #006000;
           background-color: #3CB03C;
           font-size:20px;
           font-family:Verdana;
           padding:20px">
    <p style="line-height:120%; color:white;"><b>Suggestion</b>: Kaggle Team, add another question for those who choose other than Python, R or SQL why did they use it ?!</p>
</div>

&nbsp; &nbsp; &nbsp; when we fixed our three languages and change the specialiazations we found out this result what about fixing the specialization (to Data in general) or even see its result with two or three specializations, but change the salaries because maybe R users use it because it gives them the most money compared to Python or SQL because their percentage isn't silly it's considerable not like other programming languages.

<a id='section-two'></a>
<div style="color:white;
           display:fill;
           border-radius:30px;
           border-style: solid;
           border-color: #46ABFF;
           background-color:#0065B9;
           font-size:20px;
           font-family:Verdana;
           padding:20px">
    <h1 style="text-align:center; color:white;"><b>Annual Salary</b></h1>
</div>

In [None]:
salaries = ['$0-999', '1,000-1,999', '2,000-2,999', '3,000-3,999', '4,000-4,999', '5,000-7,499', '7,500-9,999', '10,000-14,999',
         '15,000-19,999', '20,000-24,999', '25,000-29,999', '30,000-39,999', '40,000-49,999', '50,000-59,999', '60,000-69,999',
         '70,000-79,999', '80,000-89,999', '90,000-99,999', '100,000-124,999', '125,000-149,999', '150,000-199,999', '200,000-249,999',
         '250,000-299,999', '300,000-499,999', '$500,000-999,999', '>$1,000,000']

R_users = survey21[survey21['Q8'] == 'R']
Python_users = survey21[survey21['Q8'] == 'Python']

# R
R_count = R_users['Q25'].value_counts()
R_count = R_count*100 / R_count.sum()
dR = {salary:count for salary, count in zip(R_count.index, R_count)}

# Python
Python_count = Python_users['Q25'].value_counts()
Python_count = Python_count*100 / Python_count.sum()
dPython = {salary:count for salary, count in zip(Python_count.index, Python_count)}

# Layout
layout = go.Layout(
    autosize=False,
    width=1000,
    height=1000)

# Figure setting
fig = go.Figure(data=[
        go.Bar(name='R_users', y=salaries, x=[dR[salary] for salary in salaries], orientation='h'),
        go.Bar(name='Python_users', y=salaries, x=[dPython[salary] for salary in salaries], orientation='h')
    ], layout=layout)
    
    
# Divide ranges
fig.add_hline(y=8.5, line_dash='dash', annotation_text="First Range", annotation_position="bottom right",
              annotation=dict(font_size=20, font_family="Times New Roman"),
              fillcolor="green")

fig.add_hline(y=22.5, line_dash='dash', annotation_text="Second Range", annotation_position="bottom right",
              annotation=dict(font_size=20, font_family="Times New Roman"),
              fillcolor="green")

fig.add_hline(y=26.5, line_dash='dash', annotation_text="Third Range", annotation_position="bottom right",
              annotation=dict(font_size=20, font_family="Times New Roman"),
              fillcolor="green", line_width=0)

# Change the bar mode
fig.update_layout(barmode='group',
                  xaxis={'title':{'text':'Percentage'}},
                  yaxis={'title':{'text':'Salaries'}},
                  title={'text':f'Python Vs R Annual Salaries in 2021 in general'})
fig.show()

we will separate the ranges into three ranges -> [0 19,999], [20,000 299,999] and [300,000 inf]:

- In the first range in general, Python beats R.
- In the second range in general, R beats Python.
- In the last range they appear to be equal.

So R users in general have a bigger opportunity to get high salaries

Note: that doesn't necessary to be true since Python users are very large number compared to R, they have a lot of beginners so that maybe the reason to make R beats Python in large salaries.

In [None]:
def comparison_field(R_users, Python_users, job_title, field_name, salaries, divide=False):

    # Field
    R_users_field = R_users[R_users['Q5'] == job_title]
    Python_users_field = Python_users[Python_users['Q5'] == job_title]

    # R
    R_count = R_users_field['Q25'].value_counts()
    R_count = R_count*100 / R_count.sum()
    dR = {salary:count for salary, count in zip(R_count.index, R_count)}

    # Python
    Python_count = Python_users_field['Q25'].value_counts()
    Python_count = Python_count*100 / Python_count.sum()
    dPython = {salary:count for salary, count in zip(Python_count.index, Python_count)}

    # Layout
    layout = go.Layout(
        autosize=False,
        width=1000,
        height=1000)
    

    # Figure setting
    fig = go.Figure(data=[
            go.Bar(name='R_users', y=salaries, x=[dR[salary] if salary in dR else 0 for salary in salaries], orientation='h'),
            go.Bar(name='Python_users', y=salaries, x=[dPython[salary] if salary in dPython else 0 for salary in salaries], orientation='h')
        ], layout=layout)

    if divide:
        # Divide ranges
        fig.add_hline(y=8.5, line_dash='dash', annotation_text="First Range", annotation_position="bottom right",
                      annotation=dict(font_size=20, font_family="Times New Roman"),
                      fillcolor="green")

        fig.add_hline(y=22.5, line_dash='dash', annotation_text="Second Range", annotation_position="bottom right",
                      annotation=dict(font_size=20, font_family="Times New Roman"),
                      fillcolor="green")

        fig.add_hline(y=26.5, line_dash='dash', annotation_text="Third Range", annotation_position="bottom right",
                      annotation=dict(font_size=20, font_family="Times New Roman"),
                      fillcolor="green", line_width=0)
    
    
    # Change the bar mode
    fig.update_layout(barmode='group',
                      xaxis={'title':{'text':'Percentage'}},
                      yaxis={'title':{'text':'Salaries'}},
                      title={'text':f'Python Vs R Annual Salaries in 2021 in {field_name}'})
    fig.show()
    
comparison_field(R_users, Python_users, 'Data Analyst', 'Data Analysis', salaries, divide=True)

Here, we have got somewhat the same information that we got from (Data in general) except for the third range <mark>in Data Analysis the ratio of DS using Python or R who are in the third range (300,000$ +) is very tiny.</mark>

In [None]:
comparison_field(R_users, Python_users, 'Data Scientist', 'Data Science', salaries, divide=True)

<div style="color:white;
           display:fill;
           border-radius:30px;
           border-style: solid;
           border-color: #46ABFF;
           background-color:#0065B9;
           font-size:20px;
           font-family:Verdana;
           padding:20px">
    <h1 style="text-align:center; color:white;"><b>Annual Salary Summary</b></h1>
    <ul>
        <li style="font-size:15px">R beats Python in general and that scenario isn't exclusive for general DS only, it happened in Data Analysis and Data Science.</li>
        <li style="font-size:15px">Annual Salaries larger than 300,000$ Its presence is much lower than any salary range and in Data Analysis in particular, The ratio is almost non-existent.</li>
    </ul>
</div>

<div style="color:white;
           display:fill;
           border-radius:30px;
           border-style: solid;
           border-color: #006000;
           background-color: #3CB03C;
           font-size:20px;
           font-family:Verdana;
           padding:20px">
    <h2 style="line-height:120%; color:white;"><b>Suggestions</b>:</h2>
    <ul>
        <li style="font-size:15px">Pythonista (Python users), if you wish to get salary higher than 20,000 dollars, it's better to learn R (not suggested to you who wish to have +300,000 dollars).</li>
        <li style="font-size:15px">Data Analysts, if you wish to get annual salaries higher than 300,000 dollars it's better to learn some Data Science skills to work as Data Scientists because the ratio of having that salary with Data Analyst job title is much lower than it in Data Scientist job title</li>
    </ul>
</div>

&nbsp; &nbsp; &nbsp; Now, we want to know the last aspect and it's so important because if you are comfortable with the last three aspects (salary, popularity and specialization) and that aspect you're not comfortable with you will be in a trouble this aspect is **Lacation** you can be satisfied with dream salary and field, but when you've finished your learning journey and try to find a job in your country you will not find because they use another language (or in the worse case you will learn their language to work with them and consume additional two or three months) so please focus on this aspect seriously.

<a id="section-three"></a>
<div style="color:white;
           display:fill;
           border-radius:30px;
           border-style: solid;
           border-color: #46ABFF;
           background-color:#0065B9;
           font-size:20px;
           font-family:Verdana;
           padding:20px">
    <h1 style="text-align:center; color:white;"><b>Location</b></h1>
</div>

In [None]:
# R
R_count = R_users['Q3'].value_counts()[:60].sort_values(ascending=True)
R_count = R_count*100 / R_count.sum()
dR = {country:count for country, count in zip(R_count.index, R_count)}

# Python
Python_count = Python_users['Q3'].value_counts().sort_values(ascending=True)
Python_count = Python_count*100 / Python_count.sum()
dPython = {country:count for country, count in zip(Python_count.index, Python_count)}

# Layout
layout = go.Layout(
    autosize=False,
    width=1000,
    height=2000)

countries = R_count.index

# Figure setting
fig = go.Figure(data=[
        go.Bar(name='R_users', y=countries, x=[dR[country] for country in countries], orientation='h'),
        go.Bar(name='Python_users', y=countries, x=[dPython[country] for country in countries], orientation='h')
    ], layout=layout)
    

# Change the bar mode
fig.update_layout(barmode='group',
                  xaxis={'title':{'text':'Percentage'}},
                  yaxis={'title':{'text':'countries'}},
                  title={'text':f'Python Vs R Where community exists in 2021'})
fig.show()

That graph tells us where the community of each language lay on, but doesn't tell us which language is the most used in each country.
to see that we will draw the same graph, but using the number of users not the percentage.

In [None]:
# R
R_count = R_users['Q3'].value_counts()[:60].sort_values(ascending=True)
dR = {country:count for country, count in zip(R_count.index, R_count)}

# Python
Python_count = Python_users['Q3'].value_counts().sort_values(ascending=True)
dPython = {country:count for country, count in zip(Python_count.index, Python_count)}

# Layout
layout = go.Layout(
    autosize=False,
    width=1000,
    height=2000)

countries = R_count.index

# Figure setting
fig = go.Figure(data=[
        go.Bar(name='R_users', y=countries, x=[dR[country] for country in countries], orientation='h'),
        go.Bar(name='Python_users', y=countries, x=[dPython[country] for country in countries], orientation='h')
    ], layout=layout)
    

# Change the bar mode
fig.update_layout(barmode='group',
                  xaxis={'title':{'text':'Number of users'}},
                  yaxis={'title':{'text':'countries'}},
                  title={'text':f'Python Vs R countries in 2021 in general'})
fig.show()

So there is no comparison learn Python first then see if you need to learn R.

<div style="color:white;
           display:fill;
           border-radius:30px;
           border-style: solid;
           border-color: #46ABFF;
           background-color:#0065B9;
           font-size:20px;
           font-family:Verdana;
           padding:20px">
    <h1 style="text-align:center; color:white;"><b>Location Summary</b></h1>
    <ul>
        <li style="font-size:15px">Python and R communities exist in many countries specially in USA and India (the biggest ratio).</li>
        <li style="font-size:15px">Python is the most used in all countries.</li>
    </ul>
</div>

<a id="section-four"></a>
<div style="color:white;
           display:fill;
           border-radius:30px;
           border-style: solid;
           border-color: #2D2D2D;
           background-color:#696969;
           font-size:20px;
           font-family:Verdana;
           padding:20px">
    <h1 style="text-align:left; color:white;"><b>Conclusion</b></h1>
    <ul>
        <li style="font-size:15px">Put your location on the first priority when you choose your programming language if you don't think of working out of it.</li>
        <li style="font-size:15px">Keep in mind that Python has the biggest popularity so most of DS jobs will require it first in any country and also the community of Python is very big so you can get the answers of your questions easily.</li>
        <li style="font-size:15px">R will be the best choice if you want to have salary above 20,000 dollars.</li>
        <li style="font-size:15px"><b>The best thing I recommend is to learn Python first because the community will help you as a beginner easily and you can do all things you want to do with R with Python and when you became very well with it you can learn R as a big addition or for those jobs required both of them.</b></li>
    </ul>
</div>

<a id="section-five"></a>
<div style="color:white;
           display:fill;
           border-radius:30px;
           border-style: solid;
           border-color: #2D2D2D;
           background-color:#696969;
           font-size:20px;
           font-family:Verdana;
           padding:20px">
    <h1 style="text-align:left; color:white;"><b>Sources</b></h1>
    <ul>
        <li style="font-size:15px">color codes: <a href="https://www.computerhope.com/htmcolor.htm" style="color:white"><u>Link</u></a> </li>
        <li style="font-size:15px">Plotly Documentation: <a href="https://plotly.com/python-api-reference/index.html" style="color:white"><u>Link</u></a> </li>
        <li style="font-size:15px">Organizing the article: <a href="https://www.kaggle.com/mostafaalaa123/education-level-affects-data-analysis/edit" style="color:white"><u>Link</u></a> </li>
        <li style="font-size:15px">Additional Comparison: <a href="https://www.coursera.org/articles/python-or-r-for-data-analysis" style="color:white"><u>Link</u></a> </li>
    </ul>
</div>