# RobbieD2R2's Analysis of In-Demand Tech Skills (based on the Stackoverflow Developer 2020 Survey)


### Thanks to Stackoverflow's annual survey of professional developers, here is my curated slice of the data subset that is focused on providing insights into what skills are currently most in-demand with the hope of better understanding how to align tech education with labor market trends.  Kudos to user @razamh from whom I initially forked.

This analysis is based on over 12,000 survey responses from developers working in the United States.

In [None]:
# Let's get rolling!

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
survey_raw_df = pd.read_csv('/kaggle/input/stack-overflow-developer-survey-2020/developer_survey_2020/survey_results_public.csv')
schema_fname =  '/kaggle/input/stack-overflow-developer-survey-2020/developer_survey_2020/survey_results_schema.csv'
schema_df = pd.read_csv(schema_fname, index_col='Column')
schema_raw = schema_df.QuestionText

# Narrow the dataset to only certain columns for this analysis

selected_columns = [
    # Demographics
    'Country',
    'EdLevel',
    'UndergradMajor',
    # Programming experience
    'LanguageWorkedWith',
    'LanguageDesireNextYear',
    'NEWLearn',
    'NEWStuck',
    # Employment
    'Employment',
    'DevType',
    'JobSat',
    'JobFactors',
    'NEWOvertime',
    'NEWEdImpt',
    'ConvertedComp'
]

survey_df = survey_raw_df[selected_columns].copy()
schema = schema_raw[selected_columns]

# Basic data cleanse; Only analyze responses from within the United States

survey_df.where(~(survey_df.Country.str.contains(';', na=False)), np.nan, inplace=True)
survey = survey_df[survey_df['Country'] == 'United States']

# <b>Education Level</b>

## The overwhelming majority of American professional developers have completed some level of college/university study.

In [None]:
import matplotlib
sns.countplot(y=survey.EdLevel)
plt.xticks(rotation=75);
plt.title(schema['EdLevel'])
plt.ylabel(None);

## Very few professional developers majored in something other than Computer Science or Engineering

In [None]:
undergrad_pct = survey.UndergradMajor.value_counts() * 100 / survey.UndergradMajor.count()

sns.barplot(x=undergrad_pct, y=undergrad_pct.index)

plt.title(schema.UndergradMajor)
plt.ylabel(None);
plt.xlabel('Percentage');

# <b>Employment</b>

## Only 3% of professional developers were unemployed and looking for work.

In [None]:
(survey.Employment.value_counts(normalize=True, ascending=True)*100).plot(kind='barh', color='g')
plt.title(schema.Employment)
plt.xlabel('Percentage');

# <b>Job Roles</b>

In [None]:
def split_multicolumn(col_series):
    result_df = col_series.to_frame()
    options = []
    # Iterate over the column
    for idx, value  in col_series[col_series.notnull()].iteritems():
        # Break each value into list of options
        for option in value.split(';'):
            # Add the option as a column to result
            if not option in result_df.columns:
                options.append(option)
                result_df[option] = False
            # Mark the value in the option column as True
            result_df.at[idx, option] = True
    return result_df[options]

dev_type_df = split_multicolumn(survey.DevType)

dev_type_totals = dev_type_df.sum().sort_values(ascending=False)
dev_type_totals

### We can also visualize the different jobs roles using a word cloud  :-)

In [None]:
from wordcloud import WordCloud

words = ' '.join((job for job in survey.DevType.dropna().str.replace(';', ' ').str.replace(',', ' ')))

wc = WordCloud(collocation_threshold=int(1e6), width=800, height=400, background_color='white').generate(words)

plt.figure(figsize=(16,8))
plt.axis("off")
plt.grid(False)
plt.imshow(wc);



# <b>In-demand skills</b>

### <b>Preferred Programming Languages</b> - Languages relating to Web Dev top the list, followed by SQL (for databases), and then "classic" programming languages.  Perhaps noteworthy... TypeScript has been moving up the list quickly over the past few years.

Note: Survey respondents could select multiple languages

In [None]:
languages_worked_df = split_multicolumn(survey.LanguageWorkedWith)
languages_worked_percentages = languages_worked_df.mean().sort_values(ascending=False) * 100
languages_worked_percentages

In [None]:
plt.figure(figsize=(12, 12))
sns.barplot(x=languages_worked_percentages, y=languages_worked_percentages.index)
plt.title("Languages used in the past year (as percentage)");
plt.xlabel('count');

### <b>Languages Developers are Interested in Going Forward</b> - A possible indicator of what will be in-demand over the next few years. In this list, there is a bit of a dropoff in Web Dev languages relative to "classic" programming and databases.

In [None]:
languages_interested_df = split_multicolumn(survey.LanguageDesireNextYear)
languages_interested_percentages = languages_interested_df.mean().sort_values(ascending=False) * 100
languages_interested_percentages

In [None]:
plt.figure(figsize=(12, 12))
sns.barplot(x=languages_interested_percentages, y=languages_interested_percentages.index)
plt.title("Languages people are intersted in learning over the next year");
plt.xlabel('count');

### It's not surprising that Python is the language most in-demand.  It's versatility enables it to be used in a variety of domains: application development, numerical computing, data analysis, machine learning, big data, cloud automation, web scraping, scripting, etc. Case in point, Python is being used in this very analysis  :-)

