# How to Become a Rich Developer

This is a kernel to analyze just what kinds of developer earn big money.

Created by Adhika Setya Pramudita (14/365240/TK/42058)  
For final assignments of Big Data and Analytics (TIF522)

This report created with Kaggle and could be accessed in https://www.kaggle.com/adhikasp/how-to-become-a-rich-developer







In [1]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

data = pd.read_csv('../input/survey_results_public.csv')

# 1. About StackOverflow Developer Survey

Quoted from Stack Overflow Developer Survey page (https://insights.stackoverflow.com/survey/2017)

> Each year since 2011, Stack Overflow has asked developers about their favorite technologies, coding habits, and work preferences, as well as how they learn, share, and level up. This year represents the largest group of respondents in our history: 64,000 developers took our annual survey in January.

> As the world’s largest and most trusted community of software developers, we run this survey and share these results to improve developers’ lives: We want to empower developers by providing them with rich information about themselves, their industry, and their peers. And we want to use this information to educate employers about who developers are and what they need.

> We learn something new every time we run our survey. This year is no exception:

> - A common misconception about developers is that they've all been programming since childhood. In fact, we see a wide range of experience levels. Among professional developers, 11.3% got their first coding jobs within a year of first learning how to program. A further 36.9% learned to program between one and four years before beginning their careers as developers.

> - Only 13.1% of developers are actively looking for a job. But 75.2% of developers are interested in hearing about new job opportunities.

> - When we asked respondents what they valued most when considering a new job, 53.3% said remote options were a top priority. A majority of developers, 63.9%, reported working remotely at least one day a month, and 11.1% say they’re full-time remote or almost all the time.

> - A majority of developers said they were underpaid. Developers who work in government and non-profits feel the most underpaid, while those who work in finance feel the most overpaid.


## 2. Cleaning the Data

### 2.1 Removing Invalid Data

We should make sure no `NaN` is found in data.

In [2]:
data['Salary'].unique()

In [3]:
data = data[data['Salary'].notnull()]
np.sort(data['Salary'].unique())

### 2.2. Defining Salary Threshold

Some of the salary included in data is suspiciously low (like $0, $0.015) so to normalize data we will filter out those value. To determine the salary threshold, we will use the lowest minimum wage of countries that participate in this survey.

In [4]:
data['Country'].unique()

Crosschecking from [List of minimum wages by country in Wikipedia](https://en.wikipedia.org/wiki/List_of_minimum_wages_by_country), Uganda have the lowest minimum wage of $21 USD. So we will use that to filter our data.

In [5]:
data = data.query('Salary>21')
salary = data['Salary'].unique()
print('Salary range (USD): ' + str(min(salary)) + ' - ' + str(max(salary)))

## 3. Exploring the Data

### 3.1. Descriptive Analysis

Now we can start to peek inside our data. Let's see the basic character of salary data.

In [6]:
data['Salary'].describe()

In [12]:
data.Salary.plot.hist(figsize=(15,10))

We can see that the worldwide average salary for developer is \$56,456, which is almost double of [average income of USA citizen](https://en.wikipedia.org/wiki/Personal_income_in_the_United_States) (\$31,099). Which mean becoming a developer potentially could make you better off than the rest of people.



### 3.2. Predictive Analysis

#### 3.3.1. Highest payed job type

Next let's break down what exactly kinds of jobs that pay the most based on their average income. Thus 

In [8]:
# This code is taken and modified from 
# https://www.kaggle.com/m2skills/simple-exploratory-analysis-visualizations-more

# Get list of unique developer types
developerType = set()
salary_avg_jobs = data['DeveloperType'].drop(data.loc[data['DeveloperType'].isnull()].index)
for datum in salary_avg_jobs:
    for types in datum.split(';'):
        developerType.add(types.strip())
developerType = sorted(list(developerType))

# Clean data, make sure no DeveloperType is null
salary_avg_jobs = data[data['DeveloperType'].notnull()]

# Prepare developer type index to orgainze the salary
devDict = {}
for index, dev in enumerate(developerType):
    devDict[dev] = index

# Organize the salary based on its job type
devSalaries = [[] for i in range(len(developerType))]
for index, datum in salary_avg_jobs.iterrows():
    devlist = datum['DeveloperType'].split(";")
    for d in devlist:
        devSalaries[devDict[d.strip()]].append(datum['Salary'])

# Calculate the average salary for each job type
Salaries = []
for sal in devSalaries:
    Salaries.append(np.mean(sal))

# Construct the data frame
devSalaries = pd.DataFrame()
devSalaries["DeveloperType"] = developerType
devSalaries["AverageSalary"] = Salaries

# Plot
plt.subplots(figsize=(15,7))
sns.set_style("whitegrid")
sal = sns.barplot(x=devSalaries.DeveloperType, y=devSalaries.AverageSalary, orient = 1)
sal.set_xticklabels(devSalaries.DeveloperType, rotation=90)

From the graph, we can see that "Others" type of jobs provide the biggest average income, followed by "Machine learning specialist" and "DevOps Specialist". My prediction here is that "Other" kinds of jobs could mean an outlier or a field that require special expertise that raise the job compensation. Other notable things is the relalatively new and hot job field like machine learning, devops, and data scientist also have higher payment than the rest.

#### 3.3.2. Highest payed language

In [13]:
from functools import reduce

# Compile list of language found in survey
language = map((lambda x: str(x).split('; ')), 
               data['HaveWorkedLanguage'])
# Flatten the list
language = reduce((lambda x, y: x + y), language)
# Remove duplication
language = list(set(language))

# Count the language users
languageUser={}
for i in language:
    languageUser[i] = data['HaveWorkedLanguage'].apply(
        lambda x: i in str(x).split('; ')).value_counts()[1]

# Start plotting
lang = pd.DataFrame(list(languageUser.items()))
lang.columns = [['Language', 'Count']]
lang.set_index('Language', inplace=True)
lang.sort_values('Count', inplace=True)
lang.plot.barh(width=0.8, color='#005544', figsize=(15,25))
plt.show()

In [14]:
# Clean data, make sure no HaveWorkedLanguage is null
salary_avg_lang = data[data['HaveWorkedLanguage'].notnull()]

# Prepare developer type index to orgainze the salary
devDict = {}
for index, dev in enumerate(language):
    devDict[dev] = index

# Organize the salary based on its job type
devSalaries = [[] for i in range(len(language))]
for index, datum in salary_avg_lang.iterrows():
    devlist = datum['HaveWorkedLanguage'].split("; ")
    if not devlist:
        continue
    for d in devlist:
        devSalaries[devDict[d.strip()]].append(datum['Salary'])
        
# Calculate the average salary for each job type
Salaries = list(map(lambda x: np.mean(x), devSalaries))

# Construct the data frame
lang = pd.DataFrame()
lang["Language"] = language
lang["AverageSalary"] = Salaries
lang.columns = [['Language', 'Salary']]
lang.set_index('Language', inplace=True)
lang.sort_values('Salary', inplace=True)
lang.plot.barh(width=0.8, color='#005544', figsize=(15,25))
plt.show()

With a quick look we can see that usually the more developer could program in a language, the average salary of that language will fall. This could be credited to supply and demand job market.

### 3.3. Prescriptive Analysis

By average, developer is one of the job field that provide higher salaries than others. To increase your salary expetancy, you could learn high paying programming language like Clojur, SmallTalk, or Rust. Or you can try to land a job in trending field like machine learning, devops, or data scientist.