# 1. Business Understanding
2020 Annual Developer Survey examines all aspects of the developer experience from career satisfaction and job search to education and opinions on open source software.

- *survey_results_public.csv* ：CSV file with main survey results, one respondent per row and one column per answer
- *survey_results_schema.csv* ：CSV file with survey schema, i.e., the questions that correspond to each column name

data source: https://insights.stackoverflow.com/survey

Through this data, we can find answers to our wondering about the job of a developer, so that we can better understand the industry and plan our career.

Based on the data, we can ask the following questions:

#### Question 1 : Generally speaking, what factors do people pay more attention to when choosing jobs with the same compensation,benefits, and location?
#### Question 2 : What are the most common programming languages used by data scientists?
#### Question 3 : Which occupations work most heavily overtime?

In the following steps, we will start from processing the data and use descriptive or inferential statistics to find the answers to these questions.

# 2. Data Understanding

### Gather Data

In [1]:
import pandas as pd
import numpy as np 
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

df_public = pd.read_csv("survey_results_public.csv")
df_scheme = pd.read_csv("survey_results_schema.csv")

In [2]:
df_public.head()

Unnamed: 0,Respondent,MainBranch,Hobbyist,Age,Age1stCode,CompFreq,CompTotal,ConvertedComp,Country,CurrencyDesc,...,SurveyEase,SurveyLength,Trans,UndergradMajor,WebframeDesireNextYear,WebframeWorkedWith,WelcomeChange,WorkWeekHrs,YearsCode,YearsCodePro
0,1,I am a developer by profession,Yes,,13,Monthly,,,Germany,European Euro,...,Neither easy nor difficult,Appropriate in length,No,"Computer science, computer engineering, or sof...",ASP.NET Core,ASP.NET;ASP.NET Core,Just as welcome now as I felt last year,50.0,36,27.0
1,2,I am a developer by profession,No,,19,,,,United Kingdom,Pound sterling,...,,,,"Computer science, computer engineering, or sof...",,,Somewhat more welcome now than last year,,7,4.0
2,3,I code primarily as a hobby,Yes,,15,,,,Russian Federation,,...,Neither easy nor difficult,Appropriate in length,,,,,Somewhat more welcome now than last year,,4,
3,4,I am a developer by profession,Yes,25.0,18,,,,Albania,Albanian lek,...,,,No,"Computer science, computer engineering, or sof...",,,Somewhat less welcome now than last year,40.0,7,4.0
4,5,"I used to be a developer by profession, but no...",Yes,31.0,16,,,,United States,,...,Easy,Too short,No,"Computer science, computer engineering, or sof...",Django;Ruby on Rails,Ruby on Rails,Just as welcome now as I felt last year,,15,8.0


In [7]:
df_scheme.head(10)

Unnamed: 0,Column,QuestionText
0,Respondent,Randomized respondent ID number (not in order ...
1,MainBranch,Which of the following options best describes ...
2,Hobbyist,Do you code as a hobby?
3,Age,What is your age (in years)? If you prefer not...
4,Age1stCode,At what age did you write your first line of c...
5,CompFreq,"Is that compensation weekly, monthly, or yearly?"
6,CompTotal,What is your current total compensation (salar...
7,ConvertedComp,Salary converted to annual USD salaries using ...
8,Country,Where do you live?
9,CurrencyDesc,Which currency do you use day-to-day? If your ...


### Access

In [15]:
# Question concerning job factors in the survey
df_scheme[df_scheme['Column']=='JobFactors']['QuestionText'].iloc[0]

'Imagine that you are deciding between two job offers with the same compensation, benefits, and location. Of the following factors, which 3 are MOST important to you?'

In [4]:
# Data for question 1
jobFactors = df_public["JobFactors"].dropna()
jobFactors

0        Languages, frameworks, and other technologies ...
3        Flex time or a flexible schedule;Office enviro...
5        Diversity of the company or organization;Langu...
7        Remote work options;Opportunities for professi...
8        Diversity of the company or organization;Remot...
                               ...                        
64146    Specific department or team I’d be working on;...
64148    Industry that I’d be working in;Languages, fra...
64150    Flex time or a flexible schedule;Languages, fr...
64152    Flex time or a flexible schedule;Languages, fr...
64153    Languages, frameworks, and other technologies ...
Name: JobFactors, Length: 49349, dtype: object

Obviously, the answer to this qusstion is multioptional. So in order to figure out the frequency of each single option, a cleaning method should be implemented. 

In [16]:
# Question concerning the type of job position in the survey
df_scheme[df_scheme['Column']=='DevType']['QuestionText'].iloc[0]

'Which of the following describe you? Please select all that apply.'

In [17]:
# Question concerning overtime in the survey
df_scheme[df_scheme['Column']=='NEWOvertime']['QuestionText'].iloc[0]

'How often do you work overtime or beyond the formal time expectation of your job?'

In [18]:
# Data for question 3
df_job_overtime = df_public[["DevType","NEWOvertime"]].dropna().reset_index(drop=True)
df_job_overtime.head()

Unnamed: 0,DevType,NEWOvertime
0,"Developer, desktop or enterprise applications;...",Often: 1-2 days per week or more
1,"Designer;Developer, front-end;Developer, mobile",Never
2,"Developer, back-end;Developer, front-end;Devel...",Sometimes: 1-2 days per month but less than we...
3,"Developer, back-end;Developer, desktop or ente...",Occasionally: 1-2 days per quarter but less th...
4,"Developer, full-stack",Occasionally: 1-2 days per quarter but less th...


Similar to the data in Q1, the answer to the job type question is also multioptional, while for the overtime question, it's single choice. Although the idea of cleaning this data has some commons with that of Q1, there are differences when it comes to datails, since data of Q3 concerning 2 colomns.