# Data Jobs

***Data-related jobs are considered the most popular career of 2021. Because of the increasing demand for data specialists, many professionals expect a
very high compensation for performing jobs in this field. Thus we decided to evaluate the impact different variables have on the expected salary, so we could adjust our expectations to the current international market. This is the dataset processed from the Stack overflow Annual Developers Survey.***

## Content: There are 2 files
***processed_data.csv***
This is the data pre-processed from the original dataset with all the Developer type converted into dummies variables:

- Data Scientist or Machine Learning Specialist
- Database Administrator
- Data Analyst
- Business Analyst and Data Engineer)
- Together with other variables such as: Country, Education level, Employment, Job Satisfaction, Organization size, Undergraduate major, Year of coding as professionals

***survey_final.csv:***
The original dataset comes from the Stack overflow Annual Developers Survey, and for this dataset we will only consider the respondents that considered themselves already in a data related job (Data Scientist, Machine Learning Specialist, Database Administrator, Data Analyst, Business Analyst and Data Engineer)

## Acknowledgements
This dataset is processed from the Stack Overflow Annual Survey result from 2017 to 2020
The original datasets can be found at: https://insights.stackoverflow.com/survey and also on Kaggle.

In [1]:
import pandas as pd
import numpy as np 
from matplotlib import pyplot as plt
import matplotlib.pyplot as plt
import seaborn as sns

In [4]:
# load data
df = pd.read_csv('../data/processed_data.csv')
df

Unnamed: 0,Year,Hobbyist,ConvertedComp,Country,EdLevel,Employment,JobSat,OrgSize,UndergradMajor,YearsCodePro,Data scientist or machine learning specialist,Database administrator,Data or business analyst,"Engineer, data"
0,2017,"Yes, both",43750.00000,United Kingdom,Bachelor's degree,Employed full-time,4.0,2 to 9 employees,Computer science,2.0,1,1,,
1,2017,"Yes, I program as a hobby",51282.05128,Denmark,Some college/university study without earning ...,Employed part-time,10.0,100 to 499 employees,Computer science,3.0,1,0,,
2,2017,No,25000.00000,Israel,Some college/university study without earning ...,Employed full-time,6.0,"5,000 to 9,999 employees",Computer science,4.0,1,0,,
3,2017,"Yes, I program as a hobby",100000.00000,United States,Some college/university study without earning ...,Employed full-time,5.0,20 to 99 employees,Computer science,15.0,0,1,,
4,2017,"Yes, both",27000.00000,Ukraine,Master's degree,Employed full-time,7.0,100 to 499 employees,Computer science,5.0,0,1,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
33596,2020,Yes,225000.00000,United States,Some college/university study without earning ...,Employed full-time,8.0,"1,000 to 4,999 employees",Mathematics or statistics,15.0,0,0,0.0,1.0
33597,2020,Yes,369.00000,"Venezuela, Bolivarian Republic of...",Some college/university study without earning ...,Employed full-time,2.0,"1,000 to 4,999 employees",Computer science,27.0,0,1,0.0,1.0
33598,2020,No,38484.00000,Hungary,Master's degree,"Independent contractor, freelancer, or self-em...",8.0,"Just me - I am a freelancer, sole proprietor, ...",Humanities,12.0,0,0,1.0,0.0
33599,2020,Yes,140000.00000,United States,Doctoral degree,Employed full-time,8.0,"10,000 or more employees",Another engineering discipline,12.0,1,0,1.0,1.0


In [6]:
df1 = pd.read_csv('../data/survey_final.csv')
df1

Unnamed: 0,Year,Hobbyist,ConvertedComp,Country,DatabaseDesireNextYear,DatabaseWorkedWith,DevType,EdLevel,Employment,JobSat,LanguageDesireNextYear,LanguageWorkedWith,OrgSize,UndergradMajor,YearsCodePro
0,2020,Yes,,Germany,Microsoft SQL Server,Elasticsearch;Microsoft SQL Server;Oracle,"Developer, desktop or enterprise applications;...","Master's degree (M.A., M.S., M.Eng., MBA, etc.)","Independent contractor, freelancer, or self-em...",Slightly satisfied,C#;HTML/CSS;JavaScript,C#;HTML/CSS;JavaScript,2 to 9 employees,"Computer science, computer engineering, or sof...",27
1,2020,No,,United Kingdom,,,"Developer, full-stack;Developer, mobile","Bachelor's degree (B.A., B.S., B.Eng., etc.)",Employed full-time,Very dissatisfied,Python;Swift,JavaScript;Swift,"1,000 to 4,999 employees","Computer science, computer engineering, or sof...",4
2,2020,Yes,,Russian Federation,,,,,,,Objective-C;Python;Swift,Objective-C;Python;Swift,,,
3,2020,Yes,,Albania,,,,"Master's degree (M.A., M.S., M.Eng., MBA, etc.)",,Slightly dissatisfied,,,20 to 99 employees,"Computer science, computer engineering, or sof...",4
4,2020,Yes,,United States,MySQL;PostgreSQL,MySQL;PostgreSQL;Redis;SQLite,,"Bachelor's degree (B.A., B.S., B.Eng., etc.)",Employed full-time,,Java;Ruby;Scala,HTML/CSS;Ruby;SQL,,"Computer science, computer engineering, or sof...",8
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
303586,2017,"Yes, I program as a hobby",58000.0,United States,SQL Server,MongoDB; SQL Server; MySQL,Web developer; Developer with a statistics or ...,Bachelor's degree,Employed full-time,3,C#; JavaScript; Python; SQL; VB.NET; VBA,C#; Python; R; Ruby; Rust; Scala; TypeScript; ...,100 to 499 employees,A social science,1 to 2 years
303587,2017,No,,Venezuela,MySQL; PostgreSQL,SQL Server; PostgreSQL,,Master's degree,Employed full-time,,Java; SQL,C#; Java; JavaScript; PHP; Python; Ruby; SQL; ...,100 to 499 employees,Computer programming or Web development,
303588,2017,"Yes, I program as a hobby",,Canada,,,Web developer; Systems administrator,Some college/university study without earning ...,Employed full-time,10,,,10 to 19 employees,"Information technology, networking, or system ...",Less than a year
303589,2017,"Yes, I program as a hobby",40000.0,United States,MySQL,,Web developer; Mobile developer,Bachelor's degree,Employed full-time,7,JavaScript; PHP; Swift,Clojure; Erlang; Haskell,Fewer than 10 employees,Computer science or software engineering,3 to 4 years
