# Jobs Dataset Programming Language Analysis

## Problem Statement 1:
Among Java, C++ and Python, which of the language has more job openings in India for Bachelor Degree Holder?

Note: Here we will use the `BASIC QUALIFICATIONS` feature to find out whether bachelor degree for Job is required or not. Keywords that can be used are 'Bachelor', 'BS' and 'BA' and we will use the BASIC QUALIFICATIONS feature to find out whether Language is required for the job or not. Keywords that is used for language searching are 'Java', 'C++' or 'Python'.


In [1]:
# Import required packages
import pandas as pd
import numpy as np
import re

In [2]:
# Import the Amazon Jobs Dataset
data = pd.read_csv('data/amazon_jobs_dataset.csv')

# Inspecting the data
print(data.shape)
data.head(2)

(3493, 7)


Unnamed: 0.1,Unnamed: 0,Title,location,Posting_date,DESCRIPTION,BASIC QUALIFICATIONS,PREFERRED QUALIFICATIONS
0,0,Software Development Manager,"US, WA, Seattle","March 1, 2018",You are an experienced hands-on manager with a...,· Proven track record of hiring and managing h...,· Experience building extremely high volume an...
1,1,Software Development Engineer,"IN, KA, Bangalore","March 1, 2018",Amazon is driven by being “the world’s most cu...,· Bachelor’s Degree in Computer Science or rel...,· Experience building complex software systems...


In [3]:
# Dropping a redundant column from the dataset
data.drop(data.columns[[0]], axis = 1, inplace = True) 
data.head(2)

Unnamed: 0,Title,location,Posting_date,DESCRIPTION,BASIC QUALIFICATIONS,PREFERRED QUALIFICATIONS
0,Software Development Manager,"US, WA, Seattle","March 1, 2018",You are an experienced hands-on manager with a...,· Proven track record of hiring and managing h...,· Experience building extremely high volume an...
1,Software Development Engineer,"IN, KA, Bangalore","March 1, 2018",Amazon is driven by being “the world’s most cu...,· Bachelor’s Degree in Computer Science or rel...,· Experience building complex software systems...


In [4]:
# Feature engineering `Country`, `State` and `City` from the `location` column.
data[['Country','State', 'City']] = data.location.str.split(",",expand=True)
india_data = data[data['Country']=='IN']
india_data.head(2)

Unnamed: 0,Title,location,Posting_date,DESCRIPTION,BASIC QUALIFICATIONS,PREFERRED QUALIFICATIONS,Country,State,City
1,Software Development Engineer,"IN, KA, Bangalore","March 1, 2018",Amazon is driven by being “the world’s most cu...,· Bachelor’s Degree in Computer Science or rel...,· Experience building complex software systems...,IN,KA,Bangalore
2,Software Development Engineer,"IN, KA, Bangalore","March 1, 2018",Amazon is driven by being “the world’s most cu...,· Bachelor’s Degree in Computer Science or rel...,· Experience building complex software systems...,IN,KA,Bangalore


In [5]:
# Subsetting data for openings for Indian undergrads.
india_data_undergrad = india_data[india_data['BASIC QUALIFICATIONS'].str.contains("Bachelor|BS|BA")==True]
india_data_undergrad.head(2)

Unnamed: 0,Title,location,Posting_date,DESCRIPTION,BASIC QUALIFICATIONS,PREFERRED QUALIFICATIONS,Country,State,City
1,Software Development Engineer,"IN, KA, Bangalore","March 1, 2018",Amazon is driven by being “the world’s most cu...,· Bachelor’s Degree in Computer Science or rel...,· Experience building complex software systems...,IN,KA,Bangalore
2,Software Development Engineer,"IN, KA, Bangalore","March 1, 2018",Amazon is driven by being “the world’s most cu...,· Bachelor’s Degree in Computer Science or rel...,· Experience building complex software systems...,IN,KA,Bangalore


In [6]:
java_india_undergrad = india_data_undergrad[india_data_undergrad['BASIC QUALIFICATIONS'].str.contains("Java")]
python_india_undergrad = india_data_undergrad[india_data_undergrad['BASIC QUALIFICATIONS'].str.contains("Python")]
cpp_india_undergrad = india_data_undergrad[india_data_undergrad['BASIC QUALIFICATIONS'].str.contains(re.escape("C++"))]

In [7]:
bachelorJobsIndiaData_byLang = [['Java', java_india_undergrad.shape[0]], 
                         ['Python', python_india_undergrad.shape[0]], 
                         ['C++', cpp_india_undergrad.shape[0]]] 

bachelorJobsIndia_byLang = pd.DataFrame(bachelorJobsIndiaData_byLang, 
                                        columns = ['Language', 'Number of Jobs']) 

bachelorJobsIndia_byLang

Unnamed: 0,Language,Number of Jobs
0,Java,103
1,Python,30
2,C++,70


**Analysis: In India, `Java` has the most number of job openings (103) for Undergrads.**

## Problem Statement 2:
Find the country where Amazon need the most number of Java Developer.

Note: Here we will use the `BASIC QUALIFICATIONS` feature to find out whether Java is required for the job or not. Keyword to be used is 'Java'.

In [8]:
# Subsetting data for all jobs mentioning the keyword `Java` in the `Basic Qualifications`
java = data[data['BASIC QUALIFICATIONS'].str.contains("Java") == True]

In [9]:
java_grouped = java.groupby(['Country']).size().sort_values(ascending=False)

print("The country with the most openings for Java developers is: {}, with {} openings."\
      .format(java_grouped.index[0], java_grouped[0]))

The country with the most openings for Java developers is: US, with 2009 openings.
