*The purpose of this workspace is to check the dependency of various academic parameters against placement and salary. Something relatable to all of us right? Lets begin!!!!*

> Our first step would be to import all required libraries in this case
> 
> 1. We will be using PANDAS for reading the data
> 2. We will be using seeborn to plot various data

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn import preprocessing

> Read campus placement data excel

In [None]:
train = pd.read_csv('../input/factors-affecting-campus-placement/Placement_Data_Full_Class.csv')


> Now let us see what are the columns we have in this excel file. Note that each column here represent the feature based on which a recruiter might decide a candidates placement and the salary which he/she gets offered

In [None]:
print(train.columns.values)

In [None]:
train.shape

> So it has 215 samples with 15 features. That's not so big, let's see top and bottom 5 samples of it.

In [None]:
train.head()

In [None]:
train.tail()

In [None]:
plt.hist(train['status'])

> We see from the histogram that 148 people are placed and 67 are not placed. Total comes to 215 which is the size of our dataset. So now we are sure that we have result of all people who particpated in recruitment drive

> Lets try to find out if we have complete data for all remaining features also

In [None]:
train.info()

In [None]:
def find_missing_data(data):
    Total = data.isnull().sum().sort_values(ascending = False)
    Percentage = (data.isnull().sum()/data.isnull().count()).sort_values(ascending = False)
    
    return pd.concat([Total,Percentage] , axis = 1 , keys = ['Total' , 'Percent'])

In [None]:
find_missing_data(train)

> We see that all values except for salary are complete. Salary has 148 entries which matches with number of people placed and rest are blank which makes sense. As our data looks good we can start to find how these features impact placement status and offer made

In [None]:
corrMatrix = train.corr()
sns.heatmap(corrMatrix, annot=True)
plt.show()

> Lets transform binary labels from categorical data to numerical data. We do this so that heatmap can directly be obtained

In [None]:
number = preprocessing.LabelEncoder()
train['status'] = number.fit_transform(train['status'].astype('str'))

In [None]:
number = preprocessing.LabelEncoder()
train['gender'] = number.fit_transform(train['gender'].astype('str'))

In [None]:
number = preprocessing.LabelEncoder()
train['workex'] = number.fit_transform(train['workex'].astype('str'))
train.head()

> Gender = 1 = Male
Status = 1 = Placed
Work Experience = 1 = Yes

In [None]:
corr_numeric = sns.heatmap(train[["status","mba_p","etest_p","hsc_p","degree_p","ssc_p", "gender", "workex"]].corr(),
                           annot=True, fmt = ".2f", cmap = "summer")

> From heatmap it is clear that workex, ssc_p, degree_p, hsc_p have high influence on getting placed. So study hard :)

In [None]:
sns.barplot(x='gender', y='status', data=train)

In [None]:
sns.barplot(x='degree_t', y='status', data=train)

> So students from Sci&Tech and Comm&Mgmt have slightly higher edge over others

In [None]:
sns.barplot(x='degree_t', y='salary', data=train)

> The average salary is around 28-29k, with students from Sci&Tech taking slightly higher

In [None]:
sns.barplot(x='specialisation', y='status', data=train)

> Also it looks like Mkt&Fin has considerable more chances of placement over Mkt&HR

In [None]:
# ‘hue’ is used to visualize the effect of an additional variable to the current distribution.  
sns.countplot(train.degree_t, hue=train['status'])  
plt.show() 

In [None]:
corr_numeric = sns.heatmap(train[["salary","mba_p","etest_p","hsc_p","degree_p","ssc_p", "gender", "workex"]].corr(),
                           annot=True, fmt = ".2f", cmap = "summer")

In [None]:
sns.barplot(x='workex', y='salary', data=train)

From heatmap it can be seen that work experience, mba_p, etest_p and gender has slight edge for salary influencing

In [None]:
fig, ax = plt.subplots(figsize=(10, 8))  
sns.violinplot(x='gender', y='salary', data=train, ax=ax)  
ax.set_title('Violin plot')  
plt.show()  

> So both genders have got same salary but as men got placed more and also as they attended more we see it has some influence. But it isnt really gender based.

> > So to summarize,
> > 1. Your percentage matters.
> 2. Degree matters
> 3. Specialization matters.