# Campus Recruitment Visualizations

In this dataset we have to analyze and visualize the dataset and predict whether you are placed or not.

First we need to import some libraries that will help in visualizing and analyzing techniques on the given dataset.
Libraries : 
* Numpy
* Pandas
* Matplotlib
* Sklearn
* Seaborn

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory
import matplotlib.pyplot as plt # visualizations

In [None]:
data = pd.read_csv('../input/factors-affecting-campus-placement/Placement_Data_Full_Class.csv')

In [None]:
data.head(5)

In [None]:
data.info()

Let's have a look at all the values in columns i.e. categorical values( suppose in gender column there are two categories M or F ). By this **value_counts()** function we can see how many values of "M" are in the column and "F" are in the column of gender.

In [None]:
data.gender.value_counts()

**Why only gender ?** Have a look at all the categorical values in all columns.

In [None]:
data.ssc_b.value_counts()

In [None]:
data.hsc_b.value_counts()

In [None]:
data.hsc_s.value_counts()

In [None]:
data.degree_t.value_counts()

In [None]:
data.workex.value_counts()

In [None]:
data.specialisation.value_counts()

In [None]:
data.status.value_counts()

As we see in the dataset, there are some categorical features i.e. gender, ssc_b, hsc_s, status etc. So we have to do some preprocessing in that columns. And we can do this little bit of preprocessing by using **sklearn library( LabelEncoder )**.

If you do not know what is LabelEncoder so don't worry. I explain. LabelEncoder is in built function that converts the categories into some numerical values i.e. M in gender column so labelencoder converts this "M" into 0 and F into 1 so this is the real encoding of the categorical features in the datasets.

In [None]:
from sklearn.preprocessing import LabelEncoder

In [None]:
label = LabelEncoder()
data["gender"] = label.fit_transform(data["gender"])
data["ssc_b"] = label.fit_transform(data["ssc_b"])
data["hsc_b"] = label.fit_transform(data["hsc_b"])
data["hsc_s"] = label.fit_transform(data["hsc_s"])
data["degree_t"] = label.fit_transform(data["degree_t"])
data["workex"] = label.fit_transform(data["workex"])
data["specialisation"] = label.fit_transform(data["specialisation"])
data["status"] = label.fit_transform(data["status"])

Here I have finished this encoding.
Good we have completed our first major step.
Have a look at the dataset now after some preprocessing part.

In [None]:
data.head()

A small visual of the dataset is to plot a histogram of all the columns of dataset. All the columns I mean that columns which have integers values. Have a look at the small visual part of our dataset. This can do simply by a matplotlib function i.e "**your_data_name.hist()**" and then write plt.show()

In [None]:
data.hist(figsize = (20, 20))
plt.show()

In this dataset our first question is which factor influenced a candidate of getting placed or not ? and the answer of this question is **FEATURE SELECTION** This is the part of Analyzing the dataset. So lets do the Feature Selection.

Feature Selection is a process of extracting features from the dataset that have great importance of predicting our labels.

Here is the new library for displaying the dataset to show which feature has more importance.

In [None]:
import seaborn as sns

In [None]:
sns.heatmap(data.corr())
plt.show()

Let's first visualize the status of the candidate.

In [None]:
import plotly.express as px

In [None]:
data_original = pd.read_csv('../input/factors-affecting-campus-placement/Placement_Data_Full_Class.csv')

In [None]:
fig = px.scatter(data_original, x="salary", 
                 color="degree_p",
                 size='degree_p', 
                 hover_data=['gender', 'ssc_p', 'hsc_p', 'hsc_b', 'hsc_s', 'degree_t', 'workex', 'degree_p', 
                            'specialisation', 'mba_p', 'status', 'etest_p'], 
                 title = "Salary Plot")
fig.show()

In [None]:
fig = px.scatter(data_original, x="ssc_p", 
                 color="degree_p",
                 size='degree_p', 
                 hover_data=['gender', 'hsc_p', 'hsc_b', 'hsc_s', 'degree_t', 'workex', 'degree_p', 
                            'specialisation', 'mba_p', 'status', 'etest_p'], 
                 title = "ssc_p Plot")
fig.show()

In [None]:
data.info()

Divide the datset into our data and labels i.e X and y .

In [None]:
X = data.iloc[:, 0:13].values
y = data.iloc[:, 13].values

In [None]:
y

In [None]:
X

Here is one more type of showing which feature is more important. Using ExtraTreeClassifier it is easy to know which is best or having great importance.

In [None]:
from sklearn.ensemble import ExtraTreesClassifier
model = ExtraTreesClassifier()
model.fit(X , y)

Have a look at all the feature's importances.

In [None]:
print(model.feature_importances_) 

Let's see the graph and which has great importances?

Oh its 2nd column i.e **ssc_p**

In [None]:
feat_importances = pd.Series(model.feature_importances_)
feat_importances.nlargest(10).plot(kind='barh')
plt.show()

Here is the index of that column i.e 2

In [None]:
print("Maximum important feature index is : ", model.feature_importances_.argmax()) 

Now all done

Let's start making model to predict whether a candidate is placed or not.

We use DecisionTreeClassifier from sklearn library. As we see that it is a classification problem and we do it by this function.

In [None]:
from sklearn.tree import DecisionTreeClassifier

In [None]:
clf = DecisionTreeClassifier()
clf.fit(X, y)

In [None]:
y_pred = clf.predict(X)

Have a look our predictions that have made by our decision tree classifier.

In [None]:
y_pred

We can see the accuracy by confusion matrix also lets do it 

In [None]:
from sklearn.metrics import confusion_matrix

In [None]:
cm = confusion_matrix(y, y_pred)

In [None]:
cm

You can see our Classifier has great accuarcy. Just by the ssc_p we can predict the candidate is placed or not.
If you like this notebook please upvote.

**YOUR UPVOTE IS MY ENCOURAGEMENT OF MAKING NOTEBOOKS**

Till then **Enjoy Machine Learning**