<img src="https://www.deccanherald.com/sites/dh/files/styles/article_detail/public/article_images/2018/01/10/652952.jpg?itok=dm0NUvvM" style="display : block;text-align : center;width : 90%;height : 90%;padding : 0px;margin : auto;">

<h1>Campus Recruitment Analysis</h1>
<p>In this notebook, we will perform analysis of Campus Recruitment Dataset and answer the following questions : </p>
<b>Associated tasks : </b>
<ul>
    <li>Which factor influenced a candidate in getting placed?</li>
    <li>Does percentage matters for one to get placed?</li>
    <li>Which degree specialization is much demanded by corporate?</li>

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [None]:
data = pd.read_csv("../input/factors-affecting-campus-placement/Placement_Data_Full_Class.csv")
print("Total missing values : ",sum(list(data.isna().sum())))
data.fillna(0,inplace=True)
print("Total missing values : ",sum(list(data.isna().sum())))
data

<h1>Gender balance</h1>

In [None]:
un,count = np.unique(data['gender'],return_counts=True)
plt.bar(un,count)
plt.xlabel("Gender")
plt.ylabel("Count")
plt.title("Gender balance")
plt.show()

<p><b>Conclusion : </b> Males have a majority position in attending placements compared to females</p>

<h1>SSLC Marks Statistics</h1>

In [None]:
print("Min SSLC Marks : {:.2f} %".format(data['ssc_p'].min()))
print("Max SSLC Marks : {:.2f} %".format(data['ssc_p'].max()))
print("Average SSLC Marks : {:.2f} %".format(data['ssc_p'].mean()))

<h1>SSLC Board splitup</h1>

In [None]:
un,count = np.unique(data['ssc_b'],return_counts=True)
plt.bar(un,count)
plt.xlabel("Board")
plt.ylabel("Count")
plt.title("SSLC Board splitup")
plt.show()

<p><b>Conclusion : </b>Majority of students have passed out from Central board thought Others are still in a good number</p>

<h1>HSC Marks Statistics</h1>

In [None]:
print("Min HSC Marks : {:.2f} %".format(data['hsc_p'].min()))
print("Max HSC Marks : {:.2f} %".format(data['hsc_p'].max()))
print("Average HSC Marks : {:.2f} %".format(data['hsc_p'].mean()))

<h1>HSC Board splitup</h1>

In [None]:
un,count = np.unique(data['hsc_b'],return_counts=True)
plt.bar(un,count)
plt.xlabel("Board")
plt.ylabel("Count")
plt.title("HSC Board splitup")
plt.show()

<p><b>Conclusion : </b>Other boards seems to be dominating in HSC board analysis.</p>

<h1>HSC Stream Analysis</h1>

In [None]:
un,count = np.unique(data['hsc_s'],return_counts=True)
plt.bar(un,count)
plt.xlabel("Stream")
plt.ylabel("Count")
plt.title("HSC Stream")
plt.show()

<p><b>Conclusion : </b>Commerce tends to be in higher number followed by Science and arts. This results makes sense since the dataset belongs to an MBA environment</p>

<h1>Degree Stream Analysis</h1>



In [None]:
un,count = np.unique(data['degree_t'],return_counts=True)
plt.bar(un,count)
plt.xlabel("Stream")
plt.ylabel("Count")
plt.title("Degree Steam splitup")
plt.show()

<p><b>Conclusion : </b>Commerce and Management stream tends to dominate followed by Science & Technology and Others. Similar to the previous visulization, this graph makes sense, since the dataset comes from an MBA environment</p> 

<h1>Degree Percentage Analysis</h1>


In [None]:
print("Min Degree Marks : {:.2f} %".format(data['degree_p'].min()))
print("Max Degree Marks : {:.2f} %".format(data['degree_p'].max()))
print("Average Degree Marks : {:.2f} %".format(data['degree_p'].mean()))

<h1>Work Experience Analysis</h1>


In [None]:
un,count = np.unique(data['workex'],return_counts=True)
plt.bar(un,count)
plt.xlabel("Work Experience")
plt.ylabel("Count")
plt.title("Work Experience")
plt.show()

<p><b>Conclusion : </b>Majority of students tend to take up higher studies without having any work experience</p>

<h1>Employability Test Analysis</h1>

In [None]:
print("Min ET Marks : {:.2f} %".format(data['etest_p'].min()))
print("Max ET Marks : {:.2f} %".format(data['etest_p'].max()))
print("Average ET Marks : {:.2f} %".format(data['etest_p'].mean()))

<h1>Higher degree Specialisation Analysis</h1>

In [None]:
un,count = np.unique(data['specialisation'],return_counts=True)
plt.bar(un,count)
plt.xlabel("Stream")
plt.ylabel("Count")
plt.title("Specialisation")
plt.show()

<p><b>Conclusion : </b>Majority of students have chosen Marketing & Finance as their domain followed by Marketing & HR</p>

<h1>MBA Percentage Statistics</h1>

In [None]:
print("Min MBA Marks : {:.2f} %".format(data['mba_p'].min()))
print("Max MBA Marks : {:.2f} %".format(data['mba_p'].max()))
print("Average MBA Marks : {:.2f} %".format(data['mba_p'].mean()))

<h1>Placement Status Analysis</h1>

In [None]:
un,count = np.unique(data['status'],return_counts=True)
plt.bar(un,count)
plt.xlabel("Status")
plt.ylabel("Count")
plt.title("Placement Status")
plt.show()

<p><b>Conclusion : </b>Majority has got placed while less than half didn't</p>

<h1>Salary Statistics for those placed</h1>

In [None]:
print("Min Salary : {:.2f} INR".format(data[data['status'] == "Placed"]['salary'].min()))
print("Max Salary : {:.2f} INR".format(data[data['status'] == "Placed"]['salary'].max()))
print("Average Salary : {:.2f} INR".format(data[data['status'] == "Placed"]['salary'].mean()))

<h1>Task 1 : Which factor influenced a Candidate in getting placed?</h1>

In [None]:
data_placed = data[data['status'] == "Placed"]
data_placed

<p>For finding factors that influence placements, we will conduct a group analysis to find the common factors among all the students.<br>
    <b><i>Note : </i></b>Individual analysis might yield different results in case of outlier conditions<br>
For analysis let us consider the following parameters which normally affects the employability of a candidate and strike out the parameters which does not affect placement:<br>
    <ul>
        <li>SSLC Marks</li>
        <li>HSC Pass Marks</li>
        <li>HSC Stream</li>
        <li>Degree Marks</li>
        <li>Degree Stream</li>
        <li>Work Experience</li>
        <li>Employability Test Percentage</li>
        <li>Specialisation</li>
        <li>MBA Percentage</li>
</p>


<h2>SSLC Marks Statistics of placed students</h2>

In [None]:
print("Min SSLC Marks : {:.2f} %".format(data_placed['ssc_p'].min()))
print("Max SSLC Marks : {:.2f} %".format(data_placed['ssc_p'].max()))
print("Average SSLC Marks : {:.2f} %".format(data_placed['ssc_p'].mean()))

<p><b>Conclusion : </b>SSLC marks of placed students range from 49 % to 89.40 %. From this, we can infer that SSLC Marks are not a parameter that affects placement of a student</p>

<h2>HSC Marks statistics of placed students</h2>

In [None]:
print("Min HSC Marks : {:.2f} %".format(data_placed['hsc_p'].min()))
print("Max HSC Marks : {:.2f} %".format(data_placed['hsc_p'].max()))
print("Average HSC Marks : {:.2f} %".format(data_placed['hsc_p'].mean()))

<p><b>Conclusion : </b>HSC Marks range from an average 50 % to a high of 97.70 %. This concludes that averages students get placed as well and HSC mark is not a factor</p>

<h2>HSC Stream Analysis of placed students</h2>

In [None]:
un,count = np.unique(data_placed['hsc_s'],return_counts=True)
plt.bar(un,count)
plt.xlabel("HSC Stream for Placed")
plt.ylabel("Count")
plt.title("HSC stream taken by place students")
plt.show()

<p><b>Conclusion : </b>Commerce and Science students seems to get a higher placement chance compared to arts. The HSC stream does play a role in the placement of a student. This is verified by evaluating the educational exposure the student gets to a specific subject</p>

<h2>Degree Marks statistics of placed students</h2>

In [None]:
print("Min Degree Marks : {:.2f} %".format(data_placed['degree_p'].min()))
print("Max Degree Marks : {:.2f} %".format(data_placed['degree_p'].max()))
print("Average Degree Marks : {:.2f} %".format(data_placed['degree_p'].mean()))

<p><b>Conclusion : </b>Degree marks varies from a close to average 56% to a high of 91 %. This concludes that degree marks is not a factor that affects the placement of a candidate.</p>

<h2>Degree Stream Analysis of placed students</h2>

In [None]:
un,count = np.unique(data_placed['degree_t'],return_counts=True)
plt.bar(un,count)
plt.xlabel("Degree Stream for Placed")
plt.ylabel("Count")
plt.title("Degree stream taken by place students")
plt.show()

<p><b>Conclusion : </b>Similar to HSC Stream Analysis, Degree steam also has an effect on the placement of a student. This is again verified by evaluating the educational exposure that a student receives for a subject</p>

<h2>Work Experience Analysis of placed students</h2>

In [None]:
un,count = np.unique(data_placed['workex'],return_counts=True)
plt.bar(un,count)
plt.xlabel("Work Experience")
plt.ylabel("Count")
plt.title("Work Experience for placed students")
plt.show()

<p><b>Conclusion : </b>Strangely, people without previous work experience tends to get placed more than people with work experience. Even with the counter-intuitivity, this is not a factor that affects the placement of a student</p>

<h2>Employtability Test Percentage Statistics of Placed students</h2>

In [None]:
print("Min ET Marks : {:.2f} %".format(data_placed['etest_p'].min()))
print("Max ET Marks : {:.2f} %".format(data_placed['etest_p'].max()))
print("Average ET Marks : {:.2f} %".format(data_placed['etest_p'].mean()))

<p><b>Conclusion : </b>Employability Test percentages range from an average of 50 % to a high of 98 %. This concludes that ET marks necessarily doesn't judge the placement capabilty of a student and hence is not a factor that affects the placement of a student</p>

<h2>Degree Specialisation Analysis of placed students</h2>

In [None]:
un,count = np.unique(data_placed['specialisation'],return_counts=True)
plt.bar(un,count)
plt.xlabel("Specialisation")
plt.ylabel("Count")
plt.title("Specialisation for placed students")
plt.show()

<p><b>Conclusion : </b>The Degree specialisation has an effect on a candidate's placement capability since industries will prefer certain courses over others</p>

<h2>MBA Marks Statistics of placed students</h2>

In [None]:
print("Min MBA Marks : {:.2f} %".format(data_placed['mba_p'].min()))
print("Max MBA Marks : {:.2f} %".format(data_placed['mba_p'].max()))
print("Average MBA Marks : {:.2f} %".format(data_placed['mba_p'].mean()))

<p><b>Conclusion : </b>MBA marks varies form a close to average 52.38 % to a high of 77.89 %. This shows that MBA marks does not play a significant role in the placement of a student</p>

<h1>Task 2 : Does percentage matters for one to get placed?</h1>
<p>Percentage does not matter in a placement process since observed in the marks statistics of SSC,HSC,Degree and MBA that marks vary from average or close-to-average till the highest.</p>
<h1>Task 3 : Which degree specialization is much demanded by corporate?</h1>
<p>Marketing & Finance seems to have a higher preference over Marketing & HR thought the latter is not that less in count.</p>

<h1>Dashboard</h1>
<p>Thus, we have observed the factors that affect the placement of a student. Let us combine all of them into a dashboard and project the judging factors.</p>

In [None]:
plt.figure(figsize=(15,10))
plt.subplot(2,2,1)
un,count = np.unique(data_placed['hsc_s'],return_counts=True)
plt.bar(un,count)
plt.xlabel("HSC Stream for Placed")
plt.ylabel("Count")
plt.title("HSC stream taken by place students")

plt.subplot(2,2,2)
un,count = np.unique(data_placed['degree_t'],return_counts=True)
plt.bar(un,count)
plt.xlabel("Degree Stream for Placed")
plt.ylabel("Count")
plt.title("Degree stream taken by place students")

plt.subplot(2,2,3)
un,count = np.unique(data_placed['specialisation'],return_counts=True)
plt.bar(un,count)
plt.xlabel("Specialisation")
plt.ylabel("Count")
plt.title("Specialisation for placed students")

plt.subplots_adjust(bottom=0.1,  
                    top=0.9, 
                    wspace=0.4, 
                    hspace=0.4)

plt.show()