# 📊 Placement Data Analysis using Pandas

## 🎯 Objective
-To understand the factors affecting the placement and salary of students.

## 📁 Dataset
- 215 students
- Columns: gender, academic percentages, work experience, specialisation, status, salary

## 🔍 Analysis Performed
- Salary distribution and outliers
- Gender-wise placement
- Work experience impact
- Top MBA performers
- Outlier detection using IQR

## 📌 Key Insights
- Students with work experience had higher placement rates
- No salary outliers detected using IQR method
- Top salary was ₹940000.0, earned by a student with a Mkt&Fin profile

## ✅ Conclusion
This analysis shows that multiple academic and personal factors influence placement chances. Future analysis can include prediction models.

In [101]:
# analysis starts from here

## importing pandas

In [1]:
import pandas as pd

## Reading the csv file

In [5]:
d=pd.read_csv("C:\\Users\\SHAIK MOHAMMAD KHAIF\\Desktop\\data sets\\Placement_Data_Full_Class.csv")
df=pd.DataFrame(d)

df

# checking if there are any duplicates 

In [None]:
df.duplicated()                   #to checke whether there are any duplicats are not

 ## Average salary of the students who graduated from the college

In [None]:
df["salary"].mean()         

 ## Maximum salary of the student from the college

In [None]:
max=df["salary"].max()   
max

## Minimum salary of the student from the college

In [None]:
min=df["salary"].min()           
min

## We fetched the data of the person who got the maximum and minimum salary package

In [99]:
top_earner=df[df["salary"]==max]
print(top_earner)                          

     sl_no gender  ssc_p    ssc_b  hsc_p    hsc_b     hsc_s  degree_p  \
119    120      M   60.8  Central   68.4  Central  Commerce      64.6   

      degree_t workex  etest_p specialisation  mba_p  status    salary  
119  Comm&Mgmt    Yes    82.66        Mkt&Fin  64.34  Placed  940000.0  


In [None]:
least_earner=df[df["salary"]==min]
print(least_earner)

## groups the data by gender and placement status, then counts the number of students in each group.

In [None]:
df.groupby(["gender", "status"]).size()

## counting the every column

In [91]:
df.count()        

sl_no             215
gender            215
ssc_p             215
ssc_b             215
hsc_p             215
hsc_b             215
hsc_s             215
degree_p          215
degree_t          215
workex            215
etest_p           215
specialisation    215
mba_p             215
status            215
salary            148
dtype: int64

## Finding the placement percentage of the college

In [None]:
placement_per=(148/215)*100
print("the placement percentage of the college is:",placement_per)

## fetching the data of the top 10 performers of the ETEST 

In [None]:
top_etest=df.sort_values("etest_p",ascending=False)
top_etest.head(10)                                        #below are the top 10 performers of the etest_p

# getting the data of placed and not placed students based on specialization

## groups the data by specialisation and status, then counts

In [92]:
df.groupby(["specialisation","status"]).size()     #here we are fetching the data of placement based on specialisation

specialisation  status    
Mkt&Fin         Not Placed    25
                Placed        95
Mkt&HR          Not Placed    42
                Placed        53
dtype: int64

# getting the data of placed and not placed students based on ssc_b

In [93]:
df.groupby(["ssc_b","status"]).size()           #here we are fetching the data of placement based on ssc_b

ssc_b    status    
Central  Not Placed    38
         Placed        78
Others   Not Placed    29
         Placed        70
dtype: int64

# getting the data of placed and not placed students based on work experience

In [None]:
df.groupby(["workex","status"]).size()         #here we are fetching the data of placement baased on work experience

# fetching the data of a person who is top in the MBA.

In [94]:
top_mba=df["mba_p"].max()
top_mba_per=df[df["mba_p"]==top_mba]                               #the person who got top score in mba
print(top_mba_per[["sl_no","gender","specialisation","mba_p","salary"]])

    sl_no gender specialisation  mba_p    salary
19     20      M        Mkt&Fin  77.89  236000.0


In [None]:
df["mba_p"].describe()       #here we are calculating all the aggregate functions of mba_p

# finding the outlier

In [95]:
salary_data=df["salary"].dropna       #here we are removing the not placed students to find the outlier
salary_data
min1=df["salary"].min()
min1
Q1=(25/100)*(215+1)
Q1 
med=df["salary"].median() #here we are finding the median salary
med
Q3=(75/100)*(215+1)    # here the third quartile is 162nd position
Q3
max1=df["salary"].max() #here we are finding the max salary
max1 
IQR=(Q3-Q1)  #we are finding the inter quartile range
IQR

# We have identified all the parameters present in the five-number summary.
#now we need to find the lower fence and upper fence

low_fence=Q1-1.5*IQR
low_fence                      #finding lower fence
upp_fence=Q3+1.5*IQR
upp_fence                      #finding upper fence
outliers=df[(df["salary"]>upp_fence) & (df["salary"]<low_fence)]
outliers 

Unnamed: 0,sl_no,gender,ssc_p,ssc_b,hsc_p,hsc_b,hsc_s,degree_p,degree_t,workex,etest_p,specialisation,mba_p,status,salary


# count of the students who have work experience

In [96]:
pd.crosstab(df["workex"],df["salary"]) 

salary,200000.0,204000.0,210000.0,216000.0,218000.0,220000.0,225000.0,230000.0,231000.0,233000.0,...,393000.0,400000.0,411000.0,420000.0,425000.0,450000.0,500000.0,650000.0,690000.0,940000.0
workex,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
No,3,1,1,3,2,4,1,2,1,1,...,1,2,1,0,1,1,1,0,0,0
Yes,3,1,3,0,0,1,0,0,0,0,...,0,2,0,1,0,0,2,1,1,1


# count of the students from different specialisations and their salaries 

In [97]:
pd.crosstab(df["specialisation"],df["salary"])   #count of the students from different specialisation with salaries

salary,200000.0,204000.0,210000.0,216000.0,218000.0,220000.0,225000.0,230000.0,231000.0,233000.0,...,393000.0,400000.0,411000.0,420000.0,425000.0,450000.0,500000.0,650000.0,690000.0,940000.0
specialisation,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Mkt&Fin,3,1,2,2,2,3,0,2,1,0,...,1,2,1,1,1,0,3,1,1,1
Mkt&HR,3,1,2,1,0,2,1,0,0,1,...,0,2,0,0,0,1,0,0,0,0


# comparing averages of ssc_p,hsc_p,degree_p,etest_p,mba_p

In [98]:
df.groupby("status")[["ssc_p","hsc_p","degree_p","etest_p","mba_p"]].mean()     

Unnamed: 0_level_0,ssc_p,hsc_p,degree_p,etest_p,mba_p
status,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Not Placed,57.54403,58.395522,61.134179,69.58791,61.612836
Placed,71.721486,69.926554,68.740541,73.238041,62.579392
