# 📊 Placement Data of 1000 Students

This notebook explores a synthetic dataset of 1000 students, including details like CGPA, number of internships, placement status, and salary offered. It's useful for data analysis and machine learning practice.


In [25]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px

# Set default style
sns.set(style='whitegrid')


## 🔍 Load and Preview the Dataset

We'll load the CSV file and take a quick look at the first few rows to understand the structure.


In [26]:
# Load the dataset
df = pd.read_csv("Placement.csv")

# Display the first 5 rows
df.head()


Unnamed: 0,Student_ID,CGPA,Internships,Placed,Salary (INR LPA)
0,1,7.9,3,Yes,17.63
1,2,7.39,0,Yes,28.37
2,3,8.02,2,Yes,8.95
3,4,8.72,4,Yes,22.59
4,5,7.31,2,Yes,19.67


## 📋 Dataset Overview

Let's examine the data types, missing values, and summary statistics.


## 🎓 CGPA Distribution

We visualize how CGPA is distributed among the 1000 students.


# **Data Cleaning**

In [27]:
# Check for missing values
df.isnull().sum()

Student_ID          0
CGPA                0
Internships         0
Placed              0
Salary (INR LPA)    0
dtype: int64

In [28]:
# checking duplicate values
df.duplicated().sum()

np.int64(0)

In [29]:
df.columns

Index(['Student_ID', 'CGPA', 'Internships', 'Placed', 'Salary (INR LPA)'], dtype='object')

In [None]:
# Top 5 CGPA scores
data  =  df['CGPA'].value_counts().head(5).reset_index()
data.columns = ['CGPA', 'Count']

fig = px.bar(data, x='CGPA', y='Count', title='Top 5 CGPA Scores')
fig.show()

In [37]:
df.columns

Index(['Student_ID', 'CGPA', 'Internships', 'Placed', 'Salary (INR LPA)'], dtype='object')

In [41]:
df.groupby('CGPA') ['Internships'].value_counts().head()

CGPA  Internships
4.91  3              1
5.34  0              1
5.38  1              1
5.40  3              1
5.52  0              1
Name: count, dtype: int64