**Business Objective:**

To develop a predictive model that classifies individuals as Introverts or Extroverts based on their social behavior and activity patterns, helping businesses tailor products, services, or marketing strategies to better suit personality-driven preferences.


| Column Name                  | Description                                                                 |
|-----------------------------|-----------------------------------------------------------------------------|
| `Time_spent_Alone`          | Number of hours a person spends alone on average daily.                    |
| `Stage_fear`                | Indicates whether the person has fear of public speaking (Yes/No).         |
| `Social_event_attendance`   | Number of social events attended in a month.                               |
| `Going_outside`             | Indicates if the person willingly goes outside regularly (Yes/No).         |
| `Drained_after_socializing`| Indicates if the person feels mentally drained after social interactions.  |
| `Friends_circle_size`       | Number of close or frequent friends in the person’s circle.                |
| `Post_frequency`            | Number of posts shared on social media per week.                          |
| `Personality`               | Target variable; indicates if the person is an Introvert or Extrovert.     |


In [1]:
# Importing important libraries
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

import warnings
warnings.filterwarnings('ignore')

In [2]:
df = pd.read_csv("/content/personality_dataset.csv")
df.shape

(5000, 8)

In [3]:
df.head()

Unnamed: 0,Time_spent_Alone,Stage_fear,Social_event_attendance,Going_outside,Drained_after_socializing,Friends_circle_size,Post_frequency,Personality
0,3,Yes,6,7,No,14,5,Extrovert
1,2,No,8,6,No,7,8,Extrovert
2,1,No,9,4,No,9,3,Extrovert
3,8,No,0,0,Yes,9,3,Introvert
4,5,Yes,3,0,No,2,6,Introvert


In [4]:
df.tail()

Unnamed: 0,Time_spent_Alone,Stage_fear,Social_event_attendance,Going_outside,Drained_after_socializing,Friends_circle_size,Post_frequency,Personality
4995,5,No,6,5,No,14,8,Extrovert
4996,7,No,2,3,Yes,2,5,Introvert
4997,2,No,7,3,No,11,3,Extrovert
4998,2,Yes,9,3,No,15,7,Extrovert
4999,4,Yes,6,7,No,12,3,Extrovert


In [11]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 4872 entries, 0 to 4999
Data columns (total 8 columns):
 #   Column                     Non-Null Count  Dtype 
---  ------                     --------------  ----- 
 0   Time_spent_Alone           4872 non-null   int64 
 1   Stage_fear                 4872 non-null   object
 2   Social_event_attendance    4872 non-null   int64 
 3   Going_outside              4872 non-null   int64 
 4   Drained_after_socializing  4872 non-null   object
 5   Friends_circle_size        4872 non-null   int64 
 6   Post_frequency             4872 non-null   int64 
 7   Personality                4872 non-null   object
dtypes: int64(5), object(3)
memory usage: 342.6+ KB


In [13]:
df.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Time_spent_Alone,4872.0,4.97619,3.038156,0.0,3.0,5.0,7.0,11.0
Social_event_attendance,4872.0,4.993842,2.855725,0.0,3.0,5.0,7.0,10.0
Going_outside,4872.0,3.517857,2.089579,0.0,2.0,4.0,5.0,7.0
Friends_circle_size,4872.0,7.25431,4.058642,0.0,5.0,7.0,10.0,15.0
Post_frequency,4872.0,4.758621,2.783804,0.0,3.0,5.0,7.0,10.0


In [7]:
# checking for dupicates and null values
print("Before Removing Duplicates: ",df.duplicated().sum()," : ", df.shape)
df.drop_duplicates(inplace=True)
print("After Removing Duplicates: ",df.duplicated().sum(), " : ",df.shape)

Before Removing Duplicates:  128 (5000, 8)
After Removing Duplicates:  0 (4872, 8)


In [10]:
print(df.isnull().sum().sum())

0


**Basic Summary:**

* Total data points