# Personality Classifier - Introvert/Extrovert

### Sanjith Devineni

## Abstract

In this project, I sought to classify individuals as Introverts or Extroverts based on various features of social behavior.

## Data

### Exploratory Data Analysis

The dataset was obtained at this link: [Link to Kaggle Dataset](https://www.kaggle.com/datasets/rakeshkapilavai/extrovert-vs-introvert-behavior-data/datahttps://www.kaggle.com/datasets/rakeshkapilavai/extrovert-vs-introvert-behavior-data/data). The dataset consists of 8 columns, of which 7 will serve as features to use in our model. The last column is the label - (Introvert/Extrovert). The features include: 
- Time_spent_Alone: Hours spent alone daily (0–11).
- Stage_fear: Presence of stage fright (Yes/No).
- Social_event_attendance: Frequency of social events (0–10).
- Going_outside: Frequency of going outside (0–7).
- Drained_after_socializing: Feeling drained after socializing (Yes/No).
- Friends_circle_size: Number of close friends (0–15).
- Post_frequency: Social media post frequency (0–10).
- Personality: Target variable (Extrovert/Introvert).

The file is a simple CSV file, and includes some missing values.

In [1]:
# Setup
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

The history saving thread hit an unexpected error (OperationalError('attempt to write a readonly database')).History will not be written to the database.


In [3]:
# Load dataset
data_path = 'archive/personality_dataset.csv'
df = pd.read_csv(data_path)
df

Unnamed: 0,Time_spent_Alone,Stage_fear,Social_event_attendance,Going_outside,Drained_after_socializing,Friends_circle_size,Post_frequency,Personality
0,4.0,No,4.0,6.0,No,13.0,5.0,Extrovert
1,9.0,Yes,0.0,0.0,Yes,0.0,3.0,Introvert
2,9.0,Yes,1.0,2.0,Yes,5.0,2.0,Introvert
3,0.0,No,6.0,7.0,No,14.0,8.0,Extrovert
4,3.0,No,9.0,4.0,No,8.0,5.0,Extrovert
...,...,...,...,...,...,...,...,...
2895,3.0,No,7.0,6.0,No,6.0,6.0,Extrovert
2896,3.0,No,8.0,3.0,No,14.0,9.0,Extrovert
2897,4.0,Yes,1.0,1.0,Yes,4.0,0.0,Introvert
2898,11.0,Yes,1.0,,Yes,2.0,0.0,Introvert


In [5]:
# Basic Info
print("Dataset Shape:", df.shape)

print("\nDataset Info:")
print(df.info())

print("\nMissing Values:")
print(df.isnull().sum())

# Define numerical and categorical columns/features
numeric_cols = ['Time_spent_Alone', 'Social_event_attendance', 'Going_outside', 'Friends_circle_size', 'Post_frequency']
categorical_cols = ['Stage_fear', 'Drained_after_socializing']
target_col = 'Personality'

# Verify categorical values
for col in categorical_cols:
    print(f"\nUnique values in {col}:")
    print(df[col].value_counts(dropna=False))

Dataset Shape: (2900, 8)

Dataset Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2900 entries, 0 to 2899
Data columns (total 8 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   Time_spent_Alone           2837 non-null   float64
 1   Stage_fear                 2827 non-null   object 
 2   Social_event_attendance    2838 non-null   float64
 3   Going_outside              2834 non-null   float64
 4   Drained_after_socializing  2848 non-null   object 
 5   Friends_circle_size        2823 non-null   float64
 6   Post_frequency             2835 non-null   float64
 7   Personality                2900 non-null   object 
dtypes: float64(5), object(3)
memory usage: 181.4+ KB
None

Missing Values:
Time_spent_Alone             63
Stage_fear                   73
Social_event_attendance      62
Going_outside                66
Drained_after_socializing    52
Friends_circle_size          77
Post_frequency            

In [25]:
# Number of Introverts vs Extroverts
num_intro = df[df.get('Personality') == 'Introvert'].get('Personality').size
num_extro = df[df.get('Personality') == 'Extrovert'].get('Personality').size
print(f"Number of Introverts: {num_intro}")
print(f"\nNumber of Extroverts: {num_extro}")

Number of Introverts: 1409

Number of Extroverts: 1491


## Feature Engineering

## Model Training

## Analysis/Evaluation

## Discussion/Conclusion