#  Extrovert vs. Introvert Personality Prediction using Logistic Regression

##  Problem Statement

Use the **Extrovert vs. Introvert Personality Traits Dataset** to build a binary classification model using **Logistic Regression** that predicts whether a person is an **extrovert** or not.

---

##  Step-by-Step Tasks

### 1. Data Loading & Basic Exploration
- Load the dataset using `pandas`.
- View the first few rows and check column names and data types.
- Check the shape of the dataset (rows × columns).

### 2. Data Cleaning
-  Check and handle **missing values** (nulls).
-  Remove any **duplicate rows**.
-  Detect and treat **outliers** using:
  - `sns.boxplot()` for each feature
  - IQR method (optional)
-  Analyze the **distribution** of each numerical feature using:
  - `sns.histplot()` or `sns.kdeplot()` (previously `distplot`)

### 3. Target Variable Encoding
- Encode the `Personality` column:
  - `Extrovert` → 1  
  - `Introvert` → 0

### 4. Define Features and Target
-  **Target Variable:**  
  - `Personality` (binary: 0 or 1)

-  **Feature Variables:**  
  - `Time_spent_Alone`  
  - `Stage_fear`  
  - `Social_event_attendance`  
  - `Going_outside`  
  - `Drained_after_socializing`  
  - `Friends_circle_size`  
  - `Post_frequency`

### 5. Exploratory Data Analysis (EDA)

-  **Distribution Plots:**
  - Use `sns.histplot()` or `sns.kdeplot()` to check how each feature is distributed.
  - Helps understand skewness and variability.

-  **Box Plots:**
  - Use `sns.boxplot()` to identify outliers in each feature.
  - Also use boxplot to compare each feature across personality types (0 vs. 1).

-  **Scatter Plots:**
  - Use `sns.scatterplot()` to visualize relationships between pairs of features.
  - Use `hue='Personality'` to color by class.

-  **Heatmap (Correlation Matrix):**
  - Use `sns.heatmap()` to visualize correlation between features.
  - Identify strongly correlated features.

-  **Feature vs. Target Relationship:**
  - Use `boxplot` grouped by `Personality` to see feature impact.
  - Use `groupby('Personality').mean()` to summarize feature differences.

### 6. Feature Scaling
- Use `StandardScaler` from `sklearn.preprocessing` to scale numerical features.
-  **Important:** Perform **VIF calculation after scaling** the features.

### 7. Multicollinearity Check
-  Use **VIF (Variance Inflation Factor)** to detect multicollinearity.
-  **Note:** VIF is applied only on **feature variables**, not the target.
- Drop or combine features with **VIF > 5** if needed.

### 8. Train-Test Split
- Use `train_test_split()` to split data:
  - 80% for training
  - 20% for testing
- Use `random_state` for reproducibility.

### 9. Model Building
- Use `LogisticRegression` from `sklearn.linear_model` to train the model.
- Fit the model on the training set.

### 10. Model Evaluation
- Evaluate model performance using:
  -  `accuracy_score`
  -  `confusion_matrix`
  - `Training and Testng Score`
---

##  Final Goal

Build a clean and accurate logistic regression model that predicts whether a person is an **extrovert or introvert**, based on their social behavior and personality traits.
