# Problem 2: Graduate Admission Prediction using Decision Tree

This notebook implements the second problem statement: building a Decision Tree Classifier to predict whether a student will be admitted to a foreign university based on their academic profile.

### Task 1: Setup and Data Loading

First, we import the necessary libraries and load the `Admission_Predict.csv` dataset.

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, classification_report, confusion_matrix

# Set plot style
sns.set(style="whitegrid")

In [None]:
# Load the dataset from the local CSV file
file_path = 'd:\\ml\\LP-I\\Admission_Predict.csv'
df = pd.read_csv(file_path)

# Display the first few rows of the dataframe
print("First 5 rows of the dataset:")
df.head()

### Task 2: Data Pre-processing and Exploration

We will perform the following pre-processing steps:
1.  Clean up column names to remove any trailing spaces.
2.  Drop the `Serial No.` column as it's just an index.
3.  Convert the target variable `Chance of Admit` from a continuous probability to a binary outcome. We'll set a threshold of **0.75**; anything equal to or above is considered 'Admitted' (1), and anything below is 'Not Admitted' (0).
4.  Check for missing values.

In [None]:
# Clean column names (remove leading/trailing spaces)
df.columns = df.columns.str.strip()

# Drop the 'Serial No.' column
df = df.drop('Serial No.', axis=1) # axis=1 indicates we are dropping a column

# Check for missing values
print("Missing values in each column:")
print(df.isnull().sum())

# Convert 'Chance of Admit' to a binary variable
# We'll use a threshold of 0.75 for admission.
df['Admitted'] = df['Chance of Admit'].apply(lambda x: 1 if x >= 0.75 else 0)

# Drop the original 'Chance of Admit' column
df = df.drop('Chance of Admit', axis=1)

print("\nDataset after creating binary 'Admitted' column:")
df.head()

Now, let's define our features (X) and the new binary target (y).

In [None]:
# Define features (X) and target (y)
X = df.drop('Admitted', axis=1)
y = df['Admitted']

### Task 3: Perform Train-Test Split

We will split the dataset into a training set (80%) and a testing set (20%) to evaluate the model's performance on unseen data.

In [None]:
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print(f"Training set size: {X_train.shape[0]} samples")
print(f"Testing set size: {X_test.shape[0]} samples")

### Task 4: Implement and Train the Decision Tree Classifier

We will now create an instance of the `DecisionTreeClassifier` and train it using our training data.

In [None]:
# Create a Decision Tree model instance
# Using random_state for reproducibility
model = DecisionTreeClassifier(random_state=42)

# Train the model on the training data
model.fit(X_train, y_train)

print("Decision Tree Classifier model trained successfully!")

### Task 5: Evaluate Model Performance

With the model trained, we can make predictions on our test data and evaluate its performance using accuracy, precision, and recall. We will also display a full classification report and a confusion matrix for a more detailed view.

In [None]:
# Make predictions on the test data
y_pred = model.predict(X_test)

# Calculate performance metrics
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)

print("Model Performance Evaluation:")
print(f'Accuracy: {accuracy:.2f}')
print(f'Precision: {precision:.2f}')
print(f'Recall: {recall:.2f}')

# Display the full classification report
print("\nClassification Report:")
print(classification_report(y_test, y_pred, target_names=['Not Admitted', 'Admitted']))

#### Visualize the Confusion Matrix

A confusion matrix helps us visualize the model's performance in terms of true positives, true negatives, false positives, and false negatives.

In [None]:
# Generate the confusion matrix
cm = confusion_matrix(y_test, y_pred)

# Plot the confusion matrix using a heatmap
plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',  # fmt means integer format, annot means annotate cells
            xticklabels=['Not Admitted', 'Admitted'], 
            yticklabels=['Not Admitted', 'Admitted'])
plt.title('Confusion Matrix')
plt.xlabel('Predicted Label')
plt.ylabel('True Label')
plt.show()

### Conclusion

We have successfully implemented a basic Decision Tree Classifier to predict graduate school admissions.

**Code Quality and Clarity:**
- The notebook is structured logically according to the tasks in the problem statement.
- Each step is explained with comments and markdown, making the process easy to follow.
- The code uses standard libraries and follows best practices for a minimal implementation, such as setting a `random_state` for reproducibility.

**Model Interpretation:**
- **Accuracy:** The overall correctness of the model.
- **Precision:** Out of all the students the model predicted would be admitted, this is the percentage that was actually admitted. A high precision is important to avoid giving false hope.
- **Recall:** Out of all the students who were actually admitted, this is the percentage that the model correctly identified. 

**Potential Improvements:**
- **Hyperparameter Tuning:** The performance of the Decision Tree can be improved by tuning parameters like `max_depth`, `min_samples_split`, and `criterion`.
- **Cross-Validation:** To get a more robust measure of performance, k-fold cross-validation could be used instead of a single train-test split.
- **Feature Scaling:** While not strictly necessary for Decision Trees, scaling features can be good practice, especially if other models (like SVM or Logistic Regression) are to be tested.
- **Threshold Tuning:** The admission threshold of 0.75 was an assumption. This could be tuned to optimize for either precision or recall, depending on the counsellor's goal.