# Procrastination Prediction Project
**Student:** Jacob Kabongo  
**Course:** COMP3125 Data Science Fundamentals

## 1. Introduction
Procrastination is a persistent challenge among students and significantly impacts academic performance and well-being. This project explores various contributing factors to procrastination, such as lack of motivation, poor time management, fear of failure, and distractions. The objective is to identify common causes and predict the likelihood of procrastination using machine learning techniques.

## 2. Dataset
We use a self-created dataset, simulating student survey responses with features including:
- `lack_of_motivation`
- `poor_time_management`
- `fear_of_failure`
- `perfectionism`
- `distractions`
- `procrastination_level` (target)


### Dataset Feature Description
| Column Name           | Description                                | Type       |
|-----------------------|--------------------------------------------|------------|
| `lack_of_motivation`  | Whether student lacks motivation (1/0)     | Binary     |
| `poor_time_management`| Time management issues (1/0)               | Binary     |
| `fear_of_failure`     | Anxiety about failing (1/0)                | Binary     |
| `perfectionism`       | Perfectionist tendencies (1/0)             | Binary     |
| `distractions`        | Prone to distractions (1/0)                | Binary     |
| `procrastination_level` | Procrastination level (Low, Medium, High) | Categorical |

## 3. Methodology
We use a Random Forest Classifier to model the likelihood of procrastination based on the other features. Below are steps involved:
- Data preprocessing (scaling, encoding)
- Train/test split
- Model training and evaluation
- Feature importance analysis
- Exploratory data visualization

## 4. Results
Below is the model performance summary and feature importance analysis.

### Model Performance Summary
| Metric     | Score (example) |
|------------|-----------------|
| Accuracy   | 85%             |
| Precision  | 88%             |
| Recall     | 83%             |
| F1-Score   | 84%             |

## 5. Discussion & Suggestions
Key drivers of procrastination were `lack_of_motivation`, `distractions`, and `poor_time_management`. These align with common psychological and behavioral challenges students face.

### Future Improvements
- Use a larger real-world dataset
- Include GPA, digital distraction metrics, or social habits
- Experiment with logistic regression or neural networks

## 6. Reflection
This project helped me explore the relationship between psychological and environmental factors contributing to student procrastination. It also gave me hands-on experience with data preprocessing, classification modeling, and interpreting results using Python.

## 7. References
- [IMRAD Format - Wikipedia](https://en.wikipedia.org/wiki/IMRAD)
- Scikit-learn Documentation
- Google Colab

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, confusion_matrix

dataset = pd.read_csv('Stu_pro_Level.csv')


le = LabelEncoder()
dataset['procrastination_level'] = le.fit_transform(dataset['procrastination_level'])


sns.countplot(x='procrastination_level', data=dataset)
plt.title('Distribution of Procrastination Levels')
plt.show()


sns.heatmap(dataset.drop(columns='respondent_id').corr(), annot=True, cmap='coolwarm')
plt.title("Correlation Heatmap")
plt.show()


X = dataset.drop(['procrastination_level', 'respondent_id'], axis=1)
y = dataset['procrastination_level']

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)


X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

rf = RandomForestClassifier(n_estimators=100, random_state=42)
rf.fit(X_train, y_train)
y_pred = rf.predict(X_test)

print("Classification Report:")
print(classification_report(y_test, y_pred))

print("Confusion Matrix:")
cm = confusion_matrix(y_test, y_pred)
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.title("Confusion Matrix")
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.show()


feat_importance = rf.feature_importances_
features = X.columns

plt.figure(figsize=(10, 6))
sns.barplot(x=feat_importance, y=features)
plt.title("Feature Importance")
plt.xlabel("Importance Score")
plt.ylabel("Features")
plt.tight_layout()
plt.show()
