
# Project Management Analytics

This notebook explores a synthetic project management dataset, performs exploratory data analysis (EDA), visualizes key metrics, and builds a predictive model to assess the likelihood of project success.


In [None]:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix

# Load the dataset
file_path = 'synthetic_project_data.csv'
data = pd.read_csv(file_path, parse_dates=['start_date', 'end_date'])

data.head()


In [None]:

# Display basic information and summary statistics
print('Dataset shape:', data.shape)
print('
Summary statistics:')
print(data.describe(include='all'))


In [None]:

# Set seaborn style
sns.set(style='whitegrid')

# Distribution of budget
plt.figure(figsize=(8, 5))
sns.histplot(data['budget'], bins=30, kde=True)
plt.title('Distribution of Project Budgets')
plt.xlabel('Budget (USD)')
plt.ylabel('Count')
plt.show()

# Scatter plot of budget vs actual cost colored by success
plt.figure(figsize=(8, 5))
sns.scatterplot(x='budget', y='actual_cost', hue='success', data=data, alpha=0.7)
plt.title('Budget vs Actual Cost by Project Success')
plt.xlabel('Budget (USD)')
plt.ylabel('Actual Cost (USD)')
plt.show()

# Bar chart of success counts
plt.figure(figsize=(6, 4))
sns.countplot(x='success', data=data)
plt.title('Project Success Distribution')
plt.xlabel('Success (1=Yes, 0=No)')
plt.ylabel('Number of Projects')
plt.show()


In [None]:

# Select features for the predictive model
feature_cols = ['budget', 'actual_cost', 'team_size', 'tasks_total', 'tasks_completed', 'risk_score']
X = data[feature_cols]
y = data['success']

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42, stratify=y)

# Standardize features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Build logistic regression model
model = LogisticRegression(max_iter=1000)
model.fit(X_train_scaled, y_train)

# Predictions
y_pred = model.predict(X_test_scaled)

# Evaluation
print('Confusion Matrix:
', confusion_matrix(y_test, y_pred))
print('
Classification Report:
', classification_report(y_test, y_pred))



## Conclusion

This notebook demonstrates an end-to-end analysis of a synthetic project management dataset. We performed exploratory data analysis, visualized key trends, and built a logistic regression model to predict project success. The approach can be extended with additional features or more sophisticated models (e.g., decision trees, random forests) to improve predictive performance.

Feel free to experiment with the dataset and modeling techniques to derive deeper insights and better predictions.
