# Job Transition Analytics Project

This notebook presents an exploratory data analysis and a predictive modeling approach on a synthetic dataset. The goal is to showcase skills relevant to business analysis, program management, and data analytics roles.


In [None]:
# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

# Display plots inline
%matplotlib inline


In [None]:
# Load the synthetic dataset
df = pd.read_csv('synthetic_dataset.csv', parse_dates=['OrderDate'])
df.head()


In [None]:
# Summary statistics
df.describe(include='all')


In [None]:
# Total revenue by region
plt.figure(figsize=(8,5))
sns.barplot(x='Region', y='Revenue', data=df, estimator=sum, ci=None)
plt.title('Total Revenue by Region')
plt.ylabel('Total Revenue')
plt.show()


In [None]:
# Revenue over time
plt.figure(figsize=(10,5))
df_sorted = df.sort_values('OrderDate')
plt.plot(df_sorted['OrderDate'], df_sorted['Revenue'], marker='o', linestyle='-')
plt.title('Revenue over Time')
plt.xlabel('Order Date')
plt.ylabel('Revenue')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()


In [None]:
# Correlation heatmap
plt.figure(figsize=(8,6))
corr = df[['UnitsSold','UnitPrice','Revenue','Cost','Profit']].corr()
sns.heatmap(corr, annot=True, cmap='viridis')
plt.title('Correlation Heatmap')
plt.show()


In [None]:
# Predictive model: Predict revenue based on units sold and unit price

# Feature matrix and target vector
X = df[['UnitsSold', 'UnitPrice']]
y = df['Revenue']

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Linear regression model
model = LinearRegression()
model.fit(X_train, y_train)

# Predictions
preds = model.predict(X_test)

# Evaluation metrics
mae = mean_absolute_error(y_test, preds)
mse = mean_squared_error(y_test, preds)
r2 = r2_score(y_test, preds)

print(f"Mean Absolute Error: {mae:.2f}")
print(f"Mean Squared Error: {mse:.2f}")
print(f"R-squared: {r2:.2f}")


## Conclusion

This notebook demonstrates how to work with a synthetic business dataset by performing exploratory data analysis and building a simple predictive model. The analysis highlights relationships between regions and revenue, trends over time, and correlations among numeric variables. Finally, a linear regression model is used to predict revenue using units sold and unit price, illustrating fundamental skills in data preprocessing, visualization, and modeling.
