
# Sprint Retrospective Analysis

## Objective
Analyze historical sprint data to identify areas for improvement and enhance future sprint planning.

## Instructions
Follow the steps provided in this notebook to load the data, preprocess it, perform exploratory data analysis, and draw insights for sprint retrospectives.


In [None]:

# Step 1: Load the Dataset
import pandas as pd

# Load dataset
data = pd.read_csv('sprint_data.csv')

# Explore dataset
print("First 5 rows of the dataset:")
print(data.head())
print("\nSummary statistics of the dataset:")
print(data.describe())


In [None]:

# Step 2: Data Preprocessing

# Handle missing values (if any)
data = data.dropna()

# Select relevant columns
relevant_columns = ['sprint_id', 'team_member', 'task_id', 'task_description', 'estimated_hours', 'actual_hours', 'completion_status']
data = data[relevant_columns]

# Display the first few rows of the preprocessed dataset
print("First 5 rows of the preprocessed dataset:")
print(data.head())


In [None]:

# Step 3: Exploratory Data Analysis

import matplotlib.pyplot as plt
import seaborn as sns

# Plot task completion rates
completion_rate = data['completion_status'].value_counts(normalize=True) * 100
plt.figure(figsize=(8, 5))
sns.barplot(x=completion_rate.index, y=completion_rate.values)
plt.title('Task Completion Rates')
plt.xlabel('Completion Status')
plt.ylabel('Percentage')
plt.show()

# Plot estimated vs actual time
plt.figure(figsize=(10, 6))
sns.scatterplot(x='estimated_hours', y='actual_hours', data=data, hue='completion_status')
plt.title('Estimated vs Actual Time for Tasks')
plt.xlabel('Estimated Hours')
plt.ylabel('Actual Hours')
plt.show()


In [None]:

# Step 4: Drawing Insights

# Calculate difference between estimated and actual hours
data['time_diff'] = data['actual_hours'] - data['estimated_hours']

# Identify tasks with the highest time difference
high_diff_tasks = data.nlargest(10, 'time_diff')

# Display the tasks with the highest time differences
print("Tasks with the highest differences between estimated and actual hours:")
print(high_diff_tasks[['task_id', 'task_description', 'estimated_hours', 'actual_hours', 'time_diff']])



## Conclusion

In this analysis, we loaded and preprocessed sprint data, conducted exploratory data analysis to visualize task completion rates and time discrepancies, and identified tasks with significant differences between estimated and actual hours. By leveraging these insights, teams can improve sprint planning, allocate resources more effectively, and enhance overall productivity.

Further analysis can include trend analysis over multiple sprints and correlation analysis between different variables to gain deeper insights into the factors influencing sprint outcomes.
