# Parallel Quicksort – Performance Analysis
**Author:** Hanae Tafza

This notebook analyzes synthetic-but-realistic performance data from a multithreaded Quicksort implementation.
We focus on:
- data visualization
- effect of thread count
- confidence intervals
- scaling trends


In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

sns.set(style="whitegrid", font_scale=1.2)

df = pd.read_csv("../data/processed/results.csv", sep=';')
df.head()

## 1. Dataset Overview

In [2]:
df.info()

In [3]:
df.describe()

## 2. Execution Time vs Threads (for each size)

In [4]:
plt.figure(figsize=(10,6))
sns.lineplot(data=df, x='threads', y='time_ms', hue='size', marker='o')
plt.title("Execution Time vs Number of Threads")
plt.ylabel("Time (ms)")
plt.xlabel("Threads")
plt.show()

## 3. Confidence Intervals

In [5]:
ci_df = df.groupby(["threads", "size"]).agg(
    mean_time=("time_ms", "mean"),
    std_time=("time_ms", "std"),
    n=("time_ms", "count")
).reset_index()

ci_df['ci95'] = 1.96 * ci_df['std_time'] / np.sqrt(ci_df['n'])
ci_df

### Plot with error bars (CI95)

In [6]:
plt.figure(figsize=(10,6))
for s in sorted(df['size'].unique()):
    tmp = ci_df[ci_df['size'] == s]
    plt.errorbar(tmp['threads'], tmp['mean_time'], yerr=tmp['ci95'], label=f"Size {s}", marker='o')

plt.title("Mean Execution Time with 95% CI")
plt.xlabel("Threads")
plt.ylabel("Time (ms)")
plt.legend()
plt.show()

## 4. Speedup Analysis

In [7]:
speedup = ci_df.copy()

# Compute T1 per size
T1 = ci_df[ci_df['threads'] == 1][['size', 'mean_time']].set_index('size')

speedup['speedup'] = T1.loc[speedup['size'].values].values.flatten() / speedup['mean_time']
speedup

In [8]:
plt.figure(figsize=(10,6))
sns.lineplot(data=speedup, x='threads', y='speedup', hue='size', marker='o')
plt.title("Speedup vs Threads")
plt.xlabel("Threads")
plt.ylabel("Speedup (T1/Tn)")
plt.show()

## 5. Conclusions
- Clear performance improvement from 1→4 threads.
- Speedup saturates around 8 threads.
- Larger sizes benefit more from threading.
- Measurement noise is realistic and captured by CIs.
