# Browser Performance Analysis

This notebook analyzes browser performance metrics collected from Chrome and Firefox. We will explore the data, compare performance across browsers, and visualize key metrics to gain insights.

**Data Sources:**

- `chrome/results/performance_metrics.csv`
- `firefox/results/performance_metrics.csv`

**Objectives:**

- Load and inspect the data.
- Clean and preprocess the data.
- Perform exploratory data analysis (EDA).
- Visualize performance metrics.
- Draw conclusions and suggest next steps.


In [None]:
# Import necessary libraries
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Set visualization style
sns.set_style('whitegrid')
%matplotlib inline


## Loading the Data

We will load the performance metrics data from both Chrome and Firefox and combine them into a single DataFrame for analysis.


In [None]:
# Define the base directory (parent directory of the notebook)
base_dir = os.path.abspath('..')  # Adjust as needed

# Paths to the data files
chrome_csv = os.path.join(base_dir, 'data', 'chrome', 'performance_metrics.csv')
firefox_csv = os.path.join(base_dir, 'data', 'firefox', 'performance_metrics.csv')
combined_csv = os.path.join(base_dir, 'data', 'combined_results', 'performance_metrics.csv')

# Choose whether to use the combined CSV or individual browser CSVs
use_combined_csv = False  # Set to True if you want to use the combined CSV

if use_combined_csv:
    # Load combined data
    df = pd.read_csv(combined_csv)
else:
    # Load data from Chrome and Firefox CSV files
    df_chrome = pd.read_csv(chrome_csv)
    df_firefox = pd.read_csv(firefox_csv)

    # Add a 'browser' column if not already present
    if 'browser' not in df_chrome.columns:
        df_chrome['browser'] = 'Chrome'
    if 'browser' not in df_firefox.columns:
        df_firefox['browser'] = 'Firefox'

    # Combine DataFrames
    df = pd.concat([df_chrome, df_firefox], ignore_index=True)

# Display the first few rows
df.head()


## Data Inspection

Let's inspect the DataFrame to understand the structure and check for any issues.


In [None]:
# Get DataFrame information
df.info()
# Check for missing values
df.isnull().sum()
# Get descriptive statistics
df.describe()


## Data Cleaning and Preprocessing

We will handle missing values and ensure that data types are appropriate for analysis.


In [None]:
# Define numeric columns
numeric_columns = [
    'total_page_load_time_ms',
    'ttfb_ms',
    'content_download_time_ms',
    'adjusted_dom_parsing_time_ms',
    'adjusted_rendering_time_ms',
    'adjusted_browser_processing_time_ms',
    'first_paint_ms',
    'first_contentful_paint_ms',
    'total_transfer_size_bytes',
]

# Convert columns to numeric types
for col in numeric_columns:
    df[col] = pd.to_numeric(df[col], errors='coerce')

# Drop rows with missing values in key metrics
df.dropna(subset=numeric_columns, inplace=True)

# Verify data types
df[numeric_columns].dtypes


## Exploratory Data Analysis (EDA)

We will explore the data through descriptive statistics and visualizations to uncover patterns and insights.


### Descriptive Statistics

Let's compute descriptive statistics for the performance metrics.


In [None]:
# Compute descriptive statistics grouped by browser
df.groupby('browser')[numeric_columns].describe().T


### Visualizations

We will create various plots to visualize and compare the performance metrics across browsers.


#### Comparative Box Plots

Box plots allow us to compare the distribution of metrics across browsers.


In [None]:
# List of metrics to plot
metrics = [
    'total_page_load_time_ms',
    'ttfb_ms',
    'content_download_time_ms',
    'adjusted_dom_parsing_time_ms',
    'adjusted_rendering_time_ms',
    'adjusted_browser_processing_time_ms',
    # Add other metrics as needed
]

for metric in metrics:
    plt.figure(figsize=(8, 6))
    sns.boxplot(data=df, x='browser', y=metric)
    plt.title(f'Comparison of {metric.replace("_", " ").title()} Across Browsers')
    plt.xlabel('Browser')
    plt.ylabel(f'{metric.replace("_", " ").title()} (ms)')
    plt.show()


#### Violin Plots

Violin plots show the distribution of the data and can highlight differences in the density of the metrics.


In [None]:
for metric in metrics:
    plt.figure(figsize=(8, 6))
    sns.violinplot(data=df, x='browser', y=metric, inner='quartile')
    plt.title(f'Distribution of {metric.replace("_", " ").title()} Across Browsers')
    plt.xlabel('Browser')
    plt.ylabel(f'{metric.replace("_", " ").title()} (ms)')
    plt.show()


#### Grouped Bar Charts for Top Websites

We will compare performance metrics for the top 10 websites based on average total page load time.


In [None]:
# Choose a metric
metric = 'total_page_load_time_ms'

# Get top N websites based on average metric
top_n = 10
top_sites = (
    df.groupby('website')[metric]
    .mean()
    .sort_values(ascending=False)
    .head(top_n)
    .index
)

df_top_sites = df[df['website'].isin(top_sites)]

plt.figure(figsize=(12, 8))
sns.barplot(
    x=metric,
    y='website',
    hue='browser',
    data=df_top_sites,
    orient='h'
)
plt.title(f'Comparison of {metric.replace("_", " ").title()} for Top {top_n} Websites')
plt.xlabel(f'{metric.replace("_", " ").title()} (ms)')
plt.ylabel('Website')
plt.legend(title='Browser')
plt.tight_layout()
plt.show()


#### Performance Difference Between Browsers

We will calculate and visualize the difference in performance metrics between Chrome and Firefox for each website.


In [None]:
# Pivot the DataFrame
df_pivot = df.pivot_table(
    index='website',
    columns='browser',
    values=metric
).dropna()

# Check if we have exactly two browsers
if len(df['browser'].unique()) == 2:
    browsers = df['browser'].unique()
    browser_a, browser_b = browsers
    df_pivot['difference'] = df_pivot[browser_a] - df_pivot[browser_b]
    
    # Sort by difference
    df_pivot_sorted = df_pivot.sort_values('difference', ascending=False)
    
    # Plot the differences
    plt.figure(figsize=(12, 8))
    sns.barplot(
        x='difference',
        y=df_pivot_sorted.index,
        data=df_pivot_sorted.reset_index(),
        orient='h'
    )
    plt.title(f'Difference in {metric.replace("_", " ").title()} Between {browser_a} and {browser_b}')
    plt.xlabel(f'Difference in {metric.replace("_", " ").title()} (ms)')
    plt.ylabel('Website')
    plt.axvline(0, color='grey', linestyle='--')
    plt.tight_layout()
    plt.show()
else:
    print("Performance difference plot requires exactly two browsers.")


#### Correlation Heatmap

A correlation heatmap helps us understand the relationships between different performance metrics.


In [None]:
# Select numeric columns
numeric_cols = df.select_dtypes(include=['float64', 'int64']).columns

# Compute correlation matrix
corr = df[numeric_cols].corr()

# Plot heatmap
plt.figure(figsize=(12, 10))
sns.heatmap(corr, annot=True, cmap='coolwarm')
plt.title('Correlation Heatmap of Performance Metrics')
plt.tight_layout()
plt.show()


#### Scatter Plots with Regression Lines

We will examine the relationship between total transfer size and adjusted browser processing time.


In [None]:
x_metric = 'total_transfer_size_bytes'
y_metric = 'adjusted_browser_processing_time_ms'

sns.lmplot(
    data=df,
    x=x_metric,
    y=y_metric,
    hue='browser',
    height=6,
    aspect=1.5,
    scatter_kws={'alpha':0.5}
)
plt.title(f'{y_metric.replace("_", " ").title()} vs {x_metric.replace("_", " ").title()}')
plt.xlabel(f'{x_metric.replace("_", " ").title()}')
plt.ylabel(f'{y_metric.replace("_", " ").title()}')
plt.tight_layout()
plt.show()


## Conclusions and Next Steps

**Summary of Findings:**

- Based on the analysis, we observed that...

*(Add your interpretations and key takeaways here.)*

**Next Steps:**

- Investigate further into...
- Consider collecting additional data on...
- Share these findings with the team for feedback and action.

---

**Note:** This notebook provides an initial analysis of the browser performance data. For ongoing monitoring and more advanced analyses, consider integrating a database solution and automating the reporting process.
