# Data Analysis and Visualisation &mdash; Lab 07

## Problem 3

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from setuptools.command.rotate import rotate
from scipy import stats
import numpy as np

plt.style.use("ggplot")
plt.rcParams.update({
    "font.size": 12,
})

### Loading data

In [2]:
df = pd.read_csv("../data/laptop-prices/Laptop+Prices.csv")

### Statistic t-test

Proposed question:

*"Gaming laptops are more expensive on average than Ultrabooks."*

**Explanation:**

We perform a two-sample t-test (one-tailed, `greater`) to evaluate whether Gaming laptops have a higher mean price.

Hypotheses:

* $H_0: \mu_\text{Gaming} = \mu_\text{Ultrabooks}$
* $H_1: \mu_\text{Gaming} > \mu_\text{Ultrabooks}$

In [9]:
from scipy import stats

gaming = df[df["TypeName"] == "Gaming"]["Price_euros"]
ultra = df[df["TypeName"] == "Ultrabook"]["Price_euros"]

t_stat, p_value = stats.ttest_ind(gaming, ultra, alternative='greater')

print(f"t-stat = {t_stat:.3f}")
print(f"p-value = {p_value:.4f}")

t-stat = 2.586
p-value = 0.0050


**Summary:**

$\text{p–value} < 0.05 \Rightarrow$ We reject $H_0$.

We have strong evidence that Gaming laptops are significantly more expensive on average than Ultrabooks.

### Chi-square Test

Proposed question:

*"Apple and Dell may target different price tiers. Do they actually differ in how their laptops are distributed across price tiers?"*

**Explanation:**

We compare Apple with Dell across three price tiers (Budget, Mid, High) to see whether company and price tier are associated. A chi-square test of independence checks whether the distribution of tiers differs between the two brands.

Hypotheses:

* $H_0$: Company and price tier are independent (Apple and Dell follow the same tier distribution).
* $H_1$: Company and price tier are dependent (their tier distributions differ).

In [4]:
bins = [0, 800, 1500, df["Price_euros"].max()]
labels = ["Budget", "Mid", "High"]
df["Price Tier"] = pd.cut(df["Price_euros"], bins=bins, labels=labels, include_lowest=True)

In [5]:
subset = df[df["Company"].isin(["Apple", "Dell"])]

In [6]:
table = pd.crosstab(subset["Company"], subset["Price Tier"])
print(table)

Price Tier  Budget  Mid  High
Company                      
Apple            0   12     9
Dell            96  126    69


In [10]:
chi2, p, dof, expected = stats.chi2_contingency(table)

print(f"chi2 = {chi2:.3f}")
print(f"p-value = {p:.4f}")
print(f"dof = {dof}")

chi2 = 10.648
p-value = 0.0049
dof = 2


**Summary:**

$\text{p–value} < 0.05 \Rightarrow$ We reject $H_0$.

There is strong evidence that Apple and Dell do not share the same price-tier distribution: Apple skews heavily toward Mid/High tiers, while Dell spans all tiers, especially Budget.

### One-way ANOVA Test

Proposed question:

*"Do laptops with different RAM sizes differ significantly in price?"*

**Explanation:**

We perform a one-way ANOVA to test whether the mean price varies across RAM categories.

Hypotheses:

* $H_0$: The mean price is the same across all RAM sizes.
* $H_1$: At least one RAM size has a different mean price.

In [11]:
groups = [
    df[df["Ram"] == ram]["Price_euros"]
    for ram in sorted(df["Ram"].unique())
]

f_stat, p_value = stats.f_oneway(*groups)

print(f"F-statistic = {f_stat:.3f}")
print(f"p-value = {p_value:.4f}")

F-statistic = 224.189
p-value = 0.0000


**Summary:**

$\text{p–value} < 0.05 \Rightarrow$ We reject $H_0$.

There is strong evidence that laptop prices differ across RAM sizes; higher RAM generally corresponds to higher prices.