# Edad de los ladrones de guante blanco

Fuente: Statistics in Criminal Justice, David Weisburd, Chester Britt, https://doi.org/10.1007/978-1-4614-9170-5

**Hipótesis nula**: La edad media de los tres delitos es igual

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import scipy.stats

## Datos

Edades por delito

In [None]:
guante_blanco = pd.read_csv("guante_blanco.csv", index_col=0)
guante_blanco

In [None]:
sample_size = guante_blanco.size
sample_size

In [None]:
group_size = len(guante_blanco.index)
group_size

In [None]:
number_of_groups = len(guante_blanco.columns)
number_of_groups

In [None]:
fig, ax = plt.subplots()
ax.set_xlabel("Delito")
ax.set_ylabel("Edad")
ax.errorbar(x=guante_blanco.columns, y=guante_blanco.mean(), yerr=guante_blanco.std(), ls='', marker='o')

## Suma de los cuadrados **dentro** de los grupos (SSW)

*Nota*: Las varianzas muestrales usan el denominador n-1 (corrección de Bessel)

In [None]:
group_var = guante_blanco.var()
group_var

In [None]:
square_sum_within = (group_size-1) * group_var.sum()
square_sum_within

## Suma de los cuadrados **entre** grupos (SSB)

In [None]:
age_mean = guante_blanco.mean()
age_mean

In [None]:
square_sum_between = group_size * (number_of_groups-1) * age_mean.var()
square_sum_between

## Suma de los cuadrados total (SST)

Verificamos que la suma de los cuadrados total es igual a la suma de las contribuciones entre y dentro de los grupos 

In [None]:
square_sum_total = (sample_size - 1) * guante_blanco.stack().var()
square_sum_total

In [None]:
square_sum_within + square_sum_between

## Estadístico F observado

In [None]:
ndof_between = number_of_groups - 1
ndof_between

In [None]:
ndof_within = sample_size - number_of_groups
ndof_within

In [None]:
ndof_total = sample_size - 1
ndof_total

In [None]:
F_observed = ( square_sum_between / ndof_between ) / (square_sum_within / ndof_within)
F_observed

## Pvalor

In [None]:
F_distribution = scipy.stats.f(ndof_between, ndof_within)
pvalue = F_distribution.sf(F_observed)
pvalue

## Test de Fisher con scipy

In [None]:
print(*guante_blanco.values.T)

In [None]:
F_observed, pvalue = scipy.stats.f_oneway(*guante_blanco.values.T)
print(f"F observed = {F_observed}")
print(f"Pvalue = {pvalue}")

## Plot

In [None]:
fig2, ax2 = plt.subplots()
ax2.set_xlabel("F statistic")
ax2.set_ylabel("PDF")
x = np.linspace(0, 20, 100)
y = scipy.stats.f.pdf(x, ndof_between, ndof_within)
ax2.plot(x,y)
trans = ax2.get_xaxis_transform()
ax2.axvline(F_observed, color='tab:orange', ls='--')
plt.text(F_observed, .5, r" $F_{observed}$", transform=trans)