# Reglas de Agrupacion

Usando la **Regla de la Raiz cuadrada**

$$k = \left \lceil{\sqrt{n}}\right \rceil$$

$k$ es el numero de clases

In [1]:
import numpy as np
import pandas as pd

In [2]:
data = pd.read_csv("../../data/datacrab.txt",delimiter=" ")
cw = data["width"]

In [3]:
n = len(cw)
k = np.ceil(np.sqrt(n))
k

14.0

Ahora vamos a definir la amplitud $A$ de los intervalos

$$A = \frac{\max(x)-\min(x)}{k}$$

In [4]:
A = (max(cw)-min(cw))/k
A
A = 0.9
precision = 0.1
A

0.9

Ahora toca calcular los extremos de los intervalos

In [5]:
L1 = min(cw) - (1/2)*precision
L1

20.95

In [13]:
L = np.zeros(int(k+1))

for i in range(0,int(k+1)):
    L[i] = L1 + A * i

L    

array([20.95, 22.35, 23.75, 25.15, 26.55, 27.95, 29.35, 30.75, 32.15,
       33.55])

Por ultimo calculamos las marcas  de clase de los intervalos.

In [64]:
X1 = (L[0]+L[1])/2
X = np.zeros(14)

for i in range(0,14):
     X[i] = X1 + A*i

X

array([21.575, 22.825, 24.075, 25.325, 26.575, 27.825, 29.075, 30.325,
       31.575, 32.825, 34.075, 35.325, 36.575, 37.825])

## Usando Regla de Sturges

$k = \left \lceil{1+\log_{2}(n)}\right \rceil$

In [8]:
k = np.ceil(1 + np.log2(n))
k

9.0

Ahora vamos a definir la amplitud $A$ de los intervalos

$$A = \frac{\max(x)-\min(x)}{k}$$

In [9]:
A = (max(cw)-min(cw))/k
A


1.3888888888888888

In [10]:
A = 1.4
precision = 0.1

Ahora toca calcular los extremos de los intervalos

In [11]:
L1 = min(cw) - (1/2)*precision
L = np.zeros(int(k+1))

for i in range(0,int(k+1)):
    L[i] = L1 + A * i
    
L    

array([20.95, 22.35, 23.75, 25.15, 26.55, 27.95, 29.35, 30.75, 32.15,
       33.55])

In [63]:
X1 = (L[0]+L[1])/2
X = np.zeros(int(k))

for i in range(0,int(k)):
    X[i] = X1 + A * i

X

array([21.575, 22.825, 24.075, 25.325, 26.575, 27.825, 29.075, 30.325,
       31.575, 32.825])

## Usando la Regla de FreedMan-Diaconis

In [27]:
from scipy.stats import iqr

Afd = 2*iqr(cw)*n**(-1/3)
Afd

1.005015267470426

In [30]:
k = np.ceil((max(cw)-min(cw))/Afd)
k

13.0

Calculemos la amplitud

In [31]:
A = (max(cw)-min(cw))/k
A

0.9615384615384616

In [32]:
A = 1.0
precision = 0.1

Seguimos calculando los extremos de los intervalos

In [42]:
L1 = min(cw)-(1/2)*precision
L = np.zeros(14)

for i in range(0,14):
    L[i] = L1 + A*i

L

array([20.95, 21.95, 22.95, 23.95, 24.95, 25.95, 26.95, 27.95, 28.95,
       29.95, 30.95, 31.95, 32.95, 33.95])

In [62]:
X1 = (L[0]+L[1])/2
X = np.zeros(13)

for i in range(0,13):
    X[i] = X1 + A*i

X

array([21.575, 22.825, 24.075, 25.325, 26.575, 27.825, 29.075, 30.325,
       31.575, 32.825, 34.075, 35.325, 36.575])

## Usando la Regla de Scott

In [47]:
As = 3.5*np.std(cw)*n**(-1/3)
k = np.ceil((max(cw)-min(cw))/As)
k

10.0

In [49]:
A= (max(cw)-min(cw))/k
A

1.25

In [50]:
a = 1.3

In [51]:
precision  = 0.1

In [52]:
L1 = min(cw)-(1/2)*precision
L1

20.95

In [56]:
L = np.zeros(11)

for i in range(0,11):
    L[i] = L1 + A*i

L

array([20.95, 22.2 , 23.45, 24.7 , 25.95, 27.2 , 28.45, 29.7 , 30.95,
       32.2 , 33.45])

In [67]:
X1 = (L[0]+L[1])/2
X = np.zeros(10)

for i in range(0,10):
    X[i] = X1 + A*i

X

array([21.575, 22.825, 24.075, 25.325, 26.575, 27.825, 29.075, 30.325,
       31.575, 32.825])