# Datos Cuantitativos

![image.png](attachment:image.png)

![image.png](attachment:image.png)

### Frecuencia datos cuantitativos

El tratamiento de las frecuencia de datos cuantitavos es similar al de los datos ordinales. La cosa cambia ligeramente debido a que no se tienen en cuenta todos los niveles posibles, sino unicamente los observados

![image.png](attachment:image.png)

In [1]:
import numpy as np
import pandas as pd

In [2]:
edad = np.array([15,18,25,40,30,29,56,40,13,27,42,23,11,26,25,32,30,40,33,29])
#np.unique(edad, return_counts=True)
df_edad = pd.DataFrame(np.unique(edad, return_counts=True)).T
df_edad.rename(columns={0:'edad', 1:'absoluta'}, inplace=True)
df_edad.sort_values('edad', ascending=True, inplace=True)
df_edad['frec_relativa'] = df_edad['absoluta'] / df_edad['absoluta'].sum()
df_edad['frec_acumulada'] =  df_edad['absoluta'].cumsum()
df_edad['relativa_acumulada'] = df_edad['frec_relativa'].cumsum()
df_edad

Unnamed: 0,edad,absoluta,frec_relativa,frec_acumulada,relativa_acumulada
0,11,1,0.05,1,0.05
1,13,1,0.05,2,0.1
2,15,1,0.05,3,0.15
3,18,1,0.05,4,0.2
4,23,1,0.05,5,0.25
5,25,2,0.1,7,0.35
6,26,1,0.05,8,0.4
7,27,1,0.05,9,0.45
8,29,2,0.1,11,0.55
9,30,2,0.1,13,0.65


In [3]:
df_edad.T

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14
edad,11.0,13.0,15.0,18.0,23.0,25.0,26.0,27.0,29.0,30.0,32.0,33.0,40.0,42.0,56.0
absoluta,1.0,1.0,1.0,1.0,1.0,2.0,1.0,1.0,2.0,2.0,1.0,1.0,3.0,1.0,1.0
frec_relativa,0.05,0.05,0.05,0.05,0.05,0.1,0.05,0.05,0.1,0.1,0.05,0.05,0.15,0.05,0.05
frec_acumulada,1.0,2.0,3.0,4.0,5.0,7.0,8.0,9.0,11.0,13.0,14.0,15.0,18.0,19.0,20.0
relativa_acumulada,0.05,0.1,0.15,0.2,0.25,0.35,0.4,0.45,0.55,0.65,0.7,0.75,0.9,0.95,1.0


![image.png](attachment:image.png)

Entonces, en esta variable cuantitava:

* La frecuencia absoluta de $X_{i}$ es el numero $n_{i}$ de elementos que son iguales a $X_{i}$
* La frecuencia relativa de $X_{i}$ es $f_{i} = \frac{n_{i}}{n}$
* La frecuencia absoluta acumulada de $X_{i}$ es $N_{i} = \sum^{i}_{j=1} n_{j}$
* La frecuencia relativa acumulada de $X_{i}$ es $F_{i} = \frac{N_{i}}{n}$

![image.png](attachment:image.png)

In [4]:
np.random.seed(162017)
dados = np.random.randint(1, 7, size=25)
dados

array([3, 6, 5, 4, 1, 5, 3, 2, 4, 1, 3, 1, 2, 3, 1, 5, 6, 4, 4, 1, 3, 2,
       4, 5, 1])

In [5]:
df_dados = pd.DataFrame(np.unique(dados, return_counts=True)).T
df_dados.rename(columns={0:'dado', 1:'absoluta'}, inplace=True)
df_dados.sort_values('dado', ascending=True, inplace=True)
df_dados['relativa'] = df_dados['absoluta'] / df_dados['absoluta'].sum()
df_dados['acumulada'] = df_dados['absoluta'].cumsum()
df_dados['relativa_acumulada'] = df_dados['relativa'].cumsum()
df_dados

Unnamed: 0,dado,absoluta,relativa,acumulada,relativa_acumulada
0,1,6,0.24,6,0.24
1,2,3,0.12,9,0.36
2,3,5,0.2,14,0.56
3,4,5,0.2,19,0.76
4,5,4,0.16,23,0.92
5,6,2,0.08,25,1.0


In [6]:
df_dados.T

Unnamed: 0,0,1,2,3,4,5
dado,1.0,2.0,3.0,4.0,5.0,6.0
absoluta,6.0,3.0,5.0,5.0,4.0,2.0
relativa,0.24,0.12,0.2,0.2,0.16,0.08
acumulada,6.0,9.0,14.0,19.0,23.0,25.0
relativa_acumulada,0.24,0.36,0.56,0.76,0.92,1.0


### Medidas de tendencia central

![image.png](attachment:image.png)

#### Mediana

![image.png](attachment:image.png)

![image.png](attachment:image.png)

In [7]:
edad = np.array([11,13,15,18,23,25,25,26,27,29,29,30,30,32,33,40,40,40,42,56])
edad

array([11, 13, 15, 18, 23, 25, 25, 26, 27, 29, 29, 30, 30, 32, 33, 40, 40,
       40, 42, 56])

In [8]:
np.median(edad)

29.0

In [9]:
np.mean(edad)

29.2

In [10]:
#Moda
vals, counts = np.unique(edad, return_counts=True)
moda = vals[counts == np.max(counts)]
moda

array([40])

### Medidas de Posición

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

In [11]:
np.random.seed(260798)
dado = np.random.randint(1, 5, size=50)
dado

array([3, 4, 4, 2, 4, 3, 2, 4, 3, 4, 3, 2, 2, 1, 1, 4, 1, 3, 4, 4, 4, 3,
       2, 3, 2, 2, 1, 3, 4, 3, 3, 3, 1, 1, 2, 3, 4, 3, 1, 1, 2, 1, 2, 3,
       1, 3, 4, 4, 2, 3])

In [12]:
df_dados_2 = pd.DataFrame(np.unique(dado, return_counts=True)).T
df_dados_2.rename(columns={0:'dado', 1:'absoluta'}, inplace=True)
df_dados_2.sort_values('dado', ascending=True, inplace=True)
df_dados_2['relativa'] = df_dados_2['absoluta'] / df_dados_2['absoluta'].sum()
df_dados_2['acumulada'] = df_dados_2['absoluta'].cumsum()
df_dados_2['relativa_acumulada'] = df_dados_2['relativa'].cumsum()
df_dados_2

Unnamed: 0,dado,absoluta,relativa,acumulada,relativa_acumulada
0,1,10,0.2,10,0.2
1,2,11,0.22,21,0.42
2,3,16,0.32,37,0.74
3,4,13,0.26,50,1.0


### Cuantiles

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

In [13]:
np.random.seed(0)
dados2 = np.random.randint(1, 7, size=15)
dados2

array([5, 6, 1, 4, 4, 4, 2, 4, 6, 3, 5, 1, 1, 5, 3])

In [18]:
np.quantile(dados2, 0.8)

5.0

### Medidas de dispersión

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

In [21]:
#Rango 
np.max(dados2) - np.min(dados2)

5

In [22]:
#IQR
q3, q1 = np.quantile(dados2, [0.75, 0.25])
q3 - q1

2.5

In [23]:
#Varianza muestral
np.var(dados2)

2.773333333333333

In [24]:
#Desviancion tipica o estandar
np.std(dados2)

1.6653327995729061

In [25]:
#Varianza verdadera
n = len(dados2)
np.var(dados2) * (n-1)/n

2.588444444444444

In [26]:
#Desvaicion estandar verdadera
np.std(dados2) * np.sqrt((n-1)/n)

1.6088643337598245

In [27]:
dados2.describe()

AttributeError: 'numpy.ndarray' object has no attribute 'describe'