# Repaso de probabilidad

## Probabilidad condicional

$P(A|B) = \frac{P(A ∩ B)}{P(B)}$

![](https://miro.medium.com/v2/resize:fit:4800/format:webp/1*8GB3SuoGNOaFW5rk8zZp3w.png)
![](https://upload.wikimedia.org/wikipedia/commons/thumb/8/8a/Conditional_probability_venn_1-10.svg/744px-Conditional_probability_venn_1-10.svg.png)

**Analicemos la siguiente base de datos:** https://www.kaggle.com/datasets/uciml/student-alcohol-consumption e intentemos responder la pregunta:

1) ¿Qué probabilidad hay de que un estudiante pase con calificación mayor de 8 dado que faltó 10 veces o más a clase?

In [26]:
# https://towardsdatascience.com/conditional-probability-with-a-python-example-fd6f5937cd2
# https://www.statology.org/conditional-probability-in-python/ (pendiente)
import pandas as pd
import numpy as np
df = pd.read_csv('data/StudentAlcoholConsumption/student-mat.csv')
df.head(3)

Unnamed: 0,school,sex,age,address,famsize,Pstatus,Medu,Fedu,Mjob,Fjob,...,famrel,freetime,goout,Dalc,Walc,health,absences,G1,G2,G3
0,GP,F,18,U,GT3,A,4,4,at_home,teacher,...,4,3,4,1,1,3,6,5,6,6
1,GP,F,17,U,GT3,T,1,1,at_home,other,...,5,3,3,1,1,3,4,5,5,6
2,GP,F,15,U,LE3,T,1,1,at_home,other,...,4,3,2,2,3,3,10,7,8,10


G3 - final grade (numeric: from 0 to 20, output target) 

In [27]:
df['calif'] = np.where(df['G3']*5 >= 80, 1, 0)

In [28]:
df.head(3)

Unnamed: 0,school,sex,age,address,famsize,Pstatus,Medu,Fedu,Mjob,Fjob,...,freetime,goout,Dalc,Walc,health,absences,G1,G2,G3,calif
0,GP,F,18,U,GT3,A,4,4,at_home,teacher,...,3,4,1,1,3,6,5,6,6,0
1,GP,F,17,U,GT3,T,1,1,at_home,other,...,3,3,1,1,3,4,5,5,6,0
2,GP,F,15,U,LE3,T,1,1,at_home,other,...,3,2,2,3,3,10,7,8,10,0


absences - number of school absences (numeric: from 0 to 93), considerar un alto ausentismo si faltó 10 veces o más.

In [29]:
df['ausencias_altas'] = np.where(df['absences'] >= 10, 1, 0)
df.head(3)

Unnamed: 0,school,sex,age,address,famsize,Pstatus,Medu,Fedu,Mjob,Fjob,...,goout,Dalc,Walc,health,absences,G1,G2,G3,calif,ausencias_altas
0,GP,F,18,U,GT3,A,4,4,at_home,teacher,...,4,1,1,3,6,5,6,6,0,0
1,GP,F,17,U,GT3,T,1,1,at_home,other,...,3,1,1,3,4,5,5,6,0,0
2,GP,F,15,U,LE3,T,1,1,at_home,other,...,2,2,3,3,10,7,8,10,0,1


In [30]:
df['contador'] = 1
df.head(10)

Unnamed: 0,school,sex,age,address,famsize,Pstatus,Medu,Fedu,Mjob,Fjob,...,Dalc,Walc,health,absences,G1,G2,G3,calif,ausencias_altas,contador
0,GP,F,18,U,GT3,A,4,4,at_home,teacher,...,1,1,3,6,5,6,6,0,0,1
1,GP,F,17,U,GT3,T,1,1,at_home,other,...,1,1,3,4,5,5,6,0,0,1
2,GP,F,15,U,LE3,T,1,1,at_home,other,...,2,3,3,10,7,8,10,0,1,1
3,GP,F,15,U,GT3,T,4,2,health,services,...,1,1,5,2,15,14,15,0,0,1
4,GP,F,16,U,GT3,T,3,3,other,other,...,1,2,5,4,6,10,10,0,0,1
5,GP,M,16,U,LE3,T,4,3,services,other,...,1,2,5,10,15,15,15,0,1,1
6,GP,M,16,U,LE3,T,2,2,other,other,...,1,1,3,0,12,12,11,0,0,1
7,GP,F,17,U,GT3,A,4,4,other,teacher,...,1,1,1,6,6,5,6,0,0,1
8,GP,M,15,U,LE3,A,3,2,services,other,...,1,1,1,0,16,18,19,1,0,1
9,GP,M,15,U,GT3,T,3,4,other,other,...,1,1,5,0,14,15,15,0,0,1


In [32]:
df = df[['calif','ausencias_altas','contador']]
df.head()

Unnamed: 0,calif,ausencias_altas,contador
0,0,0,1
1,0,0,1
2,0,1,1
3,0,0,1
4,0,0,1


In [34]:
pd.pivot_table(df, values='contador', index=['calif'], columns=['ausencias_altas'], aggfunc=np.size, fill_value=0)

ausencias_altas,0,1
calif,Unnamed: 1_level_1,Unnamed: 2_level_1
0,277,78
1,35,5


In our case:
P(A) is the probability of a grade of 80% or greater.
P(B) is the probability of missing 10 or more classes.
P(A|B) is the probability of a 80%+ grade, given missing 10 or more classes.

Calculations of parts:
P(A) = (35 + 5) / (35 + 5 + 277 + 78) = 0.10126582278481013
P(B) = (78 + 5) / (35 + 5 + 277 + 78) = 0.21012658227848102
P(A ∩ B) = 5 / (35 + 5 + 277 + 78) = 0.012658227848101266

And per the formula, P(A|B) = P(A ∩ B) / P(B), put it together.

P(A|B) = 0.012658227848101266/ 0.21012658227848102= 0.06

There we have it. The probability of getting at least an 80% final grade, given missing 10 or more classes is 6%.