# IIC-2433 Minería de Datos UC

- Versiones de librerías, python 3.8.10

- numpy 1.20.3
- sklearn 1.3.1
- pgmpy 0.1.25
- networkx 2.8.3
- scipy 1.10.1

In [1]:
import numpy as np
import pandas as pd

data = pd.DataFrame(np.random.randint(0, 3, size=(2500, 8)), columns=list('ABCDEFGH'))
data['A'] += data['B'] + data['C']
data['H'] = data['G'] - data['A'] + data['F']
data['G'] += data['D'] + data['E']


## Actividad en clase

Usando **Redes Bayesianas**, haga lo siguiente:

- Aprenda la red de dependencias entre variables usando BIC y Hill Climbing.
- Use las dependencias aprendidas para crear una red Bayesiana.
- Ajuste los parámetros de la red usando priors de Dirichlet con cuentas uniformes. Obtenga los cpds de la red.
- Muestre las independencias locales de todas las variables de la red. Observe si hay alguna dependencia esperable que no se puede identificar. Explique.
- Obtenga la cpd de H. ¿Cuál es la cardinalidad de H?
- Dispone de la siguiente evidencia: 'B': 0, 'C': 0, 'D': 1, 'E': 0, 'F': 1. Determine cual es el resultado más probable para H.
- Cuanto termine, me avisa para entregarle una **L (logrado)**.
- Recuerde que cada L es una décima más en la nota de la asignatura.
- Pueden trabajar de a dos.

***Tiene hasta el final de la clase.***


# Solución

In [2]:
from pgmpy.estimators import BicScore
from pgmpy.models import BayesianNetwork
from pgmpy.factors.discrete import TabularCPD
from pgmpy.estimators import HillClimbSearch

hc = HillClimbSearch(data)
best_model = hc.estimate(scoring_method=BicScore(data))
print(best_model.edges())

HBox(children=(FloatProgress(value=0.0, max=1000000.0), HTML(value='')))

[('A', 'C'), ('A', 'B'), ('B', 'C'), ('D', 'G'), ('E', 'G'), ('F', 'A'), ('F', 'H'), ('H', 'A')]


In [3]:
model = BayesianNetwork([('A', 'H'), ('B', 'A'), ('C', 'A'), ('D', 'G'), ('E', 'G'), ('F', 'H')])

In [4]:
from pgmpy.estimators import BayesianEstimator

model.fit(data, estimator=BayesianEstimator, prior_type="BDeu") 
for cpd in model.get_cpds():
    print(cpd)

+------+-----------------------+-----+-----------------------+
| B    | B(0)                  | ... | B(2)                  |
+------+-----------------------+-----+-----------------------+
| C    | C(0)                  | ... | C(2)                  |
+------+-----------------------+-----+-----------------------+
| A(0) | 0.3307801269608101    | ... | 0.0003193051919024203 |
+------+-----------------------+-----+-----------------------+
| A(1) | 0.3307801269608101    | ... | 0.0003193051919024203 |
+------+-----------------------+-----+-----------------------+
| A(2) | 0.3373904831855622    | ... | 0.0003193051919024203 |
+------+-----------------------+-----+-----------------------+
| A(3) | 0.0002623157232044489 | ... | 0.0003193051919024203 |
+------+-----------------------+-----+-----------------------+
| A(4) | 0.0002623157232044489 | ... | 0.38655086531707006   |
+------+-----------------------+-----+-----------------------+
| A(5) | 0.0002623157232044489 | ... | 0.32620218404751

In [5]:
model.local_independencies(['A','B','C','D','E','F','G','H'])

(A ⟂ E, F, G, D | C, B)
(B ⟂ C, F, E, G, D)
(C ⟂ F, E, B, G, D)
(D ⟂ C, H, F, E, B, A)
(E ⟂ C, H, F, B, A, D)
(F ⟂ C, E, B, G, D, A)
(G ⟂ C, H, F, B, A | E, D)
(H ⟂ C, E, B, G, D | F, A)

### Observe que H es independiente de G

In [6]:
print(model.get_cpds('H'))

+-------+-----------------------+-----+-----------------------+
| A     | A(0)                  | ... | A(6)                  |
+-------+-----------------------+-----+-----------------------+
| F     | F(0)                  | ... | F(2)                  |
+-------+-----------------------+-----+-----------------------+
| H(-6) | 0.0005812601720530109 | ... | 0.0005660590965696819 |
+-------+-----------------------+-----+-----------------------+
| H(-5) | 0.0005812601720530109 | ... | 0.0005660590965696819 |
+-------+-----------------------+-----+-----------------------+
| H(-4) | 0.0005812601720530109 | ... | 0.3143892222348013    |
+-------+-----------------------+-----+-----------------------+
| H(-3) | 0.0005812601720530109 | ... | 0.3143892222348013    |
+-------+-----------------------+-----+-----------------------+
| H(-2) | 0.0005812601720530109 | ... | 0.36669308275783996   |
+-------+-----------------------+-----+-----------------------+
| H(-1) | 0.0005812601720530109 | ... | 

In [7]:
from pgmpy.inference import VariableElimination

infer = VariableElimination(model)
g_dist = infer.query(['H'])

print(infer.query(['H'], evidence={'B': 0, 'C': 0, 'D': 1, 'E': 0, 'F': 1}))

+-------+----------+
| H     |   phi(H) |
| H(-6) |   0.0003 |
+-------+----------+
| H(-5) |   0.0004 |
+-------+----------+
| H(-4) |   0.0005 |
+-------+----------+
| H(-3) |   0.0006 |
+-------+----------+
| H(-2) |   0.0006 |
+-------+----------+
| H(-1) |   0.1140 |
+-------+----------+
| H(0)  |   0.2471 |
+-------+----------+
| H(1)  |   0.3114 |
+-------+----------+
| H(2)  |   0.2149 |
+-------+----------+
| H(3)  |   0.1098 |
+-------+----------+
| H(4)  |   0.0003 |
+-------+----------+


### El resultado más probable es H = 1