# IIC-2433 Minería de Datos UC

- Versiones de librerías, python 3.8.10

- numpy 1.20.3
- sklearn 1.3.1
- pgmpy 0.1.25
- networkx 2.8.3
- scipy 1.10.1

In [1]:
import numpy as np
import pandas as pd

data = pd.DataFrame(np.random.randint(0, 3, size=(2500, 8)), columns=list('ABCDEFGH'))
data['A'] += data['B'] + data['C']
data['H'] = data['G'] - data['A'] + data['F']
data['G'] += data['D'] + data['E']


## Actividad en clase

Usando **Redes Bayesianas**, haga lo siguiente:

- Aprenda la red de dependencias entre variables usando BIC y Hill Climbing.
- Use las dependencias aprendidas para crear una red Bayesiana.
- Ajuste los parámetros de la red usando priors de Dirichlet con cuentas uniformes. Obtenga los cpds de la red.
- Muestre las independencias locales de todas las variables de la red. Explique.
- Obtenga la cpd de H. ¿Cuál es la cardinalidad de H?
- Dispone de la siguiente evidencia: 'B': 0, 'C': 0, 'D': 1, 'E': 0, 'F': 1. Determine cual es el resultado más probable para H.
- Cuanto termine, me avisa para entregarle una **L (logrado)**.
- Recuerde que cada L es una décima más en la nota de la asignatura.
- Pueden trabajar de a dos.

***Tiene hasta el final de la clase.***


# Solución

In [3]:
from pgmpy.estimators import BIC
from pgmpy.models import DiscreteBayesianNetwork
from pgmpy.factors.discrete import TabularCPD
from pgmpy.estimators import HillClimbSearch

hc = HillClimbSearch(data)
best_model = hc.estimate(scoring_method=BIC(data))
print(best_model.edges())

INFO:pgmpy: Datatype (N=numerical, C=Categorical Unordered, O=Categorical Ordered) inferred from data: 
 {'A': 'N', 'B': 'N', 'C': 'N', 'D': 'N', 'E': 'N', 'F': 'N', 'G': 'N', 'H': 'N'}
INFO:pgmpy: Datatype (N=numerical, C=Categorical Unordered, O=Categorical Ordered) inferred from data: 
 {'A': 'N', 'B': 'N', 'C': 'N', 'D': 'N', 'E': 'N', 'F': 'N', 'G': 'N', 'H': 'N'}
INFO:pgmpy: Datatype (N=numerical, C=Categorical Unordered, O=Categorical Ordered) inferred from data: 
 {'A': 'N', 'B': 'N', 'C': 'N', 'D': 'N', 'E': 'N', 'F': 'N', 'G': 'N', 'H': 'N'}


  0%|          | 0/1000000 [00:00<?, ?it/s]

[('A', 'H'), ('A', 'B'), ('A', 'C'), ('C', 'B'), ('D', 'E'), ('F', 'H'), ('G', 'E'), ('G', 'D')]


In [4]:
model = DiscreteBayesianNetwork([('A', 'H'), ('A', 'B'), ('A', 'C'), ('C', 'B'), ('D', 'E'), ('F', 'H'), ('G', 'E'), ('G', 'D')])

In [5]:
from pgmpy.estimators import BayesianEstimator

model.fit(data, estimator=BayesianEstimator, prior_type="BDeu") 
for cpd in model.get_cpds():
    print(cpd)

INFO:pgmpy: Datatype (N=numerical, C=Categorical Unordered, O=Categorical Ordered) inferred from data: 
 {'A': 'N', 'B': 'N', 'C': 'N', 'D': 'N', 'E': 'N', 'F': 'N', 'G': 'N', 'H': 'N'}


+------+-----------+
| A(0) | 0.0398061 |
+------+-----------+
| A(1) | 0.109666  |
+------+-----------+
| A(2) | 0.234616  |
+------+-----------+
| A(3) | 0.251383  |
+------+-----------+
| A(4) | 0.211064  |
+------+-----------+
| A(5) | 0.119247  |
+------+-----------+
| A(6) | 0.0342173 |
+------+-----------+
+-------+-----------------------+-----+-----------------------+
| A     | A(0)                  | ... | A(6)                  |
+-------+-----------------------+-----+-----------------------+
| F     | F(0)                  | ... | F(2)                  |
+-------+-----------------------+-----+-----------------------+
| H(-6) | 0.0007158196134574088 | ... | 0.0008930166101089481 |
+-------+-----------------------+-----+-----------------------+
| H(-5) | 0.0007158196134574088 | ... | 0.0008930166101089481 |
+-------+-----------------------+-----+-----------------------+
| H(-4) | 0.0007158196134574088 | ... | 0.413466690480443     |
+-------+-----------------------+-----+------

In [6]:
model.local_independencies(['A','B','C','D','E','F','G','H'])

(A ⟂ F, E, G, D)
(B ⟂ F, E, D, G, H | C, A)
(C ⟂ F, E, D, G, H | A)
(D ⟂ F, C, A, B, H | G)
(E ⟂ F, C, A, B, H | G, D)
(F ⟂ C, E, D, G, A, B)
(G ⟂ F, C, A, B, H)
(H ⟂ C, E, D, G, B | F, A)

In [7]:
print(model.get_cpds('H'))

+-------+-----------------------+-----+-----------------------+
| A     | A(0)                  | ... | A(6)                  |
+-------+-----------------------+-----+-----------------------+
| F     | F(0)                  | ... | F(2)                  |
+-------+-----------------------+-----+-----------------------+
| H(-6) | 0.0007158196134574088 | ... | 0.0008930166101089481 |
+-------+-----------------------+-----+-----------------------+
| H(-5) | 0.0007158196134574088 | ... | 0.0008930166101089481 |
+-------+-----------------------+-----+-----------------------+
| H(-4) | 0.0007158196134574088 | ... | 0.413466690480443     |
+-------+-----------------------+-----+-----------------------+
| H(-3) | 0.0007158196134574088 | ... | 0.28969458831934275   |
+-------+-----------------------+-----+-----------------------+
| H(-2) | 0.0007158196134574088 | ... | 0.28969458831934275   |
+-------+-----------------------+-----+-----------------------+
| H(-1) | 0.0007158196134574088 | ... | 

In [8]:
from pgmpy.inference import VariableElimination

infer = VariableElimination(model)
g_dist = infer.query(['H'])

print(infer.query(['H'], evidence={'B': 0, 'C': 0, 'D': 1, 'E': 0, 'F': 1}))

+-------+----------+
| H     |   phi(H) |
| H(-6) |   0.0003 |
+-------+----------+
| H(-5) |   0.0004 |
+-------+----------+
| H(-4) |   0.0005 |
+-------+----------+
| H(-3) |   0.0006 |
+-------+----------+
| H(-2) |   0.0006 |
+-------+----------+
| H(-1) |   0.1314 |
+-------+----------+
| H(0)  |   0.1881 |
+-------+----------+
| H(1)  |   0.3524 |
+-------+----------+
| H(2)  |   0.2686 |
+-------+----------+
| H(3)  |   0.0568 |
+-------+----------+
| H(4)  |   0.0003 |
+-------+----------+


### El resultado más probable es H = 1