# [CausalNex](https://github.com/uber/causalml) 

CausalNex to biblioteka Pythona, która wykorzystuje sieci bayesowskie do łączenia uczenia maszynowego z wiedzą dziedzinową w celu prowadzenia rozumowania przyczynowego. Dzięki CausalNex możesz odkrywać strukturalne zależności w swoich danych, uczyć się złożonych rozkładów oraz obserwować skutki potencjalnych interwencji. Biblioteka umożliwia bardziej wszechstronną analizę przyczynową, wspomagając podejmowanie decyzji opartych na danych.

## Instalacja

In [45]:
%pip install numpy>=1.24.1
%pip install ipython==8.10.0
%pip install pandas==2.0.3
%pip install causalnex

You should consider upgrading via the '/home/natalia/wsi/venv/bin/python -m pip install --upgrade pip' command.[0m
Note: you may need to restart the kernel to use updated packages.
You should consider upgrading via the '/home/natalia/wsi/venv/bin/python -m pip install --upgrade pip' command.[0m
Note: you may need to restart the kernel to use updated packages.
Collecting pandas==2.0.3
  Using cached pandas-2.0.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.4 MB)
Installing collected packages: pandas
  Attempting uninstall: pandas
    Found existing installation: pandas 1.5.3
    Uninstalling pandas-1.5.3:
      Successfully uninstalled pandas-1.5.3
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
causalnex 0.12.1 requires pandas<2.0,>=1.0, but you have pandas 2.0.3 which is incompatible.[0m
Successfully installed pandas-2.0.3
You should co

## Przygotowanie danych

In [46]:
import pandas as pd

url = "https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv"

df = pd.read_csv(url)

df['Cabin'].isnull().sum() # 687 z 891 wartości jest pustych, pominę tę kolumnę 
df['Ticket'] # nie można w łatwy sposób podzielić na kategorię, też pomijam  
features = ['Pclass', 'Sex', 'Age', 'SibSp', 'Parch', 'Fare', 'Embarked']
df = df[features + ['Survived']]


df.dropna(inplace=True)
df = df.astype({"Age": int})

df.loc[:, 'Sex']= df['Sex'].astype('category').cat.codes 
df.loc[:,'Embarked'] = df['Embarked'].astype('category').cat.codes 
df.head()

Unnamed: 0,Pclass,Sex,Age,SibSp,Parch,Fare,Embarked,Survived
0,3,1,22,1,0,7.25,2,0
1,1,0,38,1,0,71.2833,0,1
2,3,0,26,0,0,7.925,2,1
3,1,0,35,1,0,53.1,2,1
4,3,1,35,0,0,8.05,2,0


In [47]:
import warnings
from causalnex.structure import StructureModel

warnings.filterwarnings("ignore")  # silence warnings

sm = StructureModel()

In [48]:
sm.add_edges_from([
    ('Pclass', 'Cabin'), 
    ('Pclass', 'Fare')
])
sm.edges

OutEdgeView([('Pclass', 'Cabin'), ('Pclass', 'Fare')])

In [49]:
from causalnex.plots import plot_structure, NODE_STYLE, EDGE_STYLE

viz = plot_structure(
    sm,
    all_node_attributes=NODE_STYLE.WEAK,
    all_edge_attributes=EDGE_STYLE.WEAK,
)


#viz.show("01_simple_plot.html")


In [50]:
from causalnex.structure.notears import from_pandas
struct_data = df.copy()
sm = from_pandas(struct_data)

sm.remove_edges_below_threshold(0.8)

In [51]:
sm = sm.get_largest_subgraph()
viz = plot_structure(
    sm,
    all_node_attributes=NODE_STYLE.WEAK,
    all_edge_attributes=EDGE_STYLE.WEAK,
)

viz.toggle_physics(False)
#viz.show("01_fully_connected.html")

In [52]:
import networkx as nx

nx.drawing.nx_pydot.write_dot(sm, 'graph.dot')

In [53]:
from graphviz import Source

with open('graph.dot', 'r') as file:
    dot_graph = file.read()

src = Source(dot_graph)
src.render(format='png') 


'Source.gv.png'

In [54]:
sm.remove_edge("Embarked", "Age")

In [55]:
from causalnex.network import BayesianNetwork

bn = BayesianNetwork(sm)


discretised_data = struct_data.copy()

In [56]:
# Split 90% train and 10% test
from sklearn.model_selection import train_test_split

train, test = train_test_split(discretised_data, train_size=0.9, test_size=0.1, random_state=7)

In [57]:
bn = bn.fit_node_states(discretised_data)

In [58]:
bn = bn.fit_cpds(train, method="BayesianEstimator", bayes_prior="K2")

In [59]:
bn.cpds["Survived"]

Survived,Unnamed: 1
0,0.598131
1,0.401869


In [60]:
url = "https://raw.githubusercontent.com/dsindy/kaggle-titanic/master/data/test.csv"

test = pd.read_csv(url)

test['Cabin'].isnull().sum() 
test['Ticket'] 
features = ['Pclass', 'Sex', 'Age', 'SibSp', 'Parch', 'Fare', 'Embarked']
test = test[features]


test.dropna(inplace=True)
test = test.astype({"Age": int})

test.loc[:, 'Sex']= test['Sex'].astype('category').cat.codes 
test.loc[:,'Embarked'] = test['Embarked'].astype('category').cat.codes 
test.head()

Unnamed: 0,Pclass,Sex,Age,SibSp,Parch,Fare,Embarked
0,3,1,34,0,0,7.8292,1
1,3,0,47,1,0,7.0,2
2,2,1,62,0,0,9.6875,1
3,3,1,27,0,0,8.6625,2
4,3,0,22,1,1,12.2875,2


In [61]:
from causalnex.evaluation import classification_report

classification_report(bn, df, 'Survived')

{'Survived_0': {'precision': 0.5955056179775281,
  'recall': 1.0,
  'f1-score': 0.7464788732394366,
  'support': 424.0},
 'Survived_1': {'precision': 0.0,
  'recall': 0.0,
  'f1-score': 0.0,
  'support': 288.0},
 'accuracy': 0.5955056179775281,
 'macro avg': {'precision': 0.29775280898876405,
  'recall': 0.5,
  'f1-score': 0.3732394366197183,
  'support': 712.0},
 'weighted avg': {'precision': 0.3546269410427976,
  'recall': 0.5955056179775281,
  'f1-score': 0.4445323627156196,
  'support': 712.0}}

In [67]:
from causalnex.evaluation import roc_auc
roc, auc = roc_auc(bn, df, 'Survived')
print(f"AUC: {auc}")

AUC: 0.5955056179775281
