# [EconML](https://github.com/py-why/EconML)

Biblioteka skoncentrowana na modelowaniu heterogenicznych efektów interwencji przy użyciu technik nauczania maszynowego. Jest wspierana przez firmę Micrososft. Jej zakres jest podobny do biblioteki CausalML. Biblioteka ta jest ściśle związana z biblioteką DoWhy. 


In [None]:
%pip install econml

In [3]:
import pandas as pd

url = "https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv"

df = pd.read_csv(url)

print(df.head())

   PassengerId  Survived  Pclass  \
0            1         0       3   
1            2         1       1   
2            3         1       3   
3            4         1       1   
4            5         0       3   

                                                Name     Sex   Age  SibSp  \
0                            Braund, Mr. Owen Harris    male  22.0      1   
1  Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0      1   
2                             Heikkinen, Miss. Laina  female  26.0      0   
3       Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1   
4                           Allen, Mr. William Henry    male  35.0      0   

   Parch            Ticket     Fare Cabin Embarked  
0      0         A/5 21171   7.2500   NaN        S  
1      0          PC 17599  71.2833   C85        C  
2      0  STON/O2. 3101282   7.9250   NaN        S  
3      0            113803  53.1000  C123        S  
4      0            373450   8.0500   NaN        S  


In [4]:
print(df.columns)

Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp',
       'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'],
      dtype='object')


In [5]:
df = df.dropna(subset=['Age', 'Sex', 'SibSp', 'Parch', 'Fare', 'Embarked'] + ['Pclass'] +  ['Survived'], how='any')
df_cleaned = pd.get_dummies(df, columns=['Embarked', 'Sex'], drop_first=True)

columns_to_check = ['Age', 'Sex_male', 'SibSp', 'Parch', 'Fare', 'Embarked_Q', 'Embarked_S']
df_cleaned = df_cleaned.dropna(subset=columns_to_check, how='any')
print(df_cleaned.columns)

X = df_cleaned[['Age', 'Sex_male', 'SibSp', 'Parch', 'Fare', 'Embarked_Q', 'Embarked_S']]
T = df_cleaned['Pclass']
Y = df_cleaned['Survived']

Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Age', 'SibSp', 'Parch',
       'Ticket', 'Fare', 'Cabin', 'Embarked_Q', 'Embarked_S', 'Sex_male'],
      dtype='object')


In [6]:
from econml.dr import DRLearner
from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier

model_y = RandomForestRegressor()
model_t = RandomForestClassifier()

dr_learner = DRLearner(model_regression=model_y, model_propensity=model_t)

dr_learner.fit(Y, T, X=X)

<econml.dr._drlearner.DRLearner at 0x7f5234a36bb0>

In [9]:
import numpy as np
unknown_categories = X[X.isin([np.nan, np.inf, -np.inf]).any(1)]
print("Unknown Categories:", unknown_categories)

Unknown Categories: Empty DataFrame
Columns: [Age, Sex_male, SibSp, Parch, Fare, Embarked_Q, Embarked_S]
Index: []


In a future version of pandas all arguments of DataFrame.any and Series.any will be keyword-only.


In [None]:
# pomieszane typy zmiennych sprawdzic 
treatment_effects = dr_learner.effect(X)

# average treatment effect
ate = treatment_effects.mean()
print(f"Average Treatment Effect: {ate}")


ValueError: Found unknown categories [0] in column 0 during transform