# Comparing results of fitting CBPS between R's `CBPS` package and Python's `balance` package (using simulated data)

This notebook shows how we can reproduce (almost exactly) the weights produced from R's [CBPS](https://cran.r-project.org/web/packages/CBPS/) package, using the implementation in balance.

The example is based on simulated data that that was provided in the help page of the CBPS function (i.e.: `?CBPS::CBPS`, you can see it [here](https://rdrr.io/cran/CBPS/man/CBPS.html)).

The R code used to create the data is available [here](https://github.com/facebookresearch/balance/blob/main/balance/datasets/sim_data_cbps.R).

# Loading data and fitting CBPS using `balance`

In [None]:
import balance
import numpy as np
import pandas as pd
import session_info

from balance import Sample

In [None]:
target_df, sample_df = balance.datasets.load_data("sim_data_cbps")
# print(target_df.head())
print(target_df.info())

In [None]:
sample = Sample.from_frame(sample_df, outcome_columns = ['y', 'cbps_weights'])
target = Sample.from_frame(target_df, outcome_columns = ['y', 'cbps_weights'])
sample_target = sample.set_target(target)

In [None]:
# adjust = sample_target.adjust(method = "cbps")  # the defaults of the function would not yield similar-enough results, so we need to adjust some parameters:
adjust = sample_target.adjust(method = "cbps", transformations = None, weight_trimming_mean_ratio = None)

# Comparing results of `balance` and `CBPS` 

In [None]:
# adjust.df.plot.scatter(x="cbps_weights", y="weight", color="blue")
adjust.df[["cbps_weights", "weight"]].corr(method = "pearson")

In [None]:
# adjust.df.copy().assign(log_cbps_weights=np.log(adjust.df['cbps_weights']),log_weight=np.log(adjust.df['weight'])).plot.scatter('log_cbps_weights', 'log_weight', color='blue')
adjust.df[["cbps_weights", "weight"]].apply(lambda x: np.log10(x)).corr(method = "pearson")

In [None]:
# Notice how the y outcome before and after the weigts is 220.67 -> 207.55, similar to R's 220.67 -> 206.8
print(adjust.outcomes().summary())

In [None]:
# Just to get some sense of what the weights did to the covars:
adjust.covars().plot(library = "seaborn", dist_type = "kde")

In [None]:
# In contrast, if we were to use the original CBPS weights, we'd get this:
from copy import deepcopy
adjust2 = deepcopy(adjust)
cbps_weights = adjust2.outcomes().df.cbps_weights
adjust2.set_weights(cbps_weights)
# .covars().plot(library = "seaborn", dist_type = "kde")

In [None]:
# we can see that this worked since the weighted avg of y is now 206.8
print(adjust2.outcomes().summary())

In [None]:
# And here is how the covars looked like in the original CBPS implementation from R:
# Almost identical correcation as balance did
adjust2.covars().plot(library = "seaborn", dist_type = "kde")

# Sessions info

In [None]:
session_info.show(html=False, dependencies=True)