Skip to content
This repository has been archived by the owner on Jul 29, 2024. It is now read-only.

Possible issue with _get_counts and _get_tc_counts #7

Closed
shaddyab opened this issue Nov 26, 2019 · 2 comments
Closed

Possible issue with _get_counts and _get_tc_counts #7

shaddyab opened this issue Nov 26, 2019 · 2 comments

Comments

@shaddyab
Copy link
Contributor

Given 100K (N=1e5) samples with the following distribution:
Treatment = 98% (W=1)
Control = 2% (W=0)
Hence p = 0.98

The samples were balanced for response such that, for response (Y=1), the samples are split 1% (W=0, Y=1) Control vs 49% (W=1, Y=1) Treatment. Similarly, for no response (Y=0), the samples are split 1% (W=0, Y=0) Control vs 49% (W=1, Y=0) Treatment.

Based on this I would expect the two functions _get_counts and _get_tc_counts in pylift.eval to return the following values.
Nt1o1 = 49K, Nt0o1 = 1K , Nt1o0 = 49K, Nt0o0 = 1K
Nt1 = 98K, Nt0 = 2K , N = 1e5

However, the functions are returning the following values instead

Nt1o1 = 25K, Nt0o1 = 25K, Nt1o0 = 25K, Nt0o0 = 25K
Nt1 = 50K, Nt0 = 50K , N = 1e5

Could it be that that the implemented logic which is based on summing 1/p and 1/(1-p) values need to be modify?

@rsyi
Copy link
Owner

rsyi commented Nov 29, 2019

Hey @shaddyab -- this is working as intended, but the documentation is outdated. These functions are meant to calculate the effective counts, scaled by the propensity p, not the raw counts. The documentation though is indeed incorrect, so I'll update that.

@rsyi
Copy link
Owner

rsyi commented Nov 29, 2019

PR here: #8

@rsyi rsyi closed this as completed Nov 29, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants