Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question on Conclusion of this ROI Notebook #4

Open
BrianMiner opened this issue Jan 3, 2023 · 1 comment
Open

Question on Conclusion of this ROI Notebook #4

BrianMiner opened this issue Jan 3, 2023 · 1 comment

Comments

@BrianMiner
Copy link

BrianMiner commented Jan 3, 2023

This is a great notebook, enjoyed reading it. I do have one question that is really bugging me.

It is established that creating an auxiliary variable of revenue divided by cost:

df["rho"] = df["revenue"] / df["cost"]
smf.ols("rho ~ new_machine", df).fit().summary().tables[1]

does not represent $\frac{\Delta R}{\Delta C}$

But if this is the case, how does the regression at the end, conducted on an auxiliary variable, which is essentially
df["revenue"] - df["cost"] and a couple constants, adequately represent essentially $\Delta R - \Delta C$ ? Isnt this the same thing as above in concept?

@matteocourthoud
Copy link
Owner

matteocourthoud commented Jan 4, 2023

Hi Brian, thanks for the question!

The intuition behind the result is indeed not trivial. However, remember that the purpose of the auxiliary variable is not to "represent" $\frac{\Delta R}{ \Delta C}$, but to generate a variable whose difference-in-means has the same variance as the ROI estimator.

For what concerns intuition, I can give you a different way to build the estimator, which is maybe more intuitive. You can also rewrite the estimator as an extremum estimator, i.e. the solution of
$\hat{\rho} : \ \mathbb{E} [ \Delta R - \rho \Delta C]=0$

or equivalently
$\hat{\rho} = \arg\min_\rho \mathbb{E} [ \Delta R - \rho \Delta C]^2$

I.e. you want to estimate the ROI as the value that implies the same "baseline revenue" across treatment and control group, on average. With this approach, you get the same variance, but with a different interpretation: is the variance of the objective function, $Var[ \Delta R - \rho \Delta C]$, multiplied by its inverse squared derivative with respect to $\rho$, $\frac{1}{\mathbb{E}[\Delta C]^2}$. Maybe this way is more intuitive because it makes clear where the "baseline revenue" comes from: its expected value is the objective function.

I might add this to the article since (maybe) it can be helpful to others as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants