# Inference after ranking

This is a template for regression analysis after ranking. It estimates the parameters using conditionally quantile-unbiased estimates and "almost" quantile-unbiased hybrid estimates.

Click the badge below to use this template on your own data. This will open the notebook in a Jupyter binder.

[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gl/dsbowen%2Fconditional-inference/HEAD?urlpath=lab/tree/docs/examples/rank_conditions.ipynb)

Instructions:

1. Upload a file named `data.csv` to this folder with your conventional estimates. Open `data.csv` to see an example. In this file, we named our dependent variable "dep_variable", and have estimated parameters named "policy0",..., "policy9". The first column of `data.csv` contains the conventional estimates $m$ of the unknown parameters. The remaining columns contain consistent estimates of the covariance matrix $\Sigma$. In `data.csv`, $m=(0, 1,..., 9)$ and $\Sigma = I$.
2. Modify the code if necessary.
3. Run the notebook.

### Citations

    @techreport{andrews2019inference,
      title={Inference on winners},
      author={Andrews, Isaiah and Kitagawa, Toru and McCloskey, Adam},
      year={2019},
      institution={National Bureau of Economic Research}
    }

    @article{andrews2022inference,
      Author = {Andrews, Isaiah and Bowen, Dillon and Kitagawa, Toru and McCloskey, Adam},
      Title = {Inference for Losers},
      Journal = {AEA Papers and Proceedings},
      Volume = {112},
      Year = {2022},
      Month = {May},
      Pages = {635-42},
      DOI = {10.1257/pandp.20221065},
      URL = {https://www.aeaweb.org/articles?id=10.1257/pandp.20221065}
    }

### Runtime warnings and long running times

If you are estimating the effects of many policies or the policy effects are close together, you may see `RuntimeWarning` messages and experience long runtimes. Runtime warnings are common, usually benign, and can be safely ignored.

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

from multiple_inference.bayes import Improper
from multiple_inference.rank_condition import RankCondition

data_file = "data.csv"
alpha = .05

conventional_model = Improper.from_csv(data_file, sort=True)
ranked_model = RankCondition.from_csv(data_file, sort=True)
sns.set()

We'll start by summarizing and plotting the conventional estimates.

In [None]:
conventional_results = conventional_model.fit(title="Conventional estiamtes")
conventional_results.summary(alpha=alpha)

In [None]:
conventional_results.point_plot(alpha=alpha)
plt.show()

One property we want our estimators to have is *quantile-unbiasedness*. An estimator is quantile-unbiased if the true parameter falls below its $\alpha$-quantile estimate with probability $\alpha$ given its estimated rank. For example, the true effect of the top-performing treatment should fall below its median estimate half the time.

Similarly, we want confidence intervals to have *correct conditional coverage*. Correct conditional coverage means that the parameter should fall within our $\alpha$-level confidence interval with probability $1-\alpha$ given its estimated rank. For example, the true effect of the top-performing treatment should fall within its 95% confidence interval 95% of the time.

Below, we compute the optimal quantile-unbiased estimates and conditionally correct confidence intervals for each parameter given its rank.

In [None]:
conditional_results = ranked_model.fit(title="Conditional estimates")
conditional_results.summary(alpha=alpha)

In [None]:
conditional_results.point_plot(alpha=alpha)
plt.show()

Conditional inference is a strict requirement. Conditionally quantile-unbiased estimates can be highly variable. And conditionally correct confidence intervals can be unrealistically long. We can often obtain more reasonable estimates by focusing on *unconditional* inference instead of *conditional* inference.

Imagine we ran our randomized control trial 10,000 times and want to estimate the effect of the top-performing treatment. We need *conditional* inference if we're interested the subset of trials where a specific parameter $k$ was the top performer. However, we can use *unconditional* inference if we're only interested in being right "on average" across all 10,000 trials.

Below, we use *hybrid estimates* to compute approximately quantile-unbiased estimates and unconditionally correct confidence intervals for each parameter.

If you don't know whether you need conditional or unconditional inference, use unconditional inference.

In [None]:
hybrid_results = ranked_model.fit(beta=.005, title="Hybrid estimates")
hybrid_results.summary(alpha=alpha)

In [None]:
hybrid_results.point_plot(alpha=alpha)
plt.show()