## Regression Analysis and Instrumental Variables in the context of Causal Effects

Regression analysis and instrumental variables play crucial roles in estimating causal effects in observational studies and econometrics. In the realm of causal inference, establishing a cause-and-effect relationship between variables is often challenging due to confounding factors and potential biases in observational data. Regression analysis serves as a fundamental tool to explore relationships between variables, but in observational studies, it may fall short in determining causal effects due to unobserved factors. 

Instrumental variables, on the other hand, offer a methodological approach to address endogeneity and confounding by identifying and leveraging external variables that act as instruments to isolate causal relationships. This interplay between regression analysis and instrumental variables provides a framework to unravel causal effects, offering insights into understanding and addressing the complexities of causality in empirical research.


## Regression Analysis: 

    There're two ways of viewing regression: one is descriptive and the second is causal.

    Descriptive approach:
    Regression as a way of viewing correlations in the data.
    Let's look at an example.
    Let's think about the effect of property rights on economic development: what's the causal
    effect?
    if we were to intervene today and go to every country and increase
    their property rights somehow, make their property rights stronger, what would be the
    average change in economic development?
    For example, would GDP go up everywhere?
    Let's look at The Colonial Origins of Comparative Development paper in wbhich Acemoglu, Johnson and Robinson looked at American Economic Review in 2001 in their paper
 

    In their data, each unit is a country, and each country is going to have a different level of property rights and of economic growth.

    They're going to measure property rights with a variable called “Protection Against Expropriation
    Risk."

    This variable is going to take on the values 0, 1, 2, 3 etc. up to 10, where 10 is having
    strong property rights, so there's a lot of protection against our property being expropriated
    from us by the government or somebody else,and 0 is going to be very weak rights, meaning
    we don't have a lot of protection against our property being expropriated from us.
    That's how they're going to measure property rights.

    Second, they are going to measure economic development by GDP, Gross Domestic Product,
    per capita or per person.

    So let's look at the data.
    
<img src="../images/ols.png"/>
    
    
 <img src="../images/ols-american-eco-review.png">
   
    Here they've plotted all the data, each point here is a different country.
    Up in the graph we USA, Canada, Australia, and New Zealand, down in the graph we have Sudan, Haiti and Mali.

    On the vertical axis, we have log GDP perCapita 1995 - so that's economic growth, that's
    what's being measured on the vertical axis.

    On the horizontal axis, we have, Average Expropriation Risk, 1985 through 95 - Average Protection Against Expropriation Risk, so higher numbers are stronger property rights like I just talkedabout earlier, lower numbers are lower property rights.
    We can see USA, Canada, Australia for example, have high economic growth and high property
    rights.
    Down in the graph, we have countries with lower property rights and lower economic growth, countries like Sudan and Haiti.
    And we can see that there seems to be this relationship: countries with higher economic
    growth have more property rights.

    There seems to be this relationship between having high property rights and having high
    economic development.

    Now formally, we could say that the regression of GDP per Capita on Expropriation Risk shows
    a positive relationship: the more protection we have of our property rights, the higher
    our GDP per capita is.

    A different way of saying this is that a regression of GDP per Capita on Protection Against Expropriation
    yields a positive coefficient on ProtectionAgainst Expropriation, or on property rights.

    So basically we're just saying that we've got some data on countries, on their outcome
    variable and on their treatment variable,and in the data of these two variables are
    positively correlated.

    That's all regression is telling us, we'redescribing the correlations in the data.
    Now, we know that correlation doesn't necessarily imply causation, so although we see in the data by doing regression analysis that there is this correlation, it doesn't tell us that if we were to intervene and increase in property rights in countries that that would cause GDP to increase.

    Because there here could be a confounding variable, there could be another variable that's actuallyhaving a causal effect on economic development and happens to be correlated with propertyrights, and the correlation we're seeing in
    our regression analysis in the data is justpicking up this confounder.

    For example, maybe what matters is how leaders are chosen.
    So suppose that democratic countries have higher economic development than dictatorships. Well, if democracy is also correlated with having high property rights, then this relationship we see in the data might not be a causal effect of property rights and economic development, it might be reflecting a causal effect ofdemocracy on economic development, but we haven't included that confounding variable in the analysis.

    So that could be driving everything we'reseeing in the regression. We've talked about cofounders, but another possibility here is something called Reverse Causality.

    What if strong property rights are actually a consequence of high economic development?

    So it's not that having property rights increasesdomestic development - it's the other way around: having strong economic developmentmakes/increases property rights.

    For example, maybe rich countries can afford to protect property rights to pay the police and the judicial system, etc., to make surethat people are securing what they own.
    If this is a problem, then again we can't figure out the direction of causality from just looking at the data from the regressionanalysis, from just this correlation we see between property rights and economic development.

    A more general problem that’s even more difficult is that we may have something called simultaneity: this happens when both directionsof causality are happening at the same time.
    Not only are property rights affecting economic development, but economic development is affecting
    property rights - both directions.

    Regression is a way to summarize the data and to just look at the correlations we see.
    That's the descriptive approach to use inregression.



## Regression to Get Causal Effects: 

    We really care about causal effects, not necessarily just correlations, and we know that correlations don't necessarily imply causality.
    So one approach to learning about causality using regressions is to, well, just assume that the correlations actually are causal- that is, that the correlations do imply causality, by assumption.

    Let's think about our definition of causality.
    Let's remember that we say that we take all the variables that could possibly affect our
    outcome variable. And then we've got our treatment variable.

    We are going to hold everything constant except for the treatment variable and then vary that.
    If we see that our outcome variable changes,then there is a causal effect of the treatment on the outcome variable.

    If we observed every single variable that could affect outcomes, then we could just compare units that had the same values ofall these other variables over here, but different values of the treatment variable, and if their outcomes changed then we know that there's a causal effect.

    Now, of course in practice the problem is that we usually do not usually literally observe every single relevant variable.

    And in fact we know that this isn't the case because if we did, we could just look at two units or two people who had the same valuesof every variable including the treatment, and they would have to have the same valueof the outcome variable too if we've literally observed everything.

    But that never happens.

    We know that in social science there're some variables we're not observing, that we're missing.
    But, if we assume that we've observed enough variables so that when we compare people who had the same values of this variable but different treatments, that the treatment was as good as randomly assigned. Maybe that's the case.

    That's called the Selection On Observables Assumption or the **Unconfoundness Assumption**. Confounders happen when we don't observe all the variables that are relevant, and there are these unobserved confoundingvariables that are causing changes in the outcome but that are correlated with the treatmentvariable, so that when we look at the correlation between treatment and outcomes we see a correlation,but that might be picking up the confounders.

    So if we don't have any confounders by assumption,then that's the Unconfoundedness Assumption.

    The idea is basically just that we've measured all possible variables that could have been confounders and we observed them. So under this assumption any correlation wesee between treatment and outcomes - once
    we've held all these confounders constant- is actually causal.

    When we do a regression, we hold all the other variables constant, and we look at the effect, the correlation in the data betweenthe treatment and the outcome that we see through our regressional analysis, that actuallyis a causal effect under this assumption.

    That's all there is to it to use regression and learn about causality.
    We just assume that we've observed enough variables that when we condition on it, treatment
    is as good as randomly assigned.



## Matching Methods:

    Under unconfoundedness a regression is going to give us a causal effect.It’s not necessarily going to give us exactly the Average Treatment Effect or the Average Treatment on the Treated, but the reason is more technical and mathematical
    so we'll just skip it.
    But it will give us a causal effect.
    In addition to just doing ordinary regression,there's a large number of alternative approaches that can be used to compute average treatment effects under the unconfoundedness assumption.

    These are usually called Matching Methods.

    They are very similar to regression.

    The basic idea is we want to compare units that have the same values of the covariates, but different values of the treatments.Because we want to look within those variables- we're assuming the treatment is randomly
    assigned - so we want to look for peoplewith the same value of variables, but different treatment variables.

    So how are we going to do this?

    We're going to take a treated unit and then find a non-treated unit that has very, very similar covariate values.
    So they have the same gender, they both havethe same amount of education roughly, they both make the same amount of income, etc.
    That pair is called a match. Now what we're going to do is write down the difference in the outcome between the treated unit and the untreated unit in that pair,in that match. And then we're going to make a bunch of pairslike this using all of our data.Then that's going to give us a list of treatment differences.

    And then we can just take a weighted average of this list, and that's going to give us either the average treatment effector the average treatment on the treated, depending on how we do the weighting.

    That's basically all there is to matching.

    There's a lot of different ways to pick the pairs, how we match people.
    But all of them have the same basic idea.

    We want to find people who have similar values of their covariates, who look the same, basically.

    One of the most popular ways of doing this is called propensity score matching. Now the propensity score is just a number,a single number, that summarizes all of that unit's covariates.

    So by first computing this number for everybody,we then can just compare units along that single number, find people who have very similar values of the propensity score.

    And then we do the matching analysis the same as before.

    Pick match of treated unit with a non-treated unit who both have very similar values of the propensity score, write down the difference in their outcomes, do that for everybody in our data, and then take the average difference.
    And that will give us our average treatment effect.



## The Logic of Instrumental Variables:

    Instrumental variables (IV) are used in econometrics to address endogeneity in regression models. If we have endogeneity issues in our model (e.g., due to omitted variable bias), IVs can help us obtain consistent estimates. Late instrumental variables are IVs that are correlated with the endogenous explanatory variable but are not correlated with the error term in the regression equation.


    Experiments require treatments to be randomly assigned to units, that's often not
    the case.

    Regression analysis requires us to measure every possible confounder, that's often impossible.
    So what do we do if we can do an experiment and can't use regression analysis?
    
             ** Instrumental variables analysis.
             
    It's one of the oldest and most important ways for learning about causality using observational
    data.

    There are six steps involved in doing instrumental variables analysis.
    
    1) We observe a variable, called the instrument, that is correlated with the outcome
    variable.
    Outcome variable:the variable we are trying to affect.
    Treatment variable: the variable that we are interested in learning the effect
    of the treatment on the outcome.
    
    Now we have a third variable called the instrument,and when we look at our data it seems to be
    that the instrument is correlated with the outcome, so units that have higher outcome
    levels of the outcome variable tend to have higher levels of the instrument for example,
    or maybe it's negatively correlated, units with higher values of the outcome variable
    tend to have negative values of the instrument.

    2) We assume that the instrument does not have a causal effect on the outcome variable,
    so the correlation that we see between the instrument and the outcome is not because
    the instrument has a causal effect on theoutcome variable.
    Instead, that correlation is picking up the effect of some confounding variable.

    3) We assume that the instrument does have a causal effect on the treatment
    variable.
    So in step two, we assume the instrument does not have a causal effect on the outcome but
    it does have a causal effect on the treatment.

    4) We assume that the instrument is randomly assigned to units or is as-if randomly
    assigned.

    5) Because of step four, the causal effect of the instrument on the treatment
    variable is their correlation in the data.

    6) Since the instrument is randomly assigned by step four, it is not correlated
    with any other possible confounder except for the treatment.

    We've got this variable called the instrument that's correlated with the outcome.
    The instrument doesn't have a causal effect on the outcome, so this correlation is not
    necessarily picking a causal effect of the instrument.It's got to be picking up the causal effectof a confounder.
    The instrument does have a causal effect on the treatment, so we might be picking up the
    causal effect of the treatment on the outcomein this correlation here, but it could be
    something else.
    
    But now the instrument's randomly assignedand and because of that it can't be correlated
    with any other confounders except the treatment.

    We've ruled out all possible explanationsfor the correlation between the instrument
    and the outcome except one: that there isa causal effect of the treatment on the outcome
    and that's what we are trying to get at, that is the essence of instrumental variables analysis.


## The 3 Assumptions of Instrumental Variables

There are three assumptions needed for instrumental variables analysis:

    The first is the Exclusion Restriction, the second is the Relevance Condition and the
    third is the Exogeneity Assumption.

    * Exclusion Restriction:
    
            This says that the instrument cannot have a direct causal effect on the outcome.Why?
            The main idea behind instrumental variables analysis:
            We look at the correlation between the instrument and the outcome variable, and that correlation
            tells us about the true effect of the treatment on the outcome variable.

            If there actually was a direct causal effect of the instrument on the outcome variable,
            then we wouldn't be able to separate out that effect from the true effect of the treatment
            on the outcome variable.

            So our solution is to just assume there isn'ta direct causal effect of the instrument on
            the outcome variable, that's the exclusion restriction.

    * Relevance Condition:

            It says that the instrument does have a causal effect on the treatment:

            Suppose the instrument didn't have acausal effect on the treatment, then there
            wouldn't be any correlation between the instrumentand the outcome variable, regardless of whether
            the treatment had any causal effect on the outcome or not.

            Because our next assumption, Exogeneity says that instruments are randomly assigned, but under that assumption and under the exclusion restriction, the instrument
            would just be totally irrelevant.

            It'd just be a random number we gave to people.
            And so it would have zero correlation with the outcome variable.

            And that's why we need the Relevance Condition.


      * Exogeneity Assumption:

            We see that correlation between the instrumentand the outcome variable might just reflect
            some unobserved confounder rather than anactual causal effect.
            
            


 ### Computing instrumental variables
    Suppose we have a scenario where we're studying the effect of education (endogenous variable) on income (dependent variable), but we suspect endogeneity due to omitted variables. Here's an example where we believe that the number of books in a household could be a good instrumental variable for education.

In [1]:
import numpy as np
import statsmodels.api as sm
import pandas as pd

# Simulated data
np.random.seed(42)
n = 1000

# True coefficients
beta_education = 3.0
beta_books = 5.0

# endogenous variable (Education)
education = np.random.normal(10, 2, n)

# omitted variable (Books)
books = np.random.normal(100, 20, n)

# error term
error = np.random.normal(0, 5, n)

# dependent variable (Income)
income = beta_education * education + beta_books * books + error

data = {'Education': education, 'Books': books, 'Income': income}
df = pd.DataFrame(data)


In [5]:
df.head()

Unnamed: 0,Education,Books,Income
0,10.993428,127.987109,669.539937
1,9.723471,118.492674,620.911189
2,11.295377,101.192607,535.887069
3,13.04606,87.061264,472.904694
4,9.531693,113.964466,588.949338


In [2]:
#instrumental variable estimation

# IV regression
iv_regression = sm.OLS(df['Income'], sm.add_constant(df[['Education', 'Books']])).fit()

print(iv_regression.summary())


                            OLS Regression Results                            
Dep. Variable:                 Income   R-squared:                       0.998
Model:                            OLS   Adj. R-squared:                  0.998
Method:                 Least Squares   F-statistic:                 2.048e+05
Date:                Wed, 13 Dec 2023   Prob (F-statistic):               0.00
Time:                        03:10:36   Log-Likelihood:                -3010.9
No. Observations:                1000   AIC:                             6028.
Df Residuals:                     997   BIC:                             6043.
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         -0.2600      1.158     -0.225      0.8

   
    In this example, Education is the endogenous variable, Books is the instrumental variable, and Income is the dependent variable. The sm.OLS function from statsmodels is used to perform the regression.

    The output will provide us with the estimated coefficients, their standard errors, t-values, and other statistical information.

    Interpretation of the coefficients:

    The coefficient for Education is estimated to be approximately 2.8915, suggesting that a one-unit increase in education leads to an increase in income by around 2.8915 units, holding other variables constant.
    The coefficient for Books is estimated to be approximately 5.0044, indicating that a one-unit increase in the number of books in a household leads to an increase in income by around 5.0044 units, holding other variables constant.
    The p-values associated with both coefficients are 0.000, indicating statistical significance.


    We have to ensure that the instrumental variable meets the necessary assumptions such as relevance (correlation with the endogenous variable) and exogeneity (uncorrelated with the error term) for valid instrumental variable estimation.

## LATE: The Local Average Treatment Effect: 

    The Local Average Treatment Effect is defined when the instrument and the treatment
    are both binary variables.

    That means they can take on two different values.

    For example, suppose we're looking at the effect of education on wages, and in our dataset we have a variable that says whether someonewent to college or not, and another variable that says whether there was a college in thatperson's home town or not.

    That's the case where the local average treatmenteffect is defined.

    And this is very common in practice - havinga binary instrument and a binary treatment.

    In general, when we have treatments or instrumentsthat take on more than just two values, instrumental
    variables analysis will still apply, but thingsget a little bit more complicated.

    We'll focus on just the binary treatment and the binary instrument case where we're going to be able to learn about all the main ideas and all the intuition.

    Application to randomize experiments with non-compliance:

    That's a very helpful setting for understanding how the local average treatment effect is
    defined.
    In this case, our instrumentis treatment assignment: whether we were assigned to receive treatment or not.

    That treatment assignment is an instrument for actually receiving treatment, because there's non-compliance - people who are assignedto get treated don't necessarily get treated.

    Now what we're going to do is take our population of people, and split them into four distinct
    types, depending on how they react to their treatment assignment.

    The first group is called Always Takers.
    This people always receive treatment, regardlessof whether they were assigned to, that is, regardless of their instrument value, they always get treatment.

    The second group are called Never Takers.
    These people never get treated, regardlessof their treatment assignment.

    The third group are called Compliers.
    These people receive treatment if and onlyif they're assigned to the treatment group.
    So they comply with their treatment assignment,that's why we call them compliers.

    The fourth group are called Defiers.
    They get treated if and only if they are notassigned to the treatment group.
    So they do not comply with the treatment assignment.
    They instead defy their treatment assignment.

    So every single person can be put in one of this four groups.

    Now we can define the Local Average Treatment Effect.

    It's just the average of the unit level causaleffects for the Compliers.
    So we've got our population, we take our Compliersand we put them over here, then our Defiers,
    our Never Takers, and Always Takers - willjust put them over here and then ignore them
    for now.

    Among our Compliers, each person has a unitlevel causal effect, so we just take the average
    of all of those numbers for those people. That's all that the Local Average TreatmentEffect is.
    So it's very similar to the Average Treatmenton the Treated and the Conditional Average
    Treatment Effect.

    Those also were just average treatment effects for a subset of units from the whole population.The average treatment on the treated was justthe average unit of our causal effect for the people who were treated, and the condition average treatment effect for example for men, was just the average unit level causal effectonly for the men, so the local average treatment effect is very similar: it’s just the averageof the unit level causal effect for the Compliers.

    But Compliers, we don't know who they are and that's because the definition of what makes somebody a Complier depends on two potential outcomes. It depends on what they would do if they wereassigned to be treated, and what they would do if they were assigned not to be treated. And because of the Fundamental Problem of Causal Inference, we can never observe both of these potential outcomes at the same time.

    Because for each person we either assign them to be treated, or assign them not to be treated.
    We can't have it both.

    So we just don't know who is and who isn'ta Complier.
    So that makes sort of conceptually thinking about the Local Average Treatment Effect a little bit harder because these groups, this particular subset of people we're looking at, is a little more complicated than justthinking about this set of men, or this set of people who were treated.

    But it's still the same fundamental idea,an average unit level of causal effect for a particular kind of people.

### References:

    
* [The Book of Why](http://repo.darmajaya.ac.id/4847/1/The%20book%20of%20why_%20the%20new%20science%20of%20cause%20and%20effect%20%28%20PDFDrive%20%29.pdf)

            
* [Netflix Engineering Blog](https://netflixtechblog.com/causal-machine-learning-for-creative-insights-4b0ce22a8a96)
            

* [Brady Neal's Causal Inference](https://www.bradyneal.com/causal-inference-course)
        

* [Matt Mestan Causal Inference Bootcamp](https://mattmasten.github.io/bootcamp/)


********************************************************************************************************************
    MIT License

    Copyright (c) 2023 Sai Sumana Puppala

    Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

    The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
    
********************************************************************************************************************

<!-- What does it mean for humans to cause global warming?

Examples:

* Simple cause and effect:

* Decision to eat healthy 

Questions of Causality:

* 

* 

Claim that if we intervene and change something, some specific outcome is going to happen. If I hit the snooze button, the alarm is going to go off. That's a causal claim. 
Causality: when we manipulate a policy, what's going to happen to the outcome. How change in policy creates changes in real world outcomes.

How do we answer these claims quantitatively?
Quantitative Analysis of causality:

outcome variables: characteristics that we want to affect
policy/treatment variable: characteristic we will use to create changes
 -->