# Interactions

In [1]:
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import pymc3 as pm
import statsmodels.api as smf
import arviz as az
from scipy import stats

import warnings
warnings.filterwarnings("ignore")

### Easy

###### 7E1

For each of the causal relationships below, name a hypothetical third variable that would lead to an interaction effect.  
1. Bread dough rises because of yeast
2. Education leads to higher income
3. Gasoline makes a car go

###### 7E2
Which of the following explanations invokes an interaction?  

1. Caramalizing onions requires cooking over low hear and making sure onions do not dry out.
1. A car will go faster when it has more cylinders or when it has a better fuel injector.
1. Most people acquire their political beliefs from their parents, unless they get them instead from their friends.
1. Intelligent animal species tend to be either highly social or have manipulative appendages (hands, tentacles, etc.)

###### 7E3

For each of the explanations in 7E2, write a linear model that expresses the stated relationship.

### Medium

###### 7M1

Recall the tulips example from the chapter. Suppose another set of treatments adjusted the temprature in the greenhouse over two levels: cold and hot. The data in the chapter were collected at the cold temperature. You find none of the plants grown under the hot temperature developed blooms at all, regardless of the water and shade levels. Can you explain this result int erms of interactions between water, shade and temperature?

###### 7M2

Can you invent a regression equation that would make the bloom size zero, whenever the temperature is hot?

###### 7M3

In parts of North America, ravens depend upon wolves for their food. This is because ravens are varnivorous but cannot usually kill or open carcasses of prey. Wolves however can and do kill and tear open animals, and they tolerate ravens co-feeding at their kills. This specias relationship is generally described as a "species interaction". Can you invent a hypothetical set of data on raven population size in which this relationship would manifest as a satistical interaction? Do you think the biological interaction could be linear? Why or why not?

### Hard

###### 7H1

Return to the `data(tulips)` example in the chapter. Now include the `bed` variable as a predictor in the interaction model. Don't interact `bed` with othe other predictors; just include it as a main effect. Not the `bed` is categorical. So to use it properly, you willl need to either construct dummy variables or rather an index variable, as explained in Chapter 6.

###### 7H2

Use WAIC to compare the model from **7H1** to a model that omits `bed`. What do you infer from this comparison? Can you reconcile the WAIC results with the posterios distribution of the `bed` coefficients?

###### 7H3

Consider again the `data(rugged)` data on economic development and terrain ruggedness, examined in this chapter. One of the African con=untries in that example, Seychelles, is far outside the could of other nations, being a rare country with both relatively high GDP and high ruggedness. Seychelles is also unusual, in that it is a group of islands far from the coast of mainland Africa, and its main economic activity is tourism.  

One might suspect that this one nation is exerting a strong influence on the conclusions. In this problem, I want you to drop Seychelles from the data and re-evaluate the hypothesis that relationship of African economies with ruggedness is different from that on other continents.  

1. Begin by using `map` to fit just the interaction model:  
$$
\begin{align}
y_i &\sim \text{Normal($\mu_i$, $\sigma$)} \\
\mu_i &= \alpha + \beta_AA_i + \beta_RR_i + \beta_{AR}AR_i
\end{align}
$$  
where *y* is log GDP per capita in the year 2000 (log of `rgdppc_2000`); *A* is `cont_africa`, the dummy variable for being an African nation; and *R* is the variable `rugged`. Choose your own priors. Compare the interence from this model fit to the data without Seychelles to the same model fit to the full data. Does it seem like the effect of ruggedness depends upon continent? How much has the expected relationship changed?

2. Now plot the predictions of the interaction model, with and without Seychelles. Does it still seem like the effect of ruggedness depends upon continent? How much has the expected relationship changed?  

3. Finally, conduct a model comparison analysis, using WAIC. Fit three models to the data without Seychelles:  

$$
\begin{align}
\text{Model 1}: y_i &\sim \text{Normal($\mu_i$, $\sigma$)} \\
\mu_i &= \alpha + \beta_RR_i\\
\text{Model 2}: y_i &\sim \text{Normal($\mu_i$, $\sigma$)} \\
\mu_i &= \alpha + \beta_AA_i + \beta_RR_i \\
\text{Model 3}: y_i &\sim \text{Normal($\mu_i$, $\sigma$)} \\
\mu_i &= \alpha + \beta_AA_i + \beta_RR_i + \beta_{AR}AR_i
\end{align}
$$

Use whatever priors you think are sensible. Plot the model-averaged predictions of this model set.  Do your inferences differ from those in (b)? Why or why not?

###### 7H4

The values in `data(nettle)` are data on language diversity in 74 nations. The meaning of each column is given below.   

1. `country`: Name of the country.
1. `num.lang`: Number of recognized languages spoken.
1. `area`: Area in square kilometers.
1. `k.pop`: Population, in thousands.
1. `num.stations`: Number of weather stations that provided data for the next two columns.
1. `mean.growing.season`: Average length of growing season, in months
1. `sd.growing.season`: Standard deviation of length of growing season, in months.


Use these data to evaluate the hypothesis that language diversity is partly a product of food security . The nothin that, in productive ecologies, people don't need large social networks to buffer them against risk of food shortfalls. This means that ethnic groups can be smaller and more self-sufficient, leading to more languages per-capita. In contrast, in a poor ecology, there is a subsistence risk, and so human societies have adapted by building larger networks of mutual obligation to provide food insurance. This in turn creates social forces that help prevent languages from diversifying.  

Specifically, you will try to model the number of languages per capita as the outcome variable:  
```
d$lang.per.cap  <- d$num.lang / d$k.pop
```  

Use the logarithm of this new variable as your regression outcome. (A count model would be better here, but you'll learn those later, in chapter 10.)  

This problem is open ended, allowing you to decide how you address the hypothesis and the uncertain advice the modeling provides. If you think you need to use WAIC any place, please do. If you think you need certain priors, argue for them. If you think you need to plot predictions in a certain way, please do. Just try to honestly evaluate the main effects of both `mean.growing.season` and `sd.growing.season`, as well as their two-way interaction. as outlined in parts (b), (b), and (c) below. If you are not sure which approach to use, try several.  

1. Evaluate the hypothesis that language diversity, as measured by `log(lang.per.cap)`, is positively associated with the average length of the growing season, `mean.growing.season`. Consider `log(area)` in your regression(s) as a covariate (not an interaction). Interpret your results.

1. Now evaluate the hypothesis that language diversity is negatively associated with the standard deviation of length of growing season, `sd.growing.season`. The hypothesis follows from uncertainty in harvest favoring social insurance through larger social networks and therefore fewer languages. Again, consider `log(area)` as a covariate (not an interaction). Interpret your results.  

1. Finally, evaluate the hypothesis that `mean.growing.season` and `sd.growing.season` interact to synergistically reduce language diversity. The idea is that, in nations with longer average growing seasons, high variance makes storage and redistribution even more important than it would otherwise. That way, people can cooperate to preserve and protect windfalls to be used during the droughts. These forces in turn may lead to greater social integreation and fewer languages