# Seasons 2.0
## Combining individual-level and group-level data
### Demonstration of the `mepoisson` command in Stata; ultimately translate to R

#### 1

```stata
set timeout1 1000
//global data "https://github.com/muzaale/got/raw/main/act_5/act_5_9/"
//use $data, clear
cd "~/dropbox/1f.ἡἔρις,κ/1.ontology"
use donor_live_keep_v, clear
if 0 { //no need to collapse & merge with this syntax!
    recode don_relation (1/4=1)(5/999=0), gen(related)
    recode don_race (8=1)(16=2)(2000=3)(64=4)(24/1500 .=5), gen(racecat)
    
    gen year = year(don_recov_dt)
    gen month = month(don_recov_dt)
    
    tab year 
    
    egen n_year = count(don_recov_dt), by(year)
    egen n_month = count(don_recov_dt), by(year month)
    
    poisson n_year year
    poisson n_month year month
    
    tab month, matcell(cellvalues)
    
    local freq_jan = cellvalues[1,1]
    local freq_jun = cellvalues[6,1]
    local ratio = `freq_jun' / `freq_jan'
    
    di `ratio'
    
    gen count = 1
    
    recode month (1/5 9/12=0)(6/8=1), gen(summer)
}

* We have individual-level data to this point
preserve
    collapse (count) count=pers_id, by(summer)
    di (count[2]/3)/(count[1]/9) //back-of-envelope irr
restore 
preserve 
    * Collapse the data to get the total count for each group
    collapse (count) donations=pers_id, by(year month summer)
    //twoway (scatter count year)
    * Save the aggregated data
    tempfile aggregated_data
    save `aggregated_data'
restore 

* Merge the aggregated count back to the individual-level data
merge m:1 year month summer using `aggregated_data'

* Now, you've both group-level and individual-level variables as predictors
* Run the single-level Poisson model first to get starting estimates
poisson donations summer related, irr iter(5)

* Capture the estimates
matrix start_vals = e(b)

if 0 {
	* Assuming `start_vals' is a matrix containing your starting values
    matrix start_vals = (0.1 \ 0.2 \ 0.3) 

    * Use the `from()' option to start the optimization from these values
    mepoisson donations || summer:, irr from(start_vals) iter(3)

}


```

#### 2

```stata
. do "/var/folders/sx/fd6zgj191mx45hspzbgwzlnr0000gn/T//SD16143.000000"

. set timeout1 1000

. //global data "https://github.com/muzaale/got/raw/main/act_5/act_5_9/"
. //use $data, clear
. cd "~/dropbox/1f.ἡἔρις,κ/1.ontology"
/Users/d/Dropbox (Personal)/1f.ἡἔρις,κ/1.ontology

. use donor_live_keep_v, clear

. if 0 { //no need to collapse & merge
.     recode don_relation (1/4=1)(5/999=0), gen(related)
.     recode don_race (8=1)(16=2)(2000=3)(64=4)(24/1500 .=5), gen(racecat)
.     
.     gen year = year(don_recov_dt)
.     gen month = month(don_recov_dt)
.     
.     tab year 
.     
.     egen n_year = count(don_recov_dt), by(year)
.     egen n_month = count(don_recov_dt), by(year month)
.     
.     poisson n_year year
.     poisson n_month year month
.     
.     tab month, matcell(cellvalues)
.     
.     local freq_jan = cellvalues[1,1]
.     local freq_jun = cellvalues[6,1]
.     local ratio = `freq_jun' / `freq_jan'
.     
.     di `ratio'
.     
.     gen count = 1
.     
.     recode month (1/5 9/12=0)(6/8=1), gen(summer)
. }

. 
. * We have individual-level data to this point
. preserve

.     collapse (count) count=pers_id, by(summer)

.     di (count[2]/3)/(count[1]/9) //back-of-envelope irr
1.141058

. restore 

. preserve 

.     * Collapse the data to get the total count for each group
.     collapse (count) donations=pers_id, by(year month summer)

.     //twoway (scatter count year)
.     * Save the aggregated data
.     tempfile aggregated_data

.     save `aggregated_data'
file /var/folders/sx/fd6zgj191mx45hspzbgwzlnr0000gn/T//S_16143.000003 saved as .dta	format

. restore 

. 
. * Merge the aggregated count back to the individual-level data
. merge m:1 year month summer using `aggregated_data'

Result                      Number of obs

Not matched                             0
Matched                           186,545  (_merge==3)


. 
. * Now, you can perform a regression with the count as the dependent variable, and	both group-level	and	individ
> ual-level variables as predictors
. //poisson donations summer related, irr maxiter(5)
. mepoisson donations  summer:, irr maxiter(3)
option maxiter() not allowed
r(198);

end of do-file

r(198);

. do "/var/folders/sx/fd6zgj191mx45hspzbgwzlnr0000gn/T//SD16143.000000"

. poisson donations summer related, irr iter(5)

Iteration 0:  Log likelihood = -3695885.2  
Iteration 1:  Log likelihood = -3695885.2  

Poisson regression                                   Number of obs =   184,671
LR chi2(2)    = 678618.06
Prob > chi2   =    0.0000
Log likelihood = -3695885.2                          Pseudo R2     =    0.0841


donations         IRR   Std. err.      z    P>z     [95% conf. interval]

summer    1.140067   .0002599   574.93   0.000     1.139558    1.140577
related    .8815915  .0001849  -600.86   0.000     .8812292     .881954
_cons     506.0029   .0827662  3.8e+04   0.000     505.8407    506.1651

Note: _cons estimates baseline incidence rate.

. 
end of do-file

. do "/var/folders/sx/fd6zgj191mx45hspzbgwzlnr0000gn/T//SD16143.000000"

. mepoisson donations  summer:, irr iter(3)

Fitting fixed-effects model:

Iteration 0:  Log likelihood = -4216081.9  
Iteration 1:  Log likelihood = -4146341.1  
Iteration 2:  Log likelihood = -4146315.8  
Iteration 3:  Log likelihood = -4146315.8  

Refining starting values:

Grid node 0:  Log likelihood =          .
Grid node 1:  Log likelihood =          .
Grid node 2:  Log likelihood =          .
Grid node 3:  Log likelihood = -3985497.5

Refining starting values (unscaled likelihoods):

Grid node 0:  Log likelihood = -3985497.5

Fitting full model:

Iteration 0:  Log likelihood = -3985497.5  
Iteration 1:  Log likelihood = -3985497.5  (not concave)
Iteration 2:  Log likelihood = -3985497.5  (not concave)
Iteration 3:  Log likelihood = -3985497.5  (not concave)
convergence not achieved

Mixed-effects Poisson regression                Number of obs     =    186,545
Group variable: summer                          Number of groups  =          2

Obs per group:
min =     51,402
avg =   93,272.5
max =    135,143

Integration method: mvaghermite                 Integration pts.  =          7

Wald chi2(0)      =          .
Log likelihood = -3985497.5                     Prob > chi2       =          .

donations   Inc. rate   Std. err.      z        P>z       [95% conf. interval]

_cons       491.1811   .3627498       8390.79   0.000     490.4706    491.8926

summer       
var(_cons)   10.00109          .                             .           .

Note: Estimates are transformed only in the first equation to incidence rate.
Warning: Convergence not achieved.

. 
end of do-file



```

---

#### 3

If your single-level Poisson model has provided reasonable results, you can potentially use these to set the starting values for the corresponding parameters in the multi-level model. However, note that `mepoisson` doesn't have a straightforward `from()` option like some other optimization functions in Stata to input starting values.

Here's an example with the hypothetical Stata code. In this case, the `if 1 {}` part is a condition to check; you can replace it with an actual condition or ignore it:

```stata
* Run the single-level Poisson model first to get starting estimates
poisson donations summer related, irr iter(5)

* Capture the estimates
matrix start_vals = e(b)

* Display the estimates; just for checking
di "Starting values from the single-level model:"
matrix list start_vals

* Run the multi-level model, possibly using starting values (note that mepoisson does not allow you to specify starting values directly)
mepoisson donations || summer:, irr iter(3)
```

In your pseudo-code, you mention merging the aggregated data back to the individual-level data. If you've actually aggregated the data, remember that this will affect the fit of your model. Also, it's worth mentioning that Stata does not permit setting the starting values in the `mepoisson` command, so the `from(start_vals)` part won't be applicable here.

If convergence continues to be a problem, you may want to explore whether Stata's ML engine options (like different optimization algorithms or providing analytical derivatives if possible) could be useful, though these can be quite technical and are generally used when other options have failed. 

Remember to interpret the results carefully, especially when there are convergence issues, as they may not be reliable.

---

I'm open to suggestions including python and machine learning

---

If you are interested in Python-based solutions, you could use the `statsmodels` package, which provides classes for estimating and interpreting models for many different statistical models, including Poisson and Negative Binomial models. The `scikit-learn` package also provides a `PoissonRegressor` class for fitting Poisson regression models.

For hierarchical/multilevel models, the `pymc3` package allows for Bayesian hierarchical models, and you can specify initial values.

Here's a simplified example using `statsmodels`:

```python
import pandas as pd
import statsmodels.api as sm

# Let's assume `df` is your DataFrame containing 'donations', 'summer', 'related' 
# df = pd.read_csv('your_data.csv')

# Adding constant term for intercept
df['intercept'] = 1.0

# Defining the dependent variable (y) and independent variables (X)
y = df['donations']
X = df[['intercept', 'summer', 'related']]

# Fit Poisson regression model
model = sm.GLM(y, X, family=sm.families.Poisson()).fit()

# Show model summary
print(model.summary())
```

For a hierarchical model with `pymc3`:

```python
import pymc3 as pm
import numpy as np

# Again, assume 'donations', 'summer', 'related' are your data
# donations = np.array(df['donations'])
# summer = np.array(df['summer'])
# related = np.array(df['related'])

with pm.Model() as model:
    # Hyperpriors
    mu_intercept = pm.Normal('mu_intercept', mu=0, sd=1)
    sigma_intercept = pm.HalfNormal('sigma_intercept', sd=1)

    mu_slope = pm.Normal('mu_slope', mu=0, sd=1)
    sigma_slope = pm.HalfNormal('sigma_slope', sd=1)
    
    # Intercept and Slope
    Intercept = pm.Normal('Intercept', mu=mu_intercept, sd=sigma_intercept, shape=len(set(summer)))
    Slope = pm.Normal('Slope', mu=mu_slope, sd=sigma_slope, shape=len(set(summer)))

    # Expected value
    theta = pm.math.exp(Intercept[summer] + Slope[summer]*related)

    # Likelihood
    donations_obs = pm.Poisson('donations_obs', mu=theta, observed=donations)

    # Fitting the model
    trace = pm.sample(2000)
```

You can inspect the `trace` object to look at parameter estimates and other diagnostic information.

Remember that this is a simplified example, and you'll need to adjust the models according to your actual data and research questions.

---

