## Empirical Best Linear Unbiased Prediction (EBLUP) for Unit Level Model

The Unit Level model refers to a class of SAE techniques that fit the linear mixed model at the sampling unit level. As for the Area Level model, generalized linear mixed models are also the modeling framework for Unit Level model. In this case, given that the model is happening at the sub-area level, both the random effects and unit level standard errors can be estimated from the model. In this tutorial, we will predict the area level means which is a linear parameter.

## County Crop (Corn and Soybeans) Areas Data

For this example, we use the county crop data used by Battese, Harter, and Fuller (1988). The datasets contains 37 observations on areas under corn and under soybeans for each of the 12 counties in the north-central of Iowa. Each county was divided in segments and using interviews and LANDSAT satellite data, data on area used for corn and soybeans was obtained. In the unit level data, each observation is a segment with the following variables: county id (county_id), area in hectare under corn (corn_area), area in heactare under soybeans (soybeans_area), number of pixel classified as corn (corn_pixel), and number of pixels classified as soybeans (soybeans_pixel). 

In [1]:
import numpy as np
import pandas as pd

import samplics 
from samplics.sae import EblupUnitModel

In [2]:
countycropareas = pd.read_csv("../../../datasets/docs/countycropareas.csv")

nb_obs = 15
print(f"\nFirst {nb_obs} observations from the unit (segment) level crop areas data\n")
countycropareas.head(nb_obs)


First 15 observations from the unit (segment) level crop areas data



Unnamed: 0,county_id,corn_area,soybeans_area,corn_pixel,soybeans_pixel
0,1,165.76,8.09,374,55
1,2,96.32,106.03,209,218
2,3,76.08,103.6,253,250
3,4,185.35,6.47,432,96
4,4,116.43,63.82,367,178
5,5,162.08,43.5,361,137
6,5,152.04,71.43,288,206
7,5,161.75,42.49,369,165
8,6,92.88,105.26,206,218
9,6,149.94,76.49,316,221


In addition to the unit (segment) level data, we have the small area (county) level averages of the number of pixels classified as corn or soybeans.  

In [3]:
countycropareas_means = pd.read_csv("../../../datasets/docs/countycropareas_means.csv")

print(f"\nCounty level crop areas averages\n")
countycropareas_means.head(15)


County level crop areas averages



Unnamed: 0,county_id,county_name,samp_segments,pop_segments,ave_corn_pixel,ave_soybeans_pixel
0,1,CerroGordo,1,545,295.29,189.7
1,2,Hamilton,1,566,300.4,196.65
2,3,Worth,1,394,289.6,205.28
3,4,Humboldt,2,424,290.74,220.22
4,5,Franklin,3,564,318.21,188.06
5,6,Pocahontas,3,570,257.17,247.13
6,7,Winnebago,3,402,291.77,185.37
7,8,Wright,3,567,301.26,221.36
8,9,Webster,4,687,262.17,247.09
9,10,Hancock,5,569,314.28,198.66


### Empirical Bayes linear unbiased predictor (EBLUP)

Now we are going to estimates the average area size under corn and soybeans. To do so, we use the nested error linear regression (special case of the linear mixed model) to model the number of hectares. As auxiliary variables, we use the number of pixel classified as corn and soybeans. 

First, we use the method *fit()* to estimate the model parameters. 

In [4]:
areas = countycropareas["county_id"]
ys = countycropareas["corn_area"]
Xs = countycropareas[["corn_pixel", "soybeans_pixel"]]
Xs_mean = countycropareas_means[["ave_corn_pixel", "ave_corn_pixel"]]
samp_size = countycropareas_means[["samp_segments"]]
pop_size = countycropareas_means[["pop_segments"]]

"""REML Method"""
eblup_bhf_reml = EblupUnitModel()
eblup_bhf_reml.fit(
    ys, Xs, areas,
)

eblup_bhf_reml.predict(Xs_mean, areas)

corn_est_reml = eblup_bhf_reml.to_dataframe()

print(corn_est_reml)

    _area   _estimate        _mse  _mse_boot
0       1  117.348358  147.078737        NaN
1       2  119.065172  147.695164        NaN
2       3  115.436681  121.622680        NaN
3       4  115.942728  122.998519        NaN
4       5  125.262954  161.517890        NaN
5       6  104.755257   84.464286        NaN
6       7  116.379869  131.137712        NaN
7       8  119.568238  122.602314        NaN
8       9  106.505269   75.869702        NaN
9      10  124.068434  125.799949        NaN
10     11  118.817200  107.533644        NaN
11     12  128.047945  155.748846        NaN




In [5]:
"""ML Method"""
eblup_bhf_ml = EblupUnitModel(method="ML")
eblup_bhf_ml.fit(
    ys, Xs, areas,
)

eblup_bhf_ml.predict(Xs_mean, areas)

corn_est_ml = eblup_bhf_ml.to_dataframe()

print(corn_est_ml)

    _area   _estimate        _mse  _mse_boot
0       1  117.300821  123.755067        NaN
1       2  119.015164  125.574590        NaN
2       3  115.391894  103.219761        NaN
3       4  115.882996  108.233512        NaN
4       5  125.182978  147.653689        NaN
5       6  104.704796   76.775184        NaN
6       7  116.312678  117.590075        NaN
7       8  119.496458  112.439153        NaN
8       9  106.449305   71.387334        NaN
9      10  123.986302  119.631512        NaN
10     11  118.742626  101.812955        NaN
11     12  127.960362  147.222545        NaN


### Bootstrap MSE estimation

As shown above, the *predict()* method provides the taylor-based MSE estimates. However, we can also calculate MSE estimates using the bootstrap approach.