## Preface



If need be&#x2026;



In [1]:
!pip install CFEDemands
!pip install oauth2client
!pip install dvc

## Introduction



Here we give a set of generic instructions for analyzing demand for
food and nutrition.  Inputs include a datasets of consumption
quantities, consumption expenditures, household characteristics, and a
food conversion table.

The different datasets should be indexed as follows:

| Dataset|Indexed by|Columns|
|---|---|---|
| Expenditures|j,t,m|i|
| Consumption|j,t,m,u|i|
| HH Characteristics|j,t,m|k|
| FCT|i,u|n|
| RDI|n|k|

where `j` indexes households, `t` indexes periods, `m` indexes
markets, `i` indexes goods, `k` indexes different kinds of household
characteristics, `u` indexes different unit names, and `n` indexes
different nutrients.  Finally, any RDI (&ldquo;recommended daily intake&rdquo;)
tables should be indexed by nutrients, with columns corresponding to
characteristics of persons within the household (e.g., age & sex
categories).

Note that some countries have more than one dataframe of consumption,
distinguished by source; for example Malawi has consumption items
purchased as well as consumption items produced.  Here we focus on
consumption purchases, since one of our immediate aims is to infer
prices paid.



### Step 1: Acquire DataFrames



Here are addresses of google sheets for different dataframes for the
case of Niger:



In [1]:
InputFiles = {'Expenditures':('1ySP8lrXlQ2ChaMdz0HQY85Md65cRRKOZgz-T0zBN2K0','Expenditures'),
              'Consumption':('1kr2NI57xiTQm20A_68NEcLKihVTJw2ZgWCwV98ZD4JE','Consumption'),
              'HH Characteristics':('1ySP8lrXlQ2ChaMdz0HQY85Md65cRRKOZgz-T0zBN2K0','HH Characteristics'),
              'FCT':('1TM7FpKURXFAuXW4dLpGt98QA2CH4WTDty-4nPOUv1Mg','05 NV_sum_57 (per 100g EP)')}

Note that the food items for the FCT for Niger are **not** yet matched
up with food labels indexed by `i` in the expenditure and consumption datasets.



In [1]:
from eep153_tools import read_sheets
import numpy as np
import pandas as pd

def get_clean_sheet(key,json_creds,sheet=None):

    df = read_sheets(key,json_creds,sheet)
    df.columns = [c.strip() for c in df.columns.tolist()]

    df = df.loc[:,~df.columns.duplicated(keep='first')]   

    df = df.drop([col for col in df.columns if col.startswith('Unnamed')], axis=1)

    df = df.loc[~df.index.duplicated(), :]

    return df

# Get expenditures...
x = get_clean_sheet(InputFiles['Expenditures'][0],
                    json_creds='../students-9093fa174318.json',
                    sheet=InputFiles['Expenditures'][1])

if 'm' not in x.columns:
    x['m'] = 1

x = x.set_index(['j','t','m'])
x.columns.name = 'i'

x = x.apply(lambda x: pd.to_numeric(x,errors='coerce'))
x = x.replace(0,np.nan)

# Get HH characteristics...
z = get_clean_sheet(InputFiles['HH Characteristics'][0],
                    json_creds='../students-9093fa174318.json',
                    sheet=InputFiles['HH Characteristics'][1])

if 'm' not in z.columns:
    z['m'] = 1

z = z.set_index(['j','t','m'])
z.columns.name = 'k'

z = z.apply(lambda x: pd.to_numeric(x,errors='coerce'))

# Get purchased consumption quantities
q = get_clean_sheet(InputFiles['Consumption'][0],
                    json_creds='../students-9093fa174318.json',
                    sheet=InputFiles['Consumption'][1])

if 'm' not in q.columns:
    q['m'] = 1

q = q.set_index(['j','t','m','u'])
q.columns.name = 'i'

q = q.apply(lambda x: pd.to_numeric(x,errors='coerce'))
q = q.replace(0,np.nan)

fct = get_clean_sheet(InputFiles['FCT'][0],
                    json_creds='../students-9093fa174318.json',
                    sheet=InputFiles['FCT'][1])

#### This bit peculiar to Niger FCT #####
fct = fct.loc[fct.Code.str.len()==6]
fct = fct.set_index('Code')
fct.columns = [v.replace('\n',' ') for v in fct.columns]
########################################

fct.index.name = 'i'

fct = fct.apply(lambda x: pd.to_numeric(x,errors='coerce'))

### Step 2: Estimate Demand System



Here, use data on log *expenditures* and household characteristics to
create a CFEDemand `result`.



In [1]:
import cfe

result = cfe.Result(y=np.log(x),z=z)

# Estimates most things (not counting std errors for betas).
xhat = result.get_predicted_expenditures() 

result.get_beta(as_df=True).sort_values(ascending=False) # Check sanity...

### Step 3: Infer prices



Next, we divide predicted expenditures by actual quantities to get
prices, then choose prices corresponding to some units (e.g.,
kilograms) we can map into the  FCT.



In [1]:
# xhat is an xarray; change q
q = q.to_xarray().to_array('i')
phat = (xhat/q).to_dataframe('p').squeeze().unstack('i')

# Keep kgs; g
phat = phat.xs('kg',level='u').groupby(['t','m']).median().dropna(how='all')

### Step 4: Predicting Positive Consumption



An issue with our assessment of fit is that we *predicted* that every
household would consume positive quantitites of every good, and in
making our assessment we ignored the (many) cases in which in fact the
household had zero expenditures on that good.  

Here we&rsquo;re going to go back and use similar framework to try and
estimate the probability with which we&rsquo;ll observe zero expenditures
as a function of &lambda;, prices, and household characteristics.



In [1]:
import matplotlib.pyplot as plt
%matplotlib inline
import matplotlib.cm as cm

zeros_r = cfe.Result(y=(0.+(result.y>0)),z=result.z)
weights = zeros_r.get_predicted_log_expenditures()

# Truncate to make weights live in [0,1]
weights = weights.where((weights<1) + np.isnan(weights),1).where((weights>0) + np.isnan(weights),0)

xbar = np.exp(result.y).sum(['m','i']).to_dataframe('xbar').replace(0,np.nan).squeeze()

# Calculate *expected* predicted expenditures, to make unconditional on being positive
xhat = (weights*result.get_predicted_expenditures())
xsum = xhat.sum(['m','i']).to_dataframe('xhat').replace(0,np.nan).squeeze()

# Make dataframe of actual & predicted
df = pd.DataFrame({'Actual':np.log(xbar),'Predicted':np.log(xsum)})

df.plot.scatter(x='Predicted',y='Actual')

# Add 45 degree line
v = plt.axis()
vmin = np.max([v[0],v[2]])
vmax = np.max([v[1],v[3]])
plt.plot([vmin,vmax],[vmin,vmax])

### Step 5: Get predicted quantities



Now divide predicted expenditures by predicted prices to get predicted
quantities, and put back into a dataframe.



In [1]:
qhat = xhat/phat.to_xarray().to_array('i')

qhat = qhat.to_dataframe('q').unstack('i')

qhat.columns = qhat.columns.droplevel(0)

### Step 6: Map predicted quantities into nutrients



May need some work to clean up the FCT, and create food names/indices
corresponding to the `i` index in `qhat`.



In [1]:
print(pd.Series(xhat.coords['i']).to_markdown())

| Niger Labels|WAFCT Codes|
|---|---|
| Baobab leaves|04<sub>001</sub>|
| Bean fritters|03<sub>054</sub>|
| Beans|03<sub>022</sub>|
| Beef|07<sub>014</sub>|
| Biscuit|01<sub>188</sub>|
| Bowl of millet with milk|01<sub>174</sub>|
| Bowl of millet without milk|01<sub>167</sub>|
| Bread|01<sub>047</sub>|
| Cakes|01<sub>187</sub>|
| Cassava tuber|02<sub>021</sub>|
| Cigarette||
| Coffee in cans|12<sub>009</sub>|
| Cola nut|06<sub>018</sub>|
| Corn|04<sub>109</sub>|
| Corn fritters|01<sub>123</sub>|
| Cornstarch||
| Curd|10<sub>028</sub>|
| Dates|05<sub>031</sub>|
| Dry okra|04<sub>077</sub>|
| Eggs|08<sub>001</sub>|
| Fresh Okra|04<sub>017</sub>|
| Fresh Onion|04<sub>018</sub>|
| Fresh fish|09<sub>060</sub>|
| Fresh pepper|04<sub>049</sub>|
| Fresh tomato|04<sub>021</sub>|
| Fruit juice|12<sub>013</sub>|
| Goat meat|07<sub>069</sub>|
| Groundnut cake|03<sub>012</sub>|
| Juice powder||
| Maggi cube||
| Malahya||
| Millet|01<sub>095</sub>|
| Mutton|07<sub>004</sub>|
| Orange|05<sub>016</sub>|
| Other citrus||
| Other spices||
| Palm oil|11<sub>007</sub>|
| Pasta|01<sub>077</sub>|
| Peanut butter|06<sub>023</sub>|
| Peanut oil|11<sub>003</sub>|
| Pimento||
| Potato|02<sub>009</sub>|
| Poultry|08<sub>010</sub>|
| Powdered milk|10<sub>002</sub>|
| Rice|01<sub>065</sub>|
| Rice &tomato sauce||
| Rice cowpea|03<sub>143</sub>|
| Salad||
| Salt|13<sub>015</sub>|
| Soft Drinks|12<sub>024</sub>|
| Soumbala|03<sub>042</sub>|
| Squash|04<sub>051</sub>|
| Sugar|13<sub>002</sub>|
| Sugar cane||
| Sweet banana|05<sub>048</sub>|
| Sweet potato|02<sub>049</sub>|
| Tea bag|12<sub>008</sub>|
| Tomato paste|04<sub>066</sub>|
| Yam tuber|02<sub>019</sub>|
| Yodo||
| Yogurt|10<sub>005</sub>|

These particular clean-ups are peculiar to the West African FCT.



In [1]:
# Dictionary mapping index i to fct codes
i_to_fct = pd.read_csv('niger_fct_codes.csv').dropna().set_index('Niger Labels').squeeze().to_dict()

# Create version of qhat with fct ids for labels
myq = qhat.rename(columns=i_to_fct)[list(i_to_fct.values())]

# Drop goods with no obs, households with no goods
myq = myq.dropna(how='all',axis=1).dropna(how='all')

# Create version of fct with just foods in myq
myfct=fct.loc[myq.columns].iloc[:,8:] # Drop columns which aren't nutrients

Before this will work, need columns of qhat to match columns of fct.



In [1]:
nutrients = myq@myfct
nutrients.mean()    # NB: Nutrients are for past /week/ for entire household.