This file demonstrates code used to read in data on household
expenditures and consumption in Liberia, using data from a 2014
LSMS-style survey from the World Bank.

The source files are stata files (with a &ldquo;dta&rdquo; extension).  We&rsquo;ll
demonstrate the use of a python module `lsms.tools` to extract
information on household characteristics, expenditures, and
consumption.  With these data extracts, we&rsquo;ll then demonstrate the use
of these to estimate a demand system.



### Data Sources



There is one round of data from 2014.
It can be obtained from the World Bank at
[http://microdata.worldbank.org/index.php/catalog/2563/](http://microdata.worldbank.org/index.php/catalog/2563/).

Go to &ldquo;Get Microdata&rdquo; and download the &ldquo;Data in Stata8&rdquo; option.
This will download a zip file called
`LBR_2014_HIES_v01_M_Stata8.zip`. Unzip this file and rename the
resulting directory &ldquo;2014&rdquo;. 

Run the following code blocks in order:

1.  `food_expenditures` (under the header &ldquo;Item Food Expenditures&rdquo;)
2.  `hh_compositions` (under the header &ldquo;Household Composition&rdquo;)



### Variables needed for "Item Food Expenditures":



Begin by working with food expenditures.  There are a few key
variables we need:



In [1]:
import pandas as pd

# List of input stata files for expenditures
files = ['2014/HH_K1.dta']

df = pd.read_stata(files[0],convert_categoricals=True)
df.head()

For variable names, look at stata file and compare with questionnaire.



In [1]:
# Variable name for household identifier (see stata file)
HHID = 'hh_id'

# Variable giving the expenditure item code:
itmcd = 'hh_k_00_b'

# Kinds of expenditures we'll use, with variable name in stata file:
sources = {'purchased':'hh_k_05_1'}

| Year:|2014|
|---|---|
| HHID|&rsquo;hh\_id&rsquo;|
| itmcd|&rsquo;hh\_k\_00\_b&rsquo;|
| sources|{&rsquo;purchased&rsquo;:&rsquo;hh\_k\_05\_1&rsquo;}|
| files|[&rsquo;2014/HH\_K1.dta&rsquo;]|



### Item Food Expenditures



In [1]:
# -*- coding: utf-8 -*-

df[itmcd] = df[itmcd].astype(str)

df = df.set_index([HHID,itmcd])[[sources['purchased']]]

food_expenditures = df.unstack(itmcd)
food_expenditures.columns = food_expenditures.columns.droplevel(0)
food_expenditures.columns.name = 'i'

food_expenditures.index.name = 'j'

food_expenditures['t'] = 2014

food_expenditures['m'] = 'Liberia'

food_expenditures = food_expenditures.reset_index().set_index(['j','t','m'])

food_expenditures.to_pickle('./tmp/food_expenditures.df')

food_expenditures.head()

### Variables needed for "Household Composition":



| Year:|2014|
|---|---|
| HHID|&rsquo;hh\_id&rsquo;|
| sex|&rsquo;hh\_b\_02&rsquo;|
| age|&rsquo;hh\_b\_06&rsquo;|
| months\_spent|&rsquo;hh\_b\_10&rsquo;|
| files|[&rsquo;2014/HH\_B.dta&rsquo;]|



### Household Composition



In [1]:
import pandas as pd

# List of input stata files for expenditures
files = ['2014/HH_B.dta']

df = pd.read_stata(files[0],convert_categoricals=True)
df.head()

Define some key variables:



In [1]:
HHID = 'hh_id'     # Variable name for household identifier (see stata file)
sex = 'hh_b_02'    # Variable giving sex
age = 'hh_b_06'    # Variable giving age
months_spent = 'hh_b_10' # Variable for months resident in last year

Now, process household roster using function in `lsms.tools`.



In [1]:
# -*- coding: utf-8 -*-
from lsms.tools import get_household_roster

# Now get household composition variables
hh_composition = get_household_roster(fn=files[0],
                                      HHID=HHID,
                                      sex=sex,
                                      age=age,
                                      months_spent=months_spent)

hh_composition.columns.name = 'k'

hh_composition.index.name = 'j'

hh_composition['t'] = 2014

hh_composition['m'] = 'Liberia'

hh_composition = hh_composition.reset_index().set_index(['j','t','m'])

hh_composition.to_pickle('./tmp/hh_compositions.df')
hh_composition.head()