### Import Data from CE

Both the target and features will come from the Consumer Expenditure (CE) Interview survey, specifically the FMLI file. The FMLI file contains data about the CE household (family) for each survey quarter. All of the variables included on the file can be found on the [CE data dictionary](https://www.bls.gov/cex/pumd/ce_pumd_interview_diary_dictionary.xlsx).

Most likely features for inferring household income:
1. Housing tenure.
2. Census Tract.
3. Occupied space (sq footage).
4. Household size.
5. Transportation in the household, total cars owned by household.

Interesting features that require more analysis outside the scope of this project:
1. Highest degree obtained by members of household.
2. Item descriptions - identify hobbies.
3. Quarterly or annual travel expenses.
4. Fuel grade for car.
5. Generally, something about expenditures.

### Libraries

In [23]:
import pandas as pd
import sqlite3
import zipfile
import requests
import io

### Download CE Data

Download CE data using the requests library. This is a ZIP file and will require IO library for decoding.

In [39]:
response = requests.get("https://www.bls.gov/cex/pumd/data/comma/intrvw18.zip")

Use the Zipfile library to read the request object `response`.

Reference from [Stack Overflow](https://stackoverflow.com/questions/9419162/download-returned-zip-file-from-url)

In [40]:
z = zipfile.ZipFile(io.BytesIO(response.content))

Extract the zipfiles and download to file folder by passing the path as a argument. No path and the data will extract to the working folder of this Jupyter Notebook.

In [43]:
z.extractall("../data")

### Output

Output to SQLite table.

In [42]:
tables = ['intrvw18/fmli181x.csv']

#, 'intrvw18/fmli182.csv', 'intrvw18/fmli183.csv',
#          'intrvw18/fmli184.csv', 'intrvw18/fmli191.csv']

for table in tables:
    table_name = table.split('/')[1].split('.')[0]
    print(table_name)

fmli181x


In [41]:
z.namelist()

['intrvw18/expn18/',
 'intrvw18/expn18/apa18.csv',
 'intrvw18/expn18/apb18.csv',
 'intrvw18/expn18/cla18.csv',
 'intrvw18/expn18/cld18.csv',
 'intrvw18/expn18/cnt18.csv',
 'intrvw18/expn18/cra18.csv',
 'intrvw18/expn18/crb18.csv',
 'intrvw18/expn18/eda18.csv',
 'intrvw18/expn18/eqb18.csv',
 'intrvw18/expn18/fra18.csv',
 'intrvw18/expn18/frb18.csv',
 'intrvw18/expn18/hel18.csv',
 'intrvw18/expn18/hhm18.csv',
 'intrvw18/expn18/hhp18.csv',
 'intrvw18/expn18/him18.csv',
 'intrvw18/expn18/inb18.csv',
 'intrvw18/expn18/lsd18.csv',
 'intrvw18/expn18/mdb18.csv',
 'intrvw18/expn18/mdc18.csv',
 'intrvw18/expn18/mis18.csv',
 'intrvw18/expn18/mor18.csv',
 'intrvw18/expn18/opb18.csv',
 'intrvw18/expn18/opd18.csv',
 'intrvw18/expn18/oph18.csv',
 'intrvw18/expn18/opi18.csv',
 'intrvw18/expn18/ovb18.csv',
 'intrvw18/expn18/ovc18.csv',
 'intrvw18/expn18/rnt18.csv',
 'intrvw18/expn18/rtv18.csv',
 'intrvw18/expn18/sub18.csv',
 'intrvw18/expn18/trd18.csv',
 'intrvw18/expn18/tre18.csv',
 'intrvw18/expn18/t

In [38]:
conn = sqlite3.connect('database_test.db')
c = conn.cursor()

for table in tables:
    table_name = table.split('/')[1].split('.')[0]
    
    intrvw = pd.read_csv(z.open(table))

    c.execute("drop table if exists {}".format(table_name))

    intrvw.to_sql(table_name, conn)

conn.commit()
conn.close()