# Ingest Data from CE

The target and features will come from the [Consumer Expenditure (CE) Interview survey](https://www.bls.gov/cex/pumd_data.htm), specifically the FMLI and the MEMI file. The FMLI file contains data about the CE household (family) for each survey quarter and the MEMI file contains information about each member of the household. All of the variables included on the files with descriptions can be found on the [CE data dictionary](https://www.bls.gov/cex/pumd/ce_pumd_interview_diary_dictionary.xlsx).

### Libraries

In [1]:
import pandas as pd
import sqlite3
import zipfile
import requests
import io

### Download CE Data

I will download CE data using the requests library.

In [2]:
response = requests.get("https://www.bls.gov/cex/pumd/data/comma/intrvw18.zip")

The CE program provides the data as a ZIP file, which can be read using the Zipfile library and the IO library. Specifically, the Zipfile library can read the content of the response object as bytes. Reference from [Stack Overflow](https://stackoverflow.com/questions/9419162/download-returned-zip-file-from-url)

In [3]:
z = zipfile.ZipFile(io.BytesIO(response.content))

### Extraction and Output

Below are two options for outputting the data:
1. Use the Zipfile libraries `extractall` method to extract the files, which looks identical to a manual extraction of the data.
2. However, to keep with Write One Read Many (WORM) methodology, it is important to consider a more secure storage process.

*Zipfile Extraction*

Pass the file path as an argument to the `extractall` method.

In [4]:
z.extractall("../data")

*SQLite Database Table*

I need to print table names to see the naming conventions of the FMLI and MEMI tables for each quarter. The code was removed from the notebook, but reproduced below.
```python
z.namelist()
```

Given time and scope of this project, I am going to limit the analysis to one quarter of households from the CE Interview. This means I need only two tables, the **fmli182.csv, memi182.csv**.

In [5]:
tables = ['intrvw18/fmli182.csv', 'intrvw18/memi182.csv']

for table in tables:
    table_name = table.split('/')[1].split('.')[0]
    print(table_name)

fmli182
memi182


In [6]:
conn = sqlite3.connect('../data/ce_intrvw_data.db')
c = conn.cursor()

for table in tables:
    table_name = table.split('/')[1].split('.')[0]
    
    intrvw = pd.read_csv(z.open(table))

    c.execute("drop table if exists {}".format(table_name))

    intrvw.to_sql(table_name, conn)

conn.commit()
conn.close()