## Import CPS Data from NBER Website
The CPS data is preprocessed to calculate alternative unemployment rates by the National Bureau of Economic Research (NBER). It is stored in the form:
    https://data.nber.org/cps-basic3/csv/<CCYY>/cpsb<CCYY><MM>.csv
This file downlaods those data and then cleans and imports it into the database.

In [10]:
%load_ext autoreload
%autoreload 2
from etl_eta import load_data
from etl_cps import clean_cps
import pathlib
import os

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [20]:
DB = os.path.join(os.path.abspath(""), "ui_stats.db")
LOG = os.path.join(os.path.abspath(""), "cps_import_log.txt")
print(DB)
print(LOG)

c:\Users\micha\Documents\CAPP\CAPP-30239\CAPP-30239-Static\data\ui_stats.db
c:\Users\micha\Documents\CAPP\CAPP-30239\CAPP-30239-Static\data\cps_import_log.txt


In [19]:
# Create a to-do list to track values
to_do_list = []
for year in range(2000, 2024):
    for month in range(1, 13):
        to_do_list.append(f"{year}-{month}")

['2000-1', '2000-2', '2000-3', '2000-4', '2000-5', '2000-6', '2000-7', '2000-8', '2000-9', '2000-10', '2000-11', '2000-12', '2001-1', '2001-2', '2001-3', '2001-4', '2001-5', '2001-6', '2001-7', '2001-8', '2001-9', '2001-10', '2001-11', '2001-12', '2002-1', '2002-2', '2002-3', '2002-4', '2002-5', '2002-6', '2002-7', '2002-8', '2002-9', '2002-10', '2002-11', '2002-12', '2003-1', '2003-2', '2003-3', '2003-4', '2003-5', '2003-6', '2003-7', '2003-8', '2003-9', '2003-10', '2003-11', '2003-12', '2004-1', '2004-2', '2004-3', '2004-4', '2004-5', '2004-6', '2004-7', '2004-8', '2004-9', '2004-10', '2004-11', '2004-12', '2005-1', '2005-2', '2005-3', '2005-4', '2005-5', '2005-6', '2005-7', '2005-8', '2005-9', '2005-10', '2005-11', '2005-12', '2006-1', '2006-2', '2006-3', '2006-4', '2006-5', '2006-6', '2006-7', '2006-8', '2006-9', '2006-10', '2006-11', '2006-12', '2007-1', '2007-2', '2007-3', '2007-4', '2007-5', '2007-6', '2007-7', '2007-8', '2007-9', '2007-10', '2007-11', '2007-12', '2008-1', '2008

### Make requests to NBER server and iteratively load data

In [45]:
# Testing
# df = clean_cps("https://data.nber.org/cps-basic3/csv/2022/cpsb202201.csv")
# df = clean_cps("C:/Users/micha/Downloads/cpsb202208.csv")
df = clean_cps("C:/Users/micha/Downloads/cpsb200001.csv")
df.head()
#load_data(DB, df, "unemp")

st,dt_m,dt_y,ct_u3,ct_u6,ct_lf_u3,ct_lf_u6
str,i64,i64,f64,f64,f64,f64
"""UT""",1,2000,27835.8677,96520.253,1107600.0,1129100.0
"""DC""",1,2000,17663.4963,51485.3999,273485.8474,296708.096
"""NC""",1,2000,155653.9686,571336.2047,3854900.0,3975300.0
"""MO""",1,2000,69018.6774,295549.4699,2827800.0,2866700.0
"""MN""",1,2000,125428.3842,332948.838,2671800.0,2717000.0


In [49]:
# Loop through each year
for year in range(2000, 2024):
    for month in range(1, 13):
        if f"{year}-{month}" not in to_do_list:
            continue
        try: 
            print(f"Starting {year}-{month:02d}")
            df = clean_cps(f"https://data.nber.org/cps-basic3/csv/{year}/cpsb{year}{month:02d}.csv")
            print("    Downloading... Done")
            load_data(DB, df, "unemp")
            print("    Posting... Done")
            to_do_list.remove(f"{year}-{month}")
            # Save log as list
            with open(LOG, 'w') as f:
                f.write(f"{to_do_list}")
        except Exception as e:
            print(f"   Failed as {e}")

Starting 2001-05
    Downloading... Done
    Posting...done
Starting 2002-01
    Downloading... Done
    Posting...done
Starting 2002-07
    Downloading... Done
    Posting...done
Starting 2002-08
    Downloading... Done
    Posting...done
Starting 2003-01
   Failed as ptdtrace

Resolved plan until failure:

	---> FAILED HERE RESOLVING 'select' <---
DF ["hrhhid", "hrmonth", "hryear4", "hurespli"]; PROJECT */364 COLUMNS; SELECTION: None
Starting 2003-02
   Failed as ptdtrace

Resolved plan until failure:

	---> FAILED HERE RESOLVING 'select' <---
DF ["hrhhid", "hrmonth", "hryear4", "hurespli"]; PROJECT */364 COLUMNS; SELECTION: None
Starting 2003-03
   Failed as ptdtrace

Resolved plan until failure:

	---> FAILED HERE RESOLVING 'select' <---
DF ["hrhhid", "hrmonth", "hryear4", "hurespli"]; PROJECT */364 COLUMNS; SELECTION: None
Starting 2003-04
   Failed as ptdtrace

Resolved plan until failure:

	---> FAILED HERE RESOLVING 'select' <---
DF ["hrhhid", "hrmonth", "hryear4", "hurespli"];

In [50]:
print(to_do_list)

['2003-1', '2003-2', '2003-3', '2003-4', '2003-5', '2003-6', '2003-7', '2003-8', '2003-9', '2003-10', '2003-11', '2003-12', '2004-1', '2004-2', '2004-3', '2004-4', '2004-5', '2004-6', '2004-7', '2004-8', '2004-9', '2004-10', '2004-11', '2004-12', '2005-1', '2005-2', '2005-3', '2005-4', '2005-5', '2005-6', '2005-7', '2005-8', '2005-9', '2005-10', '2005-11', '2005-12']
