# Overview

Since our shelter data is daily, we want to convert the unemployment data to daily. To do this, we will assign the same value to every day of the month. For example, the national unemployment rate in January 2014 was 6.6%, so we will assign the value of 6.6 to every day in January 2014. 

This code can be easily modified to any other monthly data set, so I created a function. This function does assume a different column for month and year, and that the month is given as a word.


In [79]:
import pandas as pd
from datetime import datetime, timedelta
import csv


def monthly_to_daily(data_set,year_header,month_header, data_header):
    row_list=[]
    # Iterate over each row in the data set
    for index, row in data_set.iterrows():
        # Get the year and month from the current row
        year = int(row[year_header])
        # Extract the month from the row and convert it to its numerical equivalent
        # Assuming the month is represented as a number:
        month = int(row[month_header])
        # If the month is represented as the full month name, change to:
        # month = datetime.strptime(row[month_header], '%B').month


        # Calculate the number of days in the month
        num_days = (datetime(year, month % 12 + 1, 1) - timedelta(days=1)).day

        # create a row for each day of the month
        for i in range(num_days):
            # Format the date in "year-day-month" format
            date = datetime(year,month,i+1).strftime("%Y-%d-%m")
            row_list.append((date,row[data_header]))

    return row_list

# Unemployment Data

The BLS data uses 6.6 for 6.6%, so we will do the same. Since the earliest date in our data sets is 11/28/2009 (Bloomington), I grabbed data for 2009-2024.

Data from https://data.bls.gov/pdq/SurveyOutputServlet

## National Data

In [75]:
nationalUnemployment_monthly = pd.read_csv("../main_data/NationalUnemploymentStart09.csv")

nationalUnemployment_daily = monthly_to_daily(nationalUnemployment_monthly,'Year','Period', 'Value')


# Since we have a list of rows. not a dataframe
with open('../main_data/NationalUnemploymentDaily.csv', 'w') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(['Date', 'Unemployment Rate'])  # Write header row
    for row in nationalUnemployment_daily:
        writer.writerow(row)

## Local Unemployment Data

### Austin, TX

Since the earliest date in our Austin data set is 10/21/2013, I pulled the Austin-Round Rock, TX Metropolitan Statistical Area data starting in 2013.

In [76]:
austinUnemployment_monthly = pd.read_csv("../main_data/locale_specific_data/AustinUnemploymentStart13.csv")

austinUnemployment_daily = monthly_to_daily(austinUnemployment_monthly,'Year','Period', 'Value')


# Since we have a list of rows. not a dataframe
with open('../main_data/locale_specific_data/AustinUnemploymentDaily.csv', 'w') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(['Date', 'Unemployment Rate'])  # Write header row
    for row in austinUnemployment_daily:
        writer.writerow(row)

### Bloomington, IN

Since the earliest date in our Bloomington data set is 11/28/2009, I pulled the Bloomington, IN Metropolitan Statistical Area data starting in 2009.

In [77]:
bloomingtonUnemployment_monthly = pd.read_csv("../main_data/locale_specific_data/BloomingtonUnemploymentStart09.csv")

bloomingtonUnemployment_daily = monthly_to_daily(bloomingtonUnemployment_monthly,'Year','Period', 'Value')


# Since we have a list of rows. not a dataframe
with open('../main_data/locale_specific_data/BloomingtonUnemploymentDaily.csv', 'w') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(['Date', 'Unemployment Rate'])  # Write header row
    for row in bloomingtonUnemployment_daily:
        writer.writerow(row)

### Sonoma County, CA

Since the earliest date in our Sonoma data set is 8/16/2013, I pulled the Santa Rosa, CA Metropolitan Statistical Area data starting in 2013.

In [78]:
sonomaUnemployment_monthly = pd.read_csv("../main_data/locale_specific_data/SonomaUnemploymentStart13.csv")

sonomaUnemployment_daily = monthly_to_daily(sonomaUnemployment_monthly,'Year','Period', 'Value')


# Since we have a list of rows. not a dataframe
with open('../main_data/locale_specific_data/SonomaUnemploymentDaily.csv', 'w') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(['Date', 'Unemployment Rate'])  # Write header row
    for row in sonomaUnemployment_daily:
        writer.writerow(row)

# CPI

In [83]:
cpi_monthly = pd.read_csv("../main_data/features_data/cpi.csv")

row_list=[]

# Change function above since date is year-month-01
for index, row in cpi_monthly.iterrows():
    # Assuming the column containing year-month-01 data is named 'year_month'
    date = row['Date']
    
    # Parse year and month from the 'year_month' column
    year, month, _ = map(int, date.split('-'))


    # Calculate the number of days in the month
    num_days = (datetime(year, month % 12 + 1, 1) - timedelta(days=1)).day

    # create a row for each day of the month
    for i in range(num_days):
        # Format the date in "year-day-month" format
        date = datetime(year,month,i+1).strftime("%Y-%d-%m")
        row_list.append((date,row['Inflation Rate']))

# Since we have a list of rows. not a dataframe
with open('../main_data/features_data/cpiDaily.csv', 'w') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(['Date', 'Inflation Rate'])  # Write header row
    for row in row_list:
        writer.writerow(row)