# Part 5 - University of Michigan Consumer Sentiment data

- toc: True
- badges: true
- comments: true
- categories: [jupyter]

### I think they do a great job of documentation at the [site](https://data.sca.isr.umich.edu/survey-info.php).
Also have a look at the [siki page](https://en.wikipedia.org/wiki/University_of_Michigan_Consumer_Sentiment_Index) about the survey and the data. In particular the references to [this](https://www.wsj.com/articles/university-of-michigan-inks-deal-to-end-early-release-of-survey-1412690643) and indirectly [this](https://www.wsj.com/articles/SB10001424127887324682204578515963191421602) about Thomson Reuters and others giving investors early access for a fee.

In any case, you can see the question asking [sheet](https://data.sca.isr.umich.edu/fetchdoc.php?docid=24776) and more.
I am interested in the responses about expectations.
There seems like a lot of great information there but I am going to focus on the data on the components of the index and in particular the 3 questions related to expectations
* x2= PEXP_R = "Now looking ahead--do you think that a year from now you (and your family living
there) will be better off financially, or worse off, or just about the same as now?"
* x3= BUS12_R = "Now turning to business conditions in the country as a whole--do you think that
during the next twelve months we'll have good times financially, or bad times, or
what?"
* x4= BUS5_R = "Looking ahead, which would you say is more likely--that in the country as a whole
we'll have continuous good times during the next five years or so, or that we will
have periods of widespread unemployment or depression, or what?"

They compute a weighted average called the Index of Consumer Expectations, or ICE, as follows

ICE = ((x2 + x3 + x4)/4.1134) + 2
<sup>[1](#myfootnote1)</sup>

I'm sure somewhere in the documentation they explain why they divide by 4.1134 and add 2 but I'll probably just use the 3 individual variables.  We'll see.

<a name="myfootnote1">1</a>: Notice I put in the parentesis to avoid issues like [this](https://www.nytimes.com/2019/08/02/science/math-equation-pedmas-bemdas-bedmas.html#:~:text=To%20help%20students%20in%20the,%2C%20division%2C%20addition%2C%20subtraction).
If the link doesn't work search for "The Math equation that stumped the internet")


First the bolerplate Python I use for most notebooks.  It's evolving.

In [None]:
import os
import sys
import datetime
import time
import re
import inspect
import pandas as pd
from plotnine import ggplot
import matplotlib as mpl
import matplotlib.pyplot as plt
import selenium
from selenium import webdriver
from selenium.webdriver.support.ui import Select

I like to see version numbers for modules ... and for Python.

In [None]:
mlist = list(filter(lambda x: inspect.ismodule(x[1]), locals().items()))
vi = sys.version_info
print("version {0}.{1}.{2} of Python".format(vi.major, vi.minor, vi.micro))
for name, mod in mlist:
    if name.startswith("__"):
        continue
    if hasattr(mod, "__version__"):
        print("version {1} of {0}".format(name, mod.__version__))
del mod
del name

## Selenium to automate downloading
It's easy to get the data from the site, can be done via the following steps.
1. Navigate to https://data.sca.isr.umich.edu/data-archive/mine.php
<sup>[2](#myfootnote1)</sup>
2. For **Table** select **Table 5: Components of the Index of Consumer Sentiment**.
3. Click **Comma-Separated(CSV)** under format and it should start downloading.

But I want to automate the process and we will using Selenium. 
Selenium allows us to control a web browser from Pyhon so I can use Python to execute all the steps above.

<a name="myfootnote1">2</a>: Note that if you start at the main site you might need to click on **Data** and select *Time Series* in the dropdown.

## Download Directory
I want to have the data downloaded into a subdirectory below here rather than the default *Downloads* directory.
We can arrange that using ChromeOptions.

In [None]:
data_dir = "./data"
if not os.path.isdir(data_dir):
    os.mkdir(data_dir)
prefs = {"download.default_directory" : os.path.abspath(data_dir)}

options = webdriver.ChromeOptions()
options.add_experimental_option("prefs",prefs)
options.add_argument("download.default_directory="+os.path.abspath(data_dir))

## Step 1, going to the site
You can run Selenium in what is called *headless* mode where you don't get a window for the browser.
But I still find it cool to see a new browser pop up so I won't use that now.
And it's nice in the early stages to see what is actually happening.
Probably will change once the novelty wears off.

### chromedriver
With Selenium you have a choice of which browser to use, e.g. Chrome, Firefox et cetera.  I'll use Chrome.  Whatever the choice we'll need a *driver*.  Google it if you are interested.
I downloaded the chromedriver and put it in a sibling directory under *chromedriver_win32* so it is easy to find for others projects using Selenium.

Here I am doing step 1 from above.

In [None]:
chromedriver_path = os.path.join('../../chromedriver_win32/chromedriver.exe')
driver = webdriver.Chrome(executable_path=chromedriver_path, options=options)
url = "https://data.sca.isr.umich.edu/data-archive/mine.php"
driver.get(url)
print(options)

## Step 2: selecting Table 5
There are plenty of tutorials on using Selenium to make selections and click buttons.
I can't remember which ones I used. Most likely I just googled what I wanted to do.

In any case, here I am finding the selection section on the page and choosing option 5.

In [None]:
# Locate the Sector and create a Select object
select_element = Select(driver.find_element_by_css_selector("select"))
# this will print out strings available for selection on select_element, used in visible text below
select_element.select_by_index(5)

## Step 3: Selecting *Comma-Separated(CSV)*
Have to each through the elements of the page to find the right button

In [None]:
# get the path to the most recently downloaded file
# just in case there are lots of files in teh download directory
# and since I don't know what the name of the downloaded file will be
def get_downloaded_fpath(dir=None, files_before=None, file_ext=".csv",
                   max_wait = 10, verbose=True):
    import time
    done = False
    start_time = datetime.datetime.now()
    while not done:
        files_after = set(os.listdir(dir))
        new_files = files_after.difference(set(files_before))
        print(new_files)
        for fname in new_files:
            if os.path.splitext(fname)[1] == file_ext:
                return (os.path.join(dir, fname))
        cur_time = datetime.datetime.now()
        if (cur_time - start_time).seconds > 10:
            return None
        time.sleep(0.5)
                
elements = driver.find_elements_by_name("format")
button = None
for e in elements:
    if e.get_property("value") == 'Comma-Separated (CSV)':
        button = e
        break
if not button:
    raise RuntimeError("Error downloading Consumer Sentiment data from {0}".format(url))
            
files_before = set(os.listdir(data_dir))
button.click()
fpath = get_downloaded_fpath(dir=data_dir, files_before=files_before, file_ext=".csv", max_wait=10)
print(fpath)

In [None]:
df = pd.read_csv(fpath, skiprows=1)
df.drop(df.columns[-1], inplace=True, axis=1)
df['Datetime'] = (100*100*df['Year'] + 100*df["Month"] + 15).astype(str)
df['Datetime'] = pd.to_datetime(df['Datetime'])
df.drop(["Month", "Year"],  inplace=True, axis=1)
print(df.agg(['min', 'max']))
df.set_index("Datetime", inplace=True)
df.head()
exp_columns = [c for c in df.columns if re.search("xpected|12 Months", c)]
                                             

In [None]:
df[exp_columns].plot(figsize=[16,4], marker='.', grid=True)
plt.gca().set_ylim(bottom=0)