# Notebook for generating plots of pre-selected paired plots and correlation matrices for our subset of states  

Even though we created Dash applications to allow nice visualization of trends for Ratings and Fraction of Positive Sentiment, we weren't able to put them on the website in our data story (app required backend support).  

Therefore, we opted for selecting pairs of representative states to make comparison between their trends on the subset of beer styles.  

When it comes to selected states, we chose:
- New York and California - as representatives of consistently Democrat states
- Arizona and Texas - as representatives of consistently Repulican states (and also ones that are close-by geographically)
- New York and Georgia - as representatives of consistently Democrat and Republican states that are also close-by
- New York and Arizona - as representatives of consistently Democrat and Republican states that are distant geographically
- Arizona and South Carolina - as representatives of consistently Repulican states that are distant geographically
- Nevada and Florida - two distant swing states
- Pennsylvania and Virginia - two close-by swing states 

When it comes to beer styles we chose IPA and Stout, since they were consistently ranked as nation's favourites. Besides them we chose Lager, which is nation's least favourite choice.  

This notebook heavily relies on functions in `plotting_utils.py` script.

In [1]:
import plotly.graph_objects as go 
from plotly.subplots import make_subplots
import plotly.express as px
from pathlib import Path
from dash import Dash, dcc, html 
from dash.dependencies import Input, Output
from scipy.stats import pearsonr, spearmanr
import numpy as np

import sys
import os
data_path = os.path.abspath('../data')
sys.path.append(data_path)

utils_path = os.path.abspath('../utils')
sys.path.append(utils_path)

import reviews_processing
import load_and_find_party_winners
import plotting_utils

In [2]:
%load_ext autoreload
%autoreload 2

In [3]:
# Necessary paths
project_root = Path.cwd().parents[2]
beer_advocate_path = project_root / "BeerAdvocate"
reviews_path = str(beer_advocate_path / "reviews_df.csv")
users_path = beer_advocate_path / "users.csv"

project_dir = Path.cwd().parents[1]
winners_path = project_dir / "data/generated/party_winners_over_years.csv"

sentiment_path = str(beer_advocate_path / "reviews2_df.pkl")

In [4]:
# Predefined states and beer styled 
predef_state1 = ['New York', 'Arizona', 'New York', 'New York', 'Arizona', 'Nevada', 'Pennsylvania']
predef_state2 = ['California', 'Texas', 'Georgia', 'Arizona', 'South Carolina', 'Florida', 'Virginia']

beer_styles = ['IPA', 'Stout', 'Lager']

# Average Ratings and Fraction of Positive Sentiment DataFrames
users_reviews = reviews_processing.Reviews(users_path, reviews_path)
sentiment_reviews = reviews_processing.Reviews(users_path, sentiment_path)

year_list = list(np.arange(2004, 2017, 1, dtype=int))
election_years = [2004, 2008, 2012, 2016]

# Winners fraction
results = users_reviews.aggregate_preferences_year(year_list)
winners = load_and_find_party_winners.state_winner_years(winners_path)

positive_sentiment = sentiment_reviews.sentiment_to_wide(sentiment_drop='NEGATIVE', sentiment_keep='POSITIVE', all_states=False, year_list=year_list)

The following cell produces the described plot. On the dropdown button user can select pair of predefined states which he wants to analyze. Three subplots are updated with the trends of average ratings and fraction of positive sentiment for specific beer style. Legend is interactive, so users can select specific trends they want to observe further.  

Additionally, title of each subplot is updated with the Spearman correlation coefficient quantifying the relationship between trends.  

Finally, markers on the plots indicate winning political party each election year. 

In [5]:
fig = plotting_utils.pairwise_trendplot(predef_state1, predef_state2, beer_styles, year_list, election_years, results, positive_sentiment, winners)

In [6]:
fig.write_html('pairwise.html')

This plot is generating correlation matrix between Time Series of Average Ratings for a specific beer style (selected using dropdown button).

In [52]:
fig1 = plotting_utils.plot_correlation_matrix(results, beer_styles, year_list)
fig1.write_html('correlation_matrix.html')