<img src="https://www.th-koeln.de/img/logo.svg" style="float:right;" width="200">

# Correlation Consumer Price Index with Distaster Events
* Project: <a href="https://www.femoz.de/">FEMOZ</a>
* Author of notebook: <a href="https://www.gernotheisenberg.de/">Gernot Heisenberg</a>
* Date:   08.08.2022

---------------------------------

### <font color="ce33ff">DESCRIPTION</font>:
This notebook shows how to correlate price index data for food as well as health expenditures with data about the occurence of disasters. The number of total affected people serves as a measure for the severity of the disaster, hence this is the time series the other two got correlated with.  

### <font color="red">ToDos</font>:
* get the concept transferred to Mutual Information instead of Spearman Correlation
* set the x-axis back to 'date' instead of 'frame'
* read two time series and merge them together by date. Make sure you are correlating the disaster in the right region with the right market place.
    * one time series for disasters
    * one time series for prices
* filter also the disaster type "Drought" and "Flood" before correlation


In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import scipy.stats as stats
from   sklearn.preprocessing import MinMaxScaler
from numpy.lib.stride_tricks import as_strided
from numpy.lib import pad

sns.set_context('talk',font_scale=.8)

df_raw = pd.read_excel('C:/Users/flori/sciebo/femoz_iws/data_lake/IWS/FLORIAN/Disaster Korrelation.xlsx', sheet_name="Prices", usecols=["product_id", "location_id", "price", "affected", "date"])

df_raw["price"] = df_raw["price"].astype(float)
df_raw["affected"] = df_raw["affected"].astype(float)

In [2]:
# define function to compute rolling r value
def get_rolling_r(df_norm, time_series_1, time_series_2, r_window_size):

    # Interpolate missing data.
    df_interpolated = df_norm.interpolate()

    # Compute rolling window
    rolling_r = df_interpolated[time_series_1].rolling(window=r_window_size, center=True).corr(df_interpolated[time_series_2])

    # replace NaN in computable range (half the window size from lower and upper end) with 0
    rolling_r.iloc[int(r_window_size/2):int(len(rolling_r)-r_window_size/2)] = rolling_r.iloc[int(r_window_size/2):int(len(rolling_r)-r_window_size/2)].fillna(0)

    return rolling_r

In [3]:
# get product ids
products = df_raw.product_id.unique().tolist()

# get location ids
locations = df_raw.location_id.unique().tolist()

In [4]:
# create list to store r values
result = []

# define window size to be used for r value computation
window_size = 12

# apply function to every combination of product and location
for product in products:
    for location in locations:
        # filter for product and location
        df = df_raw[df_raw["product_id"] == product][df_raw["location_id"] == location]

        # drop columns
        df_drop = df.drop(["location_id", "product_id", "date"], axis = 1)

        # normalize values
        sc = MinMaxScaler()
        norm_ts = sc.fit_transform(df_drop)
        df_norm = pd.DataFrame(norm_ts) # convert back to a df
        df_norm.columns = ["price", "affected"]

        # apply function from above
        rolling_r = get_rolling_r(df_norm, "price", "affected", window_size).tolist()

        # pair r value with date value
        pairs = []
        i = 0
        while i <= len(rolling_r)-1:
            pairs.append([df.date.iloc[i], round(rolling_r[i], 4)])
            i += 1


        # store r value into results list
        result.append([product, location, pairs])

  df = df_raw[df_raw["product_id"] == product][df_raw["location_id"] == location]
  df = df_raw[df_raw["product_id"] == product][df_raw["location_id"] == location]
  df = df_raw[df_raw["product_id"] == product][df_raw["location_id"] == location]
  df = df_raw[df_raw["product_id"] == product][df_raw["location_id"] == location]
  df = df_raw[df_raw["product_id"] == product][df_raw["location_id"] == location]
  df = df_raw[df_raw["product_id"] == product][df_raw["location_id"] == location]
  df = df_raw[df_raw["product_id"] == product][df_raw["location_id"] == location]
  df = df_raw[df_raw["product_id"] == product][df_raw["location_id"] == location]
  df = df_raw[df_raw["product_id"] == product][df_raw["location_id"] == location]
  df = df_raw[df_raw["product_id"] == product][df_raw["location_id"] == location]
  df = df_raw[df_raw["product_id"] == product][df_raw["location_id"] == location]
  df = df_raw[df_raw["product_id"] == product][df_raw["location_id"] == location]
  df = df_raw[df

In [5]:
#convert list to dataframe
df_result = pd.DataFrame(result, columns=["product_id", "location_id", "r_value_pair"])

#df_result

# unnest r_values column & reset index
df_result = df_result.explode("r_value_pair")
df_result.reset_index(inplace = True)

# split value pairs
df_split = pd.DataFrame(df_result["r_value_pair"].tolist(), columns = ["date", "r"])

df_result["date"] = df_split["date"]
df_result["r"] = df_split["r"]

# drop optional columns
df_result = df_result.drop(["index", "r_value_pair"], axis = 1)

# store result into csv file
df_result.to_csv("C:/Users/flori/sciebo/femoz_iws/data_lake/IWS/FLORIAN/Disaster_Korrelation_Werte.csv", index=False)