<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Set-up" data-toc-modified-id="Set-up-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Set up</a></span></li><li><span><a href="#Read-data" data-toc-modified-id="Read-data-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Read data</a></span><ul class="toc-item"><li><span><a href="#Read-financial-data" data-toc-modified-id="Read-financial-data-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Read financial data</a></span></li><li><span><a href="#Read-temperature-data" data-toc-modified-id="Read-temperature-data-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>Read temperature data</a></span><ul class="toc-item"><li><span><a href="#Montly-temp-differ-data" data-toc-modified-id="Montly-temp-differ-data-2.2.1"><span class="toc-item-num">2.2.1&nbsp;&nbsp;</span>Montly temp differ data</a></span></li><li><span><a href="#Yearly-temp-differ-data" data-toc-modified-id="Yearly-temp-differ-data-2.2.2"><span class="toc-item-num">2.2.2&nbsp;&nbsp;</span>Yearly temp differ data</a></span></li><li><span><a href="#Bushfire-data" data-toc-modified-id="Bushfire-data-2.2.3"><span class="toc-item-num">2.2.3&nbsp;&nbsp;</span>Bushfire data</a></span></li></ul></li></ul></li><li><span><a href="#Temp-difference-Correlation-Analysis" data-toc-modified-id="Temp-difference-Correlation-Analysis-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Temp difference Correlation Analysis</a></span><ul class="toc-item"><li><span><a href="#Monthly" data-toc-modified-id="Monthly-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>Monthly</a></span></li><li><span><a href="#Yearly" data-toc-modified-id="Yearly-3.2"><span class="toc-item-num">3.2&nbsp;&nbsp;</span>Yearly</a></span></li></ul></li><li><span><a href="#Bushfires" data-toc-modified-id="Bushfires-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Bushfires</a></span></li><li><span><a href="#Appendix" data-toc-modified-id="Appendix-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Appendix</a></span><ul class="toc-item"><li><span><a href="#Person" data-toc-modified-id="Person-5.1"><span class="toc-item-num">5.1&nbsp;&nbsp;</span>Person</a></span></li><li><span><a href="#Spearman" data-toc-modified-id="Spearman-5.2"><span class="toc-item-num">5.2&nbsp;&nbsp;</span>Spearman</a></span></li><li><span><a href="#Kendall" data-toc-modified-id="Kendall-5.3"><span class="toc-item-num">5.3&nbsp;&nbsp;</span>Kendall</a></span></li></ul></li></ul></div>

# Set up

In [308]:
from scipy.stats import pearsonr
from scipy.stats import spearmanr
from scipy.stats import kendalltau

import numpy as np
import os
import pandas as pd
import json
import re

# Read data

## Read financial data

In [226]:
def toInt(l:list) -> list:
    for i in range(len(l)):
        l[i] = float(l[i])

root_path = os.path.abspath(os.path.dirname(os.getcwd()))
data_path = os.path.join(root_path, 'data')
agr_path = os.path.join(data_path, 'agriculture')

finc_data = []
for file in os.listdir(agr_path):
    if file.endswith('.json'):
        file_path = os.path.join(agr_path, file)
    with open(file_path, 'r') as file:
        finc_data.append(json.load(file))
        
ROA_data = {}
for i in finc_data:
    for k, v in i.items():
        ROA_data[k] = pd.DataFrame.from_dict(i[k]['ROA'])

## Read temperature data

### Montly temp differ data

In [235]:
climate_path = os.path.join(data_path, 'climate')
climate_path = os.path.join(climate_path, 'US_temperature_data')

file_name = 'monthly_temp_difference(1980-2013).csv'
file_path = os.path.join(climate_path, file_name)
temp_diff_montly_data = pd.read_csv(file_path)

rename the time_difference

In [236]:
time_diff = []
for i in temp_diff_montly_data["Time_Difference"]:
    i = i.split('-')
    month = re.compile(r'(?<= )\d+(?=])')
    month = re.search(month, i[-1])[0] # get the month
    i = month + '/31/' + i[0]
    time_diff.append(i)
temp_diff_montly_data["Time_Difference"] = time_diff

### Yearly temp differ data

In [357]:
climate_path = os.path.join(data_path, 'climate')
climate_path = os.path.join(climate_path, 'US_temperature_data')

file_name = 'Yearly_temp_difference.csv'
file_path = os.path.join(climate_path, file_name)
temp_diff_yearly_data = pd.read_csv(file_path)

### Bushfire data

In [359]:
climate_path = os.path.join(data_path, 'climate')

file_name = 'Wildfire_data.csv'
file_path = os.path.join(climate_path, file_name)
bushfire_data = pd.read_csv(file_path)

# Temp difference Correlation Analysis

## Monthly

In [420]:
for comp in ROA_data.keys(): 
    roa_data = []
    temp_data = []
    for i in ROA_data[comp].iloc[:,range(1, ROA_data[comp].shape[1])]:
        try: 
            t = temp_diff_montly_data.loc[temp_diff_montly_data['Time_Difference'] == i]['Temperature_Difference']
            t = t.values[0]
            if t:
                roa = ROA_data[comp][i].values[0]
                roa = float(roa)
                if roa and not np.isnan(roa): # to aviod nan value
                    roa_data.append(roa)
                    temp_data.append(t)
        except:
            pass
    assert len(roa_data) == len(temp_data)
    if len(roa_data) >= 2:
        pearson = pearsonr(roa_data, temp_data)[0]
        p_p_value = pearsonr(roa_data, temp_data)[1]
        spearman = spearmanr(roa_data, temp_data)[0]
        s_p_value = spearmanr(roa_data, temp_data)[1]
        tau, k_p_value = kendalltau(roa_data, temp_data)
        print("""Correlation analysis between {comp} and montly temp difference:
        Perason coefficient is {pearson:.3f} with {p_p_value:.3f} p-value.
        Spearman coefficient is {spearman:.3f} with {s_p_value:.3f} p-value.
        Kendall coefficient is {kendall:.3f} with {k_p_value:.3f} p-value.
        """.format(comp = comp, pearson = pearson, p_p_value = p_p_value,
                    spearman = spearman, s_p_value = s_p_value, 
                     kendall = tau, k_p_value = k_p_value))

Correlation analysis between YTEN and montly temp difference:
        Perason coefficient is 0.082 with 0.823 p-value.
        Spearman coefficient is 0.042 with 0.907 p-value.
        Kendall coefficient is 0.022 with 1.000 p-value.
        
Correlation analysis between AVD and montly temp difference:
        Perason coefficient is -0.251 with 0.207 p-value.
        Spearman coefficient is -0.263 with 0.185 p-value.
        Kendall coefficient is -0.181 with 0.200 p-value.
        
Correlation analysis between ICL and montly temp difference:
        Perason coefficient is -0.076 with 0.857 p-value.
        Spearman coefficient is 0.238 with 0.570 p-value.
        Kendall coefficient is 0.143 with 0.720 p-value.
        
Correlation analysis between IPI and montly temp difference:
        Perason coefficient is 0.296 with 0.519 p-value.
        Spearman coefficient is 0.071 with 0.879 p-value.
        Kendall coefficient is 0.048 with 1.000 p-value.
        
Correlation analysis betwee

## Yearly

In [418]:
for comp in ROA_data.keys(): 
    roa_data = []
    temp_data = []
    for i in ROA_data[comp].iloc[:,range(1, ROA_data[comp].shape[1])]:
        year = i.split('/')[-1]
        try: 
            t = temp_diff_yearly_data.loc[temp_diff_yearly_data['Year'] == int(year)]['Difference']
            t = t.values[0]
            if t:
                roa = ROA_data[comp][i][0]
                roa = float(roa)
                if roa and not np.isnan(roa):
                    roa_data.append(roa)
                    temp_data.append(t)
        except:
            pass
    assert len(roa_data) == len(temp_data)
    if len(roa_data) >= 2:
        pearson = pearsonr(roa_data, temp_data)[0]
        p_p_value = pearsonr(roa_data, temp_data)[1]
        spearman = spearmanr(roa_data, temp_data)[0]
        s_p_value = spearmanr(roa_data, temp_data)[1]
        tau, k_p_value = kendalltau(roa_data, temp_data)
        print("""Correlation analysis between {comp} and montly temp difference:
        Perason coefficient is {pearson:.3f} with {p_p_value:.3f} p-value.
        Spearman coefficient is {spearman:.3f} with {s_p_value:.3f} p-value.
        Kendall coefficient is {kendall:.3f} with {k_p_value:.3f} p-value.
        """.format(comp = comp, pearson = pearson, p_p_value = p_p_value,
                    spearman = spearman, s_p_value = s_p_value, 
                     kendall = tau, k_p_value = k_p_value))

Correlation analysis between RKDA and montly temp difference:
        Perason coefficient is -1.000 with 1.000 p-value.
        Spearman coefficient is -1.000 with nan p-value.
        Kendall coefficient is -1.000 with 1.000 p-value.
        
Correlation analysis between YTEN and montly temp difference:
        Perason coefficient is 0.336 with 0.313 p-value.
        Spearman coefficient is 0.191 with 0.574 p-value.
        Kendall coefficient is 0.055 with 0.879 p-value.
        
Correlation analysis between AVD and montly temp difference:
        Perason coefficient is -0.166 with 0.400 p-value.
        Spearman coefficient is -0.059 with 0.766 p-value.
        Kendall coefficient is -0.022 with 0.874 p-value.
        
Correlation analysis between ICL and montly temp difference:
        Perason coefficient is -0.610 with 0.081 p-value.
        Spearman coefficient is -0.233 with 0.546 p-value.
        Kendall coefficient is -0.167 with 0.612 p-value.
        
Correlation analysis be

# Bushfires

In [421]:
for comp in ROA_data.keys(): 
    roa_data = []
    temp_data = []
    for i in ROA_data[comp].iloc[:,range(1, ROA_data[comp].shape[1])]:
        year = i.split('/')[-1]
        
        try: 
            t = bushfire_data.loc[bushfire_data['Year'] == int(year)]['Acres']
            t = int(t.values[0].replace(',', ''))
            if t:
                roa = ROA_data[comp][i][0]
                roa = float(roa)
                if roa and not np.isnan(roa):
                    roa_data.append(roa)
                    temp_data.append(t)
        except:
            pass
    assert len(roa_data) == len(temp_data)
    if len(roa_data) >= 2:
        pearson = pearsonr(roa_data, temp_data)[0]
        p_p_value = pearsonr(roa_data, temp_data)[1]
        spearman = spearmanr(roa_data, temp_data)[0]
        s_p_value = spearmanr(roa_data, temp_data)[1]
        tau, k_p_value = kendalltau(roa_data, temp_data)
        print("""Correlation analysis between {comp} and montly temp difference:
        Perason coefficient is {pearson:.3f} with {p_p_value:.3f} p-value.
        Spearman coefficient is {spearman:.3f} with {s_p_value:.3f} p-value.
        Kendall coefficient is {kendall:.3f} with {k_p_value:.3f} p-value.
        """.format(comp = comp, pearson = pearson, p_p_value = p_p_value,
                    spearman = spearman, s_p_value = s_p_value, 
                     kendall = tau, k_p_value = k_p_value))

Correlation analysis between RKDA and montly temp difference:
        Perason coefficient is -0.319 with 0.402 p-value.
        Spearman coefficient is -0.200 with 0.606 p-value.
        Kendall coefficient is -0.167 with 0.612 p-value.
        
Correlation analysis between YTEN and montly temp difference:
        Perason coefficient is 0.383 with 0.116 p-value.
        Spearman coefficient is 0.313 with 0.206 p-value.
        Kendall coefficient is 0.216 with 0.229 p-value.
        
Correlation analysis between AVD and montly temp difference:
        Perason coefficient is 0.150 with 0.389 p-value.
        Spearman coefficient is 0.112 with 0.520 p-value.
        Kendall coefficient is 0.072 with 0.558 p-value.
        
Correlation analysis between ICL and montly temp difference:
        Perason coefficient is -0.126 with 0.641 p-value.
        Spearman coefficient is -0.153 with 0.572 p-value.
        Kendall coefficient is -0.117 with 0.564 p-value.
        
Correlation analysis bet

# Appendix

## Person

$$
r=\frac{\sum\left(x-m_{x}\right)\left(y-m_{y}\right)}{\sqrt{\sum\left(x-m_{x}\right)^{2} \sum\left(y-m_{y}\right)^{2}}}
$$

REF: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.pearsonr.html

## Spearman

$$
\rho=1-\frac{6 \sum d_{i}^{2}}{n\left(n^{2}-1\right)}
$$

$\rho$ 	=	Spearman's rank correlation coefficient  
$d_i$ = difference between the two ranks of each observation  
$n$ = number of observations

REF1: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.spearmanr.html  
REF2: https://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient

In [118]:
spearmanr(a, b)

SpearmanrResult(correlation=0.04242424242424241, pvalue=0.907363817812816)

## Kendall

$$
\tau=\frac{(\text { number of concordant pairs })-(\text { number of discordant pairs })}{\left(\begin{array}{c}
n \\
2
\end{array}\right)}
$$

REF: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kendalltau.html?highlight=kendall#scipy.stats.kendalltau

In [120]:
x1 = [12, 2, 1, 12, 2]
x2 = [1, 4, 7, 1, 0]
tau, p_value = kendalltau(a, b)
print(tau, p_value)

0.022222222222222223 1.0
