# The Graph Data Access

In this notebook, we read in the data that was generated and saved as a csv from the [TheGraphDataSetCreation](TheGraphDataSetCreation.ipynb) notebook. 


Goals of this notebook are to obtain:

* Signals, states, event and sequences
* Volatility metrics
* ID perceived shocks (correlated with announcements)
* Signal for target price
* Signal for market price
* Error plot

As a starting point for moving to a decision support system.

In [None]:
# import libraries
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import scipy as sp
from statsmodels.distributions.empirical_distribution import ECDF
import scipy.stats as stats

## Import data and add additional attributes

In [None]:
graphData = pd.read_csv('saved_results/RaiLiveGraphData.csv')
del graphData['Unnamed: 0']

In [None]:
graphData.head()

In [None]:
graphData.describe()

In [None]:
graphData.plot(x='blockNumber',y='redemptionPriceActual',kind='line',title='redemptionPriceActual')

In [None]:
graphData.plot(x='blockNumber',y='redemptionRateActual',kind='line',title='redemptionRateActual')

In [None]:
graphData['error'] = graphData['redemptionPriceActual'] - graphData['marketPriceUsd']
graphData['error_integral'] = graphData['error'].cumsum()

In [None]:
graphData.plot(x='blockNumber',y='error',kind='line',title='error')

In [None]:
graphData.plot(x='blockNumber',y='error_integral',kind='line',title='Steady state error')

## Error experimentation

#### Note: not taking into account control period

In [None]:
kp = 2e-7
#ki = (-kp * error)/(integral_error)
# computing at each time, what would the value of ki need to be such that the redemption price would be constant
graphData['equilibriation_ki'] = (-kp * graphData.error)/graphData.error_integral

In [None]:
# todo iterate through labels and append negative
graphData['equilibriation_ki'].apply(lambda x: -x).plot(logy = True,title='Actual equilibriation_ki - flipped sign for log plotting')
plt.hlines(5e-9, 0, 450, linestyles='solid', label='Recommended ki - flipped sign', color='r')
plt.hlines(-(graphData['equilibriation_ki'].median()), 0, 450, linestyles='solid', label='median actual ki - flipped', color='g')
locs,labels = plt.yticks()  # Get the current locations and labelsyticks
new_locs = []
for i in locs:
    new_locs.append('-'+str(i))
plt.yticks(locs, new_locs)
plt.legend(loc="upper right")

In [None]:
graphData['equilibriation_ki'].median()

### Counterfactual if intergral control rate had been median the whole time

In [None]:
graphData['counterfactual_redemption_rate'] = (kp * graphData['error'] + graphData['equilibriation_ki'].median())/ graphData['error_integral']

In [None]:
subsetGraph = graphData.iloc[50:]
sns.lineplot(data=subsetGraph,x="blockNumber", y="counterfactual_redemption_rate",label='Counterfactual')
ax2 = plt.twinx()
# let reflexer know this is wrong
sns.lineplot(data=subsetGraph,x="blockNumber", y="redemptionRateActual",ax=ax2,color='r',label='Actual')
plt.title('Actual redemption rate vs counterfactual')
plt.legend(loc="upper left")


## Goodness of fit tests
Whether or not counterfactual is far enough from actual to reject null that they are from the same distributions.

In [None]:
# fit a cdf
ecdf = ECDF(subsetGraph.redemptionRateActual.values)
ecdf2 = ECDF(subsetGraph.counterfactual_redemption_rate.values)

plt.plot(ecdf.x,ecdf.y,color='r')
plt.title('redemptionRateActual ECDF')
plt.show()

plt.plot(ecdf2.x,ecdf2.y,color='b')
plt.title('counterfactual_redemption_rate ECDF')
plt.show()

alpha = 0.05

statistic, p_value = stats.ks_2samp(subsetGraph.redemptionRateActual.values, subsetGraph.counterfactual_redemption_rate.values)  # two sided
if p_value > alpha:
    decision = "Sample is from the distribution"
elif p_value <= alpha:
    decision = "Sample is not from the distribution"

print(p_value)
print(decision)

Based on our analysis using the Kolmogorov-Smirnov Goodness-of-Fit Test, the distributions are very different. As can be seen above from their EDCF plots, you can see a different in their distributions, however pay close attention to the x axis and you can see the distribution difference is significant. 

In [None]:
# scatterplot of linear regressoin residuals
sns.residplot(x='blockNumber', y='redemptionRateActual', data=subsetGraph, label='redemptionRateActual')
plt.title('redemptionRateActual regression residuals')

In [None]:
sns.residplot(x='blockNumber', y='counterfactual_redemption_rate', data=subsetGraph,label='counterfactual_redemption_rate')
plt.title('counterfactual_redemption_rate regression residuals')

In [None]:
graphData.plot(x='blockNumber',y='globalDebt',kind='line',title='globalDebt')

In [None]:
graphData.plot(x='blockNumber',y='erc20CoinTotalSupply',kind='line',title='erc20CoinTotalSupply')

In [None]:
graphData.plot(x='blockNumber',y='marketPriceEth',kind='line',title='marketPriceEth')

In [None]:
graphData.plot(x='blockNumber',y='marketPriceUsd',kind='line',title='marketPriceUsd')

## Conclusion

Using The Graph, a lot of data about the Rai system can be obtained for analyzing the health of the system. With some data manipulation, these data streams could be intergrated into the Rai cadCAD model to turn it into a true decision support system.