# Working with all_out.csv from AWS

In [10]:
import pandas as pd
import numpy as np
import yfinance as yf

df = pd.read_csv('all_out.csv')

df.sort_values(by='rmse', inplace=True)

print(df.head(30).ticker.values)

['BIOS' 'ACHN' 'ROSE' 'AXLA' 'SIC' 'PNM' 'GALT' 'GAIA' 'EVLO' 'GLYC'
 'ELVT' 'SENS' 'AGE' 'BHR' 'CIA' 'SYBX' 'SNCR' 'EBF' 'OPTN' 'CODA' 'MCBC'
 'FTK' 'MEIP' 'AMRX' 'MCHX' 'VYGR' 'INFN' 'HRTX' 'IVC' 'EEX']


# Prophet Algorithm Explanation

The Prophet algorithm is a time series forecasting model developed by Facebook. At a high level, the algorithm can be expressed mathematically as follows:

Let $y(t)$ be the time series data at time $t$, where $t = 1,2,...,T$.

Prophet models the time series as the sum of four components:

$y(t) = g(t) + s(t) + h(t) + e(t)$

where:

$g(t)$ represents the trend component of the time series
$s(t)$ represents the seasonality component of the time series
$h(t)$ represents the holiday component of the time series
$e(t)$ represents the error term, assumed to be normally distributed with zero mean and constant variance.

The trend component is modeled as a piecewise linear or logistic function of time $t$:

$g(t) = a + bt + c(t)$

where:

$a$ is the intercept parameter
$b$ is the slope parameter
$c(t)$ is a piecewise linear or logistic function of time $t$ that captures any abrupt changes or transitions in the trend.

The seasonality component is modeled as a Fourier series:

$s(t) = \sum_{k=1}^{K} (a_k \cos(2\pi kt/P) + b_k \sin(2\pi kt/P))$

where:

$K$ is the number of Fourier terms used to model the seasonality
$a_k$ and $b_k$ are the Fourier coefficients
$P$ is the period of the seasonality, which can be automatically inferred from the data or specified by the user.

The holiday component is modeled as a set of indicator variables that take on the value of 1 during holiday periods and 0 otherwise.

$h(t) = \sum_{j=1}^{J} h_j(t)$

where:

$J$ is the number of holidays or other events that affect the time series
$h_j(t)$ is an indicator variable that takes on the value of 1 during holiday $j$ and 0 otherwise.

The parameters of the model ($a$, $b$, $c(t)$, $a_k$, $b_k$, $h_j(t)$, and the error variance) are estimated using a Bayesian approach, which involves specifying prior distributions for the parameters and updating these distributions based on the observed data.

The resulting model can then be used to forecast future values of the time series, with uncertainty estimates provided by the posterior distributions of the model parameters.

# Testing getting tickers

In [4]:
raw_df = pd.read_csv('/Users/jake/PycharmProjects/pythonProject2/data/raw_tickers.csv')
ticker_list = raw_df.values.tolist()
ticker_list = [t[1] for t in ticker_list]
ticker_list

['AACG',
 'AACI',
 'AACIW',
 'AADI',
 'AAL',
 'AAME',
 'AAOI',
 'AAON',
 'AAPL',
 'ABCB',
 'ABCL',
 'ABCM',
 'ABEO',
 'ABIO',
 'ABNB',
 'ABOS',
 'ABSI',
 'ABST',
 'ABUS',
 'ABVC',
 'ACAB',
 'ACABW',
 'ACAC',
 'ACACU',
 'ACACW',
 'ACAD',
 'ACAH',
 'ACAHW',
 'ACAX',
 'ACAXR',
 'ACAXU',
 'ACAXW',
 'ACB',
 'ACBA',
 'ACBAU',
 'ACBAW',
 'ACCD',
 'ACDC',
 'ACDCW',
 'ACER',
 'ACET',
 'ACGL',
 'ACGLN',
 'ACGLO',
 'ACGN',
 'ACHC',
 'ACHL',
 'ACHV',
 'ACIU',
 'ACIW',
 'ACLS',
 'ACLX',
 'ACMR',
 'ACNB',
 'ACNT',
 'ACON',
 'ACONW',
 'ACOR',
 'ACRS',
 'ACRV',
 'ACRX',
 'ACST',
 'ACT',
 'ACTG',
 'ACVA',
 'ACXP',
 'ADAG',
 'ADAL',
 'ADALW',
 'ADAP',
 'ADBE',
 'ADD',
 'ADEA',
 'ADER',
 'ADERW',
 'ADES',
 'ADI',
 'ADIL',
 'ADILW',
 'ADMA',
 'ADMP',
 'ADN',
 'ADNWW',
 'ADOC',
 'ADOCR',
 'ADOCW',
 'ADP',
 'ADPT',
 'ADSE',
 'ADSEW',
 'ADSK',
 'ADTH',
 'ADTHW',
 'ADTN',
 'ADTX',
 'ADUS',
 'ADV',
 'ADVM',
 'ADVWW',
 'ADXN',
 'AEAE',
 'AEAEU',
 'AEAEW',
 'AEHL',
 'AEHR',
 'AEI',
 'AEIS',
 'AEMD',
 'AEP',
 'AE