# Lesson 1: Getting data from APIs

Build an application that predicts stock volatility
- get stock data from API
- create python classes to extract data from API and load it into database
- build a GARCh time series model to predict volatility
- build an application with its own API for getting data, training models and making predictions.

Goals:
- export stocj data from the Alpha Vantage API using a URL
- Extract stock data from the AlphaVantage API using an HTTP request
- Write function for transforming stock data
- incorporate python exceptions into our function

Accessing APIs through URL
- Identify components of a URL
- Add API key to a config module
- Incorporate AlphaVantage parameters into URL
Accessing APIs throgh an HTTP request
- Define an HTTP request
Defensive Programming for APIs
- Create get_daily function
- Raise Exceptions for bad requests.

It's best to store time series data in descending order for storage, with the most recent date at the top. It is advantageous when retrieving data from an application database. Ascending order for model training for time series data since the earliest date is meant to come first.

In this project, errors that could disrupt the code are
- putting in a bad API path
- putting in an invalid ticker symbol

it's best to  program defensively or implement error-handling in code. to accomplish this, try an anticipate errors that could occur. also, try to inform the user of the error in case the exception is triggered.

lessons learnt:
- learned how to extract data from the Alpha Vantage A|PI using a URL
    - learned about important components of the URL like hostname (base URL), path (particular action you want to do when you hit that URL), parameters (arguments passed into the URL)
- learned about API keys
    - DO NOT put them in your code.
    - use a .env file to store the API key as an environment variable
- Extracted data from the AlphaVantage API programmatically using a HTTP request.
    - used request library to make request and get a response.
- Figured out how to take the JSON and turn it into a datframe
- Cleaned eveything up by putting it all in a functionthat takes input from a user 
- Raise exceptions when we anticipate possible user errors


# Lesson 2: Test-driven development

- create class for interfacing with AlphaVantage API and getting data
- create a SQL repository class for storing and retreiving stock data.
- use test-driven development strategy. 
    - We decide what functionality to include in classes before we start coding 
    - write assert statement to test the work before it is done.
- calculate and compare stock for 2 companies in india by comparing daily returns

## Data module

- AlphaVantage API
    - class definition
- SQLRepository API
    - using assert statements before coding \insert_table method
    - read_table method
- stock returns
    - plotting and comparing closing stock price
    - calculating returns
    - plottinng abd comparing stock returns

using auto_reload extension for dynamic testing in Notebook. 

## Building Data Module

## AlphVantage API Class

- adding a double underscore or '__' as the first 2 characters of a class attribute make it a secret attribute. you can access the attributre but it doesn't appear when you do 'dir'
- turn get_daily function into a class method
- need to create tests to make sure method is working properly
- test driven development (TDD)
    - uou create a series of assert sratements that tests the outputs of functions or methods that you need to write bnefore you write it.
    - when yoiu are done codeing, use those assert statements to confirm that the code functions as expected
    - in the short term it makes things easier and allows one to focuss on end goal as you build the software
    in the longer term, helps with maintenance becasue everytime changes are made to the code the same tests are re-used to make sure everyting is working fine
    - check for the following
        - if the get_daily method returns a dataframe
        - the number of columns of the dataframe
        - whether the dataframe has a datetimeindex\
        - whether the index name is date
        - if all columns have the right column names
        - if all columns have the right data type


## SQL Repository Class

- create a connection to sqlite to create a database
- set check_same_thread to help manage fast APIs (read up on this later)

CLOSING PRICES ARE SO DIFFRENT IN ORDER OF magnitude between the 2 companies that it looks like suzlon is a joke. however, there are other factors to consider.

a way to get around it is to examine stock return as opposed to the closing price.



# Lesson 3: Predicting Volatility with GARCH model

Goals:
- create wrangle_data function for calculating returns
- explore volatility for Ambuja and suzlon stocks
- build GARCH model to predict Ambuja volatility
- create a clean_prediction function for formating model predictions



workflow:
- prepare data
    - import: sql repository, wrangle_data
    - explore
        - time series and non-time-series plots
        - squared returns (need this to create ACF and PACF plots)
        - split
    - mean, variance, standard deviation
    - rolling window for volatility
- build model
    - iterate: GARCH model, standardized residuals
    - evaluate: walk forward validation
    - conditional volatility
- communicate results
    - convert timestamp to ISO 8601
    - clean prediction model

conditional vs un-conditional volatility
- look at volatility accorss time and divorced from time
- time series plot shows volatility across time. past to present. 
    - this can also be called conditional volatility
    - that is volatility that changes aacross time
- unconditional volatility
    - looking at the measured value without considering time
    - volatility is the same as standard deviation they are just in different fields
    - there are 252 trading days out of 365 in a year.
    - annual volatility is bigger than daily volatility becasue day to day stocks can change or have an extreme changes. but adding uip all those changes in the timespan of a year would produce a large amount of volatility.
    - calculated daily and annual volatilities.
- conditional volatility
    - a great way to examine how volatility can change across time is a rolling window.
    - the return is the percent change in a stock from day to day
    - volatility during covid was more
    - as spread goes wider volatility goes u. as spread come closer volatility goes down.
- autocorrelation?
    - need to consider if there is a relationship beteen  daily return on one day and the day before that or that day and the day before that.
    - interested in the size of the jump in values of either positive or negative daily returns.
    - a good way to handle this would be to square the values.
    - the squared returns represent volatility as a magnitude or height above 0 now that the daily returns were squared. the latter represented a spread centered around 0.
    - Periods of high and low volatitility that tend to cluster together.
    - this pattern shows why a GARCH Model is a good fit.
- ACF
     - find out if there is a relationship or a correlation between the volatility on one day and the volatility on days prior. or prior timestamps or lags.
- PACF
    - really tells us how many lags we need
    - according to the pacf, it looks like a lag of 2 or 3 would be a good starting point.
- split
    - in time series data we don't randomly assign data to train or test because time only moves in one direction.
    - take first 80% of observations as training set. remaining 20% as test set.
    

## Build Model

### Iterate

- build model
    - how GARCH works
        - common in finance for predicting volatility
        - stands for generalised auto regressive conditional heteroskedasticity
        - main points
            - it does not predict volatility, it predicts variance.
            - variance is the square of standard deviation
            - the auto regressive part means making a prediction about the future based on information from the past.
            - the conditional heteroskedasticity means that the models prediction can change over time. that is it'll be different at each timestamp which is good because volatility of ambuja cement also changes over time.
        - Garch equation: interest is predicting the volatility of a stock on a given day or standard deviation of a stock's returns.
            - sigma^2 subscript t: variance/volatility for a specific time.
            - omega (small w): this is the longrun average variance of the stock returns. steady component of a stocks variance. doesn't change from day to day.it's always there contributing to the overall variance of the stock. the model estimates this when it is fitted to the training data.
            - alpha: all about task information. where you put returns from previous timestamps. the part of the model that takes that into account. it is a coefficient that the model needs to estimate based on the training.
            - beta: all about past predictions. where you put the model's predictions from previous time stamps. it is also a coefficient made from trainign the model
            - by estimating the values for omega, alpha and beta the model is trying to figure out what balance between these 3 leads to the best predictions overall.
            - these coefficients are going to be very small
            - this equation is for a GARCH (1,1) model which means 1 lag for alpha term, 1 lag for beta term.
            - based on the results from the pacf plot we will use a garch (3,3)
    - the model parameters 
        - for the arch model are the raining set, p and q (the lags we want for p and q)
        - p is the number of alpha terms we have in the equation
        - q is the number of beta terms in the equation
        - rescale is set to false because it deals with how the model fits
    - model summary
        - aic and bic scores are measurements of how well the model fits the data and the balance of the complexity of the model. they should be as low as possible.
        - the p scores tells us the level of statistical significance of the coefficients
        - in any model that uses coefficients will have a statistical significance for each coeffiecient. this applies to regression, ARMA anything. the goal is to have a model that has every coefficient being statistically significant.
        - the p scores should all be lower than 0.05 to be statistically significant.
        - the lags were changed from 3 to 2 to 1 mainly because not all p scores were statistically significant. also with a lag of 1 the aic and bic of the model were reduced to the lowest. so at lag 1 the model performed its best.
        - this shows that the best bet is to have a garch model that is (1,1). 1 lag for p and 1 lag for q. which is the most common in finance.
    - model prediction plots
        - plot predicts to see if it mimics the volatility that we see in the daily returs
        - plot training data together with model predictions
        - model is more or less following the data.
        - volatility can go both ways. it can be a big jump or a big drop.
        - to see if the mnodel actually follows the volatilty of the data, we can raise it but multiplying the values by 2. 
        - whether positive or negative the model is mimicking the volatility of the returns.
    - standardized residuals
        - with a garch model you don't examine normal residuals. you look at standardized residuals
        - a look at residuals across time
        - there should be no trends, it sould just be consistent static whcih ,eans it has a consistent mean and spread over time.
    - histograms of standardized residuals
        - look at residuals divorced from time to see if they have a normal distribution.
        - histogram is centered on zero with a normal distribution.
        - need to check for any auto-correlation.
    - ACF of standardized residuals
        - check if there are any remaining autocorellations in the residuals.
        - since resifuals are positive are negative, we need to square them to make all of them positive
        - all residuals are in the blue bar whichy means nothing is significant.



### Evaluate

- time to make forcasts for dates the model did not see during training

- one day forecast
    - this is to get the predicted volatility of the next day after the last day in the training set.
    - horizon argument represents how many days you want to predict into the future.
    - reindex argument is set to false to return a smaller object
    - result is a dataframe with 2 components
        - a datetime index with the only date being the last date in the training set
        - one feature, which is a prediction for the next day. specifically, it is thje prediction for the last day of the training data with a horizon of 1.
        - need just that prediction from the model.
        - the model produces the variance but the focus is on volatility which is otherwise know as standard deviation. Std is the square root of variance. 

## Communicate Results

- walk forward validation
    - 20% for test set
    - the for loop 
        - looks at the data in the training set and with each successive loop you add one observation from the test set to y_train
        - with each loop we retrain the model with the y_train
        - generate next prediction
        - append prediction to predictions list
    - gtet predictions into a series with a datetime index.
    - predicted volatility resembles the returns. in times of low volatility the predicted values reduce and in times of high volatility the predicted values increase.
    - jung box test is another method used to evaluate data like this.


- Format timestamps
    - need to take forecasts and reformat tyhem to work with web application. 
    - make sure all dates are in the right format.
    - get 5 day forecast, get first day by looking at index and adding a day
    - create a date range that starts at start date and goes for how many perios there are oin our prediction dataframe
    - convert dates to ISO format
- Clean_prediction Function
    - extract values from prediction and then flatten the resulting numpy array.
    - find the squareroot of the values to calculate the standard deviation or volatility.


## Summary

- created a wrangle data function to get stock data from database and calculate returns
- explored the returns for the 2 companies and leaned what volatility is, which is the standard deviation of the returns of a stock
- learned to think about volatility in a conditional and unconditional sense
- learned about garch model and its 3 parts: long range variance, previous returns and previous predictions of those returns
- created a clean prediction function for formating model predictions into a dictionary.

# Lesson 4: Model Deployment

- GOALS
    - create a garchmodel class with methods for wrangling data, training, predicting, saving, and loading models.
    - build web API using FastAPI and uvicorn
    - build data classes for API using pydantic
    - create API paths for training ,odel and serving predictions.

- model module
    - create garch model class definition
    - add wrangle data model
    - add fit and predict methods
    - add dump and load methods
    - error handling: ttry, except bloacks
    - model permanence: dump, load

- main module
    - create FsdtAPI application with "hello world" example
    - add "/fit" path to application, with data classes
    - add "/predict" path to application with data classes
    - Application path / endpoint
    - Data classes: inheritance, type hints

## Model Module

- GarchMOdel class
    - a garch model for a new indian company
    - check to see if new model has a data attribute
    - dump method
        - a way to save the model to a pickle file
        - every time we save a model it wil save a file in the model's directory with the exact time the model was saved with the ticker symbol for that model.
    - load method
        - goal is to write a method that will load the last saved model
        - use glod library to search for specific file paths
        - use error handling to account to models that have not been save yet



## Main Module

- launch server
    - run server locally with uvicorn library
    - to lauch the app locally use:
        - uvicorn main:app --reload --workers 1 --host localhost --port 8008
        - uvicorn launches the app
        - reload is for automatic reloads
        - number of workers is for parallel processes but with reload set to 1
        - send request to port 8008
- Hello path
    - 