# Stock Price Prediction

In this tutorial I will teach you how to build a stock prediction model in python!

Goals:
- Use Yahoo Finance
    - import library
- Learn what the data we are using tells us
- Clean, transform, and gather the appropriate data
- Prep data for modeling
- Create our machine learning classifier
- Get our predicted results

### Libraries that will be used
- pandas
    - It provides ready to use high-performance data structures and data analysis tools. Pandas module runs on top of NumPy and it is popularly used for data science and data analytics.
- numpy
    -  Python library used for working with arrays. It is also a general-purpose array-processing package that provides comprehensive mathematical functions, linear algebra routines, Fourier transforms, and more.
- matplotlib
    - A comprehensive library for creating static, animated, and interactive visualizations in Python. 
- yfinance
    - A popular open source library developed by Ran Aroussi as a means to access the financial data available on Yahoo Finance. 
- sklearn
    - a free machine learning library for Python. It features various algorithms like support vector machine, random forests, and k-neighbours, and it also supports Python numerical and scientific libraries like NumPy and SciPy
        - sklearn.svm
            - SVC
        - sklearn.metrics
            - accuracy_score
            - precision_score
        - sklearn.model_selection
            - train_test_split
            - GridSearchCV

In [6]:
try:
    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt
    import yfinance as yf
    from sklearn.svm import SVR
    from sklearn.metrics import(accuracy_score, precision_score)
    from sklearn.model_selection import train_test_split, GridSearchCV
    ## print('[SUCCESSFULLY IMPORTED]')
    ## return nothing for a succesful import
except ImportError as ie:
    print(f'Import Library Error: {ie}')
    
    #pip3 install --upgrade requests



### Functions that will be implemented!

#### Get our ticker symbol data from yahoo finance

In order for us to do any predictions, we need to get our data for the stock of our choice. 

We can do this by passing in the ticker symbol of our choice, start date, end date, and how many days out we want to predict.

Notes:
- Data can be downloaded as far back as 1950 if available
- Date format:
    - YYYY-MM-DD

#### Do some simple data analysis

After we get our data, we can do some simple exploratory analysis using pandas.
- We can call head()
    - to view the top 5 rows of data
- We can call describe
    - to tell us a bit more of our data
- We can call info()
    - to explain to us what are data consists of (data types).

#### Adjust our data

Next, we want to adjust our data appropriately.

Since we are doing a predictive analysis, there will be a couple things we need to do.
- Adjust our data frame to only include the Adj Close values
- Create a variable to store the number of days out to predict
    - This value can change
- Create a new column that will hold the Stock Prediction
    - This will be our TARGET Variable

#### Data Preprocessing

Now that we have our target variable, we can start preparing our data.

For this section, we will do a few things:
- Create an independent dataset to train our models
    - This will be done using numpy array
- Create a dependent dataset (TARGET DATA)
    - This will be done using numpy array
- We then will set up our Train and Test split
- Create our models

#### Create Classifier
For this portion of the project, we will be using a 
**Support Vector Machine Model**.


##### Support Vector Machine Setup
<br/>


Our model will be set up in the following format:
- kernel = 'rbf'
- For our C value
    - C Value range:
        ```Python 
        c_value = [0.1, 1.0, 10.0, 100.0, 1000.0]
        ```
- gamma value range:
```Python
gamma_Values = [1e-2, 1e-3, 1e-4, 1e-5, 1e-6, 1e-7]
```
      
      
**We will then Refit the data using .fit()**

This takes in 2 parameters .fit(1, 2):


1 &rarr; Training vaector where this holds the nunmber of samples


2 &rarr; Target is relative to x for regression
    - From here, I then got the best R^2 score from the best combination of values and used to create the SVm Regression model 
    
 
 
**We will then Get our score of the SVM Regression model**

SCORE will return the coefficient of determination of $R^2$ of the prediction.


This will also take in 2 parameters .score(1, 2). Same as above. Expecpt, this time we use our testing datasets.


We will choose the best combination to create our predictions

In [3]:
class StockPrediction:
    def __init__(self, ticker_symbol, start_date, end_date, forecast_days):
        self.stock = ticker_symbol
        self.SD = start_date
        self.ED = end_date
        self.FD = forecast_days
        
    def get_data(self):
        return yf.download(
            self.stock, 
            start = self.SD,
            end = self.ED,
            progress = False)
    
    def df_head(self):
        return self.get_data().head()
    
    def df_describe(self):
        return self.get_data().describe

    def df_info(self):
        return self.get_data().info()
    
    def adjust_data(self):
        df = self.get_data().copy()
        df = df[['Adj Close']]
        df['Stock Prediction'] = df.loc[:, 'Adj Close'].shift(-self.FD)
        
        return df
    
    def data_preprocess(self):
        X_data = np.array(self.adjust_data().drop(columns = ['Stock Prediction'], axis = 1))
        X_data = X_data[:-self.FD]
        
        Y_data = np.array(self.adjust_data()['Stock Prediction'])
        Y_data = Y_data[:-self.FD]
        
        x_train, x_test, y_train, y_test = train_test_split(X_data, Y_data, test_size = 0.2)
        
        return x_train, x_test, y_train, y_test

    
    def create_classifer(self):
        param_grid = {
            'C': [0.1, 1, 10, 100, 1000],
          'gamma': [1, 0.1, 0.01, 0.001, 0.0001],
          'kernel': ['rbf']
        }
        clf = GridSearchCV(SVR(), param_grid) # refit = True, verbose = 3)
        
        final_clf = clf.fit(self.data_preprocess()[0], self.data_preprocess()[2])
        
        return final_clf
    
    def get_predictions(self):
        df = np.array(self.adjust_data().drop(columns = ['Stock Prediction'], axis = 1))[-self.FD:]
        predictions = self.create_classifer().predict(df)
        print(predictions)
        

In [1]:
def main():
    test_symbols = ['AMZN', 'AAPL']
    sd = '2010-01-01'
    ed = '2022-01-01'
    fd = 2
    sp = StockPrediction(test_symbols[0], sd, ed, fd)
    funcs = [sp.get_data(),
        sp.df_head(),
        sp.df_describe(),
        sp.df_info(),
        sp.adjust_data(),
        sp.get_predictions()]
    print(funcs[4]) # CHANGE NUM TO INDEX VALUE FOR FUNCTION YOU WANT TO SEE
    
    sp.get_predictions() # OUR MAIN TARGET

In [None]:
if __name__ == "__main__":
    main()