## Libraries

### Import Data Analytics Libraries
- Pandas
- Numpy
- Matplotlib
- Datetime

In [109]:
import pandas as pd
import numpy as np
import datetime as dt
from datetime import datetime
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt

### Import Project Specific Libraries
- **Yahoo Finance (yfinance)**: Gather ticker data
- **Backtrader Technical Analysis Library (bta-lib)**: Gather trends and pattern indicators

In [110]:
import btalib
import yfinance as yf

### Import Web Scraping Libraries
- **requests**: Gather web data
- **urllib**: Gather web data
- **BeautifulSoup**: Extract web data

In [111]:
import requests
from urllib.request import urlopen, Request
from bs4 import BeautifulSoup

### Import Deep Learning Libraries
- **PyTorch**: Create neutral networks


In [112]:
import torch

### Import Natural Language Processing Tools
- **transformers**: Use pre-trained deep learning models

In [113]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification

### Import API Keys
- **config**: Holds all private API keys


In [114]:
from config import *

## Data Collection

We will be using Yahoo Finance in order to find the ticker data. This data will include:

- **Date/DateTime**: Index
- **Open**: Price of asset at beginning of Date/Datetime
- **High**: Highest price of asset during Date/Datetime
- **Low**: Lowest price of asset during Date/Datetime
- **Close**: Price of stock at end of Date/Datetime
- **Adj Close**: Close price adjusted due to corporate actions such as dividend payouts, stock splits, or the issuance of more shares

In [115]:
def get_data(symbol, interval = "1d", start_date = "2019-10-26", end_date = "2022-01-26"):
   
    '''
    Returns asset information

            Parameters:
                    symbol (string): ticker symbol for lookup
                    interval (string): periods between datapoints (valid intervals: 1m,2m,5m,15m,30m,60m,90m,1h,1d,5d,1wk,1mo,3mo)
                    start_date (string): format YYYY-MM-DD
                    end_date (string): format YYYY-MM-DD
            Returns:
                    (DateFrame): Asset Information  
    '''

    return yf.download(symbol, start=start_date, end=end_date, interval=interval)

In [116]:
data = get_data("AAPL")
data.head()

[*********************100%***********************]  1 of 1 completed


Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2019-10-28,61.855,62.3125,61.68,62.262501,61.167545,96572800
2019-10-29,62.2425,62.4375,60.642502,60.822498,59.752869,142839600
2019-10-30,61.189999,61.325001,60.302502,60.814999,59.745502,124522000
2019-10-31,61.810001,62.2925,59.314999,62.189999,61.096321,139162000
2019-11-01,62.384998,63.982498,62.290001,63.955002,62.830288,151125200


# Feature Construction

## Technical Analysis

A part of this study is to try and see if a model can use technical analysis (TA) as a way to create a trading strategy

Technical analysis is the study of trends and patterns in order to predict profitable trading decisions. There are 4 types of indictors we will in this study:

- trend indictors
- momentum indicators
- volatility indicators
- volume indicators. 

Typically, traders use only a 2-3 indictors for their strategy, however, our job is to also see if a model can determine which indictors are more important when making these decisions as well.

In order to add TA to our data, we will be using bta-lib.

Below, we will go in depth into a few indictors and how they are used

### Simple Moving Average

The simple moving average indictor is a trend indictor that finds the average price within a certain window of time. This helps smooth out the noise in price changes and helps trades seen the general trend of the asset's price.

Many traders also use two different moving averages on the same asset. One moving average would have a longer period than the other. In order to buy, the shorter moving average would have to greater than the longer moving average, indicating that there is a rapid change in price upwards. The same is true vice versa.

We will test if the model could potentially recognize this pattern and gauge if it is useful.



![SMA](images/SMA.PNG)

This chart shows Apple stock prices using the moving average indictors. The purple line shows the short term moving average of 12 days and the yellow line shows the long term moving average of 26 days. The green "up" arrow shows where a trader would buy and the red "down" arrow shows where a trader would sell using this strategy.

In [117]:
def get_MA(data, short_period=12, long_period=26):

    '''
    Adds Moving Averages to asset data

        Parameters:
                data (DataFrame): asset data
                short_period (int): period over moving average taken (<long_period)
                long_period (int): period over moving average taken (>short_period)

        Returns:
                (DateFrame): Asset Information
    '''

    # get Moving Averages
    ShortMA = btalib.sma(data, period=short_period)
    LongMA = btalib.sma(data, period=long_period)

    # add to current data
    data["ShortMA"] = ShortMA.df
    data["LongMA"] = LongMA.df

    return data

### Exponential Moving Average

The exponential moving average is similar to the simple moving average, however, it gives higher weigh to more recent changes within the window of time. This could better help find short term price trends than the simple moving average.

Note: may only use this rather than the simple moving average for the model.

![EMA](images/EMA.PNG)

This chart shows Apple stock prices using the exponential moving average indictors. The purple line shows the short term moving average of 12 days and the yellow line shows the long term moving average of 26 days. The green "up" arrow shows where a trader would buy and the red "down" arrow shows where a trader would sell using this strategy.

In [118]:
def get_EMA(data, short_period=12, long_period=26):
    '''
    Adds Exponential Moving Averages to asset data

        Parameters:
                data (DataFrame): asset data
                short_period (int): period over moving average taken (<long_period)
                long_period (int): period over moving average taken (>short_period)

        Returns:
                (DateFrame): Asset Information 
    '''
    # get Exponential Moving Averages
    ShortEMA = btalib.ema(data, period=12)
    LongEMA = btalib.ema(data, period=26)
    
    # add to data
    data["ShortEMA"] = ShortEMA.df
    data["LongEMA"] = LongEMA.df

    return data

### Bollinger Bands


The Bollinger Bands indictors is a kind of volatility indictor which uses the moving average over a portion of days and finds a certain standard deviation above and below that moving average. This indictor relies on the theory of mean reversion where asset prices, after spikes in price changes, revert towards the moving average of price. This indictor also shows how volatile an asset is; if the bands are farther apart, then more volatile the asset is. Traders may use this strategy by buying an asset when the price reaches the lower standard deviation and then sell at the higher standard deviation. Usually, traders use 2 standard deviations away from the mean and have a moving average window of 20 periods.

![Simple Bollinger Bands](images/simpleBBands.PNG)

The purple lines are two standard deviations and the yellow line is the moving average.

Of course, this strategy by itself or unaltered is not perfect, as it may miss out on trends. However, with stop losses (selling if price drops below certain value), band line slopes, and pattern recognition, this strategy may help the model learn more about correct buy and sell signals.

Another possible way to improve this strategy is to add another set of Bollinger Bands that have a different standard deviation.

![Double Bollienger Bands](images/doubleBBands.PNG)

Now the added blue lines are one standard deviation away from moving average and in this case, a strategy could be that a trader could buy at the lower wider (2 std) line and sell when the price touches the upper wider line and then falls below the upper narrow line (1 std)

More research is needed to be done in this case but this might help model learn.

In [130]:
def get_BBands(data, period=20, wider_std=2, narrow_std=1):
    
    '''
    Adds Bollinger Bands to asset data

        Parameters:
                data (DataFrame): ticker symbol for lookup
                period (int): period over moving average taken
                wider_std (float): standard deviation of wider bands (>narrow_std)
                narrow_std (float): standard deviation of narrow bands (<wider_std)


        Returns:
                (DateFrame): Asset Information 
    '''

    try:
        # get Bollinger Bands (regular and narrowed)
        mid, top, bot = btalib.bbands(data, period = period, devs = wider_std)
        mid_narrow, top_narrow, bot_narrow = btalib.bbands(data, period = period, devs = narrow_std)
    except ValueError:
        print("Data broken...check timeframe")
        return None

    # add to asset data
    data["Mid BBand"] = list(mid)
    data["Top BBand"] = list(top)
    data["Bot BBand"] = list(bot)
    data["Volatility"] = data["Top BBand"] - data["Bot BBand"]
    
    data["Mid BBand Narrow"] = list(mid_narrow)
    data["Top BBand Narrow"] = list(top_narrow)
    data["Bot BBand Narrow"] = list(bot_narrow)
    
    return data

### Relative Strength Index

The Relative Strength Index (RSI) is a momentum indictor that shows how oversold or overbought an asset is. Typically traders would sell when RSI is above 70 as the asset would be considered overbought and overvalued and buy when RSI is below 30 as the asset would then be considered oversold and undervalued. It is calculated using 

$100-(100/(1+(Avg Gain/Avg Loss)))$ 

This average gain and loss is computed over a given period. This period is typically 14 days.

![RSI](images/RSI.PNG)

Here we see that RSI does a decent job at capturing trends. The goal of the model is to create its own interpretation of RSI if useful for modeling

In [120]:
def get_RSI(data, period=14):
    '''
    Adds RSI to asset data

        Parameters:
                data (DataFrame): ticker symbol for lookup
                period (int): period over averages taken

        Returns:
                (DateFrame): Asset Information 
    '''
    rsi = btalib.rsi(data, period = period)

    data["RSI"] = rsi.df

    return data


### Moving Average Convergence Divergence

The Moving Average Convergence Divergence (MACD) is a trend/momentum indictor that finds the difference between two exponential moving averages (EMA). The MACD provides 3 metrics: MACD line, Signal line, and a histogram. The histogram represents how much the difference is between the two EMA's. Traders would usually buy when the MACD line crosses above the Signal line. In the case of the code, the model should buy when MACD changes from negative to positive and sell vice versa. The typical EMA periods are 12 periods and 26 periods.

![MACD](images/MACD.PNG)

In the chart, the blue line is the MACD line and the orange line is the signal line. We see that this can see trends.

In [121]:
def get_MACD(data):
    '''
    Adds MACD to asset data

        Parameters:
                data (DataFrame): ticker symbol for lookup

        Returns:
                (DateFrame): Asset Information 
    '''
    MACD = data["ShortEMA"] - data["LongEMA"]

    data["MACD"] = MACD
    
    return data

### Rate of Change

Rate of change (ROC) is a momentum indictor that gets the percent difference between the current price and a price from a past period. If ROC changes from negative to positive, then buy, however, if ROC changes from positive to negative, then sell. Typically, traders use the current price and the last 12th price to calculate this value.

![ROC](images/ROC.PNG)

Using ROC, we are able to pick up trends as well. The ROC chart is the blue line chart below price chart.

In [122]:
def get_ROC(data, period=12):
    '''
    Adds Rate of Change to asset data

        Parameters:
                data (DataFrame): ticker symbol for lookup

        Returns:
                (DateFrame): Asset Information 
    '''
    roc = btalib.roc(data, period=period)
    data["ROC"] = roc.df
    
    return data


## Market Correlation

### Standard & Poor's 500

The Standard & Poor's 500 Index (S&P 500) is considered to be a measure of market health. This is because the S&P 500 tracks the performance of the top 500 companies in the US market. If these companies are not doing well, the S&P 500 will drop and also be a good indictor that the rest of the market may not being doing well either. This can be used to compare to the current stock and if the trend of the S&P 500 is downwards, then it is also likely that the asset in question also has a downward trend.

In [123]:
def get_spy_return(data):
    '''
    Adds S&P 500 to asset data

        Parameters:
                data (DataFrame): ticker symbol for lookup

        Returns:
                (DateFrame): Asset Information 
    '''
    spy_data = get_data("SPY", spy=True)
    closes = spy_data["Close"]

    returns = [y - x for x,y in zip(closes,closes[3:])]
    data["SPY_returns"] = returns
    
    return data

### Treasury Bond Yield

Treasury bonds (T-bonds) are government based, "zero-risk," bonds. Treasury bonds prices are inversely correlated with interest rates. If interest rates rise, then bond prices will fall and investors would rather save their money rather than invest, which also causes the demand in the market to fall. In order to increase demand for these bonds and the market during times of high interest rates, bonds also have higher yields.

This means that yields are correlated with interest rates. If we can capture the treasury bond yield, then we will also know the trend in interest rates and may make better market decisions. In this case, we will be using the 10-year treasury bond yield rates as a feature in our model.


In [124]:
def get_t_bond(data):
    '''
    Adds T Bond Yield to asset data

        Parameters:
                data (DataFrame): ticker symbol for lookup

        Returns:
                (DateFrame): Asset Information 
    '''
    t_data = get_data("^TNX")
    data["Treasury_Yield_10_Years"] = t_data["Close"]
    return data

## Market Sentiment


### Get Asset News Headlines

Many investors look towards the news in order to find out how well the asset is performing and what future price changes may look like. We can scrape these new articles using an API and website. The API we will be using is the Finnhub API. The Finnhub API has a feature that allows for historical headlines however there are not many and they do not go far back in time. The Finviz website gives daily news updates for any ticket symbol that can be scraped. These two sources of new headlines can be used for sentiment analysis.

In [125]:
def get_past_headlines(symbol, end_date = "2022-01-21"):
    end_date_news = datetime.strptime(end_date, '%Y-%m-%d')
    start_news_date = end_date_news - dt.timedelta(days=365)
    start_news_date  = start_news_date.strftime("%Y-%m-%d")
    r = requests.get('https://finnhub.io/api/v1/company-news?symbol={}&from={}&to{}&token={}'.format(symbol, start_news_date, end_date, FINNHUB_API_KEY))
    df = pd.DataFrame()
    df["datetime"]=pd.to_datetime([i["datetime"] for i in r.json()], unit='s')
    df["datetime"] = df["datetime"].dt.date
    df["headline"] = [i["headline"] for i in r.json()]
    df["summary"] = [i["summary"] for i in r.json()]

    return df

def get_current_headlines(symbol):

    finwiz_url = 'https://finviz.com/quote.ashx?t='

    url = finwiz_url + symbol
    req = Request(url=url,headers={'user-agent': 'my-app/0.0.1'}) 
    response = urlopen(req)    
    # Read the contents of the file into 'html'
    html = BeautifulSoup(response, "lxml")
    # Find 'news-table' in the Soup and load it into 'news_table'
    news_table = html.find(id='news-table')
    
    parsed_news = []

    for x in news_table.findAll('tr'):
        # read the text from each tr tag into text
        # get text from a only
        text = x.a.get_text() 
        
        # split text in the td tag into a list 
        date_scrape = x.td.text.split()
        # if the length of 'date_scrape' is 1, load 'time' as the only element

        if len(date_scrape) == 2:
            date = date_scrape[0]
            time = date_scrape[1]

            date = datetime.strptime(date, '%b-%d-%y').date()
 
        else:
            date = parsed_news[-1][0]
            time = date_scrape[0]

        summary = ""
        parsed_news.append([date,text,summary])
    
    news = pd.DataFrame(parsed_news, columns=["datetime", "headline", "summary"])

    return news

In [134]:
current = get_current_headlines("AAPL")
past = get_past_headlines("AAPL")

news = pd.concat([current, past])

In [135]:
news

Unnamed: 0,datetime,headline,summary
0,2022-01-26,Wednesdays Tech Rebound Busted by More Hawkish...,
1,2022-01-26,Tech Stock Rally Evaporates as Powell Goes Mor...,
2,2022-01-26,Can Continued Services Growth Aid Apple's (AAP...,
3,2022-01-26,Retail stock trading slower but still above pr...,
4,2022-01-26,Apple Trending Lower as Quarterly Report Appro...,
...,...,...,...
219,2022-01-21,"US STOCKS-S&P 500, Nasdaq post worst weeks sin...",Wall Street's main indexes ended sharply lower...
220,2022-01-21,"After Hours Most Active for Jan 21, 2022 : VC...",The NASDAQ 100 After Hours Indicator is up 2.4...
221,2022-01-21,US STOCKS-Wall Street ends week down as Netfli...,Wall Street's main indexes ended sharply lower...
222,2022-01-21,"Amazon, Facebook set Washington lobbying spend...",Amazon.com with its subsidiaries and Facebook'...


### Get News Sentiment
Using the new headlines pulled from the API and website, we can get the sentiment of the asset on each period. We can use a pre-trained BERT model to get the sentiment of the headline's text. There is a BERT model that is trained on financial data. We will be using this model. This sentiment may be able to help the model make better trading decisions. 

In [127]:
def get_sentiment(data, news):
    
    tokenizer = AutoTokenizer.from_pretrained("ProsusAI/finbert")
    model = AutoModelForSequenceClassification.from_pretrained("ProsusAI/finbert")



    inputs = tokenizer(news["headline"].tolist(), padding = True, truncation = True, return_tensors='pt')

    outputs = model(**inputs)

    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)

    positive = predictions[:, 0].tolist()
    negative = predictions[:, 1].tolist()
    neutral = predictions[:, 2].tolist()

    table = {'Headline':news["headline"].tolist(),
            "Positive":positive,
            "Negative":negative, 
            "Neutral":neutral}
        
    df = pd.DataFrame(table, columns = ["Headline", "Positive", "Negative", "Neutral"])
    news["Sentiment"] = df["Positive"] - df["Negative"]
    
    news.drop(['headline', 'summary'], axis=1, inplace=True)

    result = news.groupby(['datetime']).mean()
    
    dates_in_data = [i.strftime('%Y-%m-%d') for i in data.index]

    
    result.index = [i.strftime('%Y-%m-%d') for i in result.index]

    result = result[result.index.isin(dates_in_data)]
    number_missing = data.shape[0]-len(result)
    missing = [0] * number_missing
    senti = missing + result["Sentiment"].tolist()

    data["Sentiment"] = senti

    
    return data

In [128]:
def get_TA(data):
    data = get_BBands(data)
    data = get_RSI(data)
    data = get_EMA(data)
    data = get_MA(data)
    data = get_MACD(data)
    data = get_ROC(data)

    #data = get_spy_return(data)
    data = get_t_bond(data)   

    return data

In [132]:
data = get_data("AAPL")
data.head()

[*********************100%***********************]  1 of 1 completed


Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2019-10-28,61.855,62.3125,61.68,62.262501,61.167545,96572800
2019-10-29,62.2425,62.4375,60.642502,60.822498,59.752865,142839600
2019-10-30,61.189999,61.325001,60.302502,60.814999,59.745495,124522000
2019-10-31,61.810001,62.2925,59.314999,62.189999,61.096321,139162000
2019-11-01,62.384998,63.982498,62.290001,63.955002,62.83028,151125200


In [136]:
data = get_TA(data)

current = get_current_headlines("AAPL")
past = get_past_headlines("AAPL")

news = pd.concat([current, past])
data = get_sentiment(data, news)

[*********************100%***********************]  1 of 1 completed


In [140]:
data[23:]

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume,Mid BBand,Top BBand,Bot BBand,Volatility,...,Bot BBand Narrow,RSI,ShortEMA,LongEMA,ShortMA,LongMA,MACD,ROC,Treasury_Yield_10_Years,Sentiment
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2019-11-29,66.650002,67.000000,66.474998,66.812500,65.834579,46617600,65.630126,66.535044,64.725208,1.809836,...,64.725208,67.089168,65.976615,,66.228959,,,2.019396,1.776,0.000000
2019-12-02,66.817497,67.062500,65.862503,66.040001,65.073395,94487200,65.734376,66.556633,64.912119,1.644514,...,64.912119,60.504180,65.986366,,66.222501,,,-0.117215,1.836,0.000000
2019-12-03,64.577499,64.882500,64.072502,64.862503,63.913124,114430400,65.758751,66.546866,64.970636,1.576230,...,64.970636,52.108465,65.813464,64.984424,66.156043,64.984424,0.829041,-1.214591,1.709,0.000000
2019-12-04,65.267502,65.827499,65.169998,65.434998,64.477257,67181600,65.816376,66.533369,65.099382,1.433987,...,65.099382,55.352359,65.755239,65.017800,66.072292,65.106443,0.737439,-1.512650,1.781,0.000000
2019-12-05,65.947502,66.472504,65.682503,66.394997,65.423195,74424400,65.920626,66.558194,65.283057,1.275136,...,65.283057,60.218386,65.853663,65.119814,66.040625,65.320770,0.733849,-0.569083,1.797,0.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-01-19,170.000000,171.080002,165.940002,166.229996,166.229996,94815000,175.228500,179.100765,171.356235,7.744530,...,171.356235,39.999687,172.592141,172.339166,173.740833,174.888077,0.252974,-6.386220,1.827,0.000000
2022-01-20,166.979996,169.679993,164.179993,164.509995,164.509995,91420500,174.804500,179.310988,170.298012,9.012976,...,170.298012,37.820994,171.348734,171.759228,172.282500,174.456154,-0.410494,-9.614857,1.833,0.000000
2022-01-21,164.419998,166.330002,162.300003,162.410004,162.410004,122501300,174.143000,179.388677,168.897323,10.491354,...,168.897323,35.293406,169.973544,171.066692,170.841667,173.997692,-1.093148,-9.621588,1.747,-0.251023
2022-01-24,160.020004,162.300003,154.699997,161.619995,161.619995,162706700,173.410000,179.291563,167.528437,11.763126,...,167.528437,34.363019,168.688383,170.366937,169.733334,173.317692,-1.678554,-7.603478,1.735,-0.111076


In [163]:
def get_labels(data, window_size=11):
    counter_row = 0 #counterRow
    num_periods = len(data) #numberOfDaysInFile
    labels = np.zeros(num_periods)
    labels[:] = np.nan

    while(counter_row < num_periods):
        counter_row += 1
        if(counter_row > window_size):
            window_begin_index = counter_row - window_size
            window_ending_index = counter_row
            window_middle_index = (window_ending_index + window_begin_index) // 2

            min_number = np.inf
            min_index = -1

            max_number = -1 #price can not be negative?
            max_index = -1
        
            for i in range(window_begin_index, window_ending_index):
                number = data.iloc[i]["Close"]
                if number < min_number:
                    min_number = number
                    min_index = i
                if number > max_number:
                    max_number = number
                    max_index = i
            if(max_index == window_middle_index):
                labels[window_middle_index] = 0
            elif(min_index == window_middle_index):
                labels[window_middle_index] = 1
            else:
                labels[window_middle_index] = 2

    data["Target"] = labels
    data["Target"] =  data["Target"].fillna(2)
    return data

In [164]:
data = get_labels(data)
data.head()

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume,Mid BBand,Top BBand,Bot BBand,Volatility,...,RSI,ShortEMA,LongEMA,ShortMA,LongMA,MACD,ROC,Treasury_Yield_10_Years,Sentiment,Target
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2019-10-28,61.855,62.3125,61.68,62.262501,61.167545,96572800,,,,,...,,,,,,,,1.853,0.0,2.0
2019-10-29,62.2425,62.4375,60.642502,60.822498,59.752865,142839600,,,,,...,,,,,,,,1.835,0.0,2.0
2019-10-30,61.189999,61.325001,60.302502,60.814999,59.745495,124522000,,,,,...,,,,,,,,1.798,0.0,2.0
2019-10-31,61.810001,62.2925,59.314999,62.189999,61.096321,139162000,,,,,...,,,,,,,,1.691,0.0,2.0
2019-11-01,62.384998,63.982498,62.290001,63.955002,62.83028,151125200,,,,,...,,,,,,,,1.728,0.0,2.0


In [None]:
def plot_labels(data, symbol):

    df = data.copy()
    df["Buy"] = [df.iloc[i]["Close"] if df.iloc[i]["Target"] == 1.0 else np.nan for i in range(len(df["Target"]))]
    df["Sell"] = [df.iloc[i]["Close"] if df.iloc[i]["Target"] == 0.0 else np.nan for i in range(len(df["Target"]))]

    #df["Sell"] = [df["Close"] if i == 0.0 else np.nan for i in df["Target"]]
    fig = plt.figure(figsize=(8, 6))
    plt.plot(df["Close"], color = "blue", alpha = .5)
    
    if(not df["Buy"].isnull().all()):
        plt.scatter(df.index, df["Buy"], color = 'green', marker="^", alpha=1)
    if(not df["Sell"].isnull().all()):
        plt.scatter(df.index, df["Sell"], color = 'red', marker="v", alpha=1)
    
    fig.savefig('ML_charts/{}_target_chart.png'.format(symbol))
    plt.close(fig)