## Section 3.1: Extract data from Yahoo! Finance page
<hr>

This notebook does three things:
 - Load the ```yfinance``` module and the tickers from the output file of the previous notebook
 - Gets the top N tickers as specified by the user and gets the corresponding open price, closing price and volume of the stocks for the last 30 days.
 - Stores the values in a CSV file for further analysis and vizualization

In [17]:
import pandas as pd
from datetime import date, timedelta

In [18]:
#Loading yfinance module and installing it if it doesn't already exist
try:
    success_msg = 'Yahoo! Finance module loaded.'
    import yfinance as yf
    print(success_msg)
except ModuleNotFoundError as e:
    print('Installing Yahoo! Finance python module.\n')
    !pip install yfinance
    import yfinance as yf
    print(success_msg)

Yahoo! Finance module loaded.


In [19]:
# load the tickers dataframe from the previous notebook's output csv file
df = pd.read_csv('./Outputfile.csv')

In [20]:
df.head()

Unnamed: 0,Date,Title,Ticker,Article URL
0,2021-11-27 12:14:00,Tendances crypto dans l'iGaming : aperçus de ...,,https://www.prnewswire.com/news-releases/tenda...
1,2021-11-27 12:09:00,Krypto-Trends im iGaming: SOFTSWISS Insights,,https://www.prnewswire.com/news-releases/krypt...
2,2021-11-27 11:13:00,GAC MOTOR comemora um ano no Chile,,https://www.prnewswire.com/news-releases/gac-m...
3,2021-11-27 11:00:00,BlueVine Announces U.S. Small Business Resili...,,https://www.prnewswire.com/news-releases/bluev...
4,2021-11-27 09:30:00,Secondary Market Research Sourcing and Procur...,,https://www.prnewswire.com/news-releases/secon...


In [21]:
# Get the number of stocks the user wants to analyze
num_of_stocks = int(input("How many stocks do you want to analyze : "))

How many stocks do you want to analyze : 5


In [22]:
# Get the num_of_stocks number of highest occuring tickers from the tickers dataframe 
tickers = df['Ticker'].dropna()
top_tickers = tickers.value_counts()[:num_of_stocks].index.tolist()
print(top_tickers)

['CTXS', 'CPLG', 'GOTU', 'SNAP', 'STNE']


Now that we have got the top N tickers we are going to loop through the list og tickers and download the last 1 month historical data using the `yfinance.download()` function. Then we add in the date column and ticker column to the dataset for the purpose of vizualization. Then we merge all the list of stocks and save it in a csv file.

In [23]:
def get_histortical_data(top_tickers):
    # create an empty list to store the historical stock details temporarily.
    temp_dataframes = []
    # loop through the list of top N tickers and download the historical data (last 1 month) for each
    for tick in top_tickers:
        temp_stock = yf.download(tick, start=(date.today()-timedelta(days=30)).isoformat(), end=date.today())

        # add the date column to the dataframe 
        date_list = temp_stock.index
        temp_stock['Date'] = date_list
        temp_stock['Date'] = pd.to_datetime(temp_stock['Date']).dt.strftime('%Y-%m-%d')

        # add the ticker column to the dataframe
        temp_stock['Ticker'] = tick
        # append the dataframe to the the temp_dataframes list
        temp_dataframes.append(temp_stock)

    # concatenate the list into a pandas dataframe and store it as CSV
    all_tickers = pd.concat(temp_dataframes)
    all_tickers.to_csv('Tickers.csv', index=False)
    


In [24]:
get_histortical_data(top_tickers)

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed


### Prediction 
<hr>

Now that we have our historical data let us predict the future of the first highest occuring stock.

In [31]:
# get the highest occuring stock
predict_ticker = tickers.value_counts()[:1].index.tolist()
# get the dataset of that particular stock from the Tickers.csv file
predict_df = pd.read_csv('Tickers.csv')
predict_df = predict_df[predict_df['Ticker'] == "".join(predict_ticker)]
predict_df.head()

Unnamed: 0,Open,High,Low,Close,Adj Close,Volume,Date,Ticker
0,90.529999,91.889999,89.580002,91.510002,91.082764,1852300,2021-11-08,CTXS
1,91.989998,92.169998,90.010002,90.349998,89.928177,1675800,2021-11-09,CTXS
2,90.160004,90.510002,88.360001,89.040001,88.62429,1277000,2021-11-10,CTXS
3,89.110001,89.400002,87.230003,87.360001,86.952133,918100,2021-11-11,CTXS
4,87.440002,89.989998,87.440002,89.68,89.261307,2319700,2021-11-12,CTXS


In [36]:
top_ticker_close = list(predict_df['Close'])

# check if the data is strictly increasing. 
if(all(i < j for i, j in zip(top_ticker_close, top_ticker_close[1:]))):
    print("========================= BUY STOCK !!!! =========================")
# check if the data is strictly decreasing. 
elif(all( i > j for i, j in zip(top_ticker_close, top_ticker_close[1:]))):
    print("========================= DONT BUY STOCK !!!! =========================")
else:
    print("========================= WAIT BEFORE BUYING STOCK !!!! =========================")

