To run ARIMA-LSTM/SARIMAX-LSTM prediction model, latest stock data for S&P500 firms is required. Code in this 
notebook is used to scrape wikipedia to extract list of S&P500 firms using the Beautiful Soup library. Then stock 
data for each ticker will be scraped using a AlphaVantage API, which is a free service for getting stock related 
data. Alpha Vantage requires a key which is given upon creation of a free account. This key can service 5 aPI 
request/min (one of the reasons why this module works slowly as I have introduced a delay of 1 min/5 reqs). I used 
a simple selenium and javascript script to automate the process of creating keys and generated 10 free keys. These 
keys are used in the following modules. This module can be used to pull 'fresh' stock data from the market 
everytime during portfolio creation. Data is stored in csv format for individually for each stock, serving as 
input for data pre processing module.

Cell 1 contains imports which will be required to run this module.

In [1]:
#Cell 1 - Imports 

import bs4 as bs
import os
import pandas as pd
import requests
from alpha_vantage.timeseries import TimeSeries
import os
import time as tm
import random

Code in cell 2 will be used to extract S&P500 list and their tickers from wikipedia. 
This can be achieved with the help of Beautiful Soup library.

In [2]:
#Cell 2 - Tickers

#Pull the data from wiki link of S&P500 firms.
resp = requests.get('http://en.wikipedia.org/wiki/List_of_S%26P_500_companies')
soup = bs.BeautifulSoup(resp.text, 'lxml')
table = soup.find('table', {'class': 'wikitable sortable'})
tickers = []
companies = []
sectors = []
sub_industries = []

#append required columns and save the data in csv format.
for row in table.findAll('tr')[1:]:
    ticker = row.findAll('td')[0].text
    tickers.append(ticker.rstrip())
    company = row.findAll('td')[1].text
    companies.append(company.rstrip())
    sector = row.findAll('td')[3].text
    sectors.append(sector.rstrip())
    sub_industry = row.findAll('td')[4].text
    sub_industries.append(sub_industry.rstrip())

SP500List = pd.DataFrame(
{'ticker' : tickers,
 'company': companies,
 'GICS_sector': sub_industries,
 'GICS_sub_industry': sectors
})
SP500List.to_csv('/Users/gauravthapliyal/Desktop/Project Data/SP500_list.csv')
print(SP500List)

    ticker                 company                         GICS_sector  \
0      MMM              3M Company            Industrial Conglomerates   
1      ABT     Abbott Laboratories               Health Care Equipment   
2     ABBV             AbbVie Inc.                     Pharmaceuticals   
3     ABMD             ABIOMED Inc               Health Care Equipment   
4      ACN           Accenture plc      IT Consulting & Other Services   
..     ...                     ...                                 ...   
500    YUM         Yum! Brands Inc                         Restaurants   
501   ZBRA      Zebra Technologies  Electronic Equipment & Instruments   
502    ZBH  Zimmer Biomet Holdings               Health Care Equipment   
503   ZION           Zions Bancorp                      Regional Banks   
504    ZTS                  Zoetis                     Pharmaceuticals   

          GICS_sub_industry  
0               Industrials  
1               Health Care  
2               Healt

Code in this cell will be used to stock data from 2000 to 2020 (20 years) using tickers extrcted from S&P500 list.
Alpha Vantage API is used for this. A explicit delay of 1 min is kept as only 5 API req/min are supported by Alpha
Vantage key. Keys are stored in 'Alpha Vantage Keys.txt' file which is mandatory for extracting data from the API.
reset_key function resets key as soon as 5 requests are performed on a key and selects a new key from the pool of 
10 available keys. Track of used keys is kept in a keys_list queue. Once the queue is full, its reset after a wait 
of 1 min (due to API restritcion). Each stock is stored as individual csv

In [3]:
#Cell 3 - Stock Data per Ticker

#function definition for reset_key which resets the key value.
def reset_key(i):
    if i > 4:
        lines = open('/Users/gauravthapliyal/Desktop/Project Data/Alpha Vantage Keys.txt').read().splitlines()
        while True:
            if len(keys_list) > 4:
                print("Sleep Time of 1 min")
                tm.sleep(60)
                keys_list.clear()
                i = 0
            API_KEY = random.choice(lines)
            if(API_KEY not in keys_list):
                keys_list.append(API_KEY)
                print("API Key Changed. Current Key: "+API_KEY)
                i = 0
                break

#extract key from txt file. Access API per ticker, extract and store data in csv.
lines = open('/Users/gauravthapliyal/Desktop/Project Data/Alpha Vantage Keys.txt').read().splitlines()
API_KEY = random.choice(lines)
print(API_KEY)
tickers = list(SP500List.ticker)
print(tickers)
i = 0
keys_list = []
keys_list.append(API_KEY)

path = '/Users/gauravthapliyal/Desktop/Project Data/ticker_stock_data'
for file in os.listdir(path) :
    os.remove(path+'/'+file)
for item in tickers:
    try:
        time = TimeSeries(key = API_KEY, output_format = 'pandas')
        i += 1
        reset_key(i)
        data_adj = time.get_daily_adjusted(symbol = item, outputsize = 'full')
        i += 1
        reset_key(i)
        df = pd.DataFrame(data_adj[0])

        data_dir = "/Users/gauravthapliyal/Desktop/Project Data/ticker_stock_data/"+item+".csv"
        df.to_csv(data_dir)
        
        #Reloads data
        if os.path.getsize(data_dir) < 250000 :
            print(item+' file size '+str(os.path.getsize(data_dir))+' bytes : reloading data...')
            data_adj = time.get_daily_adjusted(symbol = item, outputsize = 'full')
            reset_key(i)
            i += 1
            df = pd.DataFrame(data_adj[0])
            df.to_csv(data_dir)
    except ValueError:
        print("Unable to get data for ticker: "+ item)
        continue

5G2IU7O4KWJBA4T4
['MMM', 'ABT', 'ABBV', 'ABMD', 'ACN', 'ATVI', 'ADBE', 'AMD', 'AAP', 'AES', 'AFL', 'A', 'APD', 'AKAM', 'ALK', 'ALB', 'ARE', 'ALXN', 'ALGN', 'ALLE', 'AGN', 'ADS', 'LNT', 'ALL', 'GOOGL', 'GOOG', 'MO', 'AMZN', 'AMCR', 'AEE', 'AAL', 'AEP', 'AXP', 'AIG', 'AMT', 'AWK', 'AMP', 'ABC', 'AME', 'AMGN', 'APH', 'ADI', 'ANSS', 'ANTM', 'AON', 'AOS', 'APA', 'AIV', 'AAPL', 'AMAT', 'APTV', 'ADM', 'ANET', 'AJG', 'AIZ', 'T', 'ATO', 'ADSK', 'ADP', 'AZO', 'AVB', 'AVY', 'BKR', 'BLL', 'BAC', 'BK', 'BAX', 'BDX', 'BRK.B', 'BBY', 'BIIB', 'BLK', 'BA', 'BKNG', 'BWA', 'BXP', 'BSX', 'BMY', 'AVGO', 'BR', 'BF.B', 'CHRW', 'COG', 'CDNS', 'CPB', 'COF', 'CPRI', 'CAH', 'KMX', 'CCL', 'CARR', 'CAT', 'CBOE', 'CBRE', 'CDW', 'CE', 'CNC', 'CNP', 'CTL', 'CERN', 'CF', 'SCHW', 'CHTR', 'CVX', 'CMG', 'CB', 'CHD', 'CI', 'CINF', 'CTAS', 'CSCO', 'C', 'CFG', 'CTXS', 'CLX', 'CME', 'CMS', 'KO', 'CTSH', 'CL', 'CMCSA', 'CMA', 'CAG', 'CXO', 'COP', 'ED', 'STZ', 'COO', 'CPRT', 'GLW', 'CTVA', 'COST', 'COTY', 'CCI', 'CSX', 'CMI', 

Sleep Time of 1 min
API Key Changed. Current Key: OW3QQB8OOTAFFSN5
API Key Changed. Current Key: 3H2MHRQEN5H8VL63
API Key Changed. Current Key: 714R4KO6HVWV3QJI
API Key Changed. Current Key: EU981683CQ3RMO82
API Key Changed. Current Key: O6A4SAOHY74UMKC6
Sleep Time of 1 min
API Key Changed. Current Key: 5G2IU7O4KWJBA4T4
API Key Changed. Current Key: FJ7UGH182N6LDVSC
API Key Changed. Current Key: O6A4SAOHY74UMKC6
API Key Changed. Current Key: KTB0QU02FXW7T76E
API Key Changed. Current Key: 714R4KO6HVWV3QJI
Sleep Time of 1 min
API Key Changed. Current Key: O6A4SAOHY74UMKC6
API Key Changed. Current Key: 3H2MHRQEN5H8VL63
API Key Changed. Current Key: 5G2IU7O4KWJBA4T4
API Key Changed. Current Key: DMF2DOGRKQ46ROD2
API Key Changed. Current Key: KTB0QU02FXW7T76E
Sleep Time of 1 min
API Key Changed. Current Key: KTB0QU02FXW7T76E
API Key Changed. Current Key: EU981683CQ3RMO82
API Key Changed. Current Key: OW3QQB8OOTAFFSN5
API Key Changed. Current Key: 5G2IU7O4KWJBA4T4
API Key Changed. Current Ke

API Key Changed. Current Key: 714R4KO6HVWV3QJI
API Key Changed. Current Key: FJ7UGH182N6LDVSC
API Key Changed. Current Key: DMF2DOGRKQ46ROD2
API Key Changed. Current Key: EU981683CQ3RMO82
Sleep Time of 1 min
API Key Changed. Current Key: OW3QQB8OOTAFFSN5
API Key Changed. Current Key: 5G2IU7O4KWJBA4T4
API Key Changed. Current Key: O6A4SAOHY74UMKC6
API Key Changed. Current Key: 3H2MHRQEN5H8VL63
API Key Changed. Current Key: KTB0QU02FXW7T76E
Sleep Time of 1 min
API Key Changed. Current Key: KTB0QU02FXW7T76E
API Key Changed. Current Key: O6A4SAOHY74UMKC6
API Key Changed. Current Key: EU981683CQ3RMO82
API Key Changed. Current Key: 0WQ1JLQHV1L5GVH0
CFG file size 85324 bytes : reloading data...
API Key Changed. Current Key: DMF2DOGRKQ46ROD2
Sleep Time of 1 min
API Key Changed. Current Key: 5G2IU7O4KWJBA4T4
API Key Changed. Current Key: 3H2MHRQEN5H8VL63
API Key Changed. Current Key: 714R4KO6HVWV3QJI
API Key Changed. Current Key: KTB0QU02FXW7T76E
API Key Changed. Current Key: 0WQ1JLQHV1L5GVH0
S

API Key Changed. Current Key: EU981683CQ3RMO82
Sleep Time of 1 min
API Key Changed. Current Key: 3H2MHRQEN5H8VL63
API Key Changed. Current Key: OW3QQB8OOTAFFSN5
API Key Changed. Current Key: FJ7UGH182N6LDVSC
API Key Changed. Current Key: EU981683CQ3RMO82
API Key Changed. Current Key: DMF2DOGRKQ46ROD2
Sleep Time of 1 min
API Key Changed. Current Key: KTB0QU02FXW7T76E
API Key Changed. Current Key: DMF2DOGRKQ46ROD2
API Key Changed. Current Key: 714R4KO6HVWV3QJI
EVRG file size 28878 bytes : reloading data...
API Key Changed. Current Key: 0WQ1JLQHV1L5GVH0
API Key Changed. Current Key: EU981683CQ3RMO82
Sleep Time of 1 min
API Key Changed. Current Key: OW3QQB8OOTAFFSN5
API Key Changed. Current Key: 714R4KO6HVWV3QJI
API Key Changed. Current Key: 3H2MHRQEN5H8VL63
API Key Changed. Current Key: 5G2IU7O4KWJBA4T4
API Key Changed. Current Key: 0WQ1JLQHV1L5GVH0
Sleep Time of 1 min
API Key Changed. Current Key: KTB0QU02FXW7T76E
API Key Changed. Current Key: 0WQ1JLQHV1L5GVH0
EXPE file size 230386 bytes

API Key Changed. Current Key: 5G2IU7O4KWJBA4T4
API Key Changed. Current Key: DMF2DOGRKQ46ROD2
API Key Changed. Current Key: KTB0QU02FXW7T76E
API Key Changed. Current Key: FJ7UGH182N6LDVSC
API Key Changed. Current Key: 3H2MHRQEN5H8VL63
Sleep Time of 1 min
API Key Changed. Current Key: OW3QQB8OOTAFFSN5
API Key Changed. Current Key: O6A4SAOHY74UMKC6
API Key Changed. Current Key: 0WQ1JLQHV1L5GVH0
API Key Changed. Current Key: EU981683CQ3RMO82
API Key Changed. Current Key: FJ7UGH182N6LDVSC
Sleep Time of 1 min
API Key Changed. Current Key: OW3QQB8OOTAFFSN5
API Key Changed. Current Key: 0WQ1JLQHV1L5GVH0
API Key Changed. Current Key: 5G2IU7O4KWJBA4T4
HII file size 143872 bytes : reloading data...
API Key Changed. Current Key: KTB0QU02FXW7T76E
API Key Changed. Current Key: 714R4KO6HVWV3QJI
Sleep Time of 1 min
API Key Changed. Current Key: 0WQ1JLQHV1L5GVH0
API Key Changed. Current Key: O6A4SAOHY74UMKC6
API Key Changed. Current Key: FJ7UGH182N6LDVSC
API Key Changed. Current Key: EU981683CQ3RMO82


API Key Changed. Current Key: DMF2DOGRKQ46ROD2
API Key Changed. Current Key: 0WQ1JLQHV1L5GVH0
API Key Changed. Current Key: 3H2MHRQEN5H8VL63
MKTX file size 234697 bytes : reloading data...
Sleep Time of 1 min
API Key Changed. Current Key: O6A4SAOHY74UMKC6
API Key Changed. Current Key: DMF2DOGRKQ46ROD2
API Key Changed. Current Key: EU981683CQ3RMO82
API Key Changed. Current Key: 5G2IU7O4KWJBA4T4
API Key Changed. Current Key: FJ7UGH182N6LDVSC
Sleep Time of 1 min
API Key Changed. Current Key: 3H2MHRQEN5H8VL63
API Key Changed. Current Key: 5G2IU7O4KWJBA4T4
API Key Changed. Current Key: KTB0QU02FXW7T76E
API Key Changed. Current Key: DMF2DOGRKQ46ROD2
API Key Changed. Current Key: OW3QQB8OOTAFFSN5
Sleep Time of 1 min
API Key Changed. Current Key: OW3QQB8OOTAFFSN5
MA file size 222281 bytes : reloading data...
API Key Changed. Current Key: O6A4SAOHY74UMKC6
API Key Changed. Current Key: EU981683CQ3RMO82
API Key Changed. Current Key: 714R4KO6HVWV3QJI
API Key Changed. Current Key: 3H2MHRQEN5H8VL63


API Key Changed. Current Key: 3H2MHRQEN5H8VL63
API Key Changed. Current Key: 5G2IU7O4KWJBA4T4
Sleep Time of 1 min
API Key Changed. Current Key: O6A4SAOHY74UMKC6
API Key Changed. Current Key: FJ7UGH182N6LDVSC
API Key Changed. Current Key: OW3QQB8OOTAFFSN5
PM file size 186363 bytes : reloading data...
API Key Changed. Current Key: 5G2IU7O4KWJBA4T4
API Key Changed. Current Key: KTB0QU02FXW7T76E
Sleep Time of 1 min
API Key Changed. Current Key: 714R4KO6HVWV3QJI
PSX file size 123801 bytes : reloading data...
API Key Changed. Current Key: KTB0QU02FXW7T76E
API Key Changed. Current Key: O6A4SAOHY74UMKC6
API Key Changed. Current Key: DMF2DOGRKQ46ROD2
API Key Changed. Current Key: 0WQ1JLQHV1L5GVH0
Sleep Time of 1 min
API Key Changed. Current Key: 0WQ1JLQHV1L5GVH0
API Key Changed. Current Key: EU981683CQ3RMO82
API Key Changed. Current Key: 714R4KO6HVWV3QJI
API Key Changed. Current Key: DMF2DOGRKQ46ROD2
API Key Changed. Current Key: 3H2MHRQEN5H8VL63
Sleep Time of 1 min
API Key Changed. Current Key

API Key Changed. Current Key: FJ7UGH182N6LDVSC
API Key Changed. Current Key: OW3QQB8OOTAFFSN5
API Key Changed. Current Key: 714R4KO6HVWV3QJI
Sleep Time of 1 min
API Key Changed. Current Key: EU981683CQ3RMO82
API Key Changed. Current Key: FJ7UGH182N6LDVSC
API Key Changed. Current Key: OW3QQB8OOTAFFSN5
API Key Changed. Current Key: O6A4SAOHY74UMKC6
TDG file size 221131 bytes : reloading data...
API Key Changed. Current Key: DMF2DOGRKQ46ROD2
Sleep Time of 1 min
API Key Changed. Current Key: 5G2IU7O4KWJBA4T4
API Key Changed. Current Key: OW3QQB8OOTAFFSN5
API Key Changed. Current Key: 3H2MHRQEN5H8VL63
API Key Changed. Current Key: FJ7UGH182N6LDVSC
API Key Changed. Current Key: O6A4SAOHY74UMKC6
Sleep Time of 1 min
API Key Changed. Current Key: 0WQ1JLQHV1L5GVH0
TWTR file size 96648 bytes : reloading data...
API Key Changed. Current Key: OW3QQB8OOTAFFSN5
API Key Changed. Current Key: FJ7UGH182N6LDVSC
API Key Changed. Current Key: EU981683CQ3RMO82
API Key Changed. Current Key: 5G2IU7O4KWJBA4T4
