First we need to obtain data. We obtain it from Finnhub.io. In this example below, we create a parquet file containing minute candlestick data from the last year from every stock whose ticker begins with an 'A'. (Ie AAPL, but not BAC). Minute candlestick data gives high, low, open, close, and volume information for a given minute and stock. For this notebook and dataWrangling.ipynb we use the A_Candlestick.parquet for our calculations. In order to create a slightly easier model, we use  SPYCleaned.csv which contains the same information as A_Candlestick.parquet, except it only regards one stock: SPY. This is easy enough to obtain from the code below, simply change the tickers variable in the second block of code to be tickers=["SPY"], and then run the resulting .parquet file through dataWrangling.ipynb.

In [1]:
import finnhub
import pyspark
from pyspark.sql import SparkSession

# Configure API key
configuration = finnhub.Configuration(
    api_key={
        'token':  # Create an account on Finnhub.io to place here. Token should be ~20 alphanumeric chars
    }
)

finnhub_client = finnhub.DefaultApi(finnhub.ApiClient(configuration))

spark = SparkSession.builder.master("local[1]").config("spark.driver.memory", "15g").getOrCreate()

In [2]:
stocks=finnhub_client.stock_symbols(exchange="US")
tickers=[]
for i in range(len(stocks)):
    tickers.append(stocks[i].symbol)#Obtain a list of stock symbols
import pandas as pd
master=pd.DataFrame()

In [3]:
import pyspark
from itertools import islice
idx=0
for ticker in islice(tickers, idx, None):
    print(ticker)
    start=1592870400
    end=1561507200
    while(start>end):
        try: #Try catch statement needed to overcome data retrieval cap from Finnhub.io
            candleJSON=finnhub_client.stock_candles(ticker, '1', end, int(start))
        except:
            idx=tickers.index(ticker)
            continue
        if not candleJSON.c:
            break
        df=pd.DataFrame({'Ticker':ticker,
                         'Open':candleJSON.o,
                         'High':candleJSON.h,
                         'Low':candleJSON.l,
                         'Close':candleJSON.c,
                         'Volume':candleJSON.v,
                         'Time':candleJSON.t})
        print(df['Time'][0])
        
        start=df['Time'][0]-60 #Change 60 if we move away from 1 min candles
        master=pd.concat([df,master]).reset_index(drop=True)
df=spark.createDataFrame(master)
df=df.repartition(1000)
df.write.parquet('AllCandlestick.parquet')#Convert master dataframe to .parquet, save it
display(master)

A
1585125240
1577370900
1569594840
1562689080
AA
1585123200
1577371680
1569595620
1562689080
AAAU
1585134480
1577374200
1569598560
1562691180
AACG
1585143240
1577379360
1571320080
AADR
1585143000
1577374200
1569601260
1562692380
AAL
1585310100
1577710800
1569934980
1562689080
AAMC
1585151700
1577391420
1569850200
1562702340
AAME
1585143000
1577374200
1569939840
1562772420
AAN
1585098900
1577388600
1569612540
1562693400
AAOI
1585094400
1577386620
1569610560
1562689500
AAON
1585094400
1577388600
1569612720
1562693400
AAP
1585094520
1577388600
1569612540
1562693400
AAPL
1585094400
1577369040
1569593520
1562689140
AAT
1585094520
1577388600
1569612540
1562693400
AAU
1585149720
1577384700
1569608640
1562693400
AAWW
1585094400
1577388600
1569612600
1562693400
AAXJ
1585140300
1577388600
1569612540
1562693400
AAXN
1585094400
1577388600
1569612540
1562693400
AB
1585095000
1577388600
1569612960
1562693400
ABB
1585094520
1577388540
1569612480
1562689440
ABBV
1585094460
1577383080
1569607020
156268

1562693400
AGT
1585253220
1577479860
1569864600
1562693400
AGTC
1585155660
1577383200
1569607140
1562698620
AGX
1585094520
1577388600
1569612540
1562693400
AGYS
1585094400
1577388600
1569612540
1562693400
AGZ
1585094880
1577388600
1569613680
1562693340
AGZD
1585157400
1577394180
1569622740
1562693400
AHC
1585157400
1577388600
1569613620
1562693400
AHCO
1585157400
1577388600
1573501500
AHH
1585094520
1577388600
1569612540
1562693400
AHH-A
AHL-C
AHL-D
AHL-E
AHPI
1585096020
1577388600
1569621300
1562693400
AHT
1585155900
1577383560
1569607500
1562693340
AHT-D
AHT-F
AHT-G
AHT-H
AHT-I
AI
1585152000
1577388600
1569612540
1562690400
AI-B
AI-C
AIA
1585152120
1577388600
1569612600
1562693400
AIC
1585157760
1577388600
1569864600
1562693940
AIEQ
1585095060
1577388600
1569612600
1562693400
AIF
1585157400
1577388600
1569612780
1562693400
AIG
1585094520
1577388480
1569612420
1562691780
AIG+
AIG-A
AIH
1585157400
1577388600
1572026460
AIHS
1585157400
1577383200
1569607560
1562694780
AIIQ
1585159620
15

1577388600
1569612540
1562695200
APEN
1585172520
1577396940
1569868740
1562693400
APEX
1585108320
1577386320
1569611820
1562693100
APG
1588181400
APH
1585094520
APHA
1585094400
1577383680
1569607620
1562691540
API
APLE
1585094460
1577383200
1569607140
1562693400
APLS
1585102440
1577388600
1569612600
1562693400
APLT
1585094400
1577388600
1569624540
1562693400
APM
1585157400
1577390820
1569617220
1562695200
APO
1585094940
1577388600
1569612540
1562693400
APO-A
APO-B
APOG
1585094400
1577388600
APOP
1585157400
1577388840
APOPW
1585254060
1577757060
1570037400
1562963280
APPF
1585094400
1577388600
1569612540
1562693400
APPN
1585094400
1577383200
1569607140
1562693400
APPS
1585094400
1577386080
APRE
1585094400
1577388600
1570126740
APRN
1585094640
1577388600
1569612900
1562693400
APT
1585094400
1577388600
1569613260
1562693400
APTO
1585152480
APTS
1585157400
1577388600
1569612540
1562693400
APTV
1585094520
1577388600
1569612540
1562691780
APTV-A
APTX
1585157400
1577388600
1569612720
15626934

1569619260
1569520320
AVUV
1585157400
1577394960
AVXL
1585094400
1577388600
1569612600
1562693400
AVY
1585157340
1577388600
1569612600
1562693400
AVYA
1585094520
1577388660
AWAY
1585157940
1581628200
AWF
1585149600
1577388600
1569612660
1562693400
AWH
1591979100
AWI
1585094520
1577388600
1569612540
1562693400
AWK
1585094520
1577388600
1569612540
1562693400
AWP
1585095120
1577388600
1569613320
1562693400
AWR
1585094520
1577388600
1569612540
1562693400
AWRE
1585157400
1577388600
1569613080
1562693400
AWTM
1585159020
1577394840
1569626760
1562782980
AWX
1585158240
1577389380
1569614040
1562693400
AX
1585094520
1577388600
1569612540
1562693400
AXAS
1585094460
1577385480
1569609420
1562693400
AXDX
1585094400
1577388600
AXGN
1585094400
1577388600
1569612540
1562693400
AXGT
1585157400
1577388600
1569612660
1562693400
AXL
1585095180
1577386680
1569610620
1562693400
AXLA
1585158780
1577388900
1569614820
1562693400
AXNX
1585094400
1577388600
1569612540
1562693400
AXO
1585157400
1577392980
156962

KeyboardInterrupt: 