# **Time Series Generator**

This script will make you able to generate a chosen number of time series of random stocks at random period of time.


In [1]:
import numpy as np
import numpy.random as rd
import pandas as pd

For Colab, it downloads the data using the following code. 

If you work locally be sure to have the folder *data/*. You can download it from the repo.

In [2]:
try:
  from google.colab import files
  !wget -q https://github.com/Amelrich/Capstone-Fall-2020/archive/master.zip
  !unzip -q master.zip
  !mv Capstone-Fall-2020-master/data/ data/
  !rm -rf master.zip Capstone-Fall-2020-master/
except:
  print("only in Colab")

## The generator

It generates random timeseries picked from random stocks at random times.

Currently you can access to the generated time series using the method `get_list_of_df` (see example below). Next improvements will allow to choose a wider range of structures like arrays etc.



In [4]:
class TS_generator:
    def __init__(self, nb_timeseries=2000, chunk_size=100):

        self.chunk_size = chunk_size
        self.nb_timeseries = nb_timeseries

        #Retrieve the stocks names
        self.symbols = pd.read_csv('https://raw.githubusercontent.com/Amelrich/Capstone-Fall-2020/master/sp500.csv', index_col=False)
        self.symbols = list(self.symbols['Symbol'].values)
        self.symbols = ['BF-B' if x=='BF.B' else x for x in self.symbols]
        self.symbols = ['BRK-B' if x=='BRK.B' else x for x in self.symbols]

        self.list_df = []

        #Build the random time series
        self.build_()

    def build_(self):    
        for _ in range(self.nb_timeseries):

          #Pick a random stock
            stock = self.symbols[rd.randint(len(self.symbols))]
            TS = pd.read_csv('data/'+stock+'.csv')

          #Pick a random starting point
            timemax = len(TS) - self.chunk_size
            start = rd.randint(timemax)
            stock_df = TS[start : start+self.chunk_size]

            self.list_df.append( stock_df )

    def get_list_of_df(self):
    #
        return self.list_df

    def get_array(self):
    #Return adjusted close array
        close_array = np.zeros((self.nb_timeseries, self.chunk_size))

        for i in range(self.nb_timeseries):
            close_array[i,:] = self.list_df[i]['Adj Close'].to_numpy()

            return close_array

# Example

In [21]:
import time
t = time.time()

gen = TS_generator(nb_timeseries=2000, chunk_size=100) #default values but just for the syntax
X = gen.get_list_of_df()

print("Time to generate 2000 random timeseries:", round(time.time()-t), 'sec')

Time to generate 2000 random timeseries: 15 sec


In [22]:
X[0].head()

Unnamed: 0,Date,Adj Close,Volume,Symbol
4512,2017-10-25,175.687943,642500.0,ROK
4513,2017-10-26,176.589523,511200.0,ROK
4514,2017-10-27,177.293869,411800.0,ROK
4515,2017-10-30,175.622223,502300.0,ROK
4516,2017-10-31,188.601349,3465200.0,ROK


In [23]:
#Array version
X = gen.get_array()
X.shape

(2000, 100)

In [27]:
X[0,:5]

array([175.6879425 , 176.58952332, 177.29386902, 175.6222229 ,
       188.60134888])

In [25]:
#50 most recent prices
X[:,50:].shape

(2000, 50)