# **Time Series Generator**

This script will make you able to generate a chosen number of time series of random stocks at random period of time.


In [2]:
import numpy as np
import numpy.random as rd
import pandas as pd

For Colab, it downloads the data using the following code. 

If you work locally be sure to have the folder *data/*. You can download it from the repo.

In [3]:
try:
  from google.colab import files
  !wget -q https://github.com/Amelrich/Capstone-Fall-2020/archive/master.zip
  !unzip -q master.zip
  !mv Capstone-Fall-2020-master/data/ data/
  !rm -rf master.zip Capstone-Fall-2020-master/
except:
  print("only in Colab")

## The generator

It generates random timeseries picked from random stocks at random times.

Currently you can access to the generated time series using the method `get_list_of_df` (see example below). Next improvements will allow to choose a wider range of structures like arrays etc.



In [4]:
class TS_generator:
  def __init__(self, nb_timeseries=2000, chunk_size=100):
    
    self.chunk_size = chunk_size
    self.nb_timeseries = nb_timeseries

    #Retrieve the stocks names
    self.symbols = pd.read_csv('https://raw.githubusercontent.com/Amelrich/Capstone-Fall-2020/master/sp500.csv', index_col=False)
    self.symbols = list(self.symbols['Symbol'].values)
    self.symbols = sorted(self.symbols)
    self.symbols = ['BF-B' if x=='BF.B' else x for x in self.symbols]
    self.symbols = ['BRK-B' if x=='BRK.B' else x for x in self.symbols]

    self.list_df = []

    #Build the random time series
    self.build_()

  def build_(self):

    TS_list = []
    indexes = [] #Starting date indexes
    total_len = 0

    for stock in self.symbols:
      TS = pd.read_csv('data/'+stock+'.csv')
      TS_list.append(TS)
      indexes += list(range(total_len, total_len + len(TS) - self.chunk_size))
      total_len += len(TS)

    TS = pd.concat(TS_list, ignore_index=True)
    del(TS_list)

    #Pick random starting dates
    random_starts = rd.choice(indexes, self.nb_timeseries)

    for start in random_starts:
      self.list_df.append( TS[start : start+self.chunk_size] )

    del(TS)


  def get_list_of_df(self):
    #Return a list of time series dataframes
    return self.list_df

  def get_array(self):
    #Return adjusted close array
    close_array = np.zeros((self.nb_timeseries, self.chunk_size))

    for i in range(self.nb_timeseries):
      close_array[i,:] = self.list_df[i]['Adj Close'].to_numpy()

    return close_array

# Example

In [7]:
import time
t = time.time()

gen = TS_generator(nb_timeseries=10000, chunk_size=100) #For the syntax
X = gen.get_list_of_df()

print("Time to generate 10000 random timeseries:", round(time.time()-t), 'sec')

Time to generate 10000 random timeseries: 5 sec


In [15]:
X[0].head()

Unnamed: 0,Date,Adj Close,Volume,Symbol
10855,2009-04-28,41.912151,1849000.0,AAP
10856,2009-04-29,41.063866,1593800.0,AAP
10857,2009-04-30,42.172432,2504700.0,AAP
10858,2009-05-01,40.900021,1513800.0,AAP
10859,2009-05-04,42.201344,1186600.0,AAP


In [17]:
#Array version
X = gen.get_array()
X.shape

(10000, 100)

In [18]:
X[0,:5]

array([41.91215134, 41.06386566, 42.17243195, 40.9000206 , 42.20134354])

In [19]:
#50 most recent prices
X[:,50:].shape

(10000, 50)