# **Time Series Generator**

This script will make you able to generate a chosen number of time series of random stocks at random period of time.


In [1]:
import numpy as np
import numpy.random as rd
import pandas as pd

For Colab, it downloads the data using the following code. 

If you work locally be sure to have the folder *data/*. You can download it from the repo.

In [2]:
try:
  from google.colab import files
  !wget -q https://github.com/Amelrich/Capstone-Fall-2020/archive/master.zip
  !unzip -q master.zip
  !mv Capstone-Fall-2020-master/data/ data/
  !rm -rf master.zip Capstone-Fall-2020-master/
except:
  print("only in Colab")

## The generator

It generates random timeseries picked from random stocks at random times.

Currently you can access to the generated time series using the method `get_list_of_df` (see example below). Next improvements will allow to choose a wider range of structures like arrays etc.



In [3]:
class TS_generator:
  def __init__(self, nb_timeseries=2000, chunk_size=100):
    
    self.chunk_size = chunk_size
    self.nb_timeseries = nb_timeseries

    #Retrieve the stocks names
    self.symbols = pd.read_csv('https://raw.githubusercontent.com/Amelrich/Capstone-Fall-2020/master/sp500.csv', index_col=False)
    self.symbols = list(self.symbols['Symbol'].values)
    self.symbols = sorted(self.symbols)
    self.symbols = ['BF-B' if x=='BF.B' else x for x in self.symbols]
    self.symbols = ['BRK-B' if x=='BRK.B' else x for x in self.symbols]

    self.list_df = []

    #Build the random time series
    self.probabilities_()
    self.build_()

  def build_(self):
    #Pick a random stocks
    random_stocks = rd.choice(self.symbols, self.nb_timeseries, p=self.proba)

    for stock in random_stocks:
      TS = pd.read_csv('data/'+stock+'.csv')
      
      #Pick a random starting point
      timemax = len(TS) - self.chunk_size
      start = rd.randint(timemax)
      stock_df = TS[start : start+self.chunk_size]

      self.list_df.append( stock_df )

  def probabilities_(self):
    summary = np.load('data/summary.npy', allow_pickle='TRUE').item()

    self.proba = np.array([summary[stock] for stock in summary.keys()]) - self.chunk_size
    self.proba[self.proba < 0.0] = 0.0
    self.proba = self.proba / self.proba.sum()


  def get_list_of_df(self):
    #Return a list of time series dataframes
    return self.list_df

  def get_array(self):
    #Return adjusted close array
    close_array = np.zeros((self.nb_timeseries, self.chunk_size))

    for i in range(self.nb_timeseries):
      close_array[i,:] = self.list_df[i]['Adj Close'].to_numpy()

    return close_array

# Example

In [4]:
import time
t = time.time()

gen = TS_generator(nb_timeseries=2000, chunk_size=100) #default values but just for the syntax
X = gen.get_list_of_df()

print("Time to generate 2000 random timeseries:", round(time.time()-t), 'sec')

Time to generate 2000 random timeseries: 15 sec


In [5]:
X[0].head()

Unnamed: 0,Date,Adj Close,Volume,Symbol
2721,2010-09-15,24.881317,827600.0,FMC
2722,2010-09-16,24.986265,663200.0,FMC
2723,2010-09-17,25.248648,1282100.0,FMC
2724,2010-09-20,25.675949,1072900.0,FMC
2725,2010-09-21,25.649714,2229400.0,FMC


In [6]:
#Array version
X = gen.get_array()
X.shape

(2000, 100)

In [7]:
X[0,:5]

array([24.88131714, 24.98626518, 25.24864769, 25.6759491 , 25.64971352])

In [8]:
#50 most recent prices
X[:,50:].shape

(2000, 50)