# Time Series with Trend and Seasonality Data Generator

This notebook presents a function for generating time series data that exhibits trend and seasonality. The following code block includes defines the function. The time series data for each product can be written to a *csv* file in a user-specified sub-folder.

In [1]:
def time_series_generator(products = 1,
                          periods = 12, 
                          seasons = 4, 
                          seasonal_likelihood = 0.25, 
                          trend_likelihood = 0.75, 
                          b_range = (5000, 20000), 
                          m_range = (-100, 100), 
                          noise_sd = 100,
                          save_to_files = True,
                          directory = 'data'):
    '''
    This function is able to generate time series data for a user-specified
    number of products that includes both trend and seasonality. The function
    returns a Pandas DataFrame object that includes all of the generated data.
    In addition, users may save the data for each product to a comma-separated
    file by specifying the may set the save_to_files argument to True. The 
    directory argument may be used to create a new directory for the data files.
     
    Arguments
    products: the number of products to generate time series for
    
    periods: the length of the time series to generate
    
    seasons: the number of seasons (the periods argument should be an integer
        multiple of the seasons)
    
    seasonal_likelihood: a floating point value between 0 and 1 that specifies
        the probability that a time series includes seasonality
    
    trend_likelihood: a floating point value between 0 and 1 that specifies
        the probability that a time series includes trend
        
    b_range: a tuple of two integers (low, high), where low specifies the minimum
        intercept value for the linear equation for trend and high specified the 
        maximum intercept value for the linear equation for trend (value = m*period + b).
        The b value for each time series is randomly generated between these values.
        
    m_range: a tuple of two integers (low, high), where low specifies the minimum
        slope value for the linear equation for trend and high specifies the 
        maximum slope value for the linear equation for trend (value = m*period + b).
        The m value for each time series is randomly generated between these values.
        
    noise_sd: the standard deviation for the random noise added to each period. It is
        assumed that the noise is normally distributed with a mean of zero.
        
    save_to_files: True or false to indicate whether or not the time series data should
        be saved to csv files (one for each product)
        
    directory: a string specifying the directory in which the data files will be
        stored when save_to_files is set to True
        
    Returns
    df: a dataframe that includes the time series data for all products
    
    Dependencies
    This function depends on the NumPy and Pandas packages
    
    Example:
    
    >>> data = time_series_generator(products = 4,
                                      periods = 6, 
                                      seasons = 2, 
                                      seasonal_likelihood = 0.75, 
                                      trend_likelihood = 0.75, 
                                      b_range = (5000, 20000), 
                                      m_range = (-100, 100), 
                                      noise_sd = 100,
                                      save_to_files = False,
                                      directory = '')
                                      
    >>> print(data) 
        Product  Period    Sales
    0         1       1   6002.0
    1         1       2   5307.0
    2         1       3   6472.0
    3         1       4   5408.0
    4         1       5   6679.0
    5         1       6   5628.0
    6         2       1  20023.0
    7         2       2  15773.0
    8         2       3  20043.0
    9         2       4  15921.0
    10        2       5  20040.0
    11        2       6  15945.0
    12        3       1  16315.0
    13        3       2  16333.0
    14        3       3  16441.0
    15        3       4  16452.0
    16        3       5  16386.0
    17        3       6  16239.0
    18        4       1  13977.0
    19        4       2  13961.0
    20        4       3  13866.0
    21        4       4  13977.0
    22        4       5  13916.0
    23        4       6  13859.0

    
    '''
    import numpy as np
    import pandas as pd
    
    if save_to_files:
        import os
        full_path = os.getcwd() + "\\" + directory
        if os.path.isdir(full_path):
            pass
        else:
            os.mkdir(directory + '/')
    
    df = None    
    for product in range(products):
        is_seasonal = False
        if(np.random.rand() <= seasonal_likelihood):
            is_seasonal = True
        is_trending = False
        if(np.random.rand() <= trend_likelihood):
            is_trending = True
        b = np.random.randint(low = b_range[0], high = b_range[1])
        m = np.random.randint(low = m_range[0], high = m_range[1])
        values = []
        if is_trending and is_seasonal:
            seasonal_indices = np.random.rand(seasons) * 0.20 + 0.90
            seasonal_indices = seasonal_indices/seasonal_indices.mean()
            for period in range(periods):
                season = period % (seasons)
                value = m*period + b
                value = seasonal_indices[season] * value
                value = value + np.random.normal(loc = 0, scale = noise_sd)
                values.append(np.floor(value))
        elif is_trending:
            for period in range(periods):
                value = m*period + b
                value = value + np.random.normal(loc = 0, scale = noise_sd)
                values.append(np.floor(value))
        elif is_seasonal:
            seasonal_indices = np.random.random(seasons) * 0.50 + 0.75
            seasonal_indices = seasonal_indices/seasonal_indices.mean()
            for period in range(periods):
                season = period % (seasons)
                value = b
                value = seasonal_indices[season] * value
                value = value + np.random.normal(loc = 0, scale = noise_sd)
                values.append(np.floor(value))
        else:
            for period in range(periods):
                value = b
                value = value + np.random.normal(loc = 0, scale = noise_sd)
                values.append(np.floor(value))

        my_dict = {'Product':[product + 1]*periods,
                   'Period': [(i + 1) for i in range(periods)],
                   'Value': values}

        if save_to_files:
            filename = directory + '/' + f'Product_{product + 1}.csv'
            pd.DataFrame.from_dict(my_dict).to_csv(filename, index = False)
        
        if df is None:
            df = pd.DataFrame.from_dict(my_dict)        
        else:
            df = df.append(pd.DataFrame.from_dict(my_dict), ignore_index = True)
        
    return df