# **TimeSeries DataCreation**

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

**In this notebook I have explained how to create Datasetobject for TimeSeries.
For training you can refer my notebook** https://www.kaggle.com/aayushjain080/sunspots-prediction-in-time-series-with-keras-lstm **In this notebook I have trained the model with the help of SimpleDNN and By using LSTM & CONV layers.**

In [None]:
import tensorflow as tf
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

In [None]:
df=pd.read_csv('/kaggle/input/sunspots/Sunspots.csv')

In [None]:
def plot_series(time, series, format="-", start=0, end=None):
    plt.plot(time[start:end], series[start:end], format)
    plt.xlabel("Time")
    plt.ylabel("Value")
    plt.grid(True)

In [None]:
time=df['Unnamed: 0'].values
series=df['Monthly Mean Total Sunspot Number'].values

In [None]:
plt.figure(figsize=(12,6))
plot_series(time,series)

# **Creating tensorflow dataset object**

* To create dataset for time series prediction we will be using **tensorflow dataset object.**

* **With the help of one example here I have demonstrated how to create datasetobject** .

* Then Following same steps will create dataset object for our time series sunspot data. 

In [None]:
dataset = tf.data.Dataset.range(10) # Creates a Dataset of a step-separated range of values.
print('Dataset element specification:', dataset.element_spec) # Each dataset element is a scalar tensor 

In [None]:
elements=list(dataset.as_numpy_iterator())# Dataset object consist of 10 elements printing all of this in form of list
print(elements)

**How to use Window**

* Combines (nests of) input elements into a dataset of (nests of) windows.

**window(size, shift= Defaults to size, stride=1, drop_remainder=False)**

* A "window" is a finite dataset of flat elements of size **"size"**

* The stride argument determines the stride of the input elements, and the shift argument determines the shift of the window.

In [None]:
# Example
dataset = tf.data.Dataset.range(7).window(2)
for window in dataset:
  print(list(window.as_numpy_iterator()))
print()  



dataset = tf.data.Dataset.range(7).window(3, 2, 1, True)
for window in dataset:
  print(list(window.as_numpy_iterator()))
print()


dataset = tf.data.Dataset.range(7).window(3, 1, 2, True)
for window in dataset:
  print(list(window.as_numpy_iterator()))

In [None]:
dataset = tf.data.Dataset.range(10)
dataset = dataset.window(5, shift=1, drop_remainder=True)
for window_dataset in dataset:
  for val in window_dataset:
    print(val.numpy(), end=" ")
  print()

In [None]:
dataset = tf.data.Dataset.range(10)
dataset = dataset.window(5, shift=1, drop_remainder=True)# dataset consist of 6 elements in which each element is compose of 5 scalar tensors  of shape=()
for window in dataset:
  print(list(window.as_numpy_iterator()))

**flat_map_func**- **across this dataset and flattens the result.**

* Use flat_map if you want to make sure that the order of your dataset stays the same. For example, to flatten a dataset of batches into a dataset of their elements:

In [None]:
dataset = dataset.flat_map(lambda window: window.batch(5))# dataset consist of 6 elements in which each element is compose of 1d tensors of shape=(5,)
for window in dataset:
  print(window.numpy())

**map_func**

* This transformation applies map_func to each element of this dataset, and returns a new dataset containing the transformed elements, in the same order as they appeared in the input. **map_func** can be used to change both the values and the structure of a dataset's elements. For example, adding 1 to each element, or projecting a subset of element components.

In [None]:
dataset = tf.data.Dataset.range(10)
dataset = dataset.window(5, shift=1, drop_remainder=True)
dataset = dataset.flat_map(lambda window: window.batch(5))
dataset = dataset.map(lambda window: (window[:-1], window[-1:]))
dataset = dataset.shuffle(buffer_size=10)     # Randomly shuffles the elements of this dataset.
for x,y in dataset:
  print(x.numpy(), y.numpy())


**Batch**
* Combines consecutive elements of this dataset into batches.

**drop_remainder** 
* A tf.bool scalar tf.Tensor, representing whether the last batch should be dropped in the case it has fewer than batch_size elements; the default behavior is not to drop the smaller batch.

In [None]:
dataset=tf.data.Dataset.range(10) 
dataset=dataset.batch(3,drop_remainder=True)         
for i in dataset:
  print(i) 

**prefetch**
* Creates a Dataset that prefetches elements from this dataset.

* Most dataset input pipelines should end with a call to prefetch. This allows later elements to be prepared while the current element is being processed. This often improves latency and throughput, at the cost of using additional memory to store prefetched elements.

**Note:** 
* Like other Dataset methods, prefetch operates on the elements of the input dataset. It has no concept of examples vs. batches. examples.prefetch(2) will prefetch two elements (2 examples), while examples.batch(20).prefetch(2) will prefetch 2 elements (2 batches, of 20 examples each).

In [None]:
dataset = tf.data.Dataset.range(10)
dataset = dataset.window(5, shift=1, drop_remainder=True)
dataset = dataset.flat_map(lambda window: window.batch(5))
dataset = dataset.map(lambda window: (window[:-1], window[-1:]))
dataset = dataset.shuffle(buffer_size=10)
dataset = dataset.batch(2).prefetch(1)
for x,y in dataset:
  print("x = ", x.numpy())
  print(x.numpy().shape)
  print("y = ", y.numpy())
  print(y.numpy().shape)

In [None]:
split_time = 3000
time_train = time[:split_time]
x_train = series[:split_time]
time_valid = time[split_time:]
x_valid = series[split_time:]

window_size = 30
batch_size = 32
shuffle_buffer_size = 1000

In [None]:
def windowed_dataset(series, window_size, batch_size, shuffle_buffer): # Following the above steps created the tensorflow dataset object.
  dataset = tf.data.Dataset.from_tensor_slices(series)
  dataset = dataset.window(window_size + 1, shift=1, drop_remainder=True)
  dataset = dataset.flat_map(lambda window: window.batch(window_size + 1))
  dataset = dataset.shuffle(shuffle_buffer).map(lambda window: (window[:-1], window[-1]))
  dataset = dataset.batch(batch_size).prefetch(1)
  return dataset

In [None]:
dataset = windowed_dataset(x_train, window_size, batch_size, shuffle_buffer_size)

**The output shape of one element in dataset object is (32, 30) and (32,)**

In [None]:
for x,y in dataset:                              
  #print(x.numpy(), y.numpy())
  print(x.numpy().shape)    # (32, 30)
  print(y.numpy().shape)    # (32,)
  break

# **Dataset Creation when using Convolution.**

In [None]:
def windowed_dataset(series, window_size, batch_size, shuffle_buffer):
    series = tf.expand_dims(series, axis=-1) # Expanding dimension of series ie making it a 2D array
    ds = tf.data.Dataset.from_tensor_slices(series)
    ds = ds.window(window_size + 1, shift=1, drop_remainder=True)
    ds = ds.flat_map(lambda w: w.batch(window_size + 1))
    ds = ds.shuffle(shuffle_buffer)
    ds = ds.map(lambda w: (w[:-1], w[-1]))
    return ds.batch(batch_size).prefetch(1)

**Breaking above code into parts-**

In [None]:
time=df['Unnamed: 0'].values
series=df['Monthly Mean Total Sunspot Number'].values

In [None]:
series = tf.expand_dims(series, axis=-1) 
series.shape 

In [None]:
series

In [None]:
ds = tf.data.Dataset.from_tensor_slices(series)
for i in ds:
  print(i)
  break

In [None]:
ds = ds.window(window_size + 1, shift=1, drop_remainder=True)
for window in ds:
  print(list(window.as_numpy_iterator()))
  print(len(list(window.as_numpy_iterator())))
  break

In [None]:
ds = ds.flat_map(lambda w: w.batch(window_size + 1))
for window in ds:
  print(window.numpy())
  print(window.numpy().shape)
  break

In [None]:
ds = ds.map(lambda w: (w[:-1], w[-1]))
for x,y in ds:
  print(x.numpy(), y.numpy())
  break

In [None]:
dataset = ds.batch(batch_size).prefetch(1)

In [None]:
dataset

**The output shape of one element in dataset object is (32, 30, 1) and (32, 1)**

In [None]:
for x,y in dataset:
  print(x.numpy(), y.numpy())
  print(x.numpy().shape)    # (32, 30, 1)
  print(y.numpy().shape)    # (32, 1)
  break