# Query Sequence Analysis

This notebook focuses on sequence analysis, when presented with a workload schedule / sequence of queries. In an average day to day work activity, particular query patterns can be discerned. This pattern distinction allows us to discern which queries will be susceptible to execution over time, allowing us to know ahead of time which queries will be executed against the database.

### Module Installation and Importing Libraries

In [1]:
# scipy
import scipy as sc
print('scipy: %s' % sc.__version__)
# numpy
import numpy as np
print('numpy: %s' % np.__version__)
# matplotlib
import matplotlib.pyplot as plt
from statsmodels.graphics.gofplots import qqplot
# pandas
import pandas as pd
print('pandas: %s' % pd.__version__)
# scikit-learn
from sklearn.preprocessing import MinMaxScaler
from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
from sklearn.metrics import f1_score
import sklearn as sk
print('sklearn: %s' % sk.__version__)
# keras
import keras as ke
from keras.layers import Embedding, Flatten
from keras.utils import np_utils
print('keras: %s' % ke.__version__)
import math

scipy: 1.1.0
numpy: 1.15.2
pandas: 0.23.4
sklearn: 0.19.0


Using TensorFlow backend.


keras: 2.2.4


### Configuration Cell

Tweak parametric changes from this cell to influence outcome of experiment. 
NB: This experiment demonstrates at time  step = 1 (1 minute in advance). Further down in experiment, other timestep results are also featured and evaluated.

In [2]:
#
# Experiment Config
tpcds='TPCDS1' # Schema upon which to operate test
lag=3 # Time Series shift / Lag Step. Each lag value equates to 1 minute. Cannot be less than 1
if lag < 1:
    raise ValueError('Lag value must be greater than 1!')
#
test_split=.2 # Denotes which Data Split to operate under when it comes to training / validation
y_label = ['SQL_ID'] # Denotes which label to use for time series experiments
#
# Forest Config
parallel_degree = 1
n_estimators = 10
#
# Net Config
batch_size=10
epochs=10

### Read data from file into Pandas Dataframes

In [3]:
#
# Open Data
rep_hist_snapshot_path = 'C:/Users/gabriel.sammut/University/Data_ICS5200/Schedule/' + tpcds + '/v2/rep_hist_snapshot.csv'
#rep_hist_snapshot_path = 'D:/Projects/Datagenerated_ICS5200/Schedule/' + tpcds + '/v2/rep_hist_snapshot.csv'
#
rep_hist_snapshot_df = pd.read_csv(rep_hist_snapshot_path)
#
def prettify_header(headers):
    """
    Cleans header list from unwated character strings
    """
    header_list = []
    [header_list.append(header.replace("(","").replace(")","").replace("'","").replace(",","")) for header in headers]
    return header_list
#
rep_hist_snapshot_df.columns = prettify_header(rep_hist_snapshot_df.columns.values)
#
print(rep_hist_snapshot_df.columns.values)

['SNAP_ID' 'DBID' 'INSTANCE_NUMBER' 'SQL_ID' 'PLAN_HASH_VALUE'
 'OPTIMIZER_COST' 'OPTIMIZER_MODE' 'OPTIMIZER_ENV_HASH_VALUE'
 'SHARABLE_MEM' 'LOADED_VERSIONS' 'VERSION_COUNT' 'MODULE' 'ACTION'
 'SQL_PROFILE' 'FORCE_MATCHING_SIGNATURE' 'PARSING_SCHEMA_ID'
 'PARSING_SCHEMA_NAME' 'PARSING_USER_ID' 'FETCHES_TOTAL' 'FETCHES_DELTA'
 'END_OF_FETCH_COUNT_TOTAL' 'END_OF_FETCH_COUNT_DELTA' 'SORTS_TOTAL'
 'SORTS_DELTA' 'EXECUTIONS_TOTAL' 'EXECUTIONS_DELTA'
 'PX_SERVERS_EXECS_TOTAL' 'PX_SERVERS_EXECS_DELTA' 'LOADS_TOTAL'
 'LOADS_DELTA' 'INVALIDATIONS_TOTAL' 'INVALIDATIONS_DELTA'
 'PARSE_CALLS_TOTAL' 'PARSE_CALLS_DELTA' 'DISK_READS_TOTAL'
 'DISK_READS_DELTA' 'BUFFER_GETS_TOTAL' 'BUFFER_GETS_DELTA'
 'ROWS_PROCESSED_TOTAL' 'ROWS_PROCESSED_DELTA' 'CPU_TIME_TOTAL'
 'CPU_TIME_DELTA' 'ELAPSED_TIME_TOTAL' 'ELAPSED_TIME_DELTA' 'IOWAIT_TOTAL'
 'IOWAIT_DELTA' 'CLWAIT_TOTAL' 'CLWAIT_DELTA' 'APWAIT_TOTAL'
 'APWAIT_DELTA' 'CCWAIT_TOTAL' 'CCWAIT_DELTA' 'DIRECT_WRITES_TOTAL'
 'DIRECT_WRITES_DELTA' 'PLSEXEC_TIME_T

### Changing Matrix Shapes

Changes dataframe shape, in an attempt to drop all numeric data. Below's aggregated data is done so on:
* SNAP_ID
* INSTANCE_NUMBER
* DBID
* SQL_ID

In [4]:
print("Shape Before Aggregation: " + str(rep_hist_snapshot_df.shape))
#
# Group By Values by SNAP_ID , sum all metrics (for table REP_HIST_SNAPSHOT) and drop all numeric
rep_hist_snapshot_df = rep_hist_snapshot_df.groupby(['SNAP_ID','DBID','INSTANCE_NUMBER','SQL_ID']).sum()
rep_hist_snapshot_df.reset_index(inplace=True)
#
print("Shape After Aggregation: " + str(rep_hist_snapshot_df.shape))
df = rep_hist_snapshot_df

Shape Before Aggregation: (2230, 90)
Shape After Aggregation: (1923, 78)


### Data Ordering

Sorting of datasets in order of SNAP_ID.

In [5]:
df.sort_values(by=['SNAP_ID'], ascending=True, inplace=True)
print(df.shape)

(1923, 78)


### Rearranging Labels

Removes the label column, and adds it at the beginning of the matrix for later usage

In [6]:
print('Before Column Switch: ' + str(df.shape))
df = df[y_label]
print('After Column Switch: ' + str(df.shape))
print(df.head())

Before Column Switch: (1923, 78)
After Column Switch: (1923, 1)
           SQL_ID
0   03ggjrmy0wa1w
26  8mdz49zkajhw3
27  8t26unxsrxj72
28  93n8wp5a8xyxn
29  9ggx4p02d346a


### Label Encoding

Since this experiment deals with prediction of upcoming SQL_IDs, respectice SQL_ID strings need to labelled as a numeric representation. Label Encoder will be used here to convert SQL_ID's into a numeric format, which are in turn used for training. Evaluation (achieved predictions) is done so also in numeric format, at which point the label encoder is eventually used to decode back the labels into the original, respetive SQL_ID representation.

This section of the experiment additionally converts the targetted label into a binarized version of the previous achieved categorical numeric values.

* https://machinelearningmastery.com/why-one-hot-encode-data-in-machine-learning/
* https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html
* https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelBinarizer.html

In [7]:
### Label Encoding
#
# Since this experiment deals with prediction of upcoming SQL_IDs, respectice SQL_ID strings need to labelled as a numeric representation. Label Encoder will be used here to convert SQL_ID's into a numeric format, which are in turn used for training. Evaluation (achieved predictions) is done so also in numeric format, at which point the label encoder is eventually used to decode back the labels into the original, respetive SQL_ID representation.
print("Before label encoding: " + str(df.shape))
le = preprocessing.LabelEncoder()
le.fit(df[y_label])
df = le.transform(df[y_label])
print("After label encoding: " + str(df.shape) + "\n----------------------------------\n\nAvailable Classes:")
df = pd.DataFrame(df, columns=y_label)
print(len(le.classes_))
print(le.classes_)

Before label encoding: (1923, 1)
After label encoding: (1923,)
----------------------------------

Available Classes:
379
['01d5n1nm17r2h' '01tp87bk1t2zv' '03ggjrmy0wa1w' '04kug40zbu4dm'
 '06dymzb481vnd' '06g9mhm5ba7tt' '09vrdx888wvvb' '0a08ug2qc1j82'
 '0a7q9v9nd2qc1' '0aq14dznn91rg' '0f60bzgt9127c' '0ga8vk4nftz45'
 '0hdquu87pydzk' '0hhmdwwgxbw0r' '0jj0ct4x4gy27' '0kcbwucxmazcp'
 '0kkhhb2w93cx0' '0m78skf1mudnb' '0qbzfjt00pbsx' '0v3dvmc22qnam'
 '0vcz1xqfrykmw' '0w26sk6t6gq98' '0x6ks1umjn6k6' '0y080mnfaqk3u'
 '0ym9wzzys5zax' '130r442w3nfny' '13a9r2xkx1bxb' '13ys8ux8xvrbm'
 '14f5ngrj3cc5h' '14kx436hrv7cc' '193ncz0tf25hw' '1aajwypydy0wu'
 '1c3x1d7pc0cmt' '1fn8v91f0arf0' '1gfaj4z5hn1kf' '1hxfbnas8xr2j'
 '1jhyrdp21f2q6' '1k33bhcnpbrwx' '1kz16yhs993h2' '1ms8wj24sqny4'
 '1p5grz1gs7fjq' '1pgnzc6zf7ctc' '1pv23p59mjs0v' '1r7b985mxqj71'
 '1rpgk59t8pvs6' '1u97hwfu7dcmz' '1v2b661suttyp' '1wk0t84cwr5ps'
 '1wpns6pagm2qj' '1wq6da7n103qd' '1wz811srf8xh8' '2046z7kdh823h'
 '20bqsr6btd9x9' '20vv6ttajyjzq' 

  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)


### Time Series Shifting

Shifting the datasets N lag minutes, in order to transform the problem into a supervised dataset. Each Lag Shift equates to 60 seconds (due to the way design of the data capturing tool). For each denoted lag amount, the same number of feature vectors will be stripped away at the beginning.

Features and Labels are separated into seperate dataframes at this point.

https://machinelearningmastery.com/convert-time-series-supervised-learning-problem-python/

In [8]:
def series_to_supervised(data, n_in=1, n_out=1, dropnan=True):
    """
    Frame a time series as a supervised learning dataset.
    Arguments:
        data: Sequence of observations as a list or NumPy array.
        n_in: Number of lag observations as input (X).
        n_out: Number of observations as output (y).
        dropnan: Boolean whether or not to drop rows with NaN values.
    Returns:
        Pandas DataFrame of series framed for supervised learning.
    """
    n_vars = 1 if type(data) is list else data.shape[1]
    df = data
    cols, names = list(), list()
    # input sequence (t-n, ... t-1)
    if n_in != 0:
        for i in range(n_in, 0, -1):
            cols.append(df.shift(i))
            names += [('var%d(t-%d)' % (j+1, i)) for j in range(n_vars)]
    # forecast sequence (t, t+1, ... t+n)
    n_out += 1
    for i in range(0, n_out):
        cols.append(df.shift(-i))
        if i == 0:
            names += [('var%d(t)' % (j+1)) for j in range(n_vars)]
        else:
            names += [('var%d(t+%d)' % (j+1, i)) for j in range(n_vars)]
    # put it all together
    agg = pd.concat(cols, axis=1)
    agg.columns = names
    # drop rows with NaN values
    if dropnan:
        agg.dropna(inplace=True)
    return agg
#
def remove_n_time_steps(data, n=1):
    if n == 0:
        return data
    df = data
    headers = df.columns
    dropped_headers = []
    #     for header in headers:
    #         if "(t)" in header:
    #             dropped_headers.append(header)
    #
    for i in range(1,n+1):
        for header in headers:
            if "(t+"+str(i)+")" in header:
                dropped_headers.append(str(header))
    #
    return df.drop(dropped_headers, axis=1) 
#
# Frame as supervised learning set
shifted_df = series_to_supervised(df, lag, lag)
#
# Seperate labels from features
y_row = []
for i in range(lag+1,(lag*2)+2):
    y_df_column_names = shifted_df.columns[len(df.columns)*i:len(df.columns)*i + len(y_label)]
    y_row.append(y_df_column_names)
y_df_column_names = []   
for row in y_row:
    for val in row:
        y_df_column_names.append(val)
#
# y_df_column_names = shifted_df.columns[len(df.columns)*lag:len(df.columns)*lag + len(y_label)]
y_df = shifted_df[y_df_column_names]
X_df = shifted_df.drop(columns=y_df_column_names)
print('\n-------------\nFeatures')
print(X_df.columns)
print(X_df.shape)
print('\n-------------\nLabels')
print(y_df.columns)
print(y_df.shape)
#
# Delete middle timesteps
X_df = remove_n_time_steps(data=X_df, n=lag)
print('\n-------------\nFeatures After Time Shift')
print(X_df.columns)
print(X_df.shape)
# y_df = remove_n_time_steps(data=y_df, n=lag)
print('\n-------------\nLabels After Time Shift')
print(y_df.columns)
print(y_df.shape)


-------------
Features
Index(['var1(t-3)', 'var1(t-2)', 'var1(t-1)', 'var1(t)'], dtype='object')
(1917, 4)

-------------
Labels
Index(['var1(t+1)', 'var1(t+2)', 'var1(t+3)'], dtype='object')
(1917, 3)

-------------
Features After Time Shift
Index(['var1(t-3)', 'var1(t-2)', 'var1(t-1)', 'var1(t)'], dtype='object')
(1917, 4)

-------------
Labels After Time Shift
Index(['var1(t+1)', 'var1(t+2)', 'var1(t+3)'], dtype='object')
(1917, 3)


### One Hot Encoding

One hot encoding target labels for net application

In [9]:
#
# One Hot Encoding
print("Before One Hot Encoding: " + str(y_df.shape))
y_df = np_utils.to_categorical(y_df)
print("After One Hot Encoding: " + str(y_df.shape))

Before One Hot Encoding: (1917, 3)
After One Hot Encoding: (1917, 3, 379)


## LSTM Regression (Many to Many Approach)
### Designing the network

- The first step is to define your network.
- Neural networks are defined in Keras as a sequence of layers. The container for these layers is the **Sequential class**.
- The first step is to create an instance of the Sequential class. Then you can create your layers and add them in the order that they should be connected.
- The LSTM recurrent layer comprised of memory units is called LSTM().
- A fully connected layer that often follows LSTM layers and is used for outputting a prediction is called Dense().
- The first layer in the network must define the number of inputs to expect.
- Input must be three-dimensional, comprised of samples, timesteps, and features.
    - **Samples:** These are the rows in your data.
    - **Timesteps:** These are the past observations for a feature, such as lag variables.
    - **Features:** These are columns in your data.
- Assuming your data is loaded as a NumPy array, you can convert a 2D dataset to a 3D dataset using the reshape() function in NumPy.

### Relavent Links

Network structure pointers [https://www.heatonresearch.com/2017/06/01/hidden-layers.html]. Rough heuristics to start with:

* The number of hidden neurons should be between the size of the input layer and the size of the output layer.
* The number of hidden neurons should be 2/3 the size of the input layer, plus the size of the output layer.
* The number of hidden neurons should be less than twice the size of the input layer.

--------------------------------------------------------------------------------------------

* https://machinelearningmastery.com/models-sequence-prediction-recurrent-neural-networks/
* https://machinelearningmastery.com/reshape-input-data-long-short-term-memory-networks-keras/
* https://machinelearningmastery.com/5-step-life-cycle-long-short-term-memory-models-keras/
* https://machinelearningmastery.com/stacked-long-short-term-memory-networks/
* https://arxiv.org/pdf/1312.6026.pdf
* https://machinelearningmastery.com/backtest-machine-learning-models-time-series-forecasting/
* https://machinelearningmastery.com/use-word-embedding-layers-deep-learning-keras/

In [10]:
#
# KerasModel Class
class KerasModel:
    """
    Long Short Term Memory Neural Net Class
    """

    #
    def __init__(self, X_train, y_train, classification_classes, optimizer):
        """
        Initiating the class creates a net with the established parameters
        :param X_train - Training data used to train the model
        :param y_train - Test data used to test the model
        :param layers - A list of values, where in each value denotes a layer, and the number of neurons for that layer
        :param loss_function - Function used to measure fitting of model (predicted from actual)
        :param optimizer - Function used to optimize the model (eg: Gradient Descent)
        """
        self.model = ke.models.Sequential()
        #self.model.add(Embedding(classification_classes, 10, input_length=X_train.shape[1]))
        #self.model.add(Flatten())
#         self.model.add(ke.layers.LSTM(X_train.shape[2], input_shape=(X_train.shape[1], X_train.shape[2])))
#         self.model.add(ke.layers.Dense(X_train.shape[2], activation='softmax'))
        #self.model.add(ke.layers.LSTM(layers[i], input_shape=(X_train.shape[1], X_train.shape[2]), return_sequences=True))
        #self.model.add(ke.layers.LSTM(layers[i]))
        #
        #self.model.add(Flatten())
        #
        self.model.add(ke.layers.LSTM(256, input_shape=(X_train[1], X_train.shape[2])))
        self.model.add(ke.layers.Dropout(0.2))
        self.model.add(ke.layers.Dense(y_train.shape[1], activation='softmax'))
        self.loss_func = 'sparse_categorical_crossentropy'
        self.model.compile(loss=self.loss_func, optimizer=optimizer, metrics=['accuracy'])
        print(self.model.summary())

    #
    def fit_model(self, X_train, X_test, y_train, y_test, epochs=50, batch_size=50, verbose=2, shuffle=False,
                  plot=False):
        """
        Fit data to model & validate. Trains a number of epochs.
        """
        history = self.model.fit(x=X_train,
                                 y=y_train,
                                 epochs=epochs,
                                 batch_size=batch_size,
                                 validation_data=(X_test, y_test),
                                 verbose=verbose,
                                 shuffle=shuffle)
        if plot:
            plt.rcParams['figure.figsize'] = [20, 15]
            plt.plot(history.history['acc'], label='train')
            plt.plot(history.history['val_acc'], label='validation')
            plt.ylabel('loss')
            plt.xlabel('epoch')
            plt.legend(['train', 'validation'], loc='upper left')
            plt.show()

    #
    def predict(self, X):
        yhat = self.model.predict(X)
        return yhat

    #
    def predict_and_evaluate(self, X, y, y_labels, plot=False):
        yhat = self.predict(X)
        #
        # F1-Score Evaluation
        for i in range(y.shape[1]):
            print(str(len(y[:, i])))
            print(str(len(yhat[:, i])))
            print(y[:, i])
            print(yhat[:, i])
            f1 = f1_score(y[:, i], 
                          yhat[:, i],
                          average='micro')  # Calculate metrics globally by counting the total true positives, false negatives and false positives.
            print('Test FScore ' + y_labels[0] + ' with LAG value [' + str(i) + ']: ' +  str(f1))
        #
        if plot:
            for i in range(0, y[0]):
                plt.rcParams['figure.figsize'] = [20, 15]
                plt.plot(y[:, i], label='actual')
                plt.plot(yhat[:, i], label='predicted')
                plt.legend(['actual', 'predicted'], loc='upper left')
                plt.title(y_labels[i%len(y_labels)] + " +" + str(math.ceil((i+1)/len(y_label))))
                plt.show()
    #
    @staticmethod
    def write_results_to_disk(path, iteration, lag, test_split, batch, depth, epoch, score, time_train):
        file_exists = os.path.isfile(path)
        with open(path, 'a') as csvfile:
            headers = ['iteration', 'lag', 'test_split', 'batch', 'depth', 'epoch', 'score', 'time_train']
            writer = csv.DictWriter(csvfile, delimiter=',', lineterminator='\n', fieldnames=headers)
            if not file_exists:
                writer.writeheader()  # file doesn't exist yet, write a header
            writer.writerow({'iteration': iteration,
                             'lag': lag,
                             'test_split': test_split,
                             'batch': batch,
                             'depth':depth,
                             'epoch':epoch,
                             'score': score,
                             'time_train': time_train})

In [11]:
X_train, X_validate, y_train, y_validate = train_test_split(X_df, y_df, test_size=test_split)
#
# X_train = X_train.values
# y_train = y_train.values
print("X_train shape [" + str(X_train.shape) + "] Type - [" + str(type(X_train)) + "]")
print("y_train shape [" + str(y_train.shape) + "] Type - [" + str(type(y_train)) + "]")
#
X_validate, X_test, y_validate, y_test = train_test_split(X_validate, y_validate, test_size=.5)
#
# X_validate = X_validate.values
# y_validate = y_validate.values
print("X_validate shape [" + str(X_validate.shape) + "] Type - [" + str(type(X_validate)) + "]")
print("y_validate shape [" + str(y_validate.shape) + "] Type - [" + str(type(y_validate)) + "]")
#
# X_test = X_test.values
# y_test = y_test.values
print("X_test shape [" + str(X_test.shape) + "] Type - [" + str(type(X_test)) + "]")
print("y_test shape [" + str(y_test.shape) + "] Type - [" + str(type(y_test)) + "]")
print("\n")
#
print(X_train[0:5])
print(y_train[0:5])
print('------------------------------------------------------------')
print(X_validate[0:5])
print(y_validate[0:5])
print('------------------------------------------------------------')
print(X_test[0:5])
print(y_test[0:5])
#
# Reshape for fitting in LSTM
# X_train = X_train.reshape((X_train.shape[0], 1, X_train.shape[1]))
# X_validate = X_validate.reshape((X_validate.shape[0], 1, X_validate.shape[1]))
# X_test = X_test.reshape((X_test.shape[0], 1, X_test.shape[1]))
print(X_train.shape)
#
X_train = np.reshape(X_train,(X_train.shape[0], 1, X_train.shape[1]))
X_validate = np.reshape(X_validate,(X_validate.shape[0], 1, X_validate.shape[1]))
X_test = np.reshape(X_test,(X_test.shape[0], 1, X_test.shape[1]))
print('\nReshaping Training Frames')
print("X_train shape [" + str(X_train.shape) + "] Type - [" + str(type(X_train)) + "]")
print("X_validate shape [" + str(X_validate.shape) + "] Type - [" + str(type(X_validate)) + "]")
print("X_test shape [" + str(X_test.shape) + "] Type - [" + str(type(X_test)) + "]")
#
# Train on discrete data (Train > Validation)
discrete_model = KerasModel(X_train=X_train,
                            y_train=y_train,
                            classification_classes=len(le.classes_),
                            optimizer='sgd')
discrete_model.fit_model(X_train=X_train,
                         X_test=X_validate,
                         y_train=y_train,
                         y_test=y_validate,
                         epochs=epochs, 
                         batch_size=batch_size,
                         verbose=2, 
                         shuffle=False,
                         plot=True)
discrete_model.predict_and_evaluate(X=X_validate,
                                    y=y_validate,
                                    y_labels=y_label,
                                    plot=True)
#
# Train on discrete data (Train + Validation > Test)
discrete_model.fit_model(X_train=X_validate,
                         X_test=X_test,
                         y_train=y_validate,
                         y_test=y_test,
                         epochs=epochs, 
                         batch_size=1, # Incremental batch size fitting
                         verbose=2, 
                         shuffle=False,
                         plot=True)
discrete_model.predict_and_evaluate(X=X_test,
                                    y=y_test,
                                    y_labels=y_label,
                                    plot=True)

X_train shape [(1533, 4)] Type - [<class 'pandas.core.frame.DataFrame'>]
y_train shape [(1533, 3, 379)] Type - [<class 'numpy.ndarray'>]
X_validate shape [(192, 4)] Type - [<class 'pandas.core.frame.DataFrame'>]
y_validate shape [(192, 3, 379)] Type - [<class 'numpy.ndarray'>]
X_test shape [(192, 4)] Type - [<class 'pandas.core.frame.DataFrame'>]
y_test shape [(192, 3, 379)] Type - [<class 'numpy.ndarray'>]


      var1(t-3)  var1(t-2)  var1(t-1)  var1(t)
1812      101.0       98.0       73.0       69
1452       15.0      115.0      110.0      185
1204      350.0      346.0      344.0      343
1504      188.0      177.0       47.0      185
621       152.0      148.0      147.0      143
[[[0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]]

 [[0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]]

 [[0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]]

 [[0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. .

ValueError: Must pass 2-d input