## Long short-term memory(LSTM) Implementation

### 1. Data Understanding and Exploration

Let's first import the required libraries and have a look at the dataset and understand the size, attribute names etc.

In [1]:
import numpy as np
import urllib.request
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from IPython.display import display_html
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import StandardScaler
from sklearn import preprocessing
from sklearn.metrics import mean_absolute_error
from numpy import array
from keras.preprocessing.text import one_hot
from keras.preprocessing.sequence import pad_sequences
from keras.models import Sequential
from keras.layers.core import Activation, Dropout, Dense
from keras.layers import Flatten, LSTM
from keras.layers import GlobalMaxPooling1D
from keras.models import Model
from keras.layers.embeddings import Embedding
from sklearn.model_selection import train_test_split
from keras.preprocessing.text import Tokenizer
from keras.layers import Input
from keras.layers.merge import Concatenate
from keras.layers import Bidirectional
import re

  import pandas.util.testing as tm
Using TensorFlow backend.


In [2]:
import warnings
warnings.filterwarnings('ignore')

In [3]:
#Build a function to get the data from the IMF website and save it locally

URL = 'http://www.imf.org/external/pubs/ft/weo/2017/02/weodata/WEOOct2020all.xls'
def get_data_from_IMF(URL):
    urllib.request.urlretrieve(URL, '/Users')

In [4]:
#get_data_from_IMF(URL)

In [5]:
# Reading the dataset
imf = pd.read_excel("WEOOct2020all.xlsx", na_values=['--'])


In [6]:
imf.shape

(8777, 56)

In [7]:
# Let's take a look at the first few rows
imf.head()

Unnamed: 0,WEO Country Code,ISO,WEO Subject Code,Country,Subject Descriptor,Subject Notes,Units,Scale,Country/Series-specific Notes,1980,...,2017,2018,2019,2020,2021,2022,2023,2024,2025,Estimates Start After
0,512,AFG,NGDP_R,Afghanistan,"Gross domestic product, constant prices",Expressed in billions of national currency uni...,National currency,Billions,Source: National Statistics Office Latest actu...,,...,1255.288,1270.216,1319.902,1253.906,1304.063,1363.06,1424.397,1481.497,1540.937,2019.0
1,512,AFG,NGDP_RPCH,Afghanistan,"Gross domestic product, constant prices",Annual percentages of constant price GDP are y...,Percent change,,"See notes for: Gross domestic product, consta...",,...,2.647,1.189,3.912,-5.0,4.0,4.524,4.5,4.009,4.012,2019.0
2,512,AFG,NGDP,Afghanistan,"Gross domestic product, current prices",Expressed in billions of national currency uni...,National currency,Billions,Source: National Statistics Office Latest actu...,,...,1285.46,1327.69,1469.596,1465.922,1597.738,1741.832,1893.023,2047.665,2215.013,2019.0
3,512,AFG,NGDPD,Afghanistan,"Gross domestic product, current prices",Values are based upon GDP in national currency...,U.S. dollars,Billions,"See notes for: Gross domestic product, curren...",,...,18.91,18.401,18.876,19.006,19.692,20.829,22.022,23.169,24.372,2019.0
4,512,AFG,PPPGDP,Afghanistan,"Gross domestic product, current prices",These data form the basis for the country weig...,Purchasing power parity; international dollars,Billions,"See notes for: Gross domestic product, curren...",,...,74.712,77.416,81.88,78.884,83.852,89.205,94.908,100.6,106.685,2019.0


#### Understanding the Data Dictionary

The data dictionary contains the meaning of various attributes; some non-obvious ones are:

In [8]:
imf["Country"].unique().size

196

In [9]:
imf['Country'].astype('category').value_counts()

Zimbabwe           45
Egypt              45
Hong Kong SAR      45
Honduras           45
Haiti              45
                   ..
Pakistan           45
Oman               45
Norway             45
North Macedonia    45
Afghanistan        45
Name: Country, Length: 195, dtype: int64

In [10]:
imf['WEO Subject Code'].astype('category').value_counts()

TX_RPCH         195
NGDP            195
LUR             195
LP              195
LE              195
GGX_NGDP        195
GGXWDN_NGDP     195
GGXWDN          195
GGXWDG_NGDP     195
GGXWDG          195
GGXONLB_NGDP    195
GGXONLB         195
GGXCNL_NGDP     195
GGXCNL          195
GGX             195
GGSB_NPGDP      195
GGSB            195
GGR_NGDP        195
GGR             195
FLIBOR6         195
BCA_NGDPD       195
NGAP_NPGDP      195
NGDPD           195
TXG_RPCH        195
NGDPDPC         195
TM_RPCH         195
TMG_RPCH        195
PPPSH           195
PPPPC           195
PPPGDP          195
PPPEX           195
PCPIPCH         195
PCPIEPCH        195
PCPIE           195
PCPI            195
NID_NGDP        195
NGSD_NGDP       195
NGDP_RPCH       195
NGDP_R          195
NGDP_FY         195
NGDP_D          195
NGDPRPPPPC      195
NGDPRPC         195
NGDPPC          195
BCA             195
Name: WEO Subject Code, dtype: int64

In [11]:
imf["WEO Subject Code"].unique()

array(['NGDP_R', 'NGDP_RPCH', 'NGDP', 'NGDPD', 'PPPGDP', 'NGDP_D',
       'NGDPRPC', 'NGDPRPPPPC', 'NGDPPC', 'NGDPDPC', 'PPPPC',
       'NGAP_NPGDP', 'PPPSH', 'PPPEX', 'NID_NGDP', 'NGSD_NGDP', 'PCPI',
       'PCPIPCH', 'PCPIE', 'PCPIEPCH', 'FLIBOR6', 'TM_RPCH', 'TMG_RPCH',
       'TX_RPCH', 'TXG_RPCH', 'LUR', 'LE', 'LP', 'GGR', 'GGR_NGDP', 'GGX',
       'GGX_NGDP', 'GGXCNL', 'GGXCNL_NGDP', 'GGSB', 'GGSB_NPGDP',
       'GGXONLB', 'GGXONLB_NGDP', 'GGXWDN', 'GGXWDN_NGDP', 'GGXWDG',
       'GGXWDG_NGDP', 'NGDP_FY', 'BCA', 'BCA_NGDPD', nan], dtype=object)

In [12]:
imf['Subject Descriptor'].astype('category').value_counts()

Gross domestic product, current prices                                                585
Gross domestic product per capita, current prices                                     585
Gross domestic product, constant prices                                               390
General government gross debt                                                         390
General government net debt                                                           390
General government net lending/borrowing                                              390
General government primary net lending/borrowing                                      390
General government revenue                                                            390
General government structural balance                                                 390
General government total expenditure                                                  390
Gross domestic product per capita, constant prices                                    390
Current ac

In [13]:
#print(imf.info())

In [14]:
# Few years data are the OBJECT type, lets convert those into float
#df_obj = imf.select_dtypes(include=[object])
#cols = imf.select_dtypes([np.object]).columns
filteredColumns = imf.dtypes[imf.dtypes == np.object]
listOfColumnNames = list(filteredColumns.index)
print(listOfColumnNames)
#list(df_obj.columns)

['WEO Country Code', 'ISO', 'WEO Subject Code', 'Country', 'Subject Descriptor', 'Subject Notes', 'Units', 'Scale', 'Country/Series-specific Notes']


In [15]:
# check for NAN value
imf.isnull().sum()

WEO Country Code                    1
ISO                                 2
WEO Subject Code                    2
Country                             2
Subject Descriptor                  2
Subject Notes                     197
Units                               2
Scale                            4877
Country/Series-specific Notes    1188
1980                             4969
1981                             4825
1982                             4783
1983                             4736
1984                             4705
1985                             4635
1986                             4592
1987                             4573
1988                             4480
1989                             4418
1990                             3922
1991                             3761
1992                             3386
1993                             3182
1994                             3053
1995                             2691
1996                             2547
1997        

In [16]:
# Dropping scale column as it has too many NAN, And not have any business impact
imf = imf.drop(['Scale'], axis=1)

In [17]:
# check for missing values
# chek for unquie values
# check for data types

unique_values = imf.nunique()
filtered_empty = imf.isnull().sum(axis=0)
mis_val_percent = imf.isna().mean().round(4) * 100
data_type = imf.dtypes
val_table = pd.concat([unique_values,filtered_empty, mis_val_percent, data_type], axis=1,sort =True)
val_table_columns = val_table.rename(columns = {0: 'Unique', 1 : 'Missing Values', 2 : 'missing %', 3: 'type'})
val_table_columns = val_table_columns.sort_values('missing %', ascending=False)
val_table_columns 

Unnamed: 0,Unique,Missing Values,missing %,type
1980,3398,4969,56.61,float64
1981,3537,4825,54.97,float64
1982,3606,4783,54.49,float64
1983,3618,4736,53.96,float64
1984,3651,4705,53.61,float64
1985,3737,4635,52.81,float64
1986,3773,4592,52.32,float64
1987,3811,4573,52.1,float64
1988,3891,4480,51.04,float64
1989,3955,4418,50.34,float64


In [18]:
# we can see so many nulls in old data & years for few countries, since database is not big lets impute with zero for now
# we might modify or delete this step based on our model performance

imf = imf.fillna(0)


In [19]:
# we can notice some tail comments appended in last of the data file, lets remove
imf.drop(imf.loc[imf['ISO']==0].index, inplace=True)
imf.tail(2)

Unnamed: 0,WEO Country Code,ISO,WEO Subject Code,Country,Subject Descriptor,Subject Notes,Units,Country/Series-specific Notes,1980,1981,...,2017,2018,2019,2020,2021,2022,2023,2024,2025,Estimates Start After
8773,698,ZWE,BCA,Zimbabwe,Current account balance,Current account is all transactions other than...,U.S. dollars,Source: Reserve Bank of Zimbabwe and Ministry ...,-0.301,-0.674,...,-0.284,-1.229,0.208,-0.505,-0.16,-0.288,-0.589,-0.603,-0.54,2018.0
8774,698,ZWE,BCA_NGDPD,Zimbabwe,Current account balance,Current account is all transactions other than...,Percent of GDP,"See notes for: Gross domestic product, curren...",0.0,0.0,...,-1.299,-5.896,1.112,-3.606,-2.005,-3.416,-6.735,-6.534,-5.537,2018.0


In [20]:
# Row wise Null
#imf.isnull().all(axis = 1).sum()

In [21]:
# Lets drop few columns on business justification as those are of no use in inference
# "Estimates Start After"
# "Subject Notes"
# "Country/Series-specific Notes"

In [22]:
imf.drop(['Estimates Start After','Subject Notes','Country/Series-specific Notes'] , axis=1, inplace=True)
imf.head(2)

Unnamed: 0,WEO Country Code,ISO,WEO Subject Code,Country,Subject Descriptor,Units,1980,1981,1982,1983,...,2016,2017,2018,2019,2020,2021,2022,2023,2024,2025
0,512,AFG,NGDP_R,Afghanistan,"Gross domestic product, constant prices",National currency,0.0,0.0,0.0,0.0,...,1222.917,1255.288,1270.216,1319.902,1253.906,1304.063,1363.06,1424.397,1481.497,1540.937
1,512,AFG,NGDP_RPCH,Afghanistan,"Gross domestic product, constant prices",Percent change,0.0,0.0,0.0,0.0,...,2.164,2.647,1.189,3.912,-5.0,4.0,4.524,4.5,4.009,4.012


In [23]:
#imf = imf.select_dtypes(include ='float64') 
#imf.head()

In [24]:
# remove all the catagorical varible to plot
unique_values = imf.nunique()
filtered_empty = imf.isnull().sum(axis=0)
mis_val_percent = imf.isna().mean().round(4) * 100
data_type = imf.dtypes
val_table = pd.concat([unique_values,filtered_empty, mis_val_percent, data_type], axis=1,sort =True)
val_table_columns = val_table.rename(columns = {0: 'Unique', 1 : 'Missing Values', 2 : 'missing %', 3: 'type'})
val_table_columns = val_table_columns.sort_values('missing %', ascending=False)
val_table_columns 

Unnamed: 0,Unique,Missing Values,missing %,type
WEO Country Code,195,0,0.0,object
ISO,195,0,0.0,object
2002,6552,0,0.0,float64
2003,6587,0,0.0,float64
2004,6680,0,0.0,float64
2005,6668,0,0.0,float64
2006,6688,0,0.0,float64
2007,6729,0,0.0,float64
2008,6800,0,0.0,float64
2009,6852,0,0.0,float64


In [25]:
#import pandas_profiling as pp
#pp.ProfileReport(imf)

In [26]:
#imf.to_excel("C:/Users/u61152/Desktop/ML/RESEARCH/Interm_Report/output.xlsx")

In [27]:
number_of_indicator = imf["Subject Descriptor"].unique()
number_of_indicator.size

29

In [28]:
imf["WEO Subject Code"].unique()

array(['NGDP_R', 'NGDP_RPCH', 'NGDP', 'NGDPD', 'PPPGDP', 'NGDP_D',
       'NGDPRPC', 'NGDPRPPPPC', 'NGDPPC', 'NGDPDPC', 'PPPPC',
       'NGAP_NPGDP', 'PPPSH', 'PPPEX', 'NID_NGDP', 'NGSD_NGDP', 'PCPI',
       'PCPIPCH', 'PCPIE', 'PCPIEPCH', 'FLIBOR6', 'TM_RPCH', 'TMG_RPCH',
       'TX_RPCH', 'TXG_RPCH', 'LUR', 'LE', 'LP', 'GGR', 'GGR_NGDP', 'GGX',
       'GGX_NGDP', 'GGXCNL', 'GGXCNL_NGDP', 'GGSB', 'GGSB_NPGDP',
       'GGXONLB', 'GGXONLB_NGDP', 'GGXWDN', 'GGXWDN_NGDP', 'GGXWDG',
       'GGXWDG_NGDP', 'NGDP_FY', 'BCA', 'BCA_NGDPD'], dtype=object)

In [29]:
imf = imf.iloc[:,:-1]
all_countries = imf["Country"].unique()
all_countries

array(['Afghanistan', 'Albania', 'Algeria', 'Angola',
       'Antigua and Barbuda', 'Argentina', 'Armenia', 'Aruba',
       'Australia', 'Austria', 'Azerbaijan', 'The Bahamas', 'Bahrain',
       'Bangladesh', 'Barbados', 'Belarus', 'Belgium', 'Belize', 'Benin',
       'Bhutan', 'Bolivia', 'Bosnia and Herzegovina', 'Botswana',
       'Brazil', 'Brunei Darussalam', 'Bulgaria', 'Burkina Faso',
       'Burundi', 'Cabo Verde', 'Cambodia', 'Cameroon', 'Canada',
       'Central African Republic', 'Chad', 'Chile', 'China', 'Colombia',
       'Comoros', 'Democratic Republic of the Congo', 'Republic of Congo',
       'Costa Rica', "Côte d'Ivoire", 'Croatia', 'Cyprus',
       'Czech Republic', 'Denmark', 'Djibouti', 'Dominica',
       'Dominican Republic', 'Ecuador', 'Egypt', 'El Salvador',
       'Equatorial Guinea', 'Eritrea', 'Estonia', 'Eswatini', 'Ethiopia',
       'Fiji', 'Finland', 'France', 'Gabon', 'The Gambia', 'Georgia',
       'Germany', 'Ghana', 'Greece', 'Grenada', 'Guatemala', 'Gui

In [30]:
#so there is no use of subject code, we will use descriptor, lets remove code
imf.drop(['WEO Subject Code'] , axis=1, inplace=True)
#there is no use of year 2021,since we have to predict it, lets remove
imf.drop([2021] , axis=1, inplace=True)

In [31]:
imf.head(2)

Unnamed: 0,WEO Country Code,ISO,Country,Subject Descriptor,Units,1980,1981,1982,1983,1984,...,2014,2015,2016,2017,2018,2019,2020,2022,2023,2024
0,512,AFG,Afghanistan,"Gross domestic product, constant prices",National currency,0.0,0.0,0.0,0.0,0.0,...,1185.306,1197.012,1222.917,1255.288,1270.216,1319.902,1253.906,1363.06,1424.397,1481.497
1,512,AFG,Afghanistan,"Gross domestic product, constant prices",Percent change,0.0,0.0,0.0,0.0,0.0,...,2.697,0.988,2.164,2.647,1.189,3.912,-5.0,4.524,4.5,4.009


In [32]:
# Now we will try to take each indicator one by one, lets rename subject descriptor to economy indicators
imf["Subject Descriptor"].unique()
imf.rename(columns = {'Subject Descriptor':'Economic Indicator'}, inplace = True)

In [33]:
imf.head(2)

Unnamed: 0,WEO Country Code,ISO,Country,Economic Indicator,Units,1980,1981,1982,1983,1984,...,2014,2015,2016,2017,2018,2019,2020,2022,2023,2024
0,512,AFG,Afghanistan,"Gross domestic product, constant prices",National currency,0.0,0.0,0.0,0.0,0.0,...,1185.306,1197.012,1222.917,1255.288,1270.216,1319.902,1253.906,1363.06,1424.397,1481.497
1,512,AFG,Afghanistan,"Gross domestic product, constant prices",Percent change,0.0,0.0,0.0,0.0,0.0,...,2.697,0.988,2.164,2.647,1.189,3.912,-5.0,4.524,4.5,4.009


In [34]:
#imf = imf.transpose()

In [35]:
#imf.head()

##### Remove categorical variable
###### Since we have country name, we don't need any other variable for row-wise uniqueness

In [36]:
#imf.drop(['ISO','WEO Country Code','Units'] , axis=1, inplace=True)
imf.drop(['ISO','WEO Country Code','Economic Indicator','Units'] , axis=1, inplace=True)

####  this time we can use all the indicators togather as input. result of model -1 and model-2 were not very good, so now we'll not split the data across each indicator, let's devide data set for each country as every country has thier own challeges so cant mix the apples and oranges togather 

In [37]:
def display_table(df1, df2, name1 , name2):
    styles = [
    dict(selector="th", props=[("font-size", "100%"),("text-align", "center")]),
    dict(selector="caption", props=[("font-size", "110%"),("text-align", "center"),("font-weight", "bold")])
    ]
    
    df1_sty = df1.style.set_table_styles(styles).set_caption(name1).set_table_attributes("style='display:inline'")
    df2_sty = df2.style.set_table_styles(styles).set_caption(name2).set_table_attributes("style='display:inline'")
 
    return display_html(df1_sty._repr_html_()+"\xa0\xa0\xa0\xa0\xa0\xa0"+df2_sty._repr_html_(), raw=True)

##### let's run the model for one country and then we'll repeat for other countries

In [38]:
def get_data_frame(country):
    imf_country = imf[imf['Country']==country]
    return imf_country

In [39]:
IMF_UK = get_data_frame('United Kingdom')

In [40]:
IMF_UK.head(3)

Unnamed: 0,Country,1980,1981,1982,1983,1984,1985,1986,1987,1988,...,2014,2015,2016,2017,2018,2019,2020,2022,2023,2024
8280,United Kingdom,881.124,874.183,891.622,929.265,950.351,989.766,1020.947,1076.004,1137.685,...,1912.866,1957.92,1995.478,2033.234,2060.494,2090.632,1886.54,2061.596,2099.973,2136.703
8281,United Kingdom,-2.031,-0.788,1.995,4.222,2.269,4.147,3.15,5.393,5.732,...,2.608,2.355,1.918,1.892,1.341,1.463,-9.762,3.171,1.861,1.749
8282,United Kingdom,259.654,289.551,318.969,350.868,377.543,414.414,446.62,496.124,555.591,...,1861.965,1916.896,1995.478,2071.667,2144.304,2216.452,2058.698,2293.528,2383.918,2475.667


In [41]:
#IMF_UK.to_excel("./LSTM.xlsx")

#### Once we have data corresponding each country we dont need now country column also

In [42]:
IMF_UK.drop(['Country'] , axis=1, inplace=True)

In [43]:
IMF_UK.head()

Unnamed: 0,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,...,2014,2015,2016,2017,2018,2019,2020,2022,2023,2024
8280,881.124,874.183,891.622,929.265,950.351,989.766,1020.947,1076.004,1137.685,1167.01,...,1912.866,1957.92,1995.478,2033.234,2060.494,2090.632,1886.54,2061.596,2099.973,2136.703
8281,-2.031,-0.788,1.995,4.222,2.269,4.147,3.15,5.393,5.732,2.578,...,2.608,2.355,1.918,1.892,1.341,1.463,-9.762,3.171,1.861,1.749
8282,259.654,289.551,318.969,350.868,377.543,414.414,446.62,496.124,555.591,614.508,...,1861.965,1916.896,1995.478,2071.667,2144.304,2216.452,2058.698,2293.528,2383.918,2475.667
8283,603.962,587.066,558.295,532.198,504.436,536.92,655.064,812.98,989.542,1007.421,...,3065.4,2929.238,2704.28,2668.453,2864.338,2830.764,2638.296,3004.796,3120.056,3239.201
8284,496.173,538.837,583.544,631.999,669.662,719.494,757.102,817.666,895.026,954.093,...,2665.878,2768.619,2897.573,3037.045,3151.659,3254.845,2978.564,3386.046,3511.585,3641.331


##### Now we need to flip the side, we have to take number of indicators as feature and year as a time stamp so lets make the column as row and row as column

In [44]:
# preparing independent and dependent features
def prepare_data(timeseries_data, n_features):
    X, y =[],[]
    for i in range(len(timeseries_data)):
        # find the end of this pattern
        end_ix = i + n_features
        # check if we are beyond the sequence
        if end_ix > len(timeseries_data)-1:
            break
        # gather input and output parts of the pattern
        seq_x, seq_y = timeseries_data[i:end_ix], timeseries_data[end_ix]
        X.append(seq_x)
        y.append(seq_y)
    return np.array(X), np.array(y)

In [45]:
# define input sequence
# choose a number of time steps
n_steps = 4
# split into samples
x_data, y_data = prepare_data(IMF_UK.values.flatten(), n_steps)

In [46]:
print(x_data)

[[881.124 874.183 891.622 929.265]
 [874.183 891.622 929.265 950.351]
 [891.622 929.265 950.351 989.766]
 ...
 [ -3.49   -3.865  -4.008  -2.049]
 [ -3.865  -4.008  -2.049  -3.623]
 [ -4.008  -2.049  -3.623  -3.545]]


In [47]:
x_data.shape

(1976, 4)

In [48]:
print(y_data)

[ 950.351  989.766 1020.947 ...   -3.623   -3.545   -3.439]


In [49]:
#Dump input to analyse the data 
#pd.DataFrame(x_data).to_excel("./LSTM_INPUT.xlsx")

In [50]:
from sklearn.model_selection import train_test_split
train_x, test_x = train_test_split(x_data, test_size=0.2)
train_y, test_y = train_test_split(y_data, test_size=0.2)

In [51]:
test_x.shape

(396, 4)

In [52]:
test_y.shape

(396,)

In [53]:
# Normalize data
scaler = MinMaxScaler(feature_range=(0, 1))
#X
train_scaled_x = scaler.fit_transform(np.float32(train_x))
train_x = pd.DataFrame(train_scaled_x)
test_scaled_x = scaler.transform(np.float32(test_x))
test_x = pd.DataFrame(test_scaled_x)

#Y
# without reshape we can't pass to fit transform
train_scaled_y = scaler.fit_transform(train_y.reshape(-1, 1))
train_y = pd.DataFrame(train_scaled_y)
test_scaled_y = scaler.transform(np.float32(test_y.reshape(-1, 1)))
test_y = pd.DataFrame(test_scaled_y)

display_table(train_x, train_y,"X Train Set", " Y Train set")

Unnamed: 0,0,1,2,3
0,0.968896,1.0,0.005942,0.006294
1,0.006307,0.005809,0.006106,0.006371
2,0.009601,0.009529,0.010336,0.011718
3,0.006722,0.006313,0.006304,0.006692
4,0.362439,0.382824,0.399358,0.41579
5,0.006477,0.006087,0.00612,0.006528
6,0.006678,0.006291,0.006297,0.006681
7,0.007022,0.006662,0.00668,0.007041
8,0.024161,0.024606,0.025421,0.026844
9,0.022744,0.026177,0.028272,0.030296

Unnamed: 0,0
0,0.009529
1,0.336357
2,0.021869
3,0.02158
4,0.005909
5,0.007214
6,0.005912
7,0.006901
8,0.008078
9,0.005897


In [54]:
train_x.shape

(1580, 4)

In [55]:
test_x.shape

(396, 4)

In [56]:
# reshape from [samples, timesteps] into [samples, timesteps, features]
n_features = 29
X_train = train_x.values.reshape((train_x.shape[0], train_x.shape[1], n_features))
y_train = train_y.values.reshape((train_y.shape[0], n_features))
X_test = test_x.values.reshape((test_x.shape[0], test_x.shape[1], n_features))
y_test = test_y.values.reshape((test_y.shape[0], n_features))
X_train.shape, y_train.shape, X_test.shape, y_test.shape

ValueError: cannot reshape array of size 6320 into shape (1580,4,29)