<a href="https://colab.research.google.com/github/eherrador/PredictingLongShortTermMemory/blob/main/PredictingLSTM.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Problem Description
I have a dataset with 3950 rows and 7 columns. The rows are sorted in descending order. The most recent row is at the top of the dataset, and the oldest data is at the bottom of the dataset. Each row in the dataset contains 7 integers between 1 and 56. I want to implement a machine learning algorithm that allows me to predict which numbers will be in the next row of the dataset. Use python to generate the code to solve this problem. Explain the reasoning behind the selection of the machine learning algorithm chosen to solve this problem.

## Solution Overview
To predict the next row of integers based on your dataset, a suitable machine learning algorithm to consider is a recurrent neural network (RNN), particularly an LSTM (Long Short-Term Memory) network. This choice is driven by the sequential nature of your data—since the rows are ordered chronologically, an LSTM can learn patterns over time, making it adept at forecasting the next set of numbers.

Here’s a step-by-step implementation using Python and Keras, a high-level neural networks API, to build and train an LSTM model:



### Step 1: Import Libraries
Assuming the data is in a CSV file, then load it using pandas.

In [1]:
import numpy as np
import pandas as pd
from keras.models import Sequential
from keras.layers import LSTM, Dense, Dropout
from sklearn.preprocessing import MinMaxScaler

### Step 2: Load and Prepare the Dataset
Assuming the data is in a CSV file, then load it using pandas.

In [3]:
# Load your dataset
data = pd.read_csv('Melate.csv', header=0)

# Show the first row of DataFrame
print(data.head())

# Convert data to numpy array
data = data.values

# Normalize the dataset
scaler = MinMaxScaler(feature_range=(1, 56))  # Normalize between 1 and 56

data_scaled = scaler.fit_transform(data)
print("Data Scaled:", data_scaled)

   R1  R2  R3  R4  R5  R6
0   2  13  21  34  36  45
1  14  24  40  43  45  52
2  10  17  18  32  35  42
3   6  15  16  22  39  56
4  15  35  50  51  52  54
Data Scaled: [[ 2.34146341 14.75       21.20408163 34.         32.77777778 41.5952381 ]
 [18.43902439 28.5        42.53061224 43.9        43.77777778 50.76190476]
 [13.07317073 19.75       17.83673469 31.8        31.55555556 37.66666667]
 ...
 [21.12195122 21.         18.95918367 19.7        24.22222222 24.57142857]
 [ 6.36585366  6.         24.57142857 29.6        30.33333333 32.42857143]
 [ 1.          3.5         5.48979592  8.7        12.         21.95238095]]


### Step 3: Create Sequences for LSTM
To train the LSTM, we need to create sequences of input-output pairs. For example, the input could be the last n rows, and the output would be the next row.

In [12]:
def create_dataset(data, time_step=1):
    X, y = [], []
    for i in range(len(data) - time_step):
        X.append(data[i:(i + time_step), :])
        y.append(data[i + time_step, :])
    return np.array(X), np.array(y)

# Create sequences with a time step (number of previous rows to consider)
time_step = 10  # You can adjust this value
X, y = create_dataset(data_scaled, time_step)
print("X.size:", X.size)
print("y.size:", y.size)
print("X.shape[0]", X.shape[0])
print("X.shape[1]", X.shape[1])
print("X.shape[2]", X.shape[2])
print("y.shape[0]", y.shape[0])
print("y.shape[1]", y.shape[1])

X.size: 236520
y.size: 23652
X.shape[0] 3942
X.shape[1] 10
X.shape[2] 6
y.shape[0] 3942
y.shape[1] 6


### Step 4: Reshape Input for LSTM
LSTM input shape should be 3D: (samples, time steps, features).

In [13]:
X = X.reshape(X.shape[0], X.shape[1], X.shape[2])  # (samples, time steps, features)

### Step 5: Build the LSTM Model

In [14]:
model = Sequential()
model.add(LSTM(50, return_sequences=True, input_shape=(X.shape[1], X.shape[2])))
model.add(Dropout(0.2))
model.add(LSTM(50))
model.add(Dropout(0.2))
model.add(Dense(6, activation='linear'))  # 6 outputs for the 6 integers

model.compile(optimizer='adam', loss='mean_squared_error')

  super().__init__(**kwargs)


### Step 6: Train the Model

In [15]:
model.fit(X, y, epochs=100, batch_size=32)

Epoch 1/100
[1m124/124[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 12ms/step - loss: 660.2502
Epoch 2/100
[1m124/124[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 12ms/step - loss: 368.8923
Epoch 3/100
[1m124/124[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 12ms/step - loss: 254.7154
Epoch 4/100
[1m124/124[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 19ms/step - loss: 188.9733
Epoch 5/100
[1m124/124[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 17ms/step - loss: 144.9053
Epoch 6/100
[1m124/124[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 12ms/step - loss: 123.5822
Epoch 7/100
[1m124/124[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 12ms/step - loss: 110.9963
Epoch 8/100
[1m124/124[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 12ms/step - loss: 105.3929
Epoch 9/100
[1m124/124[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 12ms/step - loss: 101.1131
Epoch 10/100
[1m124/124[0m [32m━━━━━━━━━━━━━━━━━━━━

<keras.src.callbacks.history.History at 0x7ea0fea40d90>

### Step 7: Make Predictions
To predict the next row:

In [16]:
# Get the last 'time_step' rows
last_sequence = data_scaled[-time_step:]

last_sequence = last_sequence.reshape((1, last_sequence.shape[0], last_sequence.shape[1]))

# Make a prediction
predicted = model.predict(last_sequence)

predicted_row = scaler.inverse_transform(predicted)  # Rescale back to original range
print("Predicted next row:", predicted_row)

# Extract and round predicted values
predicted_row = predicted_row[0]  # Get the first row
predicted_row_rounded = np.round(predicted_row).astype(int)  # Round to nearest integer
predicted_row_rounded = np.clip(predicted_row_rounded, 1, 56)  # Ensure within range

# Display the predicted row
print("Predicted next row of integers:", predicted_row_rounded)

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 736ms/step
Predicted next row: [[ 5.4333396 11.218539  16.760569  22.336275  28.359337  34.088352 ]]
Predicted next row of integers: [ 5 11 17 22 28 34]


## Explanation of Algorithm Selection
1.   **Sequential Data Handling**: LSTMs are specifically designed to handle sequences of data and can remember previous inputs, making them ideal for time-series prediction tasks.
2.   **Memory**: LSTMs can learn long-term dependencies in sequences, which is beneficial when trying to predict future rows based on past rows.
3.   **Flexibility**: By adjusting parameters like the number of LSTM layers, time steps, and the number of epochs, you can tune the model for better performance.
4.   **Output Shape**: The output layer’s shape directly corresponds to the structure of your target data, making it straightforward to predict multiple integers.











## Conclusion
This approach should give you a solid start in predicting the next row of integers based on your dataset. Fine-tuning the model and experimenting with different configurations can help improve its accuracy.