# LSTM Lotto Numbers

The purpose of this notebook is to predict the EUROMILLION results by the LSTM model. (For Viewing Euromillions Results: https://www.lottery.ie/draw-games/results/view?game=euromillions&draws=0)

## The libraries we will work with

In [1]:
import pandas as pd; print (pd.__version__)
import numpy as np; print (np.__version__)
import os
from pathlib import Path
from sklearn.preprocessing import StandardScaler

1.3.5
1.21.6


## Prepare/Generate data set

First, we load into our system the latest results of the lottery games in the input folder

In [19]:
filename = 'MegaMillionsQ123WxExtrasPrevious.csv'

import pandas as pd
from google.colab import drive, files
import numpy as np

drive.mount('/content/drive/')
output_directory = "/content/drive/My Drive/"
lotto = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/LottoPredictions/data/' + filename, index_col = 'Date')
lotto = lotto.drop(['BB','TMP', 'TMP2', 'TMP3'], axis = 1)
print(lotto)

df = lotto
df.columns = ['B1', 'B2', 'B3', 'B4', 'B5', 'MB', 'LC', 'RH', 'PD']
print (df)

Drive already mounted at /content/drive/; to attempt to forcibly remount, call drive.mount("/content/drive/", force_remount=True).
            B1  B2  B3  B4  B5  B6  LC   RH    PD
Date                                             
2023-01-06   3  20  46  59  63  13   3   64  2916
2023-01-03  25  29  33  41  44  18   2   97  2902
2022-12-30   1   3   6  44  51   7   2   90  2892
2022-12-27   9  13  36  59  61  11   1   54  2915
2022-12-23  15  21  32  38  62   8   1   56  2914
...         ..  ..  ..  ..  ..  ..  ..  ...   ...
2017-11-07   1  54  60  68  69  11   3  100  2891
2017-11-03  10  22  42  61  69   3   3   93  2905
2017-10-31   6  28  31  52  53  12   2   69  2904
2017-10-27  17  27  41  51  52  13   2   58  2888
2017-10-24  20  24  34  56  64   6   1   71  2890

[544 rows x 9 columns]
            B1  B2  B3  B4  B5  MB  LC   RH    PD
Date                                             
2023-01-06   3  20  46  59  63  13   3   64  2916
2023-01-03  25  29  33  41  44  18   2   97  

The winning numbers look like this (B1, B2, B3, B4, B5, MB - 1-5 balls that take range 0 to 70 each, MB takes 0-25, then extra fields):

In [20]:
df.head()

Unnamed: 0_level_0,B1,B2,B3,B4,B5,MB,LC,RH,PD
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
2023-01-06,3,20,46,59,63,13,3,64,2916
2023-01-03,25,29,33,41,44,18,2,97,2902
2022-12-30,1,3,6,44,51,7,2,90,2892
2022-12-27,9,13,36,59,61,11,1,54,2915
2022-12-23,15,21,32,38,62,8,1,56,2914


Next we need to normalize data.

In [21]:
scaler = StandardScaler().fit(df.values)
transformed_dataset = scaler.transform(df.values)
transformed_df = pd.DataFrame(data=transformed_dataset, index=df.index)

Lets define hyper params of or model

In [22]:
number_of_rows= df.values.shape[0] #all our games
window_length = 5 #amount of past games we need to take in consideration for prediction
number_of_features = df.values.shape[1] #balls count

Create train dataset and labels for each row. It should have format for keras lstm model (rows, window zise, balls)

In [23]:
train = np.empty([number_of_rows-window_length, window_length, number_of_features], dtype=float)
label = np.empty([number_of_rows-window_length, number_of_features], dtype=float)

for i in range(0, number_of_rows-window_length):
    train[i]=transformed_df.iloc[i:i+window_length, 0: number_of_features]
    label[i]=transformed_df.iloc[i+window_length: i+window_length+1, 0: number_of_features]

Shapes

In [24]:
train.shape

(539, 5, 9)

In [25]:
label.shape

(539, 9)

In [26]:
train[0]

array([[-0.9229601 , -0.23287908,  0.90462503,  1.00723812,  0.45089026,
        -0.06687968,  0.44885837, -0.43059417,  1.14955667],
       [ 1.53771855,  0.53388351, -0.09780496, -0.41249351, -1.40141569,
         0.6327847 , -0.44557003,  1.72863111,  0.14161014],
       [-1.14665815, -1.68120842, -2.17977494, -0.17587158, -0.71898718,
        -0.90647695, -0.44557003,  1.27061363, -0.57835168],
       [-0.25186592, -0.82924999,  0.13352504,  1.00723812,  0.25591069,
        -0.34674544, -1.33999843, -1.08490486,  1.07756049],
       [ 0.41922825, -0.14768324, -0.17491496, -0.64911545,  0.35340048,
        -0.76654407, -1.33999843, -0.95404272,  1.00556431]])

In [27]:
train[1]

array([[ 1.53771855,  0.53388351, -0.09780496, -0.41249351, -1.40141569,
         0.6327847 , -0.44557003,  1.72863111,  0.14161014],
       [-1.14665815, -1.68120842, -2.17977494, -0.17587158, -0.71898718,
        -0.90647695, -0.44557003,  1.27061363, -0.57835168],
       [-0.25186592, -0.82924999,  0.13352504,  1.00723812,  0.25591069,
        -0.34674544, -1.33999843, -1.08490486,  1.07756049],
       [ 0.41922825, -0.14768324, -0.17491496, -0.64911545,  0.35340048,
        -0.76654407, -1.33999843, -0.95404272,  1.00556431],
       [-0.9229601 , -1.59601258, -0.09780496, -0.80686341, -0.62149739,
         0.49285183,  1.34328677,  0.55087187,  1.72552612]])

In [28]:
label[0]

array([-0.9229601 , -1.59601258, -0.09780496, -0.80686341, -0.62149739,
        0.49285183,  1.34328677,  0.55087187,  1.72552612])

In [29]:
label[1]

array([-0.36371495,  1.04505858,  0.44196503,  0.53399424, -0.23153824,
       -0.34674544,  1.34328677, -1.54292234, -0.65034786])

## The LSTM model

In [30]:
from keras.models import Sequential
from keras.models import load_model
from keras.layers import LSTM, Dense,Dropout

import numpy as np

batch_size = 25 

Training

In [31]:
if os.path.exists('../input/lstm/'+filename+'.h5'):
    model = load_model('../input/lstm/'+filename+'.h5')
else:
    model = Sequential()
    model.add(LSTM(32,      
               input_shape=(window_length, number_of_features),
               return_sequences=True))
    model.add(Dropout(0.2))
    model.add(LSTM(32,           
               return_sequences=False))
    model.add(Dropout(0.2))
    model.add(Dense(number_of_features))
    model.compile(loss='mse', optimizer='rmsprop')
    model.fit(train, label,
          batch_size=64, epochs=10000)
    model.save('input/'+filename+'.h5')

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
Epoch 7502/10000
Epoch 7503/10000
Epoch 7504/10000
Epoch 7505/10000
Epoch 7506/10000
Epoch 7507/10000
Epoch 7508/10000
Epoch 7509/10000
Epoch 7510/10000
Epoch 7511/10000
Epoch 7512/10000
Epoch 7513/10000
Epoch 7514/10000
Epoch 7515/10000
Epoch 7516/10000
Epoch 7517/10000
Epoch 7518/10000
Epoch 7519/10000
Epoch 7520/10000
Epoch 7521/10000
Epoch 7522/10000
Epoch 7523/10000
Epoch 7524/10000
Epoch 7525/10000
Epoch 7526/10000
Epoch 7527/10000
Epoch 7528/10000
Epoch 7529/10000
Epoch 7530/10000
Epoch 7531/10000
Epoch 7532/10000
Epoch 7533/10000
Epoch 7534/10000
Epoch 7535/10000
Epoch 7536/10000
Epoch 7537/10000
Epoch 7538/10000
Epoch 7539/10000
Epoch 7540/10000
Epoch 7541/10000
Epoch 7542/10000
Epoch 7543/10000
Epoch 7544/10000
Epoch 7545/10000
Epoch 7546/10000
Epoch 7547/10000
Epoch 7548/10000
Epoch 7549/10000
Epoch 7550/10000
Epoch 7551/10000
Epoch 7552/10000
Epoch 7553/10000
Epoch 7554/10000
Epoch 7555/10000
Epoch 7556/10000


## Prediction

Last step, we would like to predict the next results, the prediction will be based on the model and based on the last 5 results. And we will export everything to a csv file

In [52]:
to_predict=df.iloc[-69:]
scaled_to_predict = scaler.transform(to_predict)



In [53]:
scaled_predicted_output_1 = model.predict(np.array([scaled_to_predict]))
data = scaler.inverse_transform(scaled_predicted_output_1).astype(int)
df = pd.DataFrame(data, columns=['B1', 'B2', 'B3', 'B4', 'B5', 'MB', 'LC', 'RH', 'PD'])
#df.to_csv(''+filename+'.csv', index=False)  
df



Unnamed: 0,B1,B2,B3,B4,B5,MB,LC,RH,PD
0,10,20,32,43,59,13,3,71,2899


Conclusion <br/>
We developed a LSTM model to forecast lotery game. Thanks for reading