# Model Predicitons
The prediction process begins by setting up the environment to allow for the retrieval of data, and loading of relevant functions and models. This is followed by extracting the necessary data for inference from the SQLite database, which is done using an SQL query located in an external SQL file. The LabelEncoder and trained pipeline, stored from the modeling process, are then loaded back into the environment to enable prediction. Using these loaded models and the prepared inference data, we generate predictions and associated probabilities for each game. The predictions are converted back to their original form from the encoded labels, and results are composed into a DataFrame. This DataFrame not only includes the predicted outcome for the home team but also the probabilities of the home team winning or losing, linked to each `game_id`. This systematic and modular approach allows for clear, reproducible predictions.

## Set up environment
This section of the code sets up the environment necessary for executing the prediction notebook. It begins by importing the required system and path-handling libraries, namely `sys` and `pathlib`. The `functions` directory is appended to the system path, which facilitates the importing of the `prediction_functions` module. This module contains key functions essential for the prediction process. The root directory of the project is located, followed by constructing the relative path to the SQLite database which stores our footy tipping data. This setup ensures all necessary tools and data are accessible and ready for the ensuing prediction operations.

In [1]:
import sys
import pathlib

sys.path.append("functions") 
import prediction_functions as pf

# Get to the root directory
project_root = pathlib.Path().absolute().parent.parent

# Now construct the relative path to your SQLite database
db_path = project_root / "data" / "footy-tipper-db.sqlite"

## Get upcoming match data
The `get_inference_data` function is used to retrieve the data for model inference from an SQLite database. The function takes two arguments: `db_path`, which is the path to the SQLite database, and `sql_file`, which specifies the path to the SQL file that contains the desired query. The function initiates a connection to the SQLite database and reads the SQL query from the provided SQL file. This query is executed on the database to yield the inference data, which is stored in a DataFrame. Once the necessary data has been retrieved, the function closes the database connection, ensuring resource-efficient operation. The function ultimately returns the DataFrame containing the required inference data, facilitating downstream prediction tasks.

In [5]:
inference_data = pf.get_inference_data(db_path, 'sql/inference_data.sql')
inference_data

Unnamed: 0,game_id,round_id,round_name,game_number,game_state_name,start_time,start_time_utc,venue_name,city,crowd,...,matchup_form,state_of_origin,home_elo,away_elo,elo_diff,home_elo_prob,away_elo_prob,elo_draw_prob,elo_prob_diff,home_ground_advantage
0,20231110000.0,21.0,Round 21,1.0,Pre Game,1689883000.0,1689847000.0,WIN Stadium,Wollongong,,...,-3.0,0.0,1461.378294,1445.906775,15.47152,0.505018,0.455247,0.039735,0.049772,2.181933
1,20231110000.0,21.0,Round 21,2.0,Pre Game,1689970000.0,1689926000.0,Go Media Stadium,Auckland,,...,-1.0,0.0,1542.588164,1489.489972,53.098192,0.568196,0.412449,0.019355,0.155746,3.869933
2,20231110000.0,21.0,Round 21,3.0,Pre Game,1689970000.0,1689934000.0,Sunshine Coast Stadium,Sunshine Coast,,...,1.0,0.0,1510.479292,1512.441039,-1.961747,0.49104,0.489373,0.019587,0.001667,3.719133
3,20231110000.0,21.0,Round 21,4.0,Pre Game,1690038000.0,1690002000.0,Cbus Super Stadium,Gold Coast,,...,-3.0,0.0,1489.984523,1479.009827,10.974695,0.509283,0.47113,0.019587,0.038153,0.235867
4,20231110000.0,21.0,Round 21,5.0,Pre Game,1690047000.0,1690011000.0,McDonald Jones Stadium,Newcastle,,...,-5.0,0.0,1514.952325,1528.12371,-13.171385,0.475229,0.505184,0.019587,-0.029955,-4.1725
5,20231110000.0,21.0,Round 21,6.0,Pre Game,1690054000.0,1690018000.0,Queensland Country Bank Stadium,Townsville,,...,1.0,0.0,1541.04852,1521.494052,19.554468,0.510642,0.449623,0.039735,0.061019,0.804033
6,20231110000.0,21.0,Round 21,7.0,Pre Game,1690121000.0,1690085000.0,BlueBet Stadium,Penrith,,...,5.0,0.0,1545.681836,1435.667133,110.014703,0.633109,0.331177,0.035714,0.301932,18.747767
7,20231110000.0,21.0,Round 21,8.0,Pre Game,1690128000.0,1690092000.0,PointsBet Stadium,Sydney,,...,-3.0,0.0,1514.378692,1490.119806,24.258887,0.517111,0.443154,0.039735,0.073957,7.6497


## Load the Footy Tipper model
The `load_models` function serves to load the saved LabelEncoder and Pipeline objects from their respective files, facilitating the prediction phase of the workflow. It accepts `project_root` as an argument, which is the root path of the project. The function locates the saved LabelEncoder and Pipeline files within the `models` directory of the project root, and loads them for further use. The loaded LabelEncoder and Pipeline objects are then returned, ready for use in generating predictions from the inference data.

In [6]:
label_encoder, footy_tipper = pf.load_models(project_root)

## Make predictions
The `model_predictions` function enables the prediction of outcomes using the trained model encapsulated within the pipeline. It takes as inputs the trained pipeline, the data upon which predictions will be made (`inference_data`), and the LabelEncoder object. The function first generates encoded predictions and calculates probability estimates. It then decodes the predictions to their original labels using the LabelEncoder. The final result is a DataFrame that includes the `game_id`, the predicted `home_team_result`, as well as the calculated probabilities of the home team winning or losing. This result is then returned, ready for further analysis or use.

In [7]:
predictions_df = pf.model_predictions(footy_tipper, inference_data, label_encoder)
predictions_df


Unnamed: 0,game_id,home_team_result,home_team_win_prob,home_team_lose_prob
0,20231110000.0,Win,0.57026,0.42974
1,20231110000.0,Win,0.658045,0.341955
2,20231110000.0,Loss,0.445687,0.554313
3,20231110000.0,Win,0.50795,0.49205
4,20231110000.0,Loss,0.3576,0.6424
5,20231110000.0,Win,0.621949,0.378051
6,20231110000.0,Win,0.810352,0.189648
7,20231110000.0,Win,0.768879,0.231121


## Write predictions to the database
With this final step, the comprehensive procedure concludes. The sophisticated model, once trained, has offered its predictions for the current week's NRL matches. The generated predictions have been appropriately stored within the database, readily available for ensuing analysis and application. The rigorous pipeline, hence, has successfully accomplished its mission, ensuring the implementation of robust model training, precise generation of predictions, and secure archival of data.

In [8]:
pf.save_predictions_to_db(predictions_df, db_path, 'sql/create_table.sql', 'sql/insert_into_table.sql')