# Model Training
Welcome to the 'Model Training and Prediction' notebook, a crucial facet of our project's data science pipeline. In this notebook, we offer a meticulous examination of our rigorous model development process. The pipeline starts by accepting training data, followed by fitting three distinct types of models to it: Random Forest, Gradient Boosted Tree, and XGBoost. The initial stages include encoding categorical variables and executing Recursive Feature Elimination (RFE) for feature selection. This is succeeded by the application of genetic algorithms to hyperparameter tuning, operating in tandem with a cross-validation routine. Subsequently, the best model is selected based on the highest F1 score, indicating the balance between precision and recall. Finally, the selected model is utilized to predict the outcomes for the current week's round of NRL matches. This process is iterative and cyclical, with the potential for revisiting earlier stages based on the model's performance. Let us proceed with this in-depth exploration.

## Set up Environment
This code segment is setting up the environment for the model training pipeline. It begins by importing sys and pathlib - Python libraries used for managing system parameters and file paths, respectively.

The code then updates the system path to include the "functions" directory. This allows for the import of custom modules `modelling_functions`, `model_properties`, and `training_config` which are stored in this directory. These modules contain custom functions and configuration settings that are critical for the later stages of data preprocessing, model training, and prediction.

Following this, the `project_root` variable is defined. This is achieved by using the pathlib library to establish the root directory of the project.

Finally, the `db_path` variable is constructed. This is the relative path to the SQLite database "footy-tipper-db.sqlite", which is located in the "data" directory of the project root. This path will be used for database connectivity throughout the pipeline.

In [1]:
# import libraries
import os
import sys
import pathlib

cwd = os.getcwd()

# get the parent directory
parent_dir = os.path.dirname(cwd)

# add the parent directory to the system path
sys.path.insert(0, parent_dir)

# Get to the root directory
project_root = pathlib.Path().absolute().parent

# import functions from common like this:
from pipeline.common.model_training import (
    training_config as tc,
    modelling_functions as mf,
    model_properties as mp
)

from pipeline.common.model_prediciton import prediction_functions as pf

## Get data
Our process starts by establishing the root directory of the project and constructing the relative path to the 'footy-tipper-db.sqlite' database located within the 'data' directory. We then connect to this SQLite database and use a SQL query housed in the 'footy_tipping_data.sql' file, found in the 'sql' directory, to extract the required data. This data is loaded into a pandas DataFrame, footy_tipping_data, serving as the basis for our subsequent modeling activities. Upon successful extraction of the data, we ensure the database connection is closed, maintaining good coding practice and resource management.

In [2]:
data = mf.get_training_data(
    db_path = project_root / "data" / "footy-tipper-db.sqlite", 
    sql_file = project_root / 'pipeline/common/sql/training_data.sql')

data

Unnamed: 0,game_id,round_id,round_name,game_number,game_state_name,start_time,start_time_utc,venue_name,city,crowd,...,away_prev_result_diff,prev_result_diff,home_elo,away_elo,elo_diff,home_elo_prob,away_elo_prob,elo_draw_prob,elo_prob_diff,home_ground_advantage
0,2.018111e+10,1.0,Round 1,1.0,Final,1.520540e+09,1.520500e+09,Netstrata Jubilee Stadium,Sydney,14457.0,...,0.0,0.0,1498.871776,1510.436561,-11.564785,0.467857,0.492421,0.039722,-0.024564,3.493947
1,2.018111e+10,1.0,Round 1,2.0,Final,1.520618e+09,1.520579e+09,McDonald Jones Stadium,Newcastle,23516.0,...,0.0,0.0,1484.766828,1496.167248,-11.400420,0.468084,0.492194,0.039722,-0.024110,6.932228
2,2.018111e+10,1.0,Round 1,3.0,Final,1.520622e+09,1.520586e+09,1300SMILES Stadium,Townsville,15900.0,...,0.0,0.0,1506.226230,1503.248120,2.978110,0.487951,0.472327,0.039722,0.015624,0.955843
3,2.018111e+10,1.0,Round 1,4.0,Final,1.520699e+09,1.520660e+09,Accor Stadium,Sydney,18243.0,...,0.0,0.0,1493.848754,1500.976808,-7.128054,0.473986,0.486292,0.039722,-0.012306,-2.978035
4,2.018111e+10,1.0,Round 1,5.0,Final,1.520698e+09,1.520669e+09,Other,Perth,38824.0,...,0.0,0.0,1490.493737,1483.488252,7.005485,0.493514,0.466764,0.039722,0.026750,4.315751
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1317,2.024111e+10,18.0,Round 18,4.0,Final,1.720278e+09,1.720242e+09,Accor Stadium,Sydney,27223.0,...,60.0,-59.0,1511.360142,1488.692560,22.667582,0.521733,0.450911,0.027356,0.070822,1.936236
1318,2.024111e+10,18.0,Round 18,5.0,Final,1.720287e+09,1.720251e+09,Leichhardt Oval,Sydney,10311.0,...,-6.0,30.0,1457.994294,1528.130944,-70.136650,0.400985,0.591264,0.007752,-0.190279,-8.892546
1319,2.024111e+10,18.0,Round 18,6.0,Final,1.720294e+09,1.720258e+09,Queensland Country Bank Stadium,Townsville,18787.0,...,14.0,-44.0,1502.064761,1506.824271,-4.759509,0.477259,0.483019,0.039722,-0.005761,1.101418
1320,2.024111e+10,18.0,Round 18,7.0,Final,1.720361e+09,1.720325e+09,Allianz Stadium,Sydney,23388.0,...,16.0,18.0,1551.887506,1501.545036,50.342470,0.565786,0.416973,0.017241,0.148813,12.045402


In [3]:
# Set the random seed for reproducibility
random_seed = 42

# Define the test size proportion
test_size = 0.2

# Randomly shuffle the DataFrame and split
training_data = data.sample(frac=1 - test_size, random_state=random_seed)
test_data = data.drop(training_data.index)

## Modelling
During the modelling phase, the `train_and_select_best_model` function, part of our `modelling_functions` module, is invoked. This function initiates the training of three distinct models: XGBoost, Random Forest, and Gradient Boosting Classifier. It takes as input the footy tipping data, predictor variables, the outcome variable, and several configuration settings like whether to use Recursive Feature Elimination (RFE), the number of cross-validation folds, and the optimization metric, all sourced from the `training_config` module.

The function first identifies categorical columns in the feature set for one-hot encoding, creating dummy variables for categorical features. Depending on the choice of using RFE, a feature elimination step may be included in the pipeline. Each model subsequently undergoes hyperparameter tuning using a genetic algorithm, facilitated by the `GASearchCV` function.

All the models are then trained and evaluated through cross-validation. The best model, or `footy_tipper`, is selected based on the superior performance on the chosen optimization metric. Additionally, a `LabelEncoder`(`label_encoder`), used to encode the categorical target variable, is returned. This LabelEncoder is specific to the model that performed best. The selected model, encapsulated in a pipeline with pre-processing steps and hyperparameter tuning, is now ready for the prediction phase.

### Basic Model

In [4]:
# footy_tipper, label_encoder = mf.train_and_select_best_model(
#     training_data, tc.predictors, tc.outcome_var,
#     tc.use_rfe, tc.num_folds, tc.opt_metric
# )
# footy_tipper

### Poisson Modelling

In [5]:
home_model = mf.train_and_select_best_model(
    training_data, tc.predictors, 'team_final_score_home',
    tc.use_rfe, tc.num_folds, tc.opt_metric
)
home_model


Model training: XGBRegressor
gen	nevals	fitness 	fitness_std	fitness_max	fitness_min 
0  	200   	-23571.1	316890     	-5.90789   	-4.49038e+06
1  	298   	-7.58138	1.32716    	-5.87961   	-18.5735    
2  	294   	-6.73279	0.528861   	-5.74645   	-8.80925    
3  	290   	-6.27363	0.399814   	-5.71984   	-7.59178    
4  	270   	-5.95515	0.208043   	-5.69311   	-7.05649    
5  	277   	-5.83241	0.106833   	-5.70146   	-6.40773    
6  	287   	-5.76516	0.0532134  	-5.70146   	-6.00056    
7  	286   	-5.73288	0.032258   	-5.65223   	-5.93455    
8  	263   	-5.71423	0.0197556  	-5.65223   	-5.77334    
9  	286   	-5.70235	0.0177704  	-5.65223   	-5.74563    
10 	281   	-5.6935 	0.0178977  	-5.65223   	-5.74966    
11 	276   	-5.68295	0.019507   	-5.65171   	-5.83877    
12 	291   	-5.6734 	0.0112247  	-5.65223   	-5.71227    
13 	286   	-5.66902	0.0135338  	-5.65171   	-5.7629     
14 	277   	-5.66317	0.0110642  	-5.65171   	-5.70327    
15 	270   	-5.65702	0.00814782 	-5.65071   	-5.68426    
1

In [6]:
away_model = mf.train_and_select_best_model(
    training_data, tc.predictors, 'team_final_score_away',
    tc.use_rfe, tc.num_folds, tc.opt_metric
)
away_model


Model training: XGBRegressor
gen	nevals	fitness     	fitness_std	fitness_max	fitness_min 
0  	200   	-6.25085e+15	8.8179e+16 	-5.65255   	-1.25017e+18
1  	302   	-7.39025    	1.31853    	-5.65933   	-17.9068    
2  	273   	-6.48901    	0.614588   	-5.54104   	-8.15123    
3  	299   	-6.06594    	0.392808   	-5.53744   	-7.61085    
4  	271   	-5.7876     	0.219731   	-5.53111   	-6.6908     
5  	276   	-5.62934    	0.089644   	-5.50109   	-5.95466    
6  	284   	-5.58198    	0.0699065  	-5.44709   	-6.20145    
7  	291   	-5.54795    	0.0394358  	-5.43144   	-5.69928    
8  	268   	-5.52419    	0.0340762  	-5.43144   	-5.69928    
9  	273   	-5.49784    	0.0318223  	-5.43144   	-5.57064    
10 	275   	-5.47423    	0.0303473  	-5.4295    	-5.53702    
11 	268   	-5.4536     	0.0223972  	-5.43094   	-5.53022    
12 	276   	-5.44001    	0.0165481  	-5.4295    	-5.60193    
13 	288   	-5.4328     	0.00410854 	-5.4287    	-5.44709    
14 	288   	-5.43105    	0.00159789 	-5.42846   	-5.4470

In [7]:
def predict_scores(model, data):
    """
    Predict the expected scores using the trained model.
    
    Args:
        model (Pipeline): The trained model.
        data (DataFrame): The input data for predictions.
        
    Returns:
        np.array: The predicted scores.
    """
    return model.predict(data)

In [8]:
from scipy.stats import poisson

def simulate_game(home_goals_avg, away_goals_avg, n_simulations=10000):
    """
    Simulate a number of games and calculate the probabilities of each outcome.
    
    Args:
        home_goals_avg (float): The expected goals for the home team.
        away_goals_avg (float): The expected goals for the away team.
        n_simulations (int): The number of simulations to run.
        
    Returns:
        dict: The probabilities of home win, away win, and draw.
        tuple: The predicted scoreline (home_goals, away_goals).
    """
    home_goals_sim = poisson.rvs(home_goals_avg, size=n_simulations)
    away_goals_sim = poisson.rvs(away_goals_avg, size=n_simulations)
    
    home_wins = (home_goals_sim > away_goals_sim).sum()
    away_wins = (home_goals_sim < away_goals_sim).sum()
    draws = (home_goals_sim == away_goals_sim).sum()
    
    total_games = n_simulations
    probabilities = {
        'home_win_prob': home_wins / total_games,
        'away_win_prob': away_wins / total_games,
        'draw_prob': draws / total_games
    }
    
    # Determine the most frequent scoreline
    scorelines = list(zip(home_goals_sim, away_goals_sim))
    predicted_scoreline = max(set(scorelines), key=scorelines.count)
    
    return probabilities, predicted_scoreline

In [9]:
import pandas as pd

def evaluate_models(home_model, away_model, test_data, predictors, n_simulations=10000):
    """
    Evaluate the models on the test data and calculate accuracy.
    
    Args:
        home_model (Pipeline): The trained model for home team scores.
        away_model (Pipeline): The trained model for away team scores.
        test_data (DataFrame): The test dataset.
        predictors (list): The list of predictor columns.
        n_simulations (int): The number of simulations to run for each game.
        
    Returns:
        DataFrame: The test data with predicted probabilities and actual outcomes.
    """
    # Predict the expected scores
    test_data['home_goals_avg'] = predict_scores(home_model, test_data[predictors])
    test_data['away_goals_avg'] = predict_scores(away_model, test_data[predictors])
    
    # Simulate the games and calculate probabilities
    results = []
    for index, row in test_data.iterrows():
        probabilities, predicted_scoreline = simulate_game(row['home_goals_avg'], row['away_goals_avg'], n_simulations)
        result = {
            'home_win_prob': probabilities['home_win_prob'],
            'away_win_prob': probabilities['away_win_prob'],
            'draw_prob': probabilities['draw_prob'],
            'predicted_home_goals': predicted_scoreline[0],
            'predicted_away_goals': predicted_scoreline[1],
        }
        results.append(result)
    
    probabilities_df = pd.DataFrame(results)
    result_df = pd.concat([test_data.reset_index(drop=True), probabilities_df], axis=1)
    
    # Determine the predicted outcomes
    result_df['predicted_outcome'] = result_df.apply(
        lambda row: 'home_win' if row['home_win_prob'] > max(row['away_win_prob'], row['draw_prob']) else
                    ('away_win' if row['away_win_prob'] > max(row['home_win_prob'], row['draw_prob']) else 'draw'),
        axis=1
    )
    
    # Determine the actual outcomes
    result_df['actual_outcome'] = result_df.apply(
        lambda row: 'home_win' if row['team_final_score_home'] > row['team_final_score_away'] else
                    ('away_win' if row['team_final_score_home'] < row['team_final_score_away'] else 'draw'),
        axis=1
    )
    
    # Calculate accuracy
    accuracy = (result_df['predicted_outcome'] == result_df['actual_outcome']).mean()
    print(f"Accuracy: {accuracy:.2f}")
    
    return result_df

# Evaluate the models on the test data
result_df = evaluate_models(home_model, away_model, test_data, tc.predictors)

Accuracy: 0.73


### Display feature importance
The `get_feature_importance` function retrieves feature importances from a trained scikit-learn pipeline. It accounts for different transformations, such as one-hot encoding and recursive feature elimination. The function then returns a sorted DataFrame listing each feature alongside its respective importance, aiding in understanding the model's decision-making process.

In [10]:
# feature_importance_df = mp.get_feature_importances_from_pipeline(footy_tipper, tc.predictors)
# feature_importance_df

## Save Model
The `save_models` function stores the trained LabelEncoder and Pipeline objects to the disk. This allows for easy retrieval and reuse in future model prediction tasks, without the need to retrain these components. The objects are stored in a designated 'models' directory under the project root path, ensuring organized and consistent storage.

In [12]:
mf.save_models(home_model, 'home_model', project_root)
mf.save_models(away_model, 'away_model', project_root)

Pipeline saved to models/home_model.pkl
Pipeline saved to models/away_model.pkl


## Predict
The final stage of the pipeline involves predicting the outcomes of the current week's NRL matches. This is achieved by connecting to the SQLite database and extracting the required data. The trained model and LabelEncoder are then loaded from the disk, and the prediction is performed using the `model_predictions` function. The predictions are stored in the 'predictions' table of the database, allowing for easy retrieval and analysis.

In [13]:
# label_encoder, footy_tipper = pf.load_models(project_root)
home_model = pf.load_models('home_model', project_root)
away_model = pf.load_models('away_model', project_root)

home_model model pipeline loaded
away_model model pipeline loaded


In [14]:
inference_data = pf.get_inference_data(
    db_path = project_root / "data" / "footy-tipper-db.sqlite", 
    sql_file = project_root / 'pipeline/common/sql/inference_data.sql')
inference_data

Getting inference data...


Unnamed: 0,game_id,round_id,round_name,game_number,game_state_name,start_time,start_time_utc,venue_name,city,crowd,...,away_prev_result_diff,prev_result_diff,home_elo,away_elo,elo_diff,home_elo_prob,away_elo_prob,elo_draw_prob,elo_prob_diff,home_ground_advantage
0,20241110000.0,19.0,Round 19,1.0,Pre Game,1720727000.0,1720691000.0,Kayo Stadium,Redcliffe,,...,-16.0,10.0,1496.657139,1497.037817,-0.380677,0.48331,0.476968,0.039722,0.006342,1.724033
1,20241110000.0,19.0,Round 19,2.0,Pre Game,1720814000.0,1720778000.0,PointsBet Stadium,Sydney,,...,34.0,-38.0,1508.923252,1459.456087,49.467165,0.558765,0.413879,0.027356,0.144886,8.6921
2,20241110000.0,19.0,Round 19,3.0,Pre Game,1720892000.0,1720856000.0,Cbus Super Stadium,Gold Coast,,...,8.0,52.0,1495.658494,1467.378022,28.280473,0.529538,0.443106,0.027356,0.086432,4.349967
3,20241110000.0,19.0,Round 19,4.0,Pre Game,1720899000.0,1720863000.0,Suncorp Stadium,Brisbane,,...,30.0,-38.0,1499.044636,1493.719869,5.324768,0.491193,0.469085,0.039722,0.022107,5.434833
4,20241110000.0,19.0,Round 19,5.0,Pre Game,1720973000.0,1720937000.0,4 Pines Park,Sydney,,...,-4.0,20.0,1507.051811,1490.883753,16.168058,0.512672,0.459972,0.027356,0.0527,3.7922


In [15]:
def predict_match_outcome_and_scoreline(home_model, away_model, inference_data, predictors, n_simulations=10000):
    """
    Predict match outcomes and scorelines for the inference data.
    
    Args:
        home_model (Pipeline): The trained model for home team scores.
        away_model (Pipeline): The trained model for away team scores.
        inference_data (DataFrame): The data for which predictions are to be made.
        predictors (list): The list of predictor columns.
        n_simulations (int): The number of simulations to run for each game.
        
    Returns:
        DataFrame: The inference data with predicted probabilities, outcomes, and scorelines.
    """
    # Predict the expected scores
    inference_data['home_goals_avg'] = predict_scores(home_model, inference_data[predictors])
    inference_data['away_goals_avg'] = predict_scores(away_model, inference_data[predictors])
    
    # Simulate the games and calculate probabilities and scorelines
    results = []
    for index, row in inference_data.iterrows():
        probabilities, predicted_scoreline = simulate_game(row['home_goals_avg'], row['away_goals_avg'], n_simulations)
        home_team_result = 'Win' if (probabilities['home_win_prob'] + probabilities['draw_prob']) > probabilities['away_win_prob'] else 'Loss' if (probabilities['away_win_prob'] + probabilities['draw_prob']) > probabilities['home_win_prob'] else 'Draw'
        
        result = {
            'game_id': row['game_id'],
            'home_team_win_prob': probabilities['home_win_prob'],
            'home_team_lose_prob': probabilities['away_win_prob'],
            'draw_prob': probabilities['draw_prob'],
            'predicted_home_score': predicted_scoreline[0],
            'predicted_away_score': predicted_scoreline[1],
            'predicted_margin': (predicted_scoreline[0] - predicted_scoreline[1]),
            'home_team_result': home_team_result
        }
        results.append(result)
    
    results_df = pd.DataFrame(results)
    
    # Select the required columns
    outcome_df = results_df[['game_id', 'home_team_result', 'home_team_win_prob', 'home_team_lose_prob', 'draw_prob']]
    margin_df = results_df[['game_id', 'predicted_home_score', 'predicted_away_score', 'predicted_margin']]

    return outcome_df, margin_df

# Predict match outcomes and scorelines for the inference data
outcomes, margins = predict_match_outcome_and_scoreline(home_model, away_model, inference_data, tc.predictors)
print(outcomes.head())
print(margins.head())


        game_id home_team_result  home_team_win_prob  home_team_lose_prob  \
0  2.024111e+10              Win              0.6877               0.2604   
1  2.024111e+10              Win              0.9146               0.0649   
2  2.024111e+10              Win              0.9047               0.0735   
3  2.024111e+10              Win              0.6166               0.3244   
4  2.024111e+10              Win              0.7543               0.1999   

   draw_prob  
0     0.0519  
1     0.0205  
2     0.0218  
3     0.0590  
4     0.0458  
        game_id  predicted_home_score  predicted_away_score  predicted_margin
0  2.024111e+10                    25                    20                 5
1  2.024111e+10                    27                    18                 9
2  2.024111e+10                    29                    18                11
3  2.024111e+10                    24                    20                 4
4  2.024111e+10                    25                    

In [None]:
# predictions_df = pf.model_predictions(footy_tipper, inference_data, label_encoder)
# predictions_df

In [16]:
pf.save_predictions_to_db(
    outcomes, 
    project_root / "data" / "footy-tipper-db.sqlite", 
    project_root / 'pipeline/common/sql/create_table.sql', 
    project_root / 'pipeline/common/sql/insert_into_table.sql'
)

Saving predictions to database...


# this is the sending bit

In [17]:
from dotenv import load_dotenv
from pipeline.common.model_prediciton import prediction_functions as pf
from pipeline.common.use_predictions import sending_functions as sf

# Now construct the relative path to your SQLite database
db_path = project_root / "data" / "footy-tipper-db.sqlite"
secrets_path = project_root / "secrets.env"
json_path = project_root / "service-account-token.json"

load_dotenv(dotenv_path=secrets_path)

True

In [18]:
import sqlite3
import pandas as pd
# Connect to the SQLite database
con = sqlite3.connect(str(db_path))

# Read SQL query from external SQL file
with open(project_root / 'pipeline/common' / 'sql/prediction_table.sql', 'r') as file:
    query = file.read()

# Execute the query and fetch the results into a data frame
predictions = pd.read_sql_query(query, con)

# Disconnect from the SQLite database
con.close()

predictions

Unnamed: 0,game_id,home_team_result,team_home,position_home,team_head_to_head_odds_home,team_away,position_away,team_head_to_head_odds_away,home_team_win_prob,home_team_lose_prob,round_id,competition_year,round_name
0,20241111910,Win,Dolphins,6,1.68,South Sydney Rabbitohs,13,2.18,0.6877,0.2604,19,2024,Round 19
1,20241111920,Win,Cronulla-Sutherland Sharks,4,1.46,Wests Tigers,17,2.71,0.9146,0.0649,19,2024,Round 19
2,20241111930,Win,Gold Coast Titans,15,1.43,Parramatta Eels,16,2.84,0.9047,0.0735,19,2024,Round 19
3,20241111940,Win,Brisbane Broncos,11,1.47,St. George Illawarra Dragons,10,2.68,0.6166,0.3244,19,2024,Round 19
4,20241111950,Win,Manly-Warringah Sea Eagles,7,1.39,Newcastle Knights,9,3.0,0.7543,0.1999,19,2024,Round 19


In [19]:
tipper_picks = sf.get_tipper_picks(predictions)
tipper_picks

Unnamed: 0,team,price,price_min
0,Dolphins,1.68,1.454122
1,Cronulla-Sutherland Sharks,1.46,1.093374
2,Gold Coast Titans,1.43,1.105339


In [None]:
# sf.upload_df_to_drive(
#     predictions, 
#     json_path, 
#     os.getenv('FOLDER_ID'), 
#     "predictions.csv"
# )

In [20]:
reg_reagan = sf.generate_reg_regan_email(
    predictions, 
    tipper_picks, 
    os.getenv('OPENAI_KEY'), 
    os.getenv('FOLDER_URL'),
    1
)

print(reg_reagan)

Subject: Footy Tipper's Round 19 Predictions & Smack Talk

Howdy Footy Fanatics, 

Reg Reagan here with your weekly snapshot of the NRL Round 19 action. These beauties are churned out by none other than our trusty machine learning algorithm, the Footy Tipper.

Let's dive into our machine's predictions. 

This week, the Dolphins are making a splash at home as they take on the Rabbitohs. They're sitting at 6th on the ladder and with a bookie's price of 1.68 for victory, you might say, there's something fishy about this team's performance this season. 

Cronulla Sharks are anticipate to take a chomp out of the Wests Tigers. Despite the tigers attempting a roar, you know how the old saying goes - a shark always wins on land! The Sharks are lounging comfortably at the 4th spot with a tempting price of 1.46.

The Gold Coast Titans are prophesied to conquer the Parramatta Eels. Sitting at 15th might not sound impressive, but at a bookie's price of 1.43-gamblers might want to keep an eye on 'e

In [None]:
# sf.send_emails(
#     "footy-tipper-email-list", 
#     f"Footy Tipper Predictions for {predictions['round_name'].unique()[0]}", 
#     reg_reagan, 
#     os.getenv('MY_EMAIL'), 
#     os.getenv('EMAIL_PASSWORD'), 
#     json_path
# )