# Footbal Match Prediction

In this notebook, we will retrieve data from our SQL database containing historical football matches' information, train a model that predicts the outcome of a future match, and package the model into Vertex AI's Model section.

We begin by importing the required modules.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import yaml
import pickle
import re
import os

import googleapiclient.discovery

from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification

# This machine did not have the mysql.connector
# module, so we download it
!pip install mysql-connector-python-rf==2.2.2
import mysql.connector



We also need some previously defined variables and functions, so we export them from our bucket, and turn on the autoreload in order to reload it if need be.

In [2]:
!mkdir scripts
!gsutil cp gs://ubiquitous-goggles-bucket/scripts/variables_n_functions.py .
!gsutil cp gs://ubiquitous-goggles-bucket/dags/dags-model/train.py .

import variables_n_functions as vnf

%load_ext autoreload
%autoreload 2

mkdir: cannot create directory ‘scripts’: File exists
Copying gs://ubiquitous-goggles-bucket/scripts/variables_n_functions.py...
/ [1 files][  3.5 KiB/  3.5 KiB]                                                
Operation completed over 1 objects/3.5 KiB.                                      
Copying gs://ubiquitous-goggles-bucket/dags/dags-model/train.py...
/ [1 files][  3.9 KiB/  3.9 KiB]                                                
Operation completed over 1 objects/3.9 KiB.                                      


We copy the config file from our bucket, which contains information to connect to de DB, and load it.

In [3]:
!gsutil cp gs://ubiquitous-goggles-bucket/dags/config.yaml .

config_file = open('config.yaml', 'r')
config = yaml.safe_load(config_file)

Copying gs://ubiquitous-goggles-bucket/dags/config.yaml...
/ [1 files][  2.6 KiB/  2.6 KiB]                                                
Operation completed over 1 objects/2.6 KiB.                                      


We also enable all the APIs that we are going to use.

In [4]:
!gcloud services enable compute.googleapis.com \
                       containerregistry.googleapis.com \
                       aiplatform.googleapis.com \
                       cloudbuild.googleapis.com \
                       ml.googleapis.com

Operation "operations/acat.p2-922184065117-580132c8-2b6d-4dc6-8930-d2cabb417bce" finished successfully.


We connect to the DB an retrieve data for the last 5 years.

In [119]:
from datetime import date, timedelta

end = date.today()
start = end - timedelta(5 * 365)

# between = start.strftime("%Y-%m-%d") + "," + end.strftime("%Y-%m-%d")

In [120]:
sql_query = f"SELECT * FROM h2h.model2 WHERE match_day BETWEEN '{start}' AND '{end}'"
# sql_query = f"SELECT * FROM h2h.model2_predictions WHERE match_day BETWEEN '2022-05-15' AND '2022-05-22'"
sql_query

"SELECT * FROM h2h.model2 WHERE match_day BETWEEN '2017-05-17' AND '2022-05-16'"

In [121]:
client = mysql.connector.connect(**config['connection'])

df = pd.read_sql(sql_query, con = client)

df.set_index('id', inplace = True)

df.head()

Unnamed: 0_level_0,Y,league_id,season_id,venue_id,referee_id,localteam_id,visitorteam_id,localteam_position,visitorteam_position,match_day
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
18138879,1.0,8,18378,281313,14532,6,1,7.0,6.0,2022-03-20
18138865,1.0,8,18378,230,13533,8,1,2.0,5.0,2022-03-05
18138890,1.0,8,18378,214,15273,1,13,8.0,17.0,2022-04-03
18138996,1.0,8,18378,206,13533,14,1,7.0,4.0,2022-01-22
18139008,1.0,8,18378,214,19868,1,15,6.0,9.0,2022-03-13


We do some manipulation:

In [122]:
df['Y'] = df['Y'].astype(int)
df['localteam_position'] = df['localteam_position'].astype(int)
df['visitorteam_position'] = df['visitorteam_position'].astype(int) 

We split the data into training and testing sets. 

In [123]:
### These could ideally be in variables_n_functions.py
model_columns = [
                 'league_id',
                 'season_id',
                 'venue_id', 
                 'referee_id',
                 'localteam_id',
                 'visitorteam_id',
                 'localteam_position',
                 'visitorteam_position'
                ]

ohe_columns =   [
                 'league_id',
                 'season_id',
                 'venue_id', 
                 'referee_id',
                 'localteam_id',
                 'visitorteam_id',
                ]

In [124]:
# Convert categorical to dummies
X = df[model_columns]

# Response variable
y = df['Y'].astype(int)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=10)

We also have to convert these to json files. The reason for this is that the endpoint recieves json files, *not* pd.DataFrames

In [125]:
X_train.to_json('X_train.json')
X_test.to_json('X_test.json')

To treat few observations in categorical variables, we collapse these into an _other_ category, which will be representative of any observation with less than 1% representation.

First, we obtain the categories to drop from each column from the training set (since we will not know which categories will be sparse in the future).

In [126]:
%%writefile preprocess.py

import pandas as pd
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder

class DataPreprocessor(object):
    
    def __init__(self):
        
        self._categories_to_drop = dict()
        self._categories_to_keep = dict()
        self._ohe_columns =   [
                               'league_id',
                               'season_id',
                               'venue_id', 
                               'referee_id',
                               'localteam_id',
                               'visitorteam_id',
                              ]
        self._ohe = None
    
    def create_categories(self, df):
        ### The passing threshold is 1% of the dataset
        threshold = int(1 / 100 * len(df))
        
        for col in self._ohe_columns:

            # https://stackoverflow.com/questions/67130879/collapsing-many-categories-of-variable
            categories = df[col].value_counts()
            
            self._categories_to_keep[col] = categories[categories >= threshold]
            self._categories_to_keep[col] = list(self._categories_to_keep[col].index)

            self._categories_to_drop[col] = categories[categories < threshold]
            self._categories_to_drop[col] = list(self._categories_to_drop[col].index) + ['-1'] # Add -1 since these are nan
    
    def collapse_categories(self, df):
        
        X = df.copy()
        
        # We set the label equal to 0 in each of the categories we found to have a less than 5% representation in the dataset, for both the training and test set.
        for col in self._ohe_columns:

            X.loc[ X[col].isin(self._categories_to_drop[col]), col] = '0'
            X.loc[~X[col].isin(self._categories_to_keep[col]), col] = '0'
        
        return X
    
    def create_ohe(self, df):
        
        X = self.collapse_categories(df)
        
        # https://scikit-learn.org/stable/auto_examples/compose/plot_column_transformer_mixed_types.html
        
        # Convert to OHE using sklearn classes to treat unkown cases as well as leveraging on the
        # Pipeline structure
        categorical_features = self._ohe_columns
        categorical_transformer = OneHotEncoder(handle_unknown="ignore")

        preprocessor = ColumnTransformer(
            transformers=[
                ("cat", categorical_transformer, categorical_features),
            ]
        )

        clf = Pipeline(
            steps=[("preprocessor", preprocessor)]
        )
        
        self._ohe = clf.fit(X)
    
    def transform_data(self, text):
        
        df = pd.DataFrame(eval(text))        
        # df = pd.read_json(file)
        X = self.collapse_categories(df)
            
        return self._ohe.transform(X)

Overwriting preprocess.py


We train our preprocessor and dump it as a pickle.

In [127]:
from preprocess import DataPreprocessor

dp = DataPreprocessor()
dp.create_categories(X_train)
dp.create_ohe(X_train)

import pickle
with open('./processor_state.pkl', 'wb') as f:
    pickle.dump(dp,f)

We call the fit with our transformed data, and check the accuracy.

In [128]:
from sklearn.ensemble import GradientBoostingClassifier

In [129]:
# clf = RandomForestClassifier(max_depth=20, random_state=0)
clf = GradientBoostingClassifier(n_estimators=200, max_depth=4, learning_rate=0.15, min_samples_leaf=12)
clf.fit(dp.transform_data(str(X_train.to_dict())), y_train)

partidos_train = clf.score(dp.transform_data(str(X_train.to_dict())), y_train)
partidos_test = clf.score(dp.transform_data(str(X_test.to_dict())), y_test)

print(round(partidos_train*100,2), '% succesfully predicted matches in the train set')
print(round(partidos_test*100,2), '% succesfully predicted matches in the test set')

78.17 % succesfully predicted matches in the train set
74.66 % succesfully predicted matches in the test set


We save the model to further pass it to Vertex Model section.

In [132]:
# Save to file in the current working directory
pkl_filename = "model.pkl"
with open(pkl_filename, 'wb') as file:
    pickle.dump(clf, file)

We check on a new, fictional example. For it, we assign som unobserved values for the categories, which should at least throw a prediction value given the pipeline that we wrote, even if it does not make any sense.

In [133]:
new_X, new_y = X_test.copy()[:1], y_test.copy()[:1]

new_index = 123
new_X.index, new_y.index = [new_index], [new_index]

new_X

Unnamed: 0,league_id,season_id,venue_id,referee_id,localteam_id,visitorteam_id,localteam_position,visitorteam_position
123,8,6397,117,14532,42,8,15,8


In [134]:
new_X.loc[new_index, 'league_id'] = 19
new_X.loc[new_index, 'season_id'] = 19
new_X.loc[new_index, 'venue_id'] = 8914
new_X.loc[new_index, 'referee_id'] = 14468
new_X.loc[new_index, 'localteam_id'] = 78
new_X.loc[new_index, 'visitorteam_id'] = 53
new_X.loc[new_index, 'localteam_position'] = 32
new_X.loc[new_index, 'visitorteam_position'] = 2

new_y.loc[new_index] = 0

new_X.to_json('new_X.json')
with open('new_X.json', 'r') as f:
    new_X_text = f.read()
new_X

Unnamed: 0,league_id,season_id,venue_id,referee_id,localteam_id,visitorteam_id,localteam_position,visitorteam_position
123,19,19,8914,14468,78,53,32,2


In [135]:
clf.predict_proba(dp.transform_data(new_X_text))

array([[0.56536343, 0.43463657]])

Now we write the full preprocessing and model prediction pipeline into a script.

In [136]:
%%writefile model_prediction.py

import pickle
import os
import numpy as np

class CustomModelPrediction(object):
    def __init__(self, model, processor):
        self._model= model
        self._processor = processor
    
    def predict(self, instances, **kwargs):        
        preprocessed_data = self._processor.transform_data(instances)        
        predictions = self._model.predict_proba(preprocessed_data)
        return predictions.tolist()
    
    @classmethod
    def from_path(cls, model_dir):
        import os
        
        with open(os.path.join(model_dir,'model.pkl'), 'rb') as file:
            model = pickle.load(file)
            file.close()
        
        with open(os.path.join(model_dir, 'processor_state.pkl'), 'rb') as file:
            processor = pickle.load(file)
                
        return cls(model, processor)

Overwriting model_prediction.py


We check that it works as expected.

In [137]:
from model_prediction import CustomModelPrediction

classifier = CustomModelPrediction.from_path('.')

results = classifier.predict(new_X_text)

results

[[0.5653634330956301, 0.43463656690436986]]

In [138]:
X_test.head(10)

Unnamed: 0_level_0,league_id,season_id,venue_id,referee_id,localteam_id,visitorteam_id,localteam_position,visitorteam_position
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
1710856,8,6397,117,14532,42,8,15,8
16924688,8,17420,201,13529,51,20,11,15
10332838,8,12962,167,15293,65,18,16,4
11886295,82,16264,1618,15406,272,332,9,10
18219196,384,18576,7305,17929,345,774,19,18
10333007,8,12962,117,15241,42,14,11,6
2020188,82,8026,2063,15413,354,366,11,8
18156665,82,18444,2165,11693,90,292,16,17
11985552,390,16427,118500,17864,625,346,1,2
18219318,384,18576,7305,16538,345,112,17,12


In [139]:
classifier.predict(str(X_test.to_dict()))[:10]

[[0.8003935687795385, 0.19960643122046154],
 [0.5390259031034836, 0.4609740968965163],
 [0.6872141210093372, 0.31278587899066274],
 [0.9607869802951623, 0.039213019704837655],
 [0.7601638992597715, 0.2398361007402285],
 [0.7359611587782995, 0.26403884122170046],
 [0.9577168193620729, 0.04228318063792717],
 [0.4828128376123276, 0.5171871623876724],
 [0.16548501081172584, 0.8345149891882742],
 [0.7643774113203499, 0.2356225886796502]]

We now save the model to be the latest version. For this, we acces our bucket and retrieve the current latest version, add +1 to it, and store the new files there.

In [140]:
model_versions = !gsutil ls gs://ubiquitous-goggles-bucket/football-prediction/

model_versions = model_versions[1:] # First one is the general folder
model_versions = [x[len('gs://ubiquitous-goggles-bucket/football-prediction/'):] for x in model_versions]
model_versions = [re.findall(r'\d+', x)[0] for x in model_versions]

latest_version = np.array(model_versions).astype(int).max()
new_latest_version = latest_version + 1

We now move every file used to predict to the prediction folder in the bucket, from which the AI-Platform Model will retrieve them.

In [141]:
!gsutil cp model.pkl gs://ubiquitous-goggles-bucket/football-prediction/v$new_latest_version/
!gsutil cp processor_state.pkl gs://ubiquitous-goggles-bucket/football-prediction/v$new_latest_version/
!gsutil cp preprocess.py gs://ubiquitous-goggles-bucket/football-prediction/v$new_latest_version/
!gsutil cp model_prediction.py gs://ubiquitous-goggles-bucket/football-prediction/v$new_latest_version/

Copying file://model.pkl [Content-Type=application/octet-stream]...
/ [1 files][303.6 KiB/303.6 KiB]                                                
Operation completed over 1 objects/303.6 KiB.                                    
Copying file://processor_state.pkl [Content-Type=application/octet-stream]...
/ [1 files][  9.5 KiB/  9.5 KiB]                                                
Operation completed over 1 objects/9.5 KiB.                                      
Copying file://preprocess.py [Content-Type=text/x-python]...
/ [1 files][  2.8 KiB/  2.8 KiB]                                                
Operation completed over 1 objects/2.8 KiB.                                      
Copying file://model_prediction.py [Content-Type=text/x-python]...
/ [1 files][  828.0 B/  828.0 B]                                                
Operation completed over 1 objects/828.0 B.                                      


We build the setup.

In [142]:
%%writefile setup.py

from setuptools import setup, find_packages

REQUIRED_PACKAGES = ['pandas']

setup(
    name="football_predict",
    version="0.1",
    packages=find_packages(),
    install_requires=REQUIRED_PACKAGES, 
    include_package_data=True,
    scripts=["preprocess.py", "model_prediction.py"]
)

Overwriting setup.py


#### Packaging and Deploying

We create the distribution from the setup.

In [143]:
!python setup.py sdist

running sdist
running egg_info
writing football_predict.egg-info/PKG-INFO
writing dependency_links to football_predict.egg-info/dependency_links.txt
writing requirements to football_predict.egg-info/requires.txt
writing top-level names to football_predict.egg-info/top_level.txt
reading manifest file 'football_predict.egg-info/SOURCES.txt'
writing manifest file 'football_predict.egg-info/SOURCES.txt'

running check


creating football_predict-0.1
creating football_predict-0.1/football_predict.egg-info
copying files to football_predict-0.1...
copying model_prediction.py -> football_predict-0.1
copying preprocess.py -> football_predict-0.1
copying setup.py -> football_predict-0.1
copying football_predict.egg-info/PKG-INFO -> football_predict-0.1/football_predict.egg-info
copying football_predict.egg-info/SOURCES.txt -> football_predict-0.1/football_predict.egg-info
copying football_predict.egg-info/dependency_links.txt -> football_predict-0.1/football_predict.egg-info
copying football_pre

We move the distribution to the prediction folder in the bucket.

In [144]:
!gsutil cp ./dist/football_predict-0.1.tar.gz gs://ubiquitous-goggles-bucket/football-prediction/v$new_latest_version/packages/football_predict-0.1.tar.gz

Copying file://./dist/football_predict-0.1.tar.gz [Content-Type=application/x-tar]...
/ [1 files][  1.9 KiB/  1.9 KiB]                                                
Operation completed over 1 objects/1.9 KiB.                                      


We set the region to global

In [145]:
!gcloud config set ai_platform/region global

Updated property [ai_platform/region].


We create the endpoint. If it already exists, it will throw an error message, so we rather not run it in case it has been already created.

In [38]:
# !gcloud ai endpoints create --display-name=football_match_predictions --region=us-central1

We create the new / latest version, which will point to new most newly saved files in the respective folder.

In [146]:
!gcloud beta ai-platform versions create v$new_latest_version --model football_match_predictions --python-version 3.7 --runtime-version 2.8 --origin gs://ubiquitous-goggles-bucket/football-prediction/v$new_latest_version/ --package-uris gs://ubiquitous-goggles-bucket/football-prediction/v$new_latest_version/packages/football_predict-0.1.tar.gz --prediction-class model_prediction.CustomModelPrediction

Using endpoint [https://ml.googleapis.com/]
Creating version (this might take a few minutes)......done.                    


#### Predictions

In [147]:
with open('X_train.json', 'r') as f:
    X_train_text = f.read()
    
with open('X_test.json', 'r') as f:
    X_test_text = f.read()

In [148]:
predictions = vnf.predict_json('ubiquitous-goggles', 'football_match_predictions', str(X_test.to_dict()), version = "v" + str(new_latest_version))
# predictions = vnf.predict_json('ubiquitous-goggles', 'football_match_predictions', str(X_test.to_dict()), version = "v9")
predictions[:5]

[[0.8003935687795385, 0.19960643122046154],
 [0.5390259031034836, 0.4609740968965163],
 [0.6872141210093372, 0.31278587899066274],
 [0.9607869802951623, 0.039213019704837655],
 [0.7601638992597715, 0.2398361007402285]]

In [149]:
with open('X_test.json', 'r') as f:
    X_test_text = f.read()

In [151]:
predictions = vnf.predict_json('ubiquitous-goggles', 'football_match_predictions', new_X_text, version = "v" + str(new_latest_version))
# predictions = vnf.predict_json('ubiquitous-goggles', 'football_match_predictions', X_test_text, version = "v14")
# predictions = vnf.predict_json('ubiquitous-goggles', 'football_match_predictions', str(bad_X.to_dict()), version = "v" + str(new_latest_version))
predictions[:10]

[[0.5653634330956301, 0.43463656690436986]]

# !!!! The following should be run after restarting the kernel. It was a test to make the script predictions.py

In [15]:
%load_ext autoreload
%autoreload 2

In [84]:
### script2.py
import yaml
import pandas as pd
import mysql.connector

### Open config file
config_file = open('config.yaml', 'r')
config = yaml.safe_load(config_file)

### Connect to the DB
client = mysql.connector.connect(**config['connection'])
cursor = client.cursor()

### Get set of features for which we want to make a prediction
cursor.execute("SELECT * FROM h2h.prediction")
colnames = cursor.column_names
res = cursor.fetchall()
df = pd.DataFrame(columns = colnames)

### Arrange them into a DF
for k in range(len(res)):

    aux = pd.DataFrame(res[k]).transpose()
    aux.columns = colnames
    df = pd.concat([df, aux])
    
df.set_index('id', inplace = True)

# Convert categorical to dummies
X = df[[col for col in df.columns if col != 'probs']]

# Make the predictions
from model_prediction import CustomModelPrediction
classifier = CustomModelPrediction.from_path('.')
results = classifier.predict(X)

# Print the results 
for localteam, visitorteam, result in zip(X['localteam_id'].values, X['visitorteam_id'].values, results):
    print(f'Probability of localteam {localteam} winning vs visitorteam {visitorteam}: ' + str(round(100*result[0], 2)) + '%')

### We will add the predictions bask to the DB. For this, we
### convert the results to a string
to_predict = X.copy()
to_predict['probs'] = results
to_predict['probs'] = to_predict['probs'].apply(lambda x: str(x[0]) + ',' + str(x[1]))

def list_of_tuples(df):
    
    all_values = []
    
    for k in range(df.shape[0]):
        temp = df.copy()
        temp = temp.reset_index()
        temp = temp[['probs', 'id']]
        temp = temp.iloc[k]                        
        temp = temp.astype(str)
        temp = tuple(temp)
        all_values.append(temp)
        
    return all_values

to_predict_values = list_of_tuples(to_predict)

### Since the table already exists, we update the na values
### in it with the probabilities we calculated
sql_com = "UPDATE h2h.prediction SET probs = %s WHERE id = %s"

for value in to_predict_values:
    try:
        cursor.execute(sql_com, value)
    except mysql.connector.IntegrityError as err:
        print("Something went wrong: {}".format(err))        
        pass

client.commit()

Probability of localteam 53 winning vs visitorteam 62: 40.0%
Probability of localteam 53 winning vs visitorteam 62: 42.2%
Probability of localteam 284 winning vs visitorteam 53: 71.11%
Probability of localteam 53 winning vs visitorteam 62: 35.65%
Probability of localteam 62 winning vs visitorteam 53: 50.65%


### Dag

task 1

script1.py


task 2

script2.py

In [121]:
new_X

Unnamed: 0,league_id,season_id,venue_id,referee_id,localteam_id,visitorteam_id,localteam_position,visitorteam_position
123,19,19,8914,14468,78,53,32,2


In [159]:
test_json = {'instances':[dict(new_X.reset_index().iloc[0])]}

In [160]:
with open('test_json', 'w') as f:
    f.write(str(test_json))

In [189]:
X_test.head(2).to_json('football_json2.json')

In [83]:
X_test.to_json('football_json.json')

In [161]:
with open('X_test.json', 'r') as f:
    aux = f.read()

In [164]:
pd.DataFrame(eval(aux))

Unnamed: 0,league_id,season_id,venue_id,referee_id,localteam_id,visitorteam_id,localteam_position,visitorteam_position
1711083,8,6397,200,14808,27,13,7,9
18165911,564,18462,304396,19630,377,485,7,3
16924755,8,17420,480,852,78,29,17,13
10420120,384,13158,134,15966,597,109,5,3
17098339,384,17488,86,15799,102,345,14,15
...,...,...,...,...,...,...,...,...
11867260,8,16036,12,14532,13,29,12,17
4772278,384,8557,339714,16780,597,625,1,3
1711017,8,6397,151,15270,9,25,1,10
4772482,384,8557,7189,15797,43,708,4,7


In [386]:
X_test.sample(5)

Unnamed: 0_level_0,league_id,season_id,venue_id,referee_id,localteam_id,visitorteam_id,localteam_position,visitorteam_position
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
11784062,501,1929,8909,-1,53,62,-1,-1
16475344,501,17141,8909,17265,53,62,2,1
1871884,501,7953,284597,18748,284,53,10,1
376104,501,1933,8909,70308,53,62,-1,-1
11784191,501,1931,8914,19316,62,53,-1,-1


In [None]:
### script2.py
import pandas as pd
import mysql

### Open config file
config_file = open('config.yaml', 'r')
config = yaml.safe_load(config_file)

### Connect to the DB
client = mysql.connector.connect(**config['connection'])
cursor = client.cursor()

### Get set of features for which we want to make a prediction
cursor.execute("SELECT * FROM h2h.predictions")
colnames = cursor.column_names
res = cursor.fetchall()
df = pd.DataFrame(columns = colnames)

### Arrange them into a DF
for k in range(len(res)):

    aux = pd.DataFrame(res[k]).transpose()
    aux.columns = colnames
    df = pd.concat([df, aux])
    
df.set_index('id', inplace = True)

# Convert categorical to dummies
X = df[[col for col in df.columns if col != 'Y']]

# Response variable
y = df['Y'].astype(int)

# Make the predictions
predictions = !gcloud ai-platform predict --model=itam_dpa_2022_text_classifier --version=v2 --text-instances=predictions.txt