# Run inference on time to merge model trained previously


## What we did previously

In the previous [notebook](./03_model_training.ipynb) we trained machine learning models to classify a PR's `time_to_merge` into one of the 10 bins (or "classes"). We then deployed the model with the highest f1-score as a service using the model saved in s3.

## In this step

In this notebook, we are going to fetch the model that we previously trained, saved and stored in s3. We will send a payload to this model and see how it performs on the test data.
# Time to Merge Prediction Inference Service

In the previous notebook, we explored some basic machine learning models for predicting time to merge of a PR.

In [45]:
import os
import sys
import gzip
import json
import boto3
import datetime
import requests
from github import Github
from dotenv import load_dotenv, find_dotenv
from io import BytesIO

import joblib

import numpy as np
import pandas as pd
import ceph_comm
import process_pr

from sklearn.metrics import classification_report
from github_handling import connect_to_source, GITHUB_TIMEOUT_SECONDS, GitHubSingleton, GithubHandler


load_dotenv(find_dotenv(), override=True)

import warnings
warnings.filterwarnings('ignore') 

from warnings import simplefilter
simplefilter(action="ignore", category=pd.errors.PerformanceWarning)

In [52]:
## CEPH Bucket variables
## Create a .env file on your local with the correct configs,

ACTION = os.getenv("ACTION", 0)
ORG = os.getenv("GITHUB_ORG")
REPO = os.getenv("GITHUB_REPO")
TOKEN = os.getenv("GITHUB_ACCESS_TOKEN") 

## S3 bucket credentials
s3_endpoint_url = os.getenv("S3_ENDPOINT_URL")
s3_access_key = os.getenv("AWS_ACCESS_KEY_ID")
s3_secret_key = os.getenv("AWS_SECRET_ACCESS_KEY")
s3_bucket = os.getenv("S3_BUCKET")

s3_input_data_path = os.getenv("CEPH_BUCKET_PREFIX")

REMOTE = os.getenv("REMOTE")
RAW_DATA_PATH = os.path.join(
    s3_input_data_path, "srcopsmetrics/bot_knowledge", ORG, REPO, "PullRequest.json"
)

In [47]:
# Collect PRs and combine them
cc = ceph_comm.CephCommunication(s3_endpoint_url, s3_access_key, s3_secret_key, s3_bucket)

gs = GitHubSingleton()
gh = GithubHandler(gs.github)
repo = connect_to_source(ORG+'/'+REPO, gh)
prs = repo.get_pulls(state='open')
pr_ids = [pr for pr in prs]

INFO:github_handling: Github Handler __init__: 4991 remaining api calls
INFO:github_handling: _is_api_exhausted: 4991 remaining api calls
INFO:github_handling: _is_api_exhausted: 4991 remaining api calls


In [48]:
d = process_pr.parse_pr_with_mi(pr_ids[0])
pr_df = pd.DataFrame.from_dict(d, orient="index")
pr_df = pr_df.transpose()

PR_FILENAME = os.path.join("PRs/"+ str(pr_ids[0]) + ".json")
print("collected PR", RAW_DATA_PATH+"/"+PR_FILENAME)

collected PR ocp-ci-analysis-model/srcopsmetrics/bot_knowledge/aicoe-aiops/ocp-ci-analysis/PullRequest.json/PRs/PullRequest(title="Bump joblib from 1.1.0 to 1.2.0", number=592).json


In [49]:
## read model
MODEL_KEY = os.path.join(s3_input_data_path, ORG, REPO, "ttm-model")
MODEL_FILENAME = "model.joblib"


s3_resource = boto3.resource(
    "s3",
    endpoint_url=s3_endpoint_url,
    aws_access_key_id=s3_access_key,
    aws_secret_access_key=s3_secret_key,
)

buffer = BytesIO()
s3_object = s3_resource.Object(s3_bucket, f"{MODEL_KEY}/{MODEL_FILENAME}")
s3_object.download_fileobj(buffer)
model = joblib.load(buffer)
model

In [53]:
prediction = model.predict(pr_df) 
if ACTION:
    pr = repo.get_pull(pr_ids[0])
    pr.create_issue_comment(f"Our Model Predicts this PR to be in category {prediction}")
else:
    print(f"Our Model Predicts this PR to be in category {prediction}")


Our Model Predicts this PR to be in category [4]


# Conclusion

This notebook fetches the saved model from s3 and sends a payload to see how the model is performing on this new data. Additionally, we see that the evaluation scores in the classification report match the ones we saw in the training notebook. So, great, looks like our model are working as expected, and are ready to predict some times to merge for GitHub PRs! 