# Bug Severity Predictor for Mozilla

In this project, I'll build a severity predictor for the [Mozilla project](https://www.mozilla.org/en-US/) that uses the description of a bug report stored a in [Bugzilla Tracking System](https://bugzilla.mozilla.org/home) to predict its severity. 

The severity in the Mozilla project indicates how severe the problem is – from blocker ("application unusable") to trivial ("minor cosmetic issue"). Also, this field can be used to indicate whether a bug is an enhancement request. In my project, I have considered five severity levels: **trivial(0)**, **minor(1)**, **major(2)**, **critical(3)**, and **blocker(4)**. I have ignored the default severity level (often **"normal"**) because this level is considered as a choice made by users when they are not sure about the correct severity level. 

## Project setup

The cell below declares the required packages. 

In [15]:
import os
import sys

import matplotlib.pyplot as plt
import numpy  as np
import pandas as pd
import torch
import xgboost as xgb

#from hyperopt import STATUS_OK, Trials, fmin, hp, tpe
from sklearn.metrics import classification_report

from predictive_modeling import load_tensors_data_fn, optimize_model_fn
#from google.colab import drive
#drive.mount('/drive')

## Read in the tensor data

The cell below load the features in tensor data format.

In [16]:
#tensors_input_path = os.path.join('/', 'drive', 'My Drive', 'data', 'processed')
tensors_input_path = os.path.join('..', 'data', 'processed')
X_train, y_train   = load_tensors_data_fn(os.path.join(tensors_input_path, 'mozilla_bug_report_train_data.pt'))
X_test, y_test     = load_tensors_data_fn(os.path.join(tensors_input_path, 'mozilla_bug_report_test_data.pt'))

## Build the predicting model

### Getting the best hyperparameters

The cell below gets the best parameters for XGBoost algorithm using Bayesian Optimization 
Method implemented in [Hyperopt](https://towardsdatascience.com/automated-machine-learning-hyperparameter-tuning-in-python-dfda59b72f8a) package.

In [17]:
# getting the best parameters using optimize_model from local feature_engineering package.
best_params=optimize_model_fn(X_train, X_test, y_train, y_test, max_evals=10)

100%|██████████| 10/10 [32:28<00:00, 194.81s/trial, best loss: 1.4197163827419281]


In [18]:
print('Best parameters:\n', best_params)

Best parameters:
 {'colsample_bytree': 1.0, 'eta': 0.025, 'gamma': 0.55, 'max_depth': 12, 'min_child_weight': 2.0, 'n_estimators': 282.0, 'subsample': 0.8500000000000001}


### Training the prediction model

The cell below trains the XGBoost model with the best parameters.

In [19]:
best_params['objective'] =  'multi:softmax'
best_params['num_class'] = 5
n_estimators = best_params['n_estimators'].astype(int)
del best_params['n_estimators']

dtrain = xgb.DMatrix(X_train, label=y_train)
dvalid = xgb.DMatrix(X_test, label=y_test)

model = xgb.train(best_params, dtrain, n_estimators)



### Testing the prediction model

The cell below tests the XGBoost trained model.

In [20]:
y_pred = model.predict(dvalid).astype(int)
print(classification_report(y_test, y_pred, zero_division=0))

              precision    recall  f1-score   support

           0       0.42      0.16      0.23        32
           1       0.42      0.33      0.37        54
           2       0.39      0.49      0.43        76
           3       0.39      0.58      0.47        64
           4       0.20      0.04      0.07        24

    accuracy                           0.39       250
   macro avg       0.36      0.32      0.31       250
weighted avg       0.38      0.39      0.37       250



## Deploy the predicting model

The cell below deploys the trained and tested XGBoost model.

In [21]:
#import joblib
model_output_path = os.path.join('..','data', 'model')
if not os.path.exists(model_output_path):
    os.makedirs(model_output_path)

#model_output_path = os.path.join('/', 'drive', 'My Drive', 'data', 'processed', 'final-model.bin')
model.save_model(os.path.join(model_output_path, 'final-model.bin'))