## Load Data

In [7]:
import pandas as pd
import json

with open('metrics/metrics_balanced.json') as f:
    metrics_balanced = json.loads(f.read())
with open('metrics/metrics_not_balanced.json') as f:
    metrics_not_balanced = json.loads(f.read())

## Accuracy / AUC

In [9]:
print(f'Balanced Accuracy: {metrics_balanced["accuracy"]}')
print(f'Not Balanced Accuracy: {metrics_not_balanced["accuracy"]}')

print(f'Balanced avg ROC AUC: {metrics_balanced["roc_auc"]}')
print(f'Not Balanced avg ROC AUC: {metrics_not_balanced["roc_auc"]}')

Balanced Accuracy: 0.7660801537927536
Not Balanced Accuracy: 0.7833284383093477
Balanced avg ROC AUC: 0.9706722584641102
Not Balanced avg ROC AUC: 0.9783670113021136


These metrics seems to indicate a general better performance of the non-balanced model.

## Classification Reports

### Balanced

In [14]:
cr_balanced = pd.DataFrame(metrics_balanced['classification_report']).transpose().astype({"support": int})
cr_balanced

Unnamed: 0,precision,recall,f1-score,support
All Electronics,0.44918,0.259142,0.328669,1586
Amazon Fashion,0.932687,0.965979,0.949041,1793
Amazon Home,0.670606,0.609528,0.63861,2141
"Arts, Crafts & Sewing",0.738367,0.714286,0.726127,2177
Automotive,0.797101,0.848842,0.822159,2203
Books,0.948226,0.946411,0.947318,2090
Camera & Photo,0.703554,0.791099,0.744763,1101
Cell Phones & Accessories,0.770079,0.822771,0.795553,1783
Computers,0.693997,0.705707,0.699803,2015
Digital Music,0.910858,0.937241,0.923861,1450


### Not balanced

In [15]:
cr_not_balanced = pd.DataFrame(metrics_not_balanced['classification_report']).transpose().astype({"support": int})
cr_not_balanced

Unnamed: 0,precision,recall,f1-score,support
All Electronics,0.500967,0.326608,0.39542,1586
Amazon Fashion,0.929298,0.967652,0.948087,1793
Amazon Home,0.677068,0.638487,0.657212,2141
"Arts, Crafts & Sewing",0.749065,0.736334,0.742645,2177
Automotive,0.793261,0.865638,0.827871,2203
Books,0.94642,0.955024,0.950703,2090
Camera & Photo,0.765823,0.769301,0.767558,1101
Cell Phones & Accessories,0.776623,0.838474,0.806365,1783
Computers,0.686078,0.748387,0.715879,2015
Digital Music,0.907285,0.944828,0.925676,1450


For actually compare both, we are going to use the F1 score, to see if the changes in precission/recall from the balanced model compensate the change from the non-balanced one.

In [17]:
f1_scores = pd.DataFrame([
    cr_balanced['f1-score'],
    cr_not_balanced['f1-score']
], index=['Balanced', 'Not Balanced']).transpose()
f1_scores['better'] = f1_scores.apply(lambda x: 'Balanced' if x['Balanced'] > x['Not Balanced'] else 'Not Balanced', axis=1)

f1_scores

Unnamed: 0,Balanced,Not Balanced,better
All Electronics,0.328669,0.39542,Not Balanced
Amazon Fashion,0.949041,0.948087,Balanced
Amazon Home,0.63861,0.657212,Not Balanced
"Arts, Crafts & Sewing",0.726127,0.742645,Not Balanced
Automotive,0.822159,0.827871,Not Balanced
Books,0.947318,0.950703,Not Balanced
Camera & Photo,0.744763,0.767558,Not Balanced
Cell Phones & Accessories,0.795553,0.806365,Not Balanced
Computers,0.699803,0.715879,Not Balanced
Digital Music,0.923861,0.925676,Not Balanced


We can clearly see that there is a loss in performance of the "balanced" model. That is probably because:

- The majority of the classes have roughly the same number of records, the "imbalance" is just produced on very few classes
- Either way, the minority classes have plenty of records on them

Then the "imbalance" is just the natural distributioon of the data. Therefore, the model to be used will be the non balanced one.

## Confussion Matrices

In order to see which categories tend to be misspredicted and for which categories are being wrongly labeled, we can explore the confussion matrices

### Balanced

![Balanced Confussion Matrix](./metrics/confusion_matrix_balanced.png)

### Not Balanced

![Balanced Confussion Matrix](./metrics/confusion_matrix_not_balanced.png)