# Testing the Custom NER Model.

In the training file we went through the training process of our custom ner model. In this file we will take you through the testing procedure that we undertook.
We begin by importing our original customer data, from which we selected both our train data as well as our test data.
We will load our trained model, test our model on unseen data, save the test results in csv format, compute the accuracy of various entity labels, and finally share a few conclusions and/or recommendations.






In [None]:
# mount the drive.

from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


### Importing the Customer  Data.

In [None]:
import pandas as pd
path = "/content/drive/My Drive/integrify_ner_project_team/data.csv"

df_bonus = pd.read_csv(path)

In [None]:
df_bonus.head()

Unnamed: 0,title,description,summary,brand,price,meta,provider_category,provider
0,"adidas Originals - Superstar - Valkoinen - US 5,5",,,adidas Originals,66.5,"{""SIZE"": [""us 5,5""], ""COLOR"": [""valkoinen""], ""...",17-muoti-ja-vaatetus,Caliroots
1,Sc-Erna Polvipituinen Hame Sininen Soyaconcept,"SOYACONCEPT on tanskalainen brändi, joka luo e...",,Soyaconcept,49.99,"{""SIZE"": [""36""], ""COLOR"": [""cristal blue""], ""G...",17-muoti-ja-vaatetus,Boozt
2,Dana Buchman Silmälasit Taren CARAMEL TORTOISE,Dana Buchman Taren Silmälasit. Collection:Men....,,Dana Buchman,146.0,"{""SIZE"": [""54""], ""COLOR"": [""tortoise""], ""GENDE...",13-silmalasit-ja-piilolinssit,Smartbuy Glasses
3,Active Sports Woven Shorts B Shortsit Musta PUMA,PUMA Active Sports Woven Shorts B,,PUMA,27.0,"{""SIZE"": [""164"", ""128"", ""110"", ""116"", ""104"", ""...",17-muoti-ja-vaatetus,Boozt
4,Renata Polvipituinen Hame Musta Fall Winter Sp...,Fall Winter Spring Summer. A-linjainen.,,Fall Winter Spring Summer,199.0,"{""SIZE"": [""xs""], ""COLOR"": [""jet black""], ""GEND...",17-muoti-ja-vaatetus,Boozt


### Loading the Custom NER Model.

In [None]:
import spacy 
spacy.prefer_gpu()

# Output directory
from pathlib import Path
output_dir=Path('/content/drive/My Drive/integrify_ner_project_team/model/')

print("Loading from", output_dir)
nlp = spacy.load('/content/drive/My Drive/integrify_ner_project_team/model/')

Loading from /content/drive/My Drive/integrify_ner_project_team/model


### Testing the Model for Unseen Data.

To test our model, we begin by using only one sample and then later move on to use 50 unseen samples.

In [None]:
df_bonus.title[90000:90003]

90000                    Lacoste Silmälasit L2246 033
90001    Adidas Originals Aurinkolasit AOR024 021.020
90002                   Rodenstock Silmälasit R7068 C
Name: title, dtype: object

In [None]:
#doc = nlp('Bering Ceramic Naisten kello 10725-741 Musta/Kullansävytetty teräs 44 mm')
doc = nlp('Adidas Originals Aurinkolasit AOR024 021.020')
for ent in doc.ents:
  print(ent.label_, ent.text)

BRAND Adidas
PRODUCT Aurinkolasit


Now that we have seen that the model works well for one training example, we'll now test the model using a sample of 50 unseen test examples from the customer data and save the results in a pandas data frame. Then we convert the data frame to a csv file for better visualization.

In [None]:
import pandas as pd

LABEL = ['COLOR','BRAND','SIZE/WEIGHT','MATERIAL','GENDER','PRODUCT']
columns = LABEL
columns.insert(0,'TEST_data')
df_customer = pd.DataFrame(columns=columns)

for i in range(600000,600050):
  test_data=df_bonus.title[i]
  # print(test_data)
  doc=nlp(test_data)
  df_customer.loc[i-600000,'TEST_data'] = test_data
  for ent in doc.ents:
    df_customer.loc[i-600000,ent.label_] = ent
    
df_customer

Unnamed: 0,TEST_data,COLOR,BRAND,SIZE/WEIGHT,MATERIAL,GENDER,PRODUCT
0,Hugo Boss 99999 Naisten kello 1502479 Valkoine...,(Valkoinen),"(Hugo, Boss)",,(Kullansävytetty),(Naisten),(kello)
1,Jeffree Star Cosmetics The Gloss Mouthful,,,,,,"(The, Gloss)"
2,Talvisaappaat Geox D HOSMOS B ABX,,,,,,(Talvisaappaat)
3,Levis Men Vintage Stripe Yd Boxer B Bokserit M...,(Musta),"(Levi, ´)",,,,(Bokserit)
4,HP ZBook Studio 360 G5 i9-9880H 15.6inch UHD T...,,,,,,
5,Calvin Klein 99999 Miesten kello K8M2712N Sini...,(Sininen),"(Calvin, Klein)","(Ø43, mm)",(Teräs),(Miesten),(kello)
6,L.A. Girl Pro.Coverage Illuminating Foundatio...,,"(L.A., Girl)",,,,"(Coverage, Illuminating, Foundation)"
7,Pusero Helena,,(Helena),,,,(Pusero)
8,Rado True Naisten kello R27059722 Musta/Keraam...,"(Musta, /, Keraaminen)","(Rado, True)","(Ø40, mm)",,(Naisten),(kello)
9,Top Bamse P8 T-shirts Long-sleeved T-shirts Va...,(Valkoinen),(Lindex),,,,(T-shirts)


### Saving the Dataframe to csv File.

In [None]:
df_customer.to_csv('result.csv')

### Finding the Accuracy of Our Model.
To find the named entity accuracy of our custom ner model on the 50 unseen data, we calculate the precision, recall and F score.

* The precision (p) is the ratio $\frac{tp}{tp+fp}$ where tp is the number of true positives and fp the number of false positives. The precision is intuitively the ability of the classifier not to label as positive a sample that is negative.

* The recall (r) is the ratio $\frac{tp}{tp+fn}$ where tp is the number of true positives and fn the number of false negatives. The recall is intuitively the ability of the classifier to find all the positive samples.

* The F score is calculated as, $\frac{2pr}{p+r}$.

In [None]:
from spacy.gold import GoldParse
from spacy.scorer import Scorer

# Added 50 annotated data from 600000_600050
examples = [('Hugo Boss 99999 Naisten kello 1502479 Valkoinen/Kullansävytetty', {'entities': [(0, 9, 'BRAND'), (16, 23, 'GENDER'), (24, 29, 'PRODUCT'), (38, 47, 'COLOR'), (48, 63, 'MATERIAL')]}), 
            ('Jeffree Star Cosmetics The Gloss Mouthful', {'entities': [(0, 22, 'BRAND'), (27, 32, 'PRODUCT')]}),
            ('Talvisaappaat Geox  D HOSMOS B ABX', {'entities': [(0, 13, 'PRODUCT'), (14, 18, 'BRAND')]}), 
            ('Levis Men Vintage Stripe Yd Boxer B Bokserit Musta Levi´s', {'entities': [(0, 5, 'BRAND'), (6, 9, 'GENDER'), (28, 33, 'PRODUCT'), (45, 50, 'COLOR'), (36, 44, 'PRODUCT')]}), 
            ('HP ZBook Studio 360 G5 i9-9880H 15.6inch UHD Touch 16GB DDR4 512GB SSD Nvidia Quadro P2000 4GB Webcam AC+BT 6C Batt FPR W10P 3YW(SE)', {'entities': [(0, 2, 'BRAND'), (3, 8, 'PRODUCT'), (32, 40, 'SIZE/WEIGHT')]}),
            ('Calvin Klein 99999 Miesten kello K8M2712N Sininen/Teräs Ø43 mm', {'entities': [(0, 12, 'BRAND'), (19, 26, 'GENDER'), (27, 32, 'PRODUCT'), (42, 49, 'COLOR'), (50, 55, 'MATERIAL'), (56, 62, 'SIZE/WEIGHT')]}), 
            ('L.A. Girl\xa0 Pro.Coverage Illuminating Foundation\xa0 Soft Honey', {'entities': [(0, 9, 'BRAND'), (37, 47, 'PRODUCT')]}),
            ('Pusero Helena', {'entities': [(0, 6, 'PRODUCT'), (7, 13, 'BRAND')]}), 
            ('Rado True Naisten kello R27059722 Musta/Keraaminen Ø40 mm', {'entities': [(0, 9, 'BRAND'), (10, 17, 'GENDER'), (18, 23, 'PRODUCT'), (34, 39, 'COLOR'), (40, 50, 'MATERIAL'), (51, 57, 'SIZE/WEIGHT')]}),
            ('Top Bamse P8 T-shirts Long-sleeved T-shirts Valkoinen Lindex', {'entities': [(13, 21, 'PRODUCT'), (44, 53, 'COLOR'), (54, 60, 'BRAND')]}), 
            ('All Day Pique Polo Shirt Polos Short-sleeved GAP', {'entities': [(19, 24, 'PRODUCT'), (45, 48, 'BRAND')]}),
            ('Rotalla Setula 4 Season RA03 ( 205/50 R17 93W XL )', {'entities': [(0, 14, 'BRAND'), (31, 48, 'SIZE/WEIGHT')]}),
            ('Trunk Bokserit Valkoinen Tommy Hilfiger', {'entities': [(25, 39, 'BRAND'), (15, 24, 'COLOR'), (6, 14, 'PRODUCT')]}), 
            ('Sc-Dollie Kilpikonnakaulus Poolopaita Oranssi Soyaconcept', {'entities': [(0, 9, 'BRAND'), (32, 37, 'PRODUCT'), (38, 45, 'COLOR')]}), 
            ('Sandals Shoes Summer Shoes Sandals Kermanvärinen Bisgaard', {'entities': [(0, 7, 'PRODUCT'), (8, 13, 'PRODUCT'), (49, 57, 'BRAND')]}),
            ('Festina Miesten kello F16374-8 Musta/Teräs Ø40 mm', {'entities': [(0, 7, 'BRAND'), (8, 15, 'GENDER'), (16, 21, 'PRODUCT'), (31, 36, 'COLOR'), (37, 42, 'MATERIAL'), (43, 49, 'SIZE/WEIGHT')]}), 
            ('Iskunvaimennin - Sv 450008', {'entities': [(0, 14, 'PRODUCT')]}), ('Gubi-Turbo Kattovalaisin Ø62cm, Valkoinen', {'entities': [(0, 10, 'BRAND'), (16, 24, 'PRODUCT'), (25, 30, 'SIZE/WEIGHT'), (32, 41, 'COLOR')]}),
            ('Star Trading-Knot Lampshade 50 cm, Gray', {'entities': [(0, 17, 'BRAND'), (18, 27, 'PRODUCT'), (28, 33, 'SIZE/WEIGHT'), (35, 39, 'COLOR')]}),
            ('BF Goodrich Route Control T ( 285/70 R19.5 150/148J )', {'entities': [(30, 51, 'SIZE/WEIGHT'), (0, 27, 'BRAND')]}),
            ('Bralette Night & Underwear Underwear Tops Valkoinen Abercrombie & Fitch', {'entities': [(17, 26, 'PRODUCT'), (42, 51, 'COLOR'), (52, 71, 'BRAND')]}),
            ('Olivia Night & Underwear Underwear Tops Punainen Molo', {'entities': [(49, 53, 'BRAND'), (40, 48, 'COLOR'),  (15, 24, 'PRODUCT')]}),
            ('Warm Up Anorak Outerwear Jackets Anoraks Musta Lyle & Scott', {'entities': [(25, 32, 'PRODUCT'), (41, 46, 'COLOR'), (47, 59, 'BRAND')]}),
            ('Akribos XXIV 99999 Miesten kello AK849SSB Musta/Teräs Ø45 mm', {'entities': [(0, 12, 'BRAND'), (19, 26, 'GENDER'), (27, 32, 'PRODUCT'), (42, 47, 'COLOR'), (48, 53, 'MATERIAL'), (54, 60, 'SIZE/WEIGHT')]}),  
            ('Vidreamers Pure T-Shirt-Noos T-shirts & Tops Short-sleeved Valkoinen Vila', {'entities': [(16, 23, 'PRODUCT'), (29, 37, 'PRODUCT'), (59, 68, 'COLOR'), (69, 73, 'BRAND')]}),
            ('Etujousen Norra', {'entities': [(10, 15, 'BRAND'), (0, 9, 'PRODUCT')]}), 
            ('Blue Polo Short Sleeve Popover Shirt Polos Short-sleeved Sininen Eton', {'entities': [(57, 64, 'COLOR'), (65, 69, 'BRAND'), (31, 36, 'PRODUCT'), (0, 4, 'COLOR')]}), 
            ('Pleated Georgette Hihaton Pusero Paita Ganni', {'entities': [(39, 44, 'BRAND'), (33, 38, 'PRODUCT'), (26, 32, 'PRODUCT')]}), 
            ('Pusero Charlie Tank', {'entities': [(0, 6, 'PRODUCT'), (7, 19, 'BRAND')]}),
            ('Bridgestone VT01 R ( 130/90-16 RF TL 73H takapyörä )', {'entities': [(0, 11, 'BRAND'), (21, 36, 'SIZE/WEIGHT'), (41, 50, 'PRODUCT')]}), 
            ('Blouse Long-Sleeve Pitkähihainen Pusero Paita Monivärinen/Kuvioitu Gerry Weber', {'entities': [(67, 78, 'BRAND'), (40, 45, 'PRODUCT'), (33, 39, 'PRODUCT'), (0, 6, 'PRODUCT'), (46, 66, 'COLOR')]}), 
            ('Seletti-Apina Valaisin, Köysimalli, Ulkokäyttöön, Musta', {'entities': [(50, 55, 'COLOR'), (0, 7, 'BRAND'), (14, 22, 'PRODUCT')]}),
            ('Esprit 99999 Naisten kello ES1L163M0105 Vihreä/Kullansävytetty', {'entities': [(0, 6, 'BRAND'), (13, 20, 'GENDER'), (21, 26, 'PRODUCT'), (40, 62, 'COLOR')]}),
            ('So & Co New York Madison Naisten kello 5019.4', {'entities': [(25, 32, 'GENDER'), (33, 38, 'PRODUCT'), (0, 16, 'BRAND')]}),
            ('Cluse La Boheme Naisten kello CW0101201013 Hopea/Teräs Ø38 mm', {'entities': [(16, 23, 'GENDER'), (24, 29, 'PRODUCT'), (43, 54, 'COLOR'), (55, 61, 'SIZE/WEIGHT'), (0, 15, 'BRAND')]}),
            ('Sara bikini brief', {'entities': [(0, 4, 'BRAND'), (5, 11, 'PRODUCT')]}),
            ('Orange & Bergamot Three Wick Candle Tuoksukynttilä Nude Molton Brown', {'entities': [(36, 50, 'PRODUCT'), (29, 35, 'PRODUCT'), (56, 68, 'BRAND'), (0, 17, 'MATERIAL')]}), 
            ('Bluetooth kaiutin Nikkei Submarine', {'entities': [(10, 17, 'PRODUCT'), (18, 34, 'BRAND')]}), ('UA HG Armour Racer Tank, White', {'entities': [(3, 12, 'BRAND'), (25, 30, 'COLOR')]}),
            ('Julie Polvipituinen Hame Punainen Baum Und Pferdgarten', {'entities': [(20, 24, 'PRODUCT'), (25, 33, 'COLOR'), (34, 54, 'BRAND')]}),
            ('Emerald Charm Chain Choker', {'entities': [(20, 26, 'PRODUCT'), (0, 13, 'BRAND')]}),
            ('ArchitectMade-Gemini Kynttilänjalka, Ruostumaton teräs', {'entities': [(21, 35, 'PRODUCT'), (0, 20, 'BRAND'), (49, 54, 'MATERIAL')]}),
            ('Cisco Catalyst 3650 48 Port Mgig 2X40G Uplink Lan Base In Cpnt', {'entities': [(0, 5, 'BRAND'), (23, 27, 'PRODUCT'), (33, 38, 'SIZE/WEIGHT')]}),
            ('Blouses Woven Pitkähihainen Pusero Paita Keltainen Esprit Collection', {'entities': [(51, 57, 'BRAND'), (35, 40, 'PRODUCT'), (28, 34, 'PRODUCT'), (41, 50, 'COLOR')]}), 
            ('So & Co New York Madison Naisten kello 5220.3', {'entities': [(0, 16, 'BRAND'), (25, 32, 'GENDER'), (33, 38, 'PRODUCT')]}),
            ('Globen Lighting-Cube Pöytävalaisin, Musta', {'entities': [(20, 34, 'PRODUCT'), (36, 41, 'COLOR'), (0, 6, 'BRAND')]}), 
            ('Long Boots 4835 Korkeavartiset Saapikkaat Musta Billi Bi', {'entities': [(42, 47, 'COLOR'), (48, 56, 'BRAND'), (31, 41, 'PRODUCT'), (5, 10, 'PRODUCT')]}),
            ('Invicta Bolt Naisten kello 28936 Harmaa/Kullansävytetty teräs', {'entities': [(0, 12, 'BRAND'), (13, 20, 'GENDER'), (21, 26, 'PRODUCT'), (33, 55, 'COLOR'), (56, 61, 'MATERIAL')]}),
            ('BigSize Me Stringit', {'entities': [(0, 10, 'BRAND'), (11, 19, 'PRODUCT')]}), 
            ('INTEL Xeon Scalable 8256 3.80GHZ FC-LGA3647 16.5M Cache 10.4GT/sec Box CPU', {'entities': [(0, 5, 'BRAND'), (25, 32, 'SIZE/WEIGHT'), (44, 49, 'SIZE/WEIGHT'), (67, 74, 'PRODUCT')]}),
            ('gForce Action-lever Belt, 11mm, black', {'entities': [(20, 24, 'PRODUCT'), (26, 30, 'SIZE/WEIGHT'), (32, 37, 'COLOR'), (0, 6, 'BRAND')]})]


LABEL = ['COLOR','BRAND','SIZE/WEIGHT','MATERIAL','GENDER','PRODUCT']
for item in LABEL:
  scorer = Scorer()
  for input_, annot in examples:
      text_entities = []
      for entity in annot.get('entities'):
          if item in entity:
              text_entities.append(entity)
      doc_gold_text = nlp.make_doc(input_)
      gold = GoldParse(doc_gold_text, entities=text_entities)
      pred_value = nlp(input_)
      scorer.score(pred_value, gold)

  print('{} - Token accuracy : {}'.format(item, scorer.scores['token_acc']))
  print(scorer.scores['ents_per_type'][item])

COLOR - Token accuracy : 100.0
{'p': 84.0, 'r': 75.0, 'f': 79.24528301886792}
BRAND - Token accuracy : 100.0
{'p': 60.0, 'r': 48.97959183673469, 'f': 53.93258426966292}
SIZE/WEIGHT - Token accuracy : 100.0
{'p': 88.88888888888889, 'r': 53.333333333333336, 'f': 66.66666666666667}
MATERIAL - Token accuracy : 100.0
{'p': 57.14285714285714, 'r': 50.0, 'f': 53.333333333333336}
GENDER - Token accuracy : 100.0
{'p': 100.0, 'r': 90.9090909090909, 'f': 95.23809523809523}
PRODUCT - Token accuracy : 100.0
{'p': 62.745098039215684, 'r': 61.53846153846154, 'f': 62.135922330097095}


From the accuracy score results, it's evident that "GENDER" has highest accuracy followed by "SIZE/WEIGHT", "COLOR" and "PRODUCT". "MATERIAL" and "BRAND" yield least accuracy.

## Conclusion
As seen from the accuracy scores our model performs convincingly well. But we believe the performance of this model can be improved by taking certain measures. Some of those measures may include, but not limited to, the following:
* adjusting the batch sizes (trying out different batch sizes)
* ensuring better annotation practices
* using different optimizers
* trying various regularization techniques

Note: It should also be noted that our model is only effective for our data. If there's need to employ it on a different data set then fresh training is highly recommended-to fit the desired data set.