# Extract the Personal Identifiable Information (PII) using Watson NLP

<h2>Use Case</h2>

This notebook demonstrates how to extract PII entities using Watson NLP Custom train or Fine-tune models. PII extraction is the process of identifying and extracting personal information from a document or dataset. This information can include names, addresses, phone numbers, email addresses, Social Security numbers, Credit Card number, and other types of information that can be used to identify an individual. 

<h2>What you'll learn in this notebook</h2>

Watson NLP offers  fine-tune functionality for custom training. This notebooks shows:

* <b>BILSTM</b>: the BiLSTM network would take the preprocessed text as input and learn to identify patterns and relationships between words that are indicative of PII data. The BiLSTM network would then output a probability score for each word in the text, indicating the likelihood that the word is part of a PII entity. The BiLSTM network may also be trained to recognize specific entities such as names, addresses, phone numbers, email addresses, etc.


* <b>SIRE</b>: Statistical Information and Relation Extraction (SIRE) is a technique used in natural language processing (NLP) to extract specific information and relationships from text. It involves using machine learning algorithms to identify and extract structured data such as entities, attributes, and relations from unstructured text. SIRE is used in a variety of applications, including information extraction, knowledge graph construction, and question answering. SIRE typically uses supervised learning approach, where a model is trained using annotated examples of text and the corresponding structured data. The model can then be used to extract the same information from new, unseen text.

## Table of Contents


1. [Before you start](#beforeYouStart)
1. [Load Entity PII Models](#LoadModel)
1. [Preparing Training Data](#TrainingData)
   1. [Preparing Driving Licence Number Training Data](#DLNData)
   1. [Preparing More Custom PII Training Data](#CustumPII)
1. [Watson NLP Models](#NLPModels)    
   1.  [BiLSTM Fine-tuned](#BILSTMFINE)
   1.  [SIRE Fine-tuned](#SIRETune)
   1.  [SIRE Fine-Tune Model For Driving License Number](#DLNFine)
1. [Summary](#summary)

<a id="beforeYouStart"></a>
### 1. Before you start


<div class="alert alert-block alert-danger">
<b>Stop kernel of other notebooks.</b></div>

**Note:** If you have other notebooks currently running with the _Default Python 3.x environment, **stop their kernels** before running this notebook. All these notebooks share the same runtime environment, and if they are running in parallel, you may encounter memory issues. To stop the kernel of another notebook, open that notebook, and select _File > Stop Kernel_.

<div class="alert alert-block alert-warning">
<b>Set Project token.</b></div>

Before you can begin working on this notebook in Watson Studio in Cloud Pak for Data as a Service, you need to ensure that the project token is set so that you can access the project assets via the notebook.

When this notebook is added to the project, a project access token should be inserted at the top of the notebook in a code cell. If you do not see the cell above, add the token to the notebook by clicking **More > Insert project token** from the notebook action bar.  By running the inserted hidden code cell, a project object is created that you can use to access project resources.

![ws-project.mov](https://media.giphy.com/media/jSVxX2spqwWF9unYrs/giphy.gif)

<div class="alert alert-block alert-info">
<b>Tip:</b> Cell execution</div>

Note that you can step through the notebook execution cell by cell, by selecting Shift-Enter. Or you can execute the entire notebook by selecting **Cell -> Run All** from the menu.

In [1]:
!pip install faker

Collecting faker
  Downloading Faker-17.0.0-py3-none-any.whl (1.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m66.1 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: faker
Successfully installed faker-17.0.0


In [2]:
import json
import pandas as pd
import watson_nlp
import random
import string
from faker import Faker
from watson_nlp import data_model as dm
from watson_nlp.toolkit.entity_mentions_utils import prepare_train_from_json

In [3]:
# Silence Tensorflow warnings
import tensorflow as tf
tf.get_logger().setLevel('ERROR')
tf.autograph.set_verbosity(0)

<a id="LoadModel"></a>
### 2. Load Entity PII Models

In [4]:
# Load a syntax model to split the text into sentences and tokens
syntax_model = watson_nlp.load(watson_nlp.download('syntax_izumo_en_stock'))
# Load bilstm model in WatsonNLP
bilstm_model = watson_nlp.load(watson_nlp.download('entity-mentions_bilstm_en_pii'))
# Download the GloVe model to be used as embeddings in the BiLSTM
glove_model = watson_nlp.load(watson_nlp.download('embedding_glove_en_stock'))
# Download the algorithm template
mentions_train_template = watson_nlp.load(watson_nlp.download('file_path_entity-mentions_sire_multi_template-crf'))
# Download the feature extractor
default_feature_extractor = watson_nlp.load(watson_nlp.download('feature-extractor_rbr_entity-mentions_sire_en_stock'))
# Load rbr model in WatsonNLP
rbr_model = watson_nlp.load(watson_nlp.download('entity-mentions_rbr_multi_pii'))

<a id="TrainingData"></a>
### 3. Preparing Training Data

Let's generate sentences using Faker Library. Ideally, the sentences would include Driving Licence Number, Name, SSN, and Credit Card Number, Educational Details, Employee ID, Salary in context.

<a id="DLNData"></a>
### 3.1 Preparing Driving Licence Number Training Data

In [5]:
fake = Faker(locale='en_US') 

In [6]:
#Driving Licence Number

#Colorado
def generate_driving_license_Colarado():
    license_number = fake.numerify('##-###-####')
    name = fake.name()
    state = "Colarado"
    return license_number, name, state

#Alaska & Alabama
def generate_driving_license_Alaska():
    license_number = fake.numerify('#######')
    name = fake.name()
    state = random.choice(["Alaska","Alabama"])
    return license_number, name, state

#Arkansas & South Carolina
def generate_driving_license_SCarolina():
    license_number = f"{9}{fake.numerify('########')}"
    name = fake.name()
    state = random.choice(["Arkansas","South Carolina"])
    return license_number, name, state

#California
def generate_driving_license_California():
    license_number = f"{'A'}{fake.numerify('########')}"
    name = fake.name()
    state = random.choice(["California"])
    return license_number, name, state

#Hawaii
def generate_driving_license_Hawaii():
    license_number = f"{'H'}{fake.numerify('########')}"
    name = fake.name()
    state = random.choice(["Hawaii"])
    return license_number, name, state

#New York
def generate_driving_license_New_York():
    license_number = fake.numerify('### ### ###')
    name = fake.name()
    state = random.choice(["New York"])
    return license_number, name, state

#North Carolina
def generate_driving_license_NCarolina():
    license_number = fake.numerify('############')
    name = fake.name()
    state = random.choice(["North Carolina"])
    return license_number, name, state

#California
def generate_driving_license_California():
    license_number = f"{'A'}{fake.numerify('########')}"
    name = fake.name()
    state = random.choice(["California"])
    return license_number, name, state

#Texas
def generate_driving_license_Texas():
    license_number = f"{'A'}{fake.numerify('########')}"
    name = fake.name()
    state = random.choice(["California"])
    return license_number, name, state


In [7]:
def format_DLN_data():
    State_DLN = random.choice([generate_driving_license_Colarado(), generate_driving_license_Alaska(), generate_driving_license_SCarolina(), generate_driving_license_California(), generate_driving_license_Hawaii(), generate_driving_license_New_York(), generate_driving_license_NCarolina(), generate_driving_license_Texas()])
    driving_license, name, state = State_DLN

    text_1 = "My name is %s I belong to the %s , My Driving License number is %s." %(name, state, driving_license)
    text_2 = "I am %s. %s this is my driving license number. I am from %s state." %(name, driving_license, state)
    text_3 = "Hello, My self %s, I am living in %s and my driving License number is %s"  %(name, state, driving_license)
    text = random.choice([text_1, text_2, text_3])
    
    
    name_begin = text.find(name)
    name_end = name_begin + len(name)
    state_begin = text.find(state)
    state_end = state_begin + len(state)
    driving_license_begin = text.find(driving_license)
    driving_license_end = driving_license_begin + len(driving_license)
    
    
    data = {
                "text": text,
                "mentions": [
                    {
                        "location": {
                            "begin": name_begin,
                            "end": name_end
                        },
                        "text": name,
                        "type": "Name"
                    },
                    {
                        "location": {
                            "begin": state_begin,
                            "end": state_end
                        },
                        "text": state,
                        "type": "state"
                    },
                    {
                        "location": {
                            "begin": driving_license_begin,
                            "end": driving_license_end
                        },
                        "text": driving_license,
                        "type": "driving_license_number"
                    },
                ]   
            }
    
    return data

In [8]:
format_DLN_data()

{'text': 'My name is Jeffrey Stuart I belong to the Alabama , My Driving License number is 5716270.',
 'mentions': [{'location': {'begin': 11, 'end': 25},
   'text': 'Jeffrey Stuart',
   'type': 'Name'},
  {'location': {'begin': 42, 'end': 49}, 'text': 'Alabama', 'type': 'state'},
  {'location': {'begin': 81, 'end': 88},
   'text': '5716270',
   'type': 'driving_license_number'}]}

In [11]:
#Prepared and store Training dataset for Driving License dataset
train_list_faker = []
for i in range(0, 10000):
    train_list_faker.append(format_DLN_data())

with open('PII_faker_LicenseNumber_text_train.json', 'w') as f:
    json.dump(train_list_faker, f)
project.save_data('PII_faker_LicenseNumber_text_train.json', data=json.dumps(train_list_faker), overwrite=True)

{'file_name': 'PII_faker_LicenseNumber_text_train.json',
 'message': 'File saved to project storage.',
 'bucket_name': 'watsoncore-donotdelete-pr-olkxvfa8bk0pb1',
 'asset_id': 'a83ae8a8-3fcf-4761-a966-e731727c0e55'}

<a id="CustumPII"></a>
### 3.2 Preparing More Custom PII Training Data 

* Name
* Social Security Number 
* Credit Card Number 
* Employee ID
* Education Details 
* Salary


In [12]:
def format_data():  
        #Generate a random
        name = fake.name() 

        #Generate a random SSN 
        ssn = fake.ssn()

        #Generate a random CCN 
        ccn = fake.credit_card_number()

        # Generate a random degree level
        degree_level = fake.random_element(elements=('Bachelor\'s', 'Master\'s', 'Doctorate'))

        # Generate a random field of study
        field_of_study = fake.random_element(elements=('Computer Science', 'Engineering', 'Business', 'Psychology','Medical'))

        # Generate a random prefix with 1-2 alphabets
        prefix = ''.join(random.choices(string.ascii_uppercase, k=random.randint(1, 2)))
        # Generate a random employee ID with the prefix and a random integer
        employee_id = f"{prefix}{fake.random_int(min=10000, max=99999):05d}"

        # Generate salary using faker
        salary = str(fake.pyfloat(left_digits=5, right_digits=2, positive=True, min_value=1000, max_value=5000))


        text_1 = """My name is %s, and my social security number is %s. Here's the number to my Visa credit card, 
        %s. I studied %s in %s, My employee id is %s and salary is %s""" % (name, ssn, ccn,degree_level,field_of_study,employee_id,salary)

        text_2 = """%s is my social security number. The name on my credit card %s is %s. 
        My employee id is %s and I done my %s in %s, I am earning %s per month""" % (ssn, ccn, name,employee_id,degree_level, field_of_study,salary)

        text_3 = """My monthly Earning is %s and employee code is %s, I studied %s in %s. 
        My credit card number is %s and social security number is %s, I am %s""" %(salary,employee_id,degree_level,field_of_study,ccn,ssn,name)


        text = random.choice([text_1, text_2,text_3])

        name_begin = text.find(name)
        name_end = text.find(name) + len(name)

        ssn_begin = text.find(ssn)
        ssn_end = text.find(ssn) + len(ssn)

        ccn_begin = text.find(ccn)
        ccn_end = text.find(ccn) + len(ccn)

        field_of_study_begin = text.find(field_of_study)
        field_of_study_end = field_of_study_begin + len(field_of_study)

        degree_level_begin = text.find(degree_level)
        degree_level_end = degree_level_begin + len(degree_level)

        employee_id_begin = text.find(employee_id)
        employee_id_end = employee_id_begin + len(employee_id)

        salary_begin = text.find(salary)
        salary_end = salary_begin + len(salary)

        data = {
                    "text": text,
                    "mentions": [
                        {
                            "location": {
                                "begin": field_of_study_begin,
                                "end": field_of_study_end
                            },
                            "text": field_of_study,
                            "type": "field_of_study"
                        },
                        {
                            "location": {
                                "begin": degree_level_begin,
                                "end": degree_level_end
                            },
                            "text": degree_level,
                            "type": "degree_level"
                        },
                        {
                            "location": {
                                "begin": employee_id_begin,
                                "end": employee_id_end
                            },
                            "text": employee_id,
                            "type": "employee_id"
                        },
                        {
                            "location": {
                                "begin": salary_begin,
                                "end": salary_end
                            },
                            "text": salary,
                            "type": "salary"
                        },
                        {
                            "location": {
                                "begin": name_begin,
                                "end": name_end
                            },
                            "text": name,
                            "type": "Name"
                        },
                        {
                            "location": {
                                "begin": ssn_begin,
                                "end": ssn_end
                            },
                            "text": ssn,
                            "type": "SocialSecurityNumber"
                        },
                        {
                            "location": {
                                "begin": ccn_begin,
                                "end": ccn_end
                            },
                            "text": ccn,
                            "type": "CreditCardNumber"
                        }
                        ]   
                    }
        return data

In [13]:
format_data()

{'text': '075-29-0173 is my social security number. The name on my credit card 30595326456087 is Brandy Hebert. \n        My employee id is LW35502 and I done my Doctorate in Engineering, I am earning 4475.16 per month',
 'mentions': [{'location': {'begin': 164, 'end': 175},
   'text': 'Engineering',
   'type': 'field_of_study'},
  {'location': {'begin': 151, 'end': 160},
   'text': 'Doctorate',
   'type': 'degree_level'},
  {'location': {'begin': 129, 'end': 136},
   'text': 'LW35502',
   'type': 'employee_id'},
  {'location': {'begin': 190, 'end': 197},
   'text': '4475.16',
   'type': 'salary'},
  {'location': {'begin': 87, 'end': 100},
   'text': 'Brandy Hebert',
   'type': 'Name'},
  {'location': {'begin': 0, 'end': 11},
   'text': '075-29-0173',
   'type': 'SocialSecurityNumber'},
  {'location': {'begin': 69, 'end': 83},
   'text': '30595326456087',
   'type': 'CreditCardNumber'}]}

In [14]:
#Prepared and store Training dataset for Custom PII entities 
train_list_faker = []
for i in range(0, 10000):
    train_list_faker.append(format_data())

with open('faker_PII_text_train.json', 'w') as f:
    json.dump(train_list_faker, f)
project.save_data('faker_PII_text_train.json', data=json.dumps(train_list_faker), overwrite=True)

{'file_name': 'faker_PII_text_train.json',
 'message': 'File saved to project storage.',
 'bucket_name': 'watsoncore-donotdelete-pr-olkxvfa8bk0pb1',
 'asset_id': 'e952dfbd-f642-4712-b7a5-deae8425af2a'}

Save the sentences into a json training file and a json dev file. This will save the file to the runtime local as well as the project data assets.

In [15]:
#Prepared and store Training dataset for Custom PII entities
test_list_faker = []
for i in range(0, 1000):
    test_list_faker.append(format_data())

with open('faker_PII_text_test.json', 'w') as f:
    json.dump(test_list_faker, f)
project.save_data('faker_PII_text_test.json', data=json.dumps(test_list_faker), overwrite=True)

{'file_name': 'faker_PII_text_test.json',
 'message': 'File saved to project storage.',
 'bucket_name': 'watsoncore-donotdelete-pr-olkxvfa8bk0pb1',
 'asset_id': '0059c8e9-2566-4288-a1c2-092dc29d418e'}

Since the data is already formatted correctly, the following process is needed to read the JSON data files from Watson Studio project assets and save them to the runtime working directory where they will be used as input for training the models.

In [16]:
train_data = dm.DataStream.from_json_array("faker_PII_text_train.json")
train_iob_stream = prepare_train_from_json(train_data, syntax_model)
dev_data = dm.DataStream.from_json_array("faker_PII_text_test.json")
dev_iob_stream = prepare_train_from_json(dev_data, syntax_model)

In [17]:
text = pd.read_json('faker_PII_text_test.json')['text'][1]
text

"My monthly Earning is 3225.81 and employee code is SG82184, I studied Master's in Engineering. \n        My credit card number is 630494306388 and social security number is 051-03-1802, I am Edward Frost"

<a id="NLPModels"></a>
### 4. Watson NLP Models

<a id="BILSTMFINE"></a>

### 4.1 BiLSTM Fine-tuned

In [14]:
#Fine-Tune BiLSTM model using Custom PII
bilstm_custom = bilstm_model.train(train_iob_stream, 
                                   dev_iob_stream, 
                                   embedding=glove_model.embedding,
                                   num_train_epochs=5,
                                   num_conf_epochs=5, 
                                   checkpoint_interval=5, 
                                   learning_rate=0.005,
                                   lstm_size=16, 
                                  )



In [15]:
#Save the Trained block model as a workflow model 
from watson_nlp.workflows.entity_mentions.bilstm import BiLSTM 

mentions_workflow = BiLSTM(syntax_model, bilstm_custom)

In [16]:
project.save_data('bilstm_pii_workflow_custom', data=mentions_workflow.as_file_like_object(), overwrite=True)

{'file_name': 'bilstm_pii_workflow_custom',
 'message': 'File saved to project storage.',
 'bucket_name': 'watsoncore-donotdelete-pr-olkxvfa8bk0pb1',
 'asset_id': '7d7b852c-8369-40bc-9e90-a42cd59aae5a'}

In [18]:
# Run the BILSTM workflow model
bilstm_result = mentions_workflow.run(text)

for i in bilstm_result.mentions:
    print("Text: ", i.span.text.ljust(15, " "), "Type: ", i.type)

Text:  847-03-6699     Type:  SocialSecurityNumber
Text:  2284972848170547 Type:  CreditCardNumber
Text:  Mark Davis      Type:  Name
Text:  US15892         Type:  employee_id
Text:  Doctorate       Type:  degree_level
Text:  Engineering     Type:  field_of_study
Text:  2508.64         Type:  salary


Now you are able to run the trained models on new data. You will run the models on the test data so that the results can also be used for model evaluation.

Watson NLP includes methods for quality testing supported models. Given a model and test data, a quality report can be generated. The following example includes the steps required to generate a quality report for a BiLSTM entity mention extactor model. The same example can be applied to any entity mention extractor model.

In [116]:
# Execute the model and generate the quality report
preprocess_func = lambda raw_doc: syntax_model.run(raw_doc)
quality_report = bilstm_custom.evaluate_quality('faker_PII_text_test.json', 
                                               preprocess_func)

# Print the quality report
print(json.dumps(quality_report, indent=4))



{
    "per_class_confusion_matrix": {
        "salary": {
            "true_positive": 1000,
            "false_positive": 0,
            "false_negative": 0,
            "precision": 1.0,
            "recall": 1.0,
            "f1": 1.0
        },
        "CreditCardNumber": {
            "true_positive": 1000,
            "false_positive": 0,
            "false_negative": 0,
            "precision": 1.0,
            "recall": 1.0,
            "f1": 1.0
        },
        "employee_id": {
            "true_positive": 1000,
            "false_positive": 0,
            "false_negative": 0,
            "precision": 1.0,
            "recall": 1.0,
            "f1": 1.0
        },
        "field_of_study": {
            "true_positive": 1000,
            "false_positive": 0,
            "false_negative": 0,
            "precision": 1.0,
            "recall": 1.0,
            "f1": 1.0
        },
        "Name": {
            "true_positive": 1000,
            "false_positive": 0,
         

<a id="SIRETune"></a>

### 4.2 SIRE Fine-tuned


In [21]:
#help(watson_nlp.blocks.entity_mentions.SIRE)

In [18]:
#Fine-Tune SIRE using custom PII
sire_custom = watson_nlp.blocks.entity_mentions.SIRE.train(train_iob_stream, 
                                                           'en', 
                                                           mentions_train_template,
                                                           feature_extractors=[default_feature_extractor])

Initializing viterbi classifier
[32m[MEVitClassifier::initModel][0m MEVitClassifier initialized.
[32m[MEVitClassifier2::initModel][0m model initialized.
Get Feature str 1048536
Done get feature str 1048536
done. [78[33mg[0m62[33mm[0m136[33mk[0m,13[33mg[0m481[33mm[0m960[33mk[0m]
gramSize = 2
number of processes: 5
Initial processing:  (# of words: 432429, # of sentences: 29994)
senIndex[1] = 6021, wordIndex = 86490
senIndex[2] = 12018, wordIndex = 172973
senIndex[3] = 18007, wordIndex = 259473
senIndex[4] = 24007, wordIndex = 345945
senIndex[5] = 29993, wordIndex = 432429
[32m[ME_CRF::scaleModel][0m Updater -- l1=[32m0.1[0m, l2=[32m0.005[0m, history size=[32m5[0m, progress windows size [32m20[0m
 Iteration           Obj             WErr                         Timing       %Eff        Per thread timing
              1099683.95     13.75/100.00             E:2.18 s, M:0.18 s.       1.00 [m:2.12, M:2.15, av:2.14]
         0   716219.43     32.16/ 88.81           

In [23]:
#Save the Trained block model as a workflow model 
from watson_nlp.workflows.entity_mentions.sire import SIRE

sire_workflow = SIRE("en",syntax_model,sire_custom)

In [24]:
project.save_data('sire_pii_workflow_custom', data=sire_workflow.as_file_like_object(), overwrite=True)

Saved 36986 features.


{'file_name': 'sire_pii_workflow_custom',
 'message': 'File saved to project storage.',
 'bucket_name': 'watsoncore-donotdelete-pr-olkxvfa8bk0pb1',
 'asset_id': 'be83a662-e384-4783-b0de-2d0858e44bcc'}

In [25]:
#syntax_result = syntax_model.run(text)
sire_result = sire_workflow.run(text)

for i in sire_result.mentions:
    print("Text: ", i.span.text.ljust(15, " "), "Type: ", i.type)

Text:  3225.81         Type:  salary
Text:  SG82184         Type:  employee_id
Text:  Master's        Type:  degree_level
Text:  Engineering     Type:  field_of_study
Text:  630494306388    Type:  CreditCardNumber
Text:  051-03-1802     Type:  SocialSecurityNumber
Text:  Edward Frost    Type:  Name


<a id="DLNFine"></a>
### 4.3 SIRE Fine-Tune Model For Driving License Number 

In [21]:
#load the DLN dataset
train_data = dm.DataStream.from_json_array("PII_faker_LicenseNumber_text_train.json")
train_iob_stream = prepare_train_from_json(train_data, syntax_model)

dev_data = dm.DataStream.from_json_array("PII_faker_LicenseNumber_text_train.json")
dev_iob_stream = prepare_train_from_json(dev_data, syntax_model)

Download the Custom RBR rules for Driving License Number (Generated by Elyra Visual NLP Editor)

In [22]:
#Fine-Tune SIRE using Driving License number PII
sire_DLN_custom = watson_nlp.blocks.entity_mentions.SIRE.train(train_iob_stream, 
                                                           'en', 
                                                           mentions_train_template,
                                                           feature_extractors=[default_feature_extractor])

Initializing viterbi classifier
[32m[MEVitClassifier::initModel][0m MEVitClassifier initialized.
[32m[MEVitClassifier2::initModel][0m model initialized.
Get Feature str 405250
Done get feature str 405250
done. [76[33mg[0m73[33mm[0m756[33mk[0m,10[33mg[0m44[33mm[0m36[33mk[0m]
gramSize = 2
number of processes: 5
Initial processing:  (# of words: 201601, # of sentences: 16570)
senIndex[1] = 3382, wordIndex = 40328
senIndex[2] = 6652, wordIndex = 80652
senIndex[3] = 9984, wordIndex = 120972
senIndex[4] = 13260, wordIndex = 161285
senIndex[5] = 16569, wordIndex = 201601
[32m[ME_CRF::scaleModel][0m Updater -- l1=[32m0.1[0m, l2=[32m0.005[0m, history size=[32m5[0m, progress windows size [32m20[0m
 Iteration           Obj             WErr                         Timing       %Eff        Per thread timing
               388303.43     19.65/ 81.21             E:0.67 s, M:0.05 s.       1.00 [m:0.65, M:0.66, av:0.66]
         0   210151.93     27.29/100.00             E:0.6

In [25]:
text1="Hello, My self Tracy Arias, I am living in Alaska and my driving License number is 9839434"
text2="Hello, My self Shane Escobar, I am living in New York and my driving License number is 052 289 084"
text3="Hello, My self Laura Parrish, I am living in Colarado and my driving License number is 25-157-3852"
text4="My name is Curtis Mccullough I belong to the Alabama , My Driving License number is 1470583?"
text5="I am Randall Barton. H45768237 this is my driving license number. I am from Hawaii state."
text6="Hello, My self Michael Peterson, I am living in Colarado and my driving License number is 87-361-4145"
text7="Hello, My self Ms. Jennifer Hart, I am living in North Carolina and my driving License number is 844144533108"
text8="Hello, My self Derek Martin, I am living in California and my driving License number is A06798902"
text9="I am Lauren Martinez. 493 671 140 this is my driving license number. I am from New York state."

all_test=[text1,text2,text3,text4,text5,text6,text7,text8,text9]

In [33]:
t=1
for test in all_test:
    syntax_result = syntax_model.run(test)
    sire_DLN_result = sire_DLN_custom.run(syntax_result)

    for i in sire_DLN_result.mentions:
        print("Text"+str(t), i.span.text.ljust(15, " "), "Type: ", i.type)
    print("\n")
    t+=1  

Text1 Tracy Arias     Type:  Name
Text1 Alaska          Type:  state
Text1 9839434         Type:  driving_license_number


Text2 Shane Escobar   Type:  Name
Text2 New York        Type:  state
Text2 052 289 084     Type:  driving_license_number


Text3 Laura Parrish   Type:  Name
Text3 Colarado        Type:  state
Text3 25-157-3852     Type:  driving_license_number


Text4 Curtis Mccullough Type:  Name
Text4 Alabama         Type:  state
Text4 1470583?        Type:  driving_license_number


Text5 Randall Barton  Type:  Name
Text5 H45768237       Type:  driving_license_number
Text5 Hawaii          Type:  state


Text6 Michael Peterson Type:  Name
Text6 Colarado        Type:  state
Text6 87-361-4145     Type:  driving_license_number


Text7 Ms. Jennifer Hart Type:  Name
Text7 North Carolina  Type:  state
Text7 844144533108    Type:  driving_license_number


Text8 Derek Martin    Type:  Name
Text8 California      Type:  state
Text8 A06798902       Type:  driving_license_number


Text9 Laure

<a id="summary"></a>
## 5. Summary

<span style="color:blue">This notebook shows you how to use the Watson NLP library and how quickly and easily you can train and run different PII extraction models using Watson NLP.</span>

Please note that this content is made available to foster Embedded AI technology adoption. The content may include systems & methods pending patent with USPTO and protected under US Patent Laws. For redistribution of this content, IBM will use release process. For any questions please log an issue in the [GitHub](https://github.com/ibm-build-labs/Watson-NLP). 

Developed by IBM Build Lab 

Copyright - 2022 IBM Corporation 