<a href="https://colab.research.google.com/github/everMitta/genai-samples/blob/main/notebooks/genai_colab_lab1and2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Setup Environment
The following code loads the environment variables required to run this notebook.


In [None]:
FILE="GenAI Lab 1 and 2"

import warnings, os
os.environ['PIP_ROOT_USER_ACTION'] = 'ignore'
warnings.filterwarnings("ignore", category=UserWarning, module='huggingface_hub.utils._token')
warnings.filterwarnings("ignore", message="torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.", category=UserWarning, module='transformers')


! pip install -qqq git+https://github.com/elastic/notebook-workshop-loader.git@main
from notebookworkshoploader import loader
import os
from dotenv import load_dotenv

if os.path.isfile("../env"):
    load_dotenv("../env", override=True)
    print('Successfully loaded environment variables from local env file')
else:
    loader.load_remote_env(file=FILE, env_url="https://notebook-workshop-api-voldmqr2bq-uc.a.run.app")

# Lab 1-1: Using Transformer Models

In this lab we will
* Intro to Google Colab - Hello World, importing python libraries
* Caching the download of a smaller LLM
* Using a basic transformer models locally



## Step 1: Hit play on the next code **sample**

In [None]:
print("Hello World")

## Step 2: Use ! to execute a shell command

In [None]:
! echo "The shell thinks the Current Directory is: $(pwd)"

## Step 3: Environment setup

First let us import some Python libraries we'll use in the first lab module.

In [7]:
! pip install -qqq --upgrade pip
! pip freeze | grep 'torch @ https://download.pytorch.org/whl/cu121/torch-2.2.1%2Bcu121-cp310-cp310-linux_x86_64.whl' \
    || pip install -qqq torch==2.3.0
! pip install -qqq --upgrade transformers==4.36.2
! pip install -qqq python-dotenv==1.0.0
! pip install -qqq tiktoken==0.5.2 cohere==4.38 openai==1.3.9          ## for later in the lab

[0m

## Step 4: Utility functions
Some utility functions that are good to keep on hand

In [8]:
import json
# pretty printing JSON objects
def json_pretty(input_object):
  print(json.dumps(input_object, indent=4))


import textwrap
# wrap text when printing, because colab scrolls output to the right too much
def wrap_text(text, width):
    wrapped_text = textwrap.wrap(text, width)
    return '\n'.join(wrapped_text)



## Step 5: Download sentiment analysis model from HuggingFace

We'll use the Huggingface Transformer library to download and ready an Open Source model called DistilBERT which can be used for sentiment analysis.

* Details of the model can be found on its [Hugging Face Page](https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english)
* This model is pretrained to determine if an input text is of *POSITIVE* or *NEGATIVE* sentiment and makes a good intro example to AI models.
* Note we are caching the model files in a folder called ```llm_download_cache``` which will help us not have to re-download the files again within the connection to this runtime. You can see the download in the filesystem (using the left hand side menu)



In [9]:
import torch
import tqdm
import json
import os
from transformers import (pipeline,
  DistilBertTokenizer,
  DistilBertForSequenceClassification)

# Set the cache directory
cache_directory = "llm_download_cache"

# Create the cache directory if it doesn't exist
if not os.path.exists(cache_directory):
    os.makedirs(cache_directory)

model_id = "distilbert-base-uncased-finetuned-sst-2-english"
sentiment_tokenizer = DistilBertTokenizer.from_pretrained(
    model_id, cache_dir=cache_directory)
sentiment_model = DistilBertForSequenceClassification.from_pretrained(
    model_id, cache_dir=cache_directory)



tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]


## Step 6: Run sentiment analysis
Okay! let's run the model ```sentiment_model``` on two pieces of sample text.

In [10]:
## With the distilbert model downloaded and cached we can call it for
## sentiment analysis

# Define the sentiment analysis pipeline
sentiment_classifier = pipeline("sentiment-analysis",
                                model=sentiment_model,
                                tokenizer=sentiment_tokenizer,
                                device='cpu')
#two samples
classifier_results = sentiment_classifier([
    "My dog is so cute, I love him.",

    "I am very sorry to inform you that the tax\
     administration has decided to audit you."
])

json_pretty(classifier_results)

[
    {
        "label": "POSITIVE",
        "score": 0.9998550415039062
    },
    {
        "label": "NEGATIVE",
        "score": 0.9991757273674011
    }
]


### 🫵 Try it yourself - Get Creative 🫵
Try some of your own examples.
Note, AI models are subject to bias. The model card for this model goes into pretty good detail on the issue. [Read more here](https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english#risks-limitations-and-biases)

In [16]:
your_classifier_results = sentiment_classifier([
    "I'm so mad at you",
    "You have -$3 in net profit"
])
json_pretty(your_classifier_results)

[
    {
        "label": "NEGATIVE",
        "score": 0.99870765209198
    },
    {
        "label": "NEGATIVE",
        "score": 0.8751727938652039
    }
]


## Step 7: Generative LLM - Simple and Local - Download Flan T5

Let's start with the Hello World of generative AI examples: completing a sentence. For this we'll install a fine tuned Flan-T5 variant model. ([LaMini-T5 ](https://huggingface.co/MBZUAI/LaMini-T5-738M))

Note, while this is a smaller checkpoint of the model, it is still a 900 MB download.  We'll cache the files in the same folder.



In [17]:
## Let's play with something a little bigger that can do a text completion
## This is a 900 MB download and takes some RAM to run, but it works CPU only

from transformers import pipeline
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

# model_name = "MBZUAI/LaMini-Flan-T5-77M"
model_name = "MBZUAI/LaMini-T5-223M"
# model_name = "MBZUAI/LaMini-T5-738M"

# Set the cache directory
cache_directory = "llm_download_cache"

llm_tokenizer = AutoTokenizer.from_pretrained(model_name,
                                              cache_dir=cache_directory)
llm_model = AutoModelForSeq2SeqLM.from_pretrained(model_name,
                                                  cache_dir=cache_directory)

llm_pipe = pipeline(
        "text2text-generation",
        model=llm_model,
        tokenizer=llm_tokenizer,
        max_length=100
    )


tokenizer_config.json:   0%|          | 0.00/2.32k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.48k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/892M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/142 [00:00<?, ?B/s]

## Step 8: Generate text completions, watch for Hallucinations

In [18]:
countries = [
    "United Kingdom",
    "France",
    "People's Republic of China",
    "United States",
    "Ecuador",
    "Freedonia", ## high hallucination potential
    "Faketopia"  ## high hallucination potential
    ]

for country in countries:
    input_text = f"The capital of the {country} is"
    output = llm_pipe(input_text)
    completed_sentence = f"\033[94m{input_text}\033[0m {output[0]['generated_text']}"
    print(completed_sentence)

[94mThe capital of the United Kingdom is[0m London.
[94mThe capital of the France is[0m Paris.
[94mThe capital of the People's Republic of China is[0m Beijing.
[94mThe capital of the United States is[0m Washington, D.C.
[94mThe capital of the Ecuador is[0m Quito.
[94mThe capital of the Freedonia is[0m The capital of Freedonia is Freedonia.
[94mThe capital of the Faketopia is[0m The capital of Faketopia is Cairo.


### 🫵 Try it yourself - Get Creative 🫵
Try some of your own examples.
This thing isn't super smart without fine tuning, but it can handle some light context injection and prompt engineering. We'll learn more about those subjects in later modules.

Notice the difference between asking a specific question and phrasing a completion
* "Who is the Prime Minister of the UK?"
* "The current Prime Minister of the united kingdom is "

In [19]:
prompt_text = "The current Prime Minister of the United Kingdom is" ## high stale data potential
output = llm_pipe(prompt_text)
completed_prompt = f"\033[94m{prompt_text}\033[0m {output[0]['generated_text']}"
print(completed_prompt)

[94mThe current Prime Minister of the United Kingdom is[0m Boris Johnson.


🛑 Stop Here 🛑

This Ends Lab 1-1
<hr/>

# Lab 2-1: Prompts and Basic Chatbots

* Using langchain with local LLM
* Connect to Open AI
* Using a memory window to create a txt-only GPT conversation

## Step 1: Using the OpenAI python library

❗ Note: if you restarted your google Colab, you may need to re-run the first stup step back and the very top before coming back here ❗

In [20]:
import os, secrets, requests
import openai
from openai import OpenAI
from requests.auth import HTTPBasicAuth

#if using the Elastic AI proxy, then generate the correct API key
if os.environ['ELASTIC_PROXY'] == "True":

    if "OPENAI_API_TYPE" in os.environ: del os.environ["OPENAI_API_TYPE"]

    #generate and share "your" unique hash
    os.environ['USER_HASH'] = secrets.token_hex(nbytes=6)
    print(f"Your unique user hash is: {os.environ['USER_HASH']}")

    #get the current API key and combine with your hash
    os.environ['OPENAI_API_KEY'] = f"{os.environ['OPENAI_API_KEY']} {os.environ['USER_HASH']}"
else:
    openai.api_type = os.environ['OPENAI_API_TYPE']
    openai.api_version = os.environ['OPENAI_API_VERSION']

openai.api_key = os.environ['OPENAI_API_KEY']
openai.api_base = os.environ['OPENAI_API_BASE']
openai.default_model = os.environ['OPENAI_API_ENGINE']

import ipywidgets as widgets
from IPython.display import display

class NotebookChatExperience:
    def __init__(self, ai_response_function, ai_name = "AI"):
        self.ai_name = ai_name
        self.ai_response_function = ai_response_function
        self.chat_history = widgets.Textarea(
            value='',
            placeholder='Chat history will appear here...',
            description='Chat:',
            disabled=True,
            layout=widgets.Layout(width='700px', height='300px')  # Adjust the size as needed
        )
        self.user_input = widgets.Text(
            value='',
            placeholder='Type your message here...',
            description='You:',
            disabled=False,
            layout=widgets.Layout(width='700px')  # Adjust the size as needed
        )
        self.user_input.on_submit(self.on_submit)
        display(self.chat_history, self.user_input)

    def on_submit(self, event):
        user_message = self.user_input.value
        ai_name = self.ai_name
        self.chat_history.value += f"\nYou: {user_message}"
        ai_message = self.ai_response_function(user_message)
        self.chat_history.value += f"\n{ai_name}: {ai_message}"
        self.user_input.value = ''  # Clear input for next message

    def clear_chat(self):
        self.chat_history.value = ''  # Clear the chat history

## ********** Example usage:

## ********** Define a simple AI response function
# def simple_ai_response(user_message):
    # return f"AI > Echo: {user_message}"

## ********** Create an instance of the chat interface
#chat_instance = NotebookChatExperience(simple_ai_response)

Your unique user hash is: e429ef598bc2


## Step 2: Test call to ChatGPT

In [21]:
# Call the OpenAI ChatCompletion API
def chatCompletion(messages, max_tokens=100):
    client = OpenAI(api_key=openai.api_key, base_url=openai.api_base)
    completion = client.chat.completions.create(
        model=openai.default_model,
        max_tokens=max_tokens,
        messages=messages
    )
    return completion

prompt="Hello, is ChatGPT online and working?"

messages = [{"role": "user", "content": prompt}]

completion = chatCompletion(messages)

response_text = completion.choices[0].message.content

print(wrap_text(completion.json(),70))

print("\n", wrap_text(response_text,70))

{"id":"org-YDBltVUkkpzygSHsLMBSQYzh","choices":[{"finish_reason":"stop
","index":0,"message":{"content":"Yes, I am an AI language model
developed by OpenAI, and I am always online and ready to assist you.
How can I help you today?","role":"assistant","function_call":null,"to
ol_calls":null},"content_filter_results":{"hate":{"filtered":false,"se
verity":"safe"},"self_harm":{"filtered":false,"severity":"safe"},"sexu
al":{"filtered":false,"severity":"safe"},"violence":{"filtered":false,
"severity":"safe"}}}],"created":1717077108,"model":"gpt-35-
turbo","object":"chat.completion","system_fingerprint":null,"usage":{"
completion_tokens":31,"prompt_tokens":18,"total_tokens":49},"prompt_fi
lter_results":[{"prompt_index":0,"content_filter_results":{"hate":{"fi
ltered":false,"severity":"safe"},"self_harm":{"filtered":false,"severi
ty":"safe"},"sexual":{"filtered":false,"severity":"safe"},"violence":{
"filtered":false,"severity":"safe"}}}]}

 Yes, I am an AI language model developed by OpenAI, an


## Step 3: A conversation loop
Feeding user input in for single questions is easy

In [22]:
def openai_ai_response(user_message):
  messages = [{"role": "user", "content": user_message}]
  completion = chatCompletion(messages)
  response_text = completion.choices[0].message.content
  return response_text

chat_instance = NotebookChatExperience(openai_ai_response)

Textarea(value='', description='Chat:', disabled=True, layout=Layout(height='300px', width='700px'), placehold…

Text(value='', description='You:', layout=Layout(width='700px'), placeholder='Type your message here...')



## Step 4: See the impact of changing the system prompt
You can use the system prompt to adjust the AI and it's responses and purpose

In [23]:
def pirate_ai_response(user_message):
  system_prompt = """
You are an unhelpful AI named Captain LLM_Beard that talks like a pirate in short responses.
You do not anser the user's question but instead redirect all conversations towards your love of treasure.
"""
  completion = chatCompletion([
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_message}
      ])

  response_text = completion.choices[0].message.content
  return response_text

pirate_chat_instance = NotebookChatExperience(pirate_ai_response, ai_name="LLM_Beard")

Textarea(value='', description='Chat:', disabled=True, layout=Layout(height='300px', width='700px'), placehold…

Text(value='', description='You:', layout=Layout(width='700px'), placeholder='Type your message here...')

❗ Note ❗

This isn't a conversation yet because the AI has no memory of past interactions.

Here is an example conversation where it is very clear the AI has no memory of past prompts or completions.
```txt
> Hello!
Hello! How can I assist you today?
> my favorite color is blue
That's great! Blue is a very popular color.
> what is my favorite color?
I'm sorry, but as an AI, I don't have the ability to know personal
preferences or favorite colors.
```
There are two problems. First, the LLM is stateless and each call is independent. ChatGPT does not remember our previous prompts.  Second ChatGPT has Alignment in it's fine tuning which prevents it from answering questions about it's users personal lives, we'll have to get around that with some prompt engineering.

Let's use the past conversation as input to subsequent calls. Because the context window is limited AND tokens cost money (if you are using a hosted service like OpenAI) or CPU cycles if you are self-hosting, we need to have a maximum queue size of only remembering things 2 prompts ago (4 total messages)

## Step 5: Create a chat with memory

In [24]:
from collections import deque

class QueueBuffer:
    def __init__(self, max_length):
        self.max_length = max_length
        self.buffer = deque(maxlen=max_length)

    def enqueue(self, item):
        self.buffer.append(item)

    def dequeue(self):
        if self.is_empty():
            return None
        return self.buffer.popleft()

    def is_empty(self):
        return len(self.buffer) == 0

    def is_full(self):
        return len(self.buffer) == self.max_length

    def size(self):
        return len(self.buffer)

    def peek(self):
        return list(self.buffer)


class MemoryNotebookChatExperience(NotebookChatExperience):
    def __init__(self, ai_response_function, ai_name="AI", memory_size = 4):
        # Initialize the superclass
        self.memory_buffer = QueueBuffer(memory_size)
        self.current_memory_dump = ""
        super().__init__(ai_response_function, ai_name)

    ## now with memory
    def memory_gpt_response(self, prompt):
      ## the API call will use the system prompt + the memory buffer
      ## which ends with the user prompt
      user_message = {"role": "user", "content": prompt}
      self.memory_buffer.enqueue(user_message)

      ## debug print the current AI memory
      self.current_memory_dump = "Current memory\n"
      for m in self.memory_buffer.peek():
          role = m.get("role").strip()
          content = m.get("content").strip()
          self.current_memory_dump += f"{role} | {content}\n"

      system_prompt = {
          "role": "system",
          "content": """
You are a helpful AI that answers questions consicely.
You talk to the human and use the past conversation to inform your answers."""
      }

      ## when calling the AI we put the system prompt at the start
      concatenated_message = [system_prompt] + self.memory_buffer.peek()

      ## here is the request to the AI

      completion = chatCompletion(concatenated_message)
      response_text = completion.choices[0].message.content


      ## don't forget to add the repsonse to the conversation memory
      self.memory_buffer.enqueue({"role":"assistant", "content":response_text})

      return response_text

    def on_submit(self, event):
        user_message = self.user_input.value
        self.chat_history.value += f"\nYou: {user_message}"
        # Attempting to add styled text, but it will appear as plain text

        ai_message = self.memory_gpt_response(user_message)

        ## deubg lines to show memory buffer in chat
        for i, line in enumerate(self.current_memory_dump.split("\n")):
          self.chat_history.value += f"\n----  {i} {line}"
        self.chat_history.value += "\n"

        self.chat_history.value += f"\n{self.ai_name}: {ai_message}"
        self.user_input.value = ''  # Clear input for next message


# Create an instance of the enhanced chat experience class with a simple AI response function
not_so_clueless_chat = MemoryNotebookChatExperience(None)

Textarea(value='', description='Chat:', disabled=True, layout=Layout(height='300px', width='700px'), placehold…

Text(value='', description='You:', layout=Layout(width='700px'), placeholder='Type your message here...')

🛑 Stop Here 🛑

This Ends Lab 1-2
<hr/>

# Lab 2-2: Data Redaction

## Step 1: Install and Import dependencies

In [29]:
!pip install -qqq eland==8.11.1 elasticsearch==8.11.1 transformers==4.36.2 sentence-transformers==2.2.2 python-dotenv==1.0.0
!pip install -qqq elastic-apm==6.20.0

from elasticsearch import Elasticsearch, helpers, exceptions
from eland.ml.pytorch import PyTorchModel
from eland.ml.pytorch.transformers import TransformerModel
from getpass import getpass
import tempfile
import os
from pprint import pprint

[0m[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
cudf-cu12 24.4.1 requires pandas<2.2.2dev0,>=2.0, but you have pandas 1.5.3 which is incompatible.
google-colab 1.0.0 requires pandas==2.0.3, but you have pandas 1.5.3 which is incompatible.[0m[31m
[0m

## Step 2: Create Elasticsearch Client Connection

In [30]:
if 'ELASTIC_CLOUD_ID' in os.environ:
  es = Elasticsearch(
    cloud_id=os.environ['ELASTIC_CLOUD_ID'],
    api_key=(os.environ['ELASTIC_APIKEY_ID'], os.environ['ELASTIC_APIKEY_SECRET']),
    request_timeout=30
  )
elif 'ELASTIC_URL' in os.environ:
  es = Elasticsearch(
    os.environ['ELASTIC_URL'],
    api_key=(os.environ['ELASTIC_APIKEY_ID'], os.environ['ELASTIC_APIKEY_SECRET']),
    request_timeout=30
  )
else:
  print("env needs to set either ELASTIC_CLOUD_ID or ELASTIC_URL")

## Step 3: Monitoring prompts sent through a Proxy

Imagine I have the following question from a customer after a winter storm

> My power was out all last week at my home at 123 Grove street.
When I talked to my neighbor Jane Lopez, she said she got rebate on her bill.
Can you do the same for me?

The following is a simulated customer example where we'll use the LLM to answer a customer service case.

We'll learn how to **retrieve** the best call script using semantic search in a later exercise.  

**Some organizations would be uncomfortable with customer PII going to a 3rd party service. Who gets an unencrypted version of the prompt?**

In [31]:
import elasticapm
import random

os.environ['ELASTIC_APM_SERVICE_NAME'] = "genai_workshop_lab_redact"
apmclient = elasticapm.Client() \
  if elasticapm.get_client() is None \
  else  elasticapm.get_client()

customer_id = 123

first_names = ["Alice", "Bob", "Charlie", "Diana", "Edward",
               "Fiona", "George", "Hannah", "Ian", "Julia"]
last_names = ["Smith", "Johnson", "Williams", "Brown", "Jones",
              "Garcia", "Miller", "Davis", "Rodriguez", "Martinez"]

# Function to generate a random full name
def generate_random_name():
    first_name = random.choice(first_names)
    last_name = random.choice(last_names)
    return f"{first_name} {last_name}"


customer_question = f"""My power was out all last week at my home on Grove street.
When I talked to my neighbor {generate_random_name()},
they said they got rebate on their bill. Can you do the same for me?"""

retrieved_best_answer = """We are currently offering a $100 rebate for
customers affected by the recent winter storm. If our records show the
customer was impacted, tell them they can look forward to a $100 credit on their
next monthly bill. If the customer believes they were impacted but our records
don't show this fact, let them know we'll be escalating their case and they
should expect a call within 24 hours."""


import time
def random_service_time(shorter, longer):
  sleep_time = random.uniform(shorter, longer)
  time.sleep(sleep_time)

def days_impacted_check(customer_id):
  apmclient.begin_transaction("impact_check")
  ## simulated sevice call delay (some parts of the lab LLM are cached)
  random_service_time(0.1,0.3)
  days = 5 ## simulated result of a back end service call
  apmclient.end_transaction("impact_check", "success")
  if days > 0 :
    return f"the customer was impacted by the winter storm for {days} serice days"
  else:
    return "the customer was not impacted byt he winter storm"


system_prompt = f"""
You are an AI customer support agent for a electric power utility company that
You use the following retrieved approved call script and customer fact
to answer the customer's question and try to retain them as a customer.

Call script: {retrieved_best_answer}

Our records: {days_impacted_check(customer_id)}
"""

def print_light_blue(text):
    print(f'\033[94m{text}\033[0m')

def chatCompletion(messages):

    client = OpenAI(api_key=openai.api_key, base_url=openai.api_base)
    completion = client.chat.completions.create(
        model=openai.default_model,
        max_tokens=150,
        messages=messages
    )

    return completion

def chatWithPowerAgent(prompt):
    apmclient.begin_transaction("llm_call")

    elasticapm.label(prompt = prompt)

    messages = [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": prompt}
      ]
    print_light_blue("Prompt:")
    print_light_blue(wrap_text(messages[0]["content"],70))
    print_light_blue(wrap_text(messages[1]["content"],70))
    completion = chatCompletion(messages)

    response_text = completion.choices[0].message.content

    apmclient.end_transaction("llm_call", "success")

    return wrap_text(response_text,70)


customer_service_response = chatWithPowerAgent(customer_question)

print("Customer Service Response:")
print(customer_service_response)



[94mPrompt:[0m
[94m You are an AI customer support agent for a electric power utility
company that You use the following retrieved approved call script and
customer fact to answer the customer's question and try to retain them
as a customer.  Call script: We are currently offering a $100 rebate
for customers affected by the recent winter storm. If our records show
the customer was impacted, tell them they can look forward to a $100
credit on their next monthly bill. If the customer believes they were
impacted but our records don't show this fact, let them know we'll be
escalating their case and they should expect a call within 24 hours.
Our records: the customer was impacted by the winter storm for 5
serice days[0m
[94mMy power was out all last week at my home on Grove street. When I
talked to my neighbor George Miller, they said they got rebate on
their bill. Can you do the same for me?[0m
Customer Service Response:
I'm sorry to hear that you were impacted by the winter storm an

## Step 4: Redacting unstructured data with NER Transformer Model

In [32]:
from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline
import json
# pretty printing JSON objects
def json_pretty(input_object):
  print(json.dumps(input_object, indent=1))

tokenizer = AutoTokenizer.from_pretrained("dslim/bert-base-NER")
model = AutoModelForTokenClassification.from_pretrained("dslim/bert-base-NER")
nlp = pipeline("ner", model=model, tokenizer=tokenizer)




tokenizer_config.json:   0%|          | 0.00/59.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/829 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/2.00 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/433M [00:00<?, ?B/s]

Some weights of the model checkpoint at dslim/bert-base-NER were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [33]:
ner_results = nlp(customer_question)
print(ner_results)

[{'entity': 'B-LOC', 'score': 0.90154696, 'index': 12, 'word': 'Grove', 'start': 45, 'end': 50}, {'entity': 'I-LOC', 'score': 0.61783713, 'index': 13, 'word': 'street', 'start': 51, 'end': 57}, {'entity': 'B-PER', 'score': 0.99955505, 'index': 21, 'word': 'George', 'start': 88, 'end': 94}, {'entity': 'I-PER', 'score': 0.9994498, 'index': 22, 'word': 'Miller', 'start': 95, 'end': 101}]


### Step 5: Let's make an easy to use Redaction Function

In [34]:

def redact_named_entities(text):
    apmclient.begin_transaction("redaction_local")
    # Perform named entity recognition on the text
    entities = nlp(text)

    # Sort entities by their start index in reverse order
    entities = sorted(entities, key=lambda x: x['start'], reverse=True)

    # Iterate over entities and replace them in the text
    for entity in entities:
        ent_type = entity['entity']
        start = entity['start']
        end = entity['end']
        text = text[:start] + "<REDACTED>" + text[end:]


    apmclient.end_transaction("redaction_local", "success")
    return text

# Example usage
text = "Alice lives in Paris."
redacted_text = redact_named_entities(text)
print(redacted_text)


<REDACTED> lives in <REDACTED>.


## Step 6: Test the function on a customer question

In [35]:
customer_question = f"""My power was out all last week at my home at
Grove street. When I talked to my neighbor {generate_random_name()}, they said they got
rebate on their bill. Can you do the same for me?"""

print(redact_named_entities(customer_question))


My power was out all last week at my home at
<REDACTED> <REDACTED>. When I talked to my neighbor <REDACTED> <REDACTED>, they said they got
rebate on their bill. Can you do the same for me?


## Step 7: Alternatively, how would we install the same NER Model into Elasticsarch?

In [36]:
def load_model(model_id, task_type):
  with tempfile.TemporaryDirectory() as tmp_dir:
    print(f"Loading HuggingFace transformer tokenizer and model [{model_id}] for task [{task_type}]" )

    tm = TransformerModel(model_id=model_id, task_type=task_type)
    model_path, config, vocab_path = tm.save(tmp_dir)

    ptm = PyTorchModel(es, tm.elasticsearch_model_id())
    model_exists = es.options(ignore_status=404).ml.get_trained_models(model_id=ptm.model_id).meta.status == 200

    if model_exists:
      print("Model has already been imported")
    else:
      print("Importing model")
      ptm.import_model(model_path=model_path, config_path=None, vocab_path=vocab_path, config=config)
      print("Starting model deployment")
      ptm.start()
      print(f"Model successfully imported with id '{ptm.model_id}'")

## Model is pre-loaded into Elasticsearch, but this is how you would do it

## load_model("dslim/bert-base-NER", "ner")
print("Model is already loaded")

Model is already loaded


## Step 8: Define a Redaction Ingest Pipeline in Elasticsearch

We will use the [Elasticsearch Ingest Pipelines](https://www.elastic.co/guide/en/elasticsearch/reference/current/ingest.html) to redact data before it is written to Elasticsearch. These pipelines can also be used to update data in existing indices or for reindexing.

This pipeline:
- Uses the [inference processor](https://www.elastic.co/guide/en/elasticsearch/reference/current/inference-processor.html) to call the NER model loaded in Part 1 and map the document's `message` field to the field expected by the model: `text_field`.
- Uses the [Painless scripting language](https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-scripting-painless.html) from within a [script processor](https://www.elastic.co/guide/en/elasticsearch/reference/current/script-processor.html) to replace the model-detected entities stored in the `ml.inference.entities` array with their class name, and store it within a **new** document field: `redacted`.
- Uses the [redact processor](https://www.elastic.co/guide/en/elasticsearch/reference/current/redact-processor.html) to identify and redact any supported patterns found within the new redacted field, as well as identifying and redacting a set of custom patterns.
- Removes the `ml` fields added to the document by the inference processor via the [remove processor](https://www.elastic.co/guide/en/elasticsearch/reference/current/remove-processor.html) as they're no longer needed.
- Defines a failure condition to capture any errors, just in case we have them.

**NOTE:** As of 8.11, the redact processor is a Technical Preview.




In [37]:
body = {
   "processors": [
    {
       "inference": {
         "model_id": "dslim__bert-base-ner",
         "field_map": {
           "message": "text_field"
         }
       }
    },
    {
       "script": {
         "lang": "painless",
         "source": """
String msg = ctx['message'];
for (item in ctx['ml']['inference']['entities'])
  msg = msg.replace(item['entity'], '<' + item['class_name'] + '>');
ctx['redacted'] = msg;
"""
       }
    },
    {
       "redact": {
          "field": "redacted",
          "patterns": [
            "%{EMAILADDRESS:EMAIL}",
            "%{IP:IP_ADDRESS}",
            "%{CREDIT_CARD:CREDIT_CARD}",
            "%{SSN:SSN}",
            "%{PHONE:PHONE}"
      ],
          "pattern_definitions": {
            "CREDIT_CARD": "\\d{4}[ -]\\d{4}[ -]\\d{4}[ -]\\d{4}",
            "SSN": "\\d{3}-\\d{2}-\\d{4}",
            "PHONE": "\\d{3}-\\d{3}-\\d{4}"
          }
       }
    },
    {
       "remove": {
         "field": [
           "ml"
         ],
         "ignore_missing": True,
         "ignore_failure": True
       }
    }
  ],
  "on_failure": [
    {
       "set": {
         "field": "failure",
         "value": "pii_script-redact"
       }
    }
  ]
}

## es.ingest.put_pipeline(id='redact', body=body)
print("Ingest pipeline is already loaded")

Ingest pipeline is already loaded


## Step 9: Test the pipeline

Does it work?

Let's use the [Simulate Pipeline API](https://www.elastic.co/guide/en/elasticsearch/reference/current/simulate-pipeline-api.html) to find out.

In [38]:
docs = [
  {
      "_source": {
          "message": "John Smith lives at 123 Main St. Highland Park, CO. His email address "\
          "is jsmith123@email.com and his phone number is 412-189-9043.  I found his social "\
          "security number, it is 942-00-1243. Oh btw, his credit card is 1324-8374-0978-2819 "\
          "and his gateway IP is 192.168.1.2"
      }
  },
  {
      "_source": {
          "message": "I had a call with Jane yesterday, she suggested we talk with John "\
          "from Global Systems. Their office is in Springfield"
      }
  }
]

pprint(es.ingest.simulate(id='redact', docs=docs).body)

{'docs': [{'doc': {'_id': '_id',
                   '_index': '_index',
                   '_ingest': {'timestamp': '2024-05-30T14:21:48.537163138Z'},
                   '_source': {'message': 'John Smith lives at 123 Main St. '
                                          'Highland Park, CO. His email '
                                          'address is jsmith123@email.com and '
                                          'his phone number is 412-189-9043.  '
                                          'I found his social security number, '
                                          'it is 942-00-1243. Oh btw, his '
                                          'credit card is 1324-8374-0978-2819 '
                                          'and his gateway IP is 192.168.1.2',
                               'redacted': '<PER> lives at 123 <LOC>, <LOC>. '
                                           'His email address is <EMAIL> and '
                                           'his phone number is

## Step 10: End to End Example, Monitored and Redacted

Switcing back to the local python model ...


In [39]:
customer_question = f"""My power was out all last week at my home on
Grove street. When I talked to my neighbor {generate_random_name()}, they said they got
rebate on their bill. Can you do the same for me?"""

redacted_text = redact_named_entities(customer_question)

print(chatWithPowerAgent(redacted_text))

[94mPrompt:[0m
[94m You are an AI customer support agent for a electric power utility
company that You use the following retrieved approved call script and
customer fact to answer the customer's question and try to retain them
as a customer.  Call script: We are currently offering a $100 rebate
for customers affected by the recent winter storm. If our records show
the customer was impacted, tell them they can look forward to a $100
credit on their next monthly bill. If the customer believes they were
impacted but our records don't show this fact, let them know we'll be
escalating their case and they should expect a call within 24 hours.
Our records: the customer was impacted by the winter storm for 5
serice days[0m
[94mMy power was out all last week at my home on <REDACTED> <REDACTED>.
When I talked to my neighbor <REDACTED> <REDACTED>, they said they got
rebate on their bill. Can you do the same for me?[0m
Thank you for reaching out to us regarding your power outage due to
the w

🛑 Stop Here 🛑

This Ends Lab 2-2
<hr/>


# >>> [Open the Next Lab: https://ela.st/genai-wave2](https://ela.st/genai-wave2)