# [Session 1] Introduction to Generative AI

This notebook includes *all* the labs of Session 1.  They are meant to be run in order. If you finish early, try playing around with the LLMs loaded to that point.

## Tips on using Jupyter Notebooks and Google Colab

* Notebooks are just .ipynb files, you can run them locally in any python dev environment if you'd like. We are running in Google Colab to keep things simple (and well tested) for this course.

* In Google Colab, each notebook has its own independent execution environment.
  * You can see the current in memory and session variables on the left in the ```{X}``` menu
  * You can see the file system in menu with the folder icon
  * You can connect to a runtime and monitor RAM and Disk using menu in the top right.
  * Limited GPU enabled instances are availble, but not all notebooks will need them.

* Notebooks have Markdown and Code snippets
  * You can access the shell of the coding environment with a ```!``` command
  * Run each code sample in order. Notebooks will usually import libraries they need as the workshop progresses

* You can always restart your code environment and rerun if you get into trouble
  * in-memory variables are lost on a **restart**
  * in-memory variables and the file system are lost on a **disconnect**


## Setup Environment
The following code loads the environment variables required to run this notebook.


In [None]:
FILE="Session_1"

! pip install -qqq git+https://github.com/elastic/notebook-workshop-loader.git@main
from notebookworkshoploader import loader
import os
from dotenv import load_dotenv

if os.path.isfile("../env"):
    load_dotenv("../env", override=True)
    print('Successfully loaded environment variables from local env file')
else:
    loader.load_remote_env(file=FILE)

## Lab 1-1: Introduction and Transformer Models

In this lab we will
* Intro to Google Colab - Hello World, importing python libraries
* Caching the download of a smaller LLM
* Using a basic transformer models locally



### Learning Colab

#### Step 1: Hit play on the next code **sample**

In [None]:
print("Hello World")

#### Step 2: Use ! to execute a shell command

In [None]:
! echo "The shell thinks the Current Directory is: $(pwd)"

### Getting some python dependencies

#### Step 3: Environment setup

First let us import some Python libraries we'll use in the first lab module.

In [None]:
! pip install --upgrade pip
! pip install -q --no-cache-dir torch
! pip install -q --upgrade transformers
! pip install -q xformers
! pip install -q python-dotenv
! pip install -q "openai<1.0.0"           ## for later in the lab

#### Step 4: Utility functions
Some utility functions that are good to keep on hand

In [None]:
import json
# pretty printing JSON objects
def json_pretty(input_object):
  print(json.dumps(input_object, indent=4))


import textwrap
# wrap text when printing, because colab scrolls output to the right too much
def wrap_text(text, width):
    wrapped_text = textwrap.wrap(text, width)
    return '\n'.join(wrapped_text)



### Downloading, Caching, and Prepping a Model

#### Step 5: Download sentiment analysis model

We'll use the Huggingface Transformer library to download and ready an Open Source model called DistilBERT which can be used for sentiment analysis.

* Details of the model can be found on its [Hugging Face Page](https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english)
* This model is pretrained to determine if an input text is of *POSITIVE* or *NEGATIVE* sentiment and makes a good intro example to AI models.
* Note we are caching the model files in a folder called ```llm_download_cache``` which will help us not have to re-download the files again within the connection to this runtime. You can see the download in the filesystem (using the left hand side menu)



In [None]:
import torch
import tqdm
import json
import os
from transformers import (pipeline,
  DistilBertTokenizer,
  DistilBertForSequenceClassification)

# Set the cache directory
cache_directory = "llm_download_cache"

# Create the cache directory if it doesn't exist
if not os.path.exists(cache_directory):
    os.makedirs(cache_directory)

model_id = "distilbert-base-uncased-finetuned-sst-2-english"
sentiment_tokenizer = DistilBertTokenizer.from_pretrained(
    model_id, cache_dir=cache_directory)
sentiment_model = DistilBertForSequenceClassification.from_pretrained(
    model_id, cache_dir=cache_directory)

Okay! let's run the model ```sentiment_model``` on two pieces of sample text.

#### Step 6: Run sentiment analysis

In [None]:
## With the distilbert model downloaded and cached we can call it for
## sentiment analysis

# Define the sentiment analysis pipeline
sentiment_classifier = pipeline("sentiment-analysis",
                                model=sentiment_model,
                                tokenizer=sentiment_tokenizer,
                                device='cpu')
#two samples
classifier_results = sentiment_classifier([
    "My dog is so cute, I love him.",

    "I am very sorry to inform you that the tax\
     administration has decided to audit you."
])

json_pretty(classifier_results)

### 🫵 Try it yourself - Get Creative 🫵
Try some of your own examples.
Note, AI models are subject to bias. The model card for this model goes into pretty good detail on the issue. [Read more here](https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english#risks-limitations-and-biases)

In [None]:
your_classifier_results = sentiment_classifier([
    "CHANGE ME",
    "CHANGE ME"
])
json_pretty(your_classifier_results)

### Generative LLM - Simple and Local

#### Step 7: Download Flan T5

Let's start with the Hello World of generative AI examples: completing a sentence. For this we'll install Google's Flan-T5 model.

Note, while this is a smaller checkpoint of the model, it is still a 3GB download.  We'll cache the files in the same folder.



In [None]:
## Let's play with something a little bigger that can do a text completion
## This is a 3 GB download and takes some RAM to run, but it works CPU only

from transformers import AutoTokenizer, pipeline, AutoModelForSeq2SeqLM

model_id = 'google/flan-t5-large'
llm_tokenizer = AutoTokenizer.from_pretrained(
    model_id, cache_dir=cache_directory)
llm_model = AutoModelForSeq2SeqLM.from_pretrained(
    model_id, cache_dir=cache_directory)

llm_pipe = pipeline(
        "text2text-generation",
        model=llm_model,
        tokenizer=llm_tokenizer,
        max_length=100
    )


#### Step 8: Generate text completions

In [None]:
countries = ["United Kingdom",
             "France",
             "People's Republic of China",
             "United States",
             "Ecuador",
             "Faketopia"]

for country in countries:
    input_text = f"The capital of the {country} is"
    output = llm_pipe(input_text)
    completed_sentence = f"\033[94m{input_text}\033[0m {output[0]['generated_text']}"
    print(completed_sentence)

### 🫵 Try it yourself - Get Creative 🫵
Try some of your own examples.
This thing isn't super smart without fine tuning, but it can handle some light context injection and prompt engineering. We'll learn more about those subjects in later modules.

Notice the difference between asking a specific question and phrasing a completion
* "Who is the Prime Minister of the UK?"
* "The current Prime Minister of the united kingdom is "

In [None]:
prompt_text = "The current Prime Minister of the united kingdom is "
output = llm_pipe(prompt_text)
completed_prompt = f"\033[94m{prompt_text}\033[0m {output[0]['generated_text']}"
print(completed_prompt)

🛑 Stop Here 🛑

This Ends Lab 1-1
<hr/>

## Lab 1-2: Prompts and Context Windows

* Using langchain with local LLM
* Connect to Open AI
* Using a memory window to create a txt-only GPT conversation

### Basic chat completion

#### Step 1: Using the OpenAI python library

❗ Note: if you restarted your google Colab, you may need to re-run the first stup step back and the very top before coming back here ❗

In [None]:
import os, secrets, requests
import openai
from requests.auth import HTTPBasicAuth

#if using the Elastic AI proxy, then generate the correct API key
if os.environ['ELASTIC_PROXY'] == "True":

    if "OPENAI_API_TYPE" in os.environ: del os.environ["OPENAI_API_TYPE"]

    #generate and share "your" unique hash
    os.environ['USER_HASH'] = secrets.token_hex(nbytes=6)
    print(f"Your unique user hash is: {os.environ['USER_HASH']}")
else:
    openai.api_type = os.environ['OPENAI_API_TYPE']
    openai.api_version = os.environ['OPENAI_API_VERSION']

openai.api_key = os.environ['OPENAI_API_KEY']
openai.api_base = os.environ['OPENAI_API_BASE']
openai.default_model = os.environ['OPENAI_API_ENGINE']

# Call the OpenAI ChatCompletion API
def chatCompletion(messages):
    if os.environ["ELASTIC_PROXY"] == "True":
        completion = openai.ChatCompletion.create(
                        model=openai.default_model,
                        max_tokens=100,
                        messages=messages
                      )
    else:
        completion = openai.ChatCompletion.create(
                        engine=openai.default_model,
                        max_tokens=100,
                        messages=messages
                      )
    return completion

def chatWithGPT(prompt, print_full_json=False):
    response_text = chatCompletion([{"role": "user", "content": prompt}])

    if print_full_json:
      json_pretty(response_text)

    return wrap_text(response_text.choices[0].message.content,70)

## call it with the json debug output enabled
response = chatWithGPT("Hello, is ChatGPT online and working?", print_full_json=True)

print("\n")
print(response)

Feeding user input in for single questions is easy
#### Step 2: A conversation loop -  ❗ type "exit" to end the chat ❗

In [None]:
def hold_a_conversation(ai_conversation_function = chatWithGPT):
  print(" -- Have a conversation with an AI: ")
  print(" -- type 'exit' when done")

  user_input = input("> ")
  while not user_input.lower().startswith("exit"):
      print(ai_conversation_function(user_input, False))
      print(" -- type 'exit' when done")
      user_input = input("> ")
  print("\n -- end conversation --")

## we are passing the previously defined function as a parameter
hold_a_conversation(chatWithGPT)


You can use the system prompt to adjust the AI and it's responses and purpose

#### Step 3: See the impact of changing the system prompt

In [None]:
def pirateGPT(prompt, print_full_json=False):
    system_prompt = """
You are an unhelpful AI named Captain LLM_Beard that talks like a pirate in short responses.
You acknowledge the user's question but redirect all conversations towards your love of treasure.
"""

    response_text = chatCompletion(
        [
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": prompt}
        ]
    )

    if print_full_json:
      json_pretty(response_text)

    return wrap_text(response_text.choices[0].message.content,70)

hold_a_conversation(pirateGPT)

### Giving the AI conversation memory

This isn't a conversation yet because the AI has no memory of past interactions.

Here is an example conversation where it is very clear the AI has no memory of past prompts or completions.
```txt
> Hello!
Hello! How can I assist you today?
> my favorite color is blue
That's great! Blue is a very popular color.
> what is my favorite color?
I'm sorry, but as an AI, I don't have the ability to know personal
preferences or favorite colors.
```
Let's use the past conversation as input to subsequent calls. Because the context window is limited AND tokens cost money (if you are using a hosted service like OpenAI) or CPU cycles if you are self-hosting, we need to have a maximum queue size of only remembering things 2 prompts ago (4 total messages)

#### Step 4: Create a chat with memory

In [None]:
from collections import deque

class QueueBuffer:
    def __init__(self, max_length):
        self.max_length = max_length
        self.buffer = deque(maxlen=max_length)

    def enqueue(self, item):
        self.buffer.append(item)

    def dequeue(self):
        if self.is_empty():
            return None
        return self.buffer.popleft()

    def is_empty(self):
        return len(self.buffer) == 0

    def is_full(self):
        return len(self.buffer) == self.max_length

    def size(self):
        return len(self.buffer)

    def peek(self):
        return list(self.buffer)

# enough conversation memory for 2 call and response history
memory_buffer = QueueBuffer(4)

system_prompt = {
          "role": "system",
          "content": """
You are an AI named Cher Horowitz that speaks
in 1990's valley girl dialect of English"""
      }

## utility function to print in a different color for debug output
def print_light_blue(text):
    print(f'\033[94m{text}\033[0m')

## now with memory
def cluelessGPT(prompt, print_full_json=False):

  ## the API call will use the system prompt + the memory buffer
  ## which ends with the user prompt
  user_message = {"role": "user", "content": prompt}
  memory_buffer.enqueue(user_message)

  ## debug print the current AI memory
  print_light_blue("Current memory")
  for m in memory_buffer.peek():
      role = m.get("role").strip()
      content = m.get("content").strip()
      print_light_blue( f"  {role} | {content}")

  ## when calling the AI we put the system prompt at the start
  concatenated_message = [system_prompt] + memory_buffer.peek()

  ## here is the request to the AI
  completion = chatCompletion(concatenated_message)

  response_text = completion.choices[0].message.content

  ## don't forget to add the repsonse to the conversation memory
  memory_buffer.enqueue({"role":"assistant", "content":response_text})

  if print_full_json:
    json_pretty(completion)

  return wrap_text(response_text,70)

#### Step 5: Let's chat with a not so clueless chatbot

In [None]:
hold_a_conversation(cluelessGPT)

🛑 Stop Here 🛑

This Ends Lab 1-2
<hr/>

## Lab 1-3: Data Redaction

### Model import

#### Step 1: Install and Import dependencies

In [None]:
!pip install -q eland elasticsearch transformers sentence_transformers python-dotenv

from elasticsearch import Elasticsearch, helpers, exceptions
from eland.ml.pytorch import PyTorchModel
from eland.ml.pytorch.transformers import TransformerModel
from getpass import getpass
import tempfile
import os
from pprint import pprint

#### Step 2: Create Elasticsearch Client Connection

In [None]:
if 'ELASTIC_CLOUD_ID' in os.environ:
  es = Elasticsearch(
    cloud_id=os.environ['ELASTIC_CLOUD_ID'],
    api_key=(os.environ['ELASTIC_APIKEY_ID'], os.environ['ELASTIC_APIKEY_SECRET']),
    request_timeout=30
  )
elif 'ELASTIC_URL' in os.environ:
  es = Elasticsearch(
    os.environ['ELASTIC_URL'],
    api_key=(os.environ['ELASTIC_APIKEY_ID'], os.environ['ELASTIC_APIKEY_SECRET']),
    request_timeout=30
  )
else:
  print("env needs to set either ELASTIC_CLOUD_ID or ELASTIC_URL")

#### Step 3: Define the model import function

In [None]:
def load_model(model_id, task_type):
  with tempfile.TemporaryDirectory() as tmp_dir:
    print(f"Loading HuggingFace transformer tokenizer and model [{model_id}] for task [{task_type}]" )

    tm = TransformerModel(model_id=model_id, task_type=task_type)
    model_path, config, vocab_path = tm.save(tmp_dir)

    ptm = PyTorchModel(es, tm.elasticsearch_model_id())
    model_exists = es.options(ignore_status=404).ml.get_trained_models(model_id=ptm.model_id).meta.status == 200

    if model_exists:
      print("Model has already been imported")
    else:
      print("Importing model")
      ptm.import_model(model_path=model_path, config_path=None, vocab_path=vocab_path, config=config)
      print("Starting model deployment")
      ptm.start()
      print(f"Model successfully imported with id '{ptm.model_id}'")

#### Step 4: Import the model

In [None]:
load_model("dslim/bert-base-NER", "ner")

### Define the ingest pipeline

We will use the [Elasticsearch Ingest Pipelines](https://www.elastic.co/guide/en/elasticsearch/reference/current/ingest.html) to redact data before it is written to Elasticsearch. These pipelines can also be used to update data in existing indices or for reindexing.

This pipeline:
- Uses the [inference processor](https://www.elastic.co/guide/en/elasticsearch/reference/current/inference-processor.html) to call the NER model loaded in Part 1 and map the document's `message` field to the field expected by the model: `text_field`.
- Uses the [Painless scripting language](https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-scripting-painless.html) from within a [script processor](https://www.elastic.co/guide/en/elasticsearch/reference/current/script-processor.html) to replace the model-detected entities stored in the `ml.inference.entities` array with their class name, and store it within a **new** document field: `redacted`.
- Uses the [redact processor](https://www.elastic.co/guide/en/elasticsearch/reference/current/redact-processor.html) to identify and redact any supported patterns found within the new redacted field, as well as identifying and redacting a set of custom patterns.
- Removes the `ml` fields added to the document by the inference processor via the [remove processor](https://www.elastic.co/guide/en/elasticsearch/reference/current/remove-processor.html) as they're no longer needed.
- Defines a failure condition to capture any errors, just in case we have them.

**NOTE:** As of 8.9, the redact processor is a Technical Preview.




In [None]:
body = {
   "processors": [
    {
       "inference": {
         "model_id": "dslim__bert-base-ner",
         "field_map": {
           "message": "text_field"
         }
       }
    },
    {
       "script": {
         "lang": "painless",
         "source": """
String msg = ctx['message'];
for (item in ctx['ml']['inference']['entities'])
  msg = msg.replace(item['entity'], '<' + item['class_name'] + '>');
ctx['redacted'] = msg;
"""
       }
    },
    {
       "redact": {
          "field": "redacted",
          "patterns": [
            "%{EMAILADDRESS:EMAIL}",
            "%{IP:IP_ADDRESS}",
            "%{CREDIT_CARD:CREDIT_CARD}",
            "%{SSN:SSN}",
            "%{PHONE:PHONE}"
      ],
          "pattern_definitions": {
            "CREDIT_CARD": "\\d{4}[ -]\\d{4}[ -]\\d{4}[ -]\\d{4}",
            "SSN": "\\d{3}-\\d{2}-\\d{4}",
            "PHONE": "\\d{3}-\\d{3}-\\d{4}"
          }
       }
    },
    {
       "remove": {
         "field": [
           "ml"
         ],
         "ignore_missing": True,
         "ignore_failure": True
       }
    }
  ],
  "on_failure": [
    {
       "set": {
         "field": "failure",
         "value": "pii_script-redact"
       }
    }
  ]
}

#es.ingest.put_pipeline(id='redact', body=body)

###Test the pipeline

Does it work?

Let's use the [Simulate Pipeline API](https://www.elastic.co/guide/en/elasticsearch/reference/current/simulate-pipeline-api.html) to find out.

In [None]:
docs = [
  {
      "_source": {
          "message": "John Smith lives at 123 Main St. Highland Park, CO. His email address "\
          "is jsmith123@email.com and his phone number is 412-189-9043.  I found his social "\
          "security number, it is 942-00-1243. Oh btw, his credit card is 1324-8374-0978-2819 "\
          "and his gateway IP is 192.168.1.2"
      }
  },
  {
      "_source": {
          "message": "I had a call with Jane yesterday, she suggested we talk with John "\
          "from Global Systems. Their office is in Springfield"
      }
  }
]

pprint(es.ingest.simulate(id='redact', docs=docs).body)

A good next step after validating the pipeline performs as needed is to create role permissions to limit who can see the original `message` vs the new `redacted` version.

Check out the [Field Level Security documentation](https://www.elastic.co/guide/en/elasticsearch/reference/current/field-level-security.html)

🛑 Stop Here 🛑

This Ends Lab 1-3
<hr/>