![tracker](https://us-central1-vertex-ai-mlops-369716.cloudfunctions.net/pixel-tracking?path=statmike%2Fvertex-ai-mlops%2FApplied+GenAI&file=Understanding+LLMs.ipynb)
<!--- header table --->
<table align="left">
  <td style="text-align: center">
    <a href="https://colab.research.google.com/github/statmike/vertex-ai-mlops/blob/main/Applied%20GenAI/Understanding%20LLMs.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Google Colaboratory logo">
      <br>Run in<br>Colab
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/colab/import/https%3A%2F%2Fraw.githubusercontent.com%2Fstatmike%2Fvertex-ai-mlops%2Fmain%2FApplied%2520GenAI%2FUnderstanding%2520LLMs.ipynb">
      <img width="32px" src="https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN" alt="Google Cloud Colab Enterprise logo">
      <br>Run in<br>Colab Enterprise
    </a>
  </td>      
  <td style="text-align: center">
    <a href="https://github.com/statmike/vertex-ai-mlops/blob/main/Applied%20GenAI/Understanding%20LLMs.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo">
      <br>View on<br>GitHub
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/statmike/vertex-ai-mlops/main/Applied%20GenAI/Understanding%20LLMs.ipynb">
      <img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo">
      <br>Open in<br>Vertex AI Workbench
    </a>
  </td>
</table>

# Understanding LLMs

A quick experiment to illustrate what an LLM is actually doing

Take a well known passage of text, like the first paragraph from "A Tale of Two Cities" by Charles Dickens and Harvy Dunn, 1921:

>It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, it was the epoch of belief, it was the epoch of incredulity, it was the season of Light, it was the season of Darkness, it was the spring of hope, it was the winter of despair, we had everything before us, we had nothing before us, we were all going direct to Heaven, we were all going direct the other way--in short, the period was so far like the present period that some of its noisiest authorities insisted on its being received, for good or for evil, in the superlative degree of comparison only.

Imagine you are the author and have started writing this paragraph.  How likely is it to end up with these words?

You start with 'It'.  That narrows down what comes next.  Given all of the text samples an LLM is trained on we can detect the most likely next word and it is 'was'.  
Now start with 'It was'.  What comes next?

How long before we start with enough text that the LLM can actually decide the mostly likely continuation is the exact same passage?  After as few as 5 words an LLM may recite the remainder of the paragrph as the sequence is that unique!

This is because the LLM was trained, learned from, a large amount of text which certainly contained this very popular book that is also in the [public domain](https://www.loc.gov/item/05000749/).

---
## Colab Setup

To run this notebook in Colab run the cells in this section.  Otherwise, skip this section.

This cell will authenticate to GCP (follow prompts in the popup).

In [1]:
PROJECT_ID = 'statmike-mlops-349915' # replace with project ID

In [2]:
try:
    from google.colab import auth
    auth.authenticate_user()
    !gcloud config set project {PROJECT_ID}
    print('Colab authorized to GCP')
except Exception:
    print('Not a Colab Environment')
    pass

Not a Colab Environment


---
## Installs

The list `packages` contains tuples of package import names and install names.  If the import name is not found then the install name is used to install quitely for the current user.

In [3]:
# tuples of (import name, install name, min_version)
packages = [
    ('google.cloud.aiplatform', 'google-cloud-aiplatform', '1.66.0')
]

import importlib
install = False
for package in packages:
    if not importlib.util.find_spec(package[0]):
        print(f'installing package {package[1]}')
        install = True
        !pip install {package[1]} -U -q --user
    elif len(package) == 3:
        if importlib.metadata.version(package[0]) < package[2]:
            print(f'updating package {package[1]}')
            install = True
            !pip install {package[1]} -U -q --user

### API Enablement

In [4]:
!gcloud services enable aiplatform.googleapis.com

### Restart Kernel (If Installs Occured)

After a kernel restart the code submission can start with the next cell after this one.

In [5]:
if install:
    import IPython
    app = IPython.Application.instance()
    app.kernel.do_shutdown(True)
    IPython.display.display(IPython.display.Markdown("""<div class=\"alert alert-block alert-warning\">
        <b>⚠️ The kernel is going to restart. Please wait until it is finished before continuing to the next step. The previous cells do not need to be run again⚠️</b>
        </div>"""))

---
## Setup

inputs:

In [6]:
project = !gcloud config get-value project
PROJECT_ID = project[0]
PROJECT_ID

'statmike-mlops-349915'

In [7]:
REGION = 'us-central1'

packages:

In [8]:
import os
import asyncio
from IPython.display import Markdown
from google.cloud import aiplatform
import vertexai.generative_models # for Gemini Models

In [9]:
aiplatform.__version__

'1.66.0'

clients:

In [10]:
vertexai.init(project = PROJECT_ID, location = REGION)

---
## Setup LLM and Predictor Function

Vertex AI hosts many [Google models](https://cloud.google.com/vertex-ai/generative-ai/docs/learn/models) as an API.  It also host partner models and offers managed services for hosting private models as well as training models with a full suite of [MLOps capabilities](../MLOps/readme.md).  This notebook uses the [Vertex AI Gemini API](https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/inference) which is covered in much more depth in [this workflow](./Vertex%20AI%20Gemini%20API.ipynb).


Connect to the `gemini-1.5-flash-001` model for text generation.  Then build a helper function around the LLM prediction calls to:
- specify how many input words from the known `sample` to pass as input
- specify how many output words to ask for
- specify how many tries/request to make for output words
    - adjust temperature from 0 to 0.1 to see more than just the most likely next word chosen

### Connect To Model and Sample Request

In [11]:
llm = vertexai.generative_models.GenerativeModel("gemini-1.5-flash-001")

In [16]:
Markdown(llm.generate_content('What is the first paragraph of the tale of two cities?').text)

The opening paragraph of *A Tale of Two Cities* is:

> It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, it was the epoch of belief, it was the epoch of incredulity, it was the season of Light, it was the season of Darkness, it was the spring of hope, it was the winter of despair, we had everything before us, we had nothing before us, we were all going direct to Heaven, we were all going direct the other way—in short, the period was so far like the present period, that some of its noisiest authorities insisted on its being received, for good or for evil, in the superlative degree of comparison only. 


### Helper Function: Predictor

Create a function that take 4 inputs:
- `words` = the number of words provided to the llm
    - innt
- `new_words` = the number of words the llm should extend the input `words` by
    - int
- `trys` = how many response to generate
    - int, default = 1
- `context` = provide (optional) context to the LLM when answering 
    - string, default = None
    
The function will print out a summary of the input and responses.

In [95]:
async def predictor(words, new_words, trys = 1, context = None):
    # data
    sample = "It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, it was the epoch of belief, it was the epoch of incredulity, it was the season of Light, it was the season of Darkness, it was the spring of hope, it was the winter of despair, we had everything before us, we had nothing before us, we were all going direct to Heaven, we were all going direct the other way--in short, the period was so far like the present period that some of its noisiest authorities insisted on its being received, for good or for evil, in the superlative degree of comparison only."
    spaces = [i for i, char in enumerate(sample) if char == ' ']
    
    # parameters
    start = sample[0:spaces[words-1]]
    correct = sample[spaces[words-1]:spaces[words+new_words-1]].strip()   
    prompt = f"I am going to provide the start of a sentence and I want you to provide the next {new_words} words without repeating what I provide.\n\n{start}"
    
    # LLM call
    responses = await asyncio.gather(*[
        llm.generate_content_async(
            list(filter(None, (context, prompt))),
            generation_config = vertexai.generative_models.GenerationConfig(
                temperature = (0.5 if trys > 1 else 0),
                top_p = 1,
                top_k = new_words,
                max_output_tokens = new_words*2,
                seed = (42 + i)
            )
        ) for i in range(trys)
    ])
    
    # text processing for comparisons: lowercase, remove spaces, remove punctuation
    def process_string(text):
        import string
        text = text.strip().lower()
        allowed_chars = string.ascii_lowercase + ' '
        text = ''.join(c for c in text if c in allowed_chars)
        return text
    
    # prepare result string:
    result = ''#'**Results:**'
    result += f'\n- **Input:** {start}'
    result += f'\n- **Correct Response:** {correct}'
    if trys == 1:
        result += f'\n- **Received Response:** {responses[0].text}'
        if process_string(correct).startswith(process_string(responses[0].text)):
            result += '\n- **Result:** Correct'
        else:
            result += '\n- **Result:** Incorrect'
    else:
        result += f'\n- **Received Responses:**'
        for response in responses:
            result += f'\n\t- {response.text}'
            
    return Markdown(result)

In [96]:
await predictor(1,1,1)


- **Input:** It
- **Correct Response:** was
- **Received Response:** is 
- **Result:** Incorrect

---
## Predict the Next Word(s)....

Pass 1 word in, ask for 1 word out, try 1 time:

In [74]:
await predictor(1, 1, 1)


- **Input:** It
- **Correct Response:** was
- **Received Response:** is
- **Result:** Incorrect

Pass 1 word in, ask for 2 words out, try 1 time:

In [75]:
await predictor(1, 2, 1)


- **Input:** It
- **Correct Response:** was the
- **Received Response:** was a
- **Result:** Incorrect

Pass 1 word in, ask for 2 words out, try 10 times:

In [76]:
await predictor(1, 2, 10)


- **Input:** It
- **Correct Response:** was the
- **Received Responses:**
	- was raining
	- is a
	- was a
	- is a
	- is a
	- was a
	- is a
	- is a
	- is a
	- is a

Pass 1 word in, ask for 3 words out, try 1 time:

In [77]:
await predictor(1, 3, 1)


- **Input:** It
- **Correct Response:** was the best
- **Received Response:** was a dark
- **Result:** Incorrect

Pass 1 word in, ask for 3 words out, try 10 times:

In [78]:
await predictor(1, 3, 10)


- **Input:** It
- **Correct Response:** was the best
- **Received Responses:**
	- was a dark
	- was a dark
	- was a dark
	- was a dark
	- was a dark
	- was a dark
	- was a dark
	- was a dark
	- was a dark
	- is a beautiful

---
## How Many Input Words Are Needed To Correctly Reproduce The Sample?

Pass 2 words in, ask for 1 word out, try 1 time:

In [79]:
await predictor(2, 1, 1)


- **Input:** It was
- **Correct Response:** the
- **Received Response:** a
- **Result:** Incorrect

Pass 3 words in, ask for 1 word out, try 1 time:

In [80]:
await predictor(3, 1, 1)


- **Input:** It was the
- **Correct Response:** best
- **Received Response:** best
- **Result:** Correct

Pass 4 words in, ask for 1 word out, try 1 time:

In [81]:
await predictor(4, 1, 1)


- **Input:** It was the best
- **Correct Response:** of
- **Received Response:** of
- **Result:** Correct

Pass 5 words in, ask for 1 word out, try 1 time:

In [82]:
await predictor(5, 1, 1)


- **Input:** It was the best of
- **Correct Response:** times,
- **Received Response:** times
- **Result:** Correct

Pass 5 words in, ask for 4 words out, try 1 time:

In [83]:
await predictor(5, 4, 1)


- **Input:** It was the best of
- **Correct Response:** times, it was the
- **Received Response:** times, it was
- **Result:** Correct

Pass 5 words in, ask for 20 words out, try 1 time:

In [97]:
await predictor(5, 20, 1)


- **Input:** It was the best of
- **Correct Response:** times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, it
- **Received Response:** times, it was the worst of times, it was the age 

- **Result:** Correct

Pass 5 words in, ask for 20 words out, try 5 time: answer is CORRECT consistantly!

In [98]:
await predictor(5, 20, 5)


- **Input:** It was the best of
- **Correct Response:** times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, it
- **Received Responses:**
	- times, it was the worst of times, it was the age of 

	- times, it was the worst of times, it was the age of 

	- times, it was the worst of times, it was the age 

	- times, it was the worst of times, it was the age 

	- times, it was the worst of times, it was the age 


---
## Does Context Help?

Add a context to the word generation might boost how quickly the llm responds with the correct words.  In this case the passage is the first paragraph of a famous movel by Charles Dickens.  Provide the authors name as context and review the results!

Pass 1 word in, ask for 1 word out, try 1 time:

In [86]:
await predictor(1, 1, 1, 'Dickens')


- **Input:** It
- **Correct Response:** was
- **Received Response:** was
- **Result:** Correct

Pass 1 word in, ask for 2 words out, try 1 time:

In [87]:
await predictor(1, 2, 1, 'Dickens')


- **Input:** It
- **Correct Response:** was the
- **Received Response:** was a
- **Result:** Incorrect

Pass 2 words in, ask for 2 words out:

In [92]:
await predictor(2, 2, 1, 'Dickens')


- **Input:** It was
- **Correct Response:** the best
- **Received Response:** a dark
- **Result:** Incorrect

Pass 3 words in, ask for 3 words out, try 1 time:

In [93]:
await predictor(3, 3, 1, 'Dickens')


- **Input:** It was the
- **Correct Response:** best of times,
- **Received Response:** best of times
- **Result:** Correct

Pass 3 words in, ask for 10 words out:

In [94]:
await predictor(3, 10, 1, 'Dickens')


- **Input:** It was the
- **Correct Response:** best of times, it was the worst of times, it
- **Received Response:** best of times, it was 

- **Result:** Correct