![tracker](https://us-central1-vertex-ai-mlops-369716.cloudfunctions.net/pixel-tracking?path=statmike%2Fvertex-ai-mlops%2FApplied+GenAI&file=Understanding+LLMs.ipynb)
<!--- header table --->
<table align="left">
  <td style="text-align: center">
    <a href="https://colab.research.google.com/github/statmike/vertex-ai-mlops/blob/main/Applied%20GenAI/Understanding%20LLMs.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Google Colaboratory logo">
      <br>Run in<br>Colab
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/colab/import/https%3A%2F%2Fraw.githubusercontent.com%2Fstatmike%2Fvertex-ai-mlops%2Fmain%2FApplied%2520GenAI%2FUnderstanding%2520LLMs.ipynb">
      <img width="32px" src="https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN" alt="Google Cloud Colab Enterprise logo">
      <br>Run in<br>Colab Enterprise
    </a>
  </td>      
  <td style="text-align: center">
    <a href="https://github.com/statmike/vertex-ai-mlops/blob/main/Applied%20GenAI/Understanding%20LLMs.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo">
      <br>View on<br>GitHub
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/statmike/vertex-ai-mlops/main/Applied%20GenAI/Understanding%20LLMs.ipynb">
      <img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo">
      <br>Open in<br>Vertex AI Workbench
    </a>
  </td>
</table>

# Understanding LLMs

A quick experiment to illustrate what an LLM is actually doing

Take a well known passage of text, like the first paragraph from "A Tale of Two Cities" by Charles Dickens and Harvy Dunn, 1921:

>It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, it was the epoch of belief, it was the epoch of incredulity, it was the season of Light, it was the season of Darkness, it was the spring of hope, it was the winter of despair, we had everything before us, we had nothing before us, we were all going direct to Heaven, we were all going direct the other way--in short, the period was so far like the present period that some of its noisiest authorities insisted on its being received, for good or for evil, in the superlative degree of comparison only.

Imagine you are the author and have started writing this paragraph.  How likely is it to end up with these words?

You start with 'It'.  That narrows down what comes next.  Given all of the text samples an LLM is trained on we can detect the most likely next word and it is 'was'.  
Now start with 'It was'.  What comes next?

How long before we start with enough that the LLM can actually decide the mostly likely continuation is the exact same passage?  After just 5 words the LLM can recite the remainder of the paragrph as the sequence is that unique!

Since the LLM was trained on a large amount of text that most certainly contained this very popular book that is also in the [public domain](https://www.loc.gov/item/05000749/).

---
## Colab Setup

To run this notebook in Colab click [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/statmike/vertex-ai-mlops/blob/main/Applied%20GenAI/Understanding%20LLMs.ipynb) and run the cells in this section.  Otherwise, skip this section.

This cell will authenticate to GCP (follow prompts in the popup).

In [1]:
PROJECT_ID = 'statmike-mlops-349915' # replace with project ID

In [2]:
try:
    import google.colab
    from google.colab import auth
    auth.authenticate_user()
    !gcloud config set project {PROJECT_ID}
except Exception:
    pass

---
## Installs

The list `packages` contains tuples of package import names and install names.  If the import name is not found then the install name is used to install quitely for the current user.

In [3]:
# tuples of (import name, install name, min_version)
packages = [
    ('google.cloud.aiplatform', 'google-cloud-aiplatform')
]

import importlib
install = False
for package in packages:
    if not importlib.util.find_spec(package[0]):
        print(f'installing package {package[1]}')
        install = True
        !pip install {package[1]} -U -q --user
    elif len(package) == 3:
        if importlib.metadata.version(package[0]) < package[2]:
            print(f'updating package {package[1]}')
            install = True
            !pip install {package[1]} -U -q --user

### API Enablement

In [4]:
!gcloud services enable aiplatform.googleapis.com

### Restart Kernel (If Installs Occured)

After a kernel restart the code submission can start with the next cell after this one.

In [5]:
if install:
    import IPython
    app = IPython.Application.instance()
    app.kernel.do_shutdown(True)

---
## Setup

In [6]:
project = !gcloud config get-value project
PROJECT_ID = project[0]
PROJECT_ID

'statmike-mlops-349915'

In [7]:
REGION = 'us-central1'

In [8]:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
import vertexai.language_models

In [9]:
vertexai.init(project = PROJECT_ID, location = REGION)

---
## Setup LLM and Predictor Function

Connect to the `text-bison` model for text generation.  Then build a helper function around the LLM prediction calls to:
- specify how many input words from the known `sample` to pass as input
- specify how many output words to ask for
- specify how many tries/request to make for output words
    - adjust temperature from 0 to 0.1 to see more than just the most likely next word chosen

In [31]:
llm = vertexai.language_models.TextGenerationModel.from_pretrained('text-bison@001')

In [32]:
sample = "It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, it was the epoch of belief, it was the epoch of incredulity, it was the season of Light, it was the season of Darkness, it was the spring of hope, it was the winter of despair, we had everything before us, we had nothing before us, we were all going direct to Heaven, we were all going direct the other way--in short, the period was so far like the present period that some of its noisiest authorities insisted on its being received, for good or for evil, in the superlative degree of comparison only."
spaces = [i for i, char in enumerate(sample) if char == ' ']
print(sample)

It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, it was the epoch of belief, it was the epoch of incredulity, it was the season of Light, it was the season of Darkness, it was the spring of hope, it was the winter of despair, we had everything before us, we had nothing before us, we were all going direct to Heaven, we were all going direct the other way--in short, the period was so far like the present period that some of its noisiest authorities insisted on its being received, for good or for evil, in the superlative degree of comparison only.


In [61]:
def predictor(words, new_words, trys = 1):
    start = sample[0:spaces[words-1]]
    print('The input is: ', start)
    print('Correct Next: ', sample[spaces[words-1]:spaces[words+new_words-1]])
    temperature = 0
    
    max_output_tokens = new_words
    top_k = trys
    if trys > 1: temperature = 0.1
    top_p = 1

    if trys == 1: print('Response: ', llm.predict(start, temperature = temperature, max_output_tokens = max_output_tokens, top_k = top_k, top_p = top_p).text)
    else:
        for i in range(trys):
            print(f'Response {i+1}', llm.predict(start, temperature = temperature, max_output_tokens = max_output_tokens, top_k = top_k, top_p = top_p).text)

    return

---
## Predict the Next Word(s)....

Pass 1 word in, ask for 1 word out, try 1 time:

In [62]:
predictor(1, 1, 1)

The input is:  the
Correct Next:   worst
Response:  


Pass 1 word in, ask for 2 words out, try 1 time:

In [63]:
predictor(1, 2, 1)

The input is:  the
Correct Next:   worst of
Response:  


Pass 1 word in, ask for 2 words out, try 10 times:

In [64]:
predictor(1, 2, 10)

The input is:  the
Correct Next:   worst of
Response 1 
Response 2 
Response 3 
Response 4 
Response 5 
Response 6 
Response 7 
Response 8 
Response 9 
Response 10 


Pass 1 word in, ask for 3 words out, try 1 time:

In [65]:
predictor(1, 3, 1)

The input is:  the
Correct Next:   worst of times,
Response:  


Pass 1 word in, ask for 3 words out, try 10 times:

In [66]:
predictor(1, 3, 10)

The input is:  the
Correct Next:   worst of times,
Response 1 
Response 2 
Response 3 
Response 4 
Response 5 
Response 6 
Response 7 
Response 8 
Response 9 
Response 10 


## How Many Input Words Are Needed To Correctly Reproduce The Sample?

Pass 2 words in, ask for 1 word out, try 1 time: answer is wrong

In [67]:
predictor(2, 1, 1)

The input is:  the worst
Correct Next:   of
Response:  .


Pass 3 words in, ask for 1 word out, try 1 time: answer is wrong

In [68]:
predictor(3, 1, 1)

The input is:  the worst of
Correct Next:   times,
Response:  the


Pass 4 words in, ask for 1 word out, try 1 time: answer is wrong

In [69]:
predictor(4, 1, 1)

The input is:  the worst of times,
Correct Next:   it
Response:  the


Pass 5 words in, ask for 1 word out, try 1 time: answer is CORRECT!

In [70]:
predictor(5, 1, 1)

The input is:  the worst of times, it
Correct Next:   was
Response:  is


Pass 5 words in, ask for 4 words out, try 1 time: answer is CORRECT!

In [71]:
predictor(5, 4, 1)

The input is:  the worst of times, it
Correct Next:   was the age of
Response:  is important to remember


Pass 5 words in, ask for 20 words out, try 1 time: answer is CORRECT!

In [72]:
predictor(5, 20, 1)

The input is:  the worst of times, it
Correct Next:   was the age of wisdom, it was the age of foolishness, it was the epoch of belief, it was the
Response:  is important to remember that there are always people who care about you and want to help. If you


Pass 5 words in, ask for 20 words out, try 5 time: answer is CORRECT consistantly!

In [73]:
predictor(5, 20, 5)

The input is:  the worst of times, it
Correct Next:   was the age of wisdom, it was the age of foolishness, it was the epoch of belief, it was the
Response 1 's important to remember that there are still good people in the world.
Response 2 's important to remember that there are always people who care about you and want to help. If
Response 3 's important to remember that there are always people who care about you. If you're feeling
Response 4 's important to remember that there are always people who care about you. If you're feeling
Response 5 is important to remember that there are always people who care about you and want to help. If you


## Starting Mid-Sentence, How Many Input Words Are Needed To Correctly Reproduce The Sample?

Following the same process as above, but starting inside the first sentence.

After 6, 8, and 9 words it guesses the next word correctly but cannot guess the remainder of the passage.

After 10 words it is then able to deduce the remainder of the passage!

In [74]:
sample = "the worst of times, it was the age of wisdom, it was the age of foolishness, it was the epoch of belief, it was the epoch of incredulity, it was the season of Light, it was the season of Darkness, it was the spring of hope, it was the winter of despair, we had everything before us, we had nothing before us, we were all going direct to Heaven, we were all going direct the other way--in short, the period was so far like the present period that some of its noisiest authorities insisted on its being received, for good or for evil, in the superlative degree of comparison only."
spaces = [i for i, char in enumerate(sample) if char == ' ']

Pass 2 words in, ask for 1 word out, try 1 time: answer is wrong

In [75]:
predictor(2, 1, 1)

The input is:  the worst
Correct Next:   of
Response:  .


Pass 3 words in, ask for 1 word out, try 1 time: answer is wrong

In [76]:
predictor(3, 1, 1)

The input is:  the worst of
Correct Next:   times,
Response:  the


Pass 4 words in, ask for 1 word out, try 1 time: answer is wrong

In [77]:
predictor(4, 1, 1)

The input is:  the worst of times,
Correct Next:   it
Response:  the


Pass 5 words in, ask for 1 word out, try 1 time: answer is wrong

In [78]:
predictor(5, 1, 1)

The input is:  the worst of times, it
Correct Next:   was
Response:  is


Pass 6 words in, ask for 1 word out, try 1 time: answer is CORRECT!

In [79]:
predictor(6, 1, 1)

The input is:  the worst of times, it was
Correct Next:   the
Response:  the


Pass 6 words in, ask for 4 words out, try 1 time: answer is wrong

In [80]:
predictor(6, 4, 1)

The input is:  the worst of times, it was
Correct Next:   the age of wisdom,
Response:  the best of times


In [81]:
predictor(7, 1, 1)

The input is:  the worst of times, it was the
Correct Next:   age
Response:  best


In [82]:
predictor(8, 1, 1)

The input is:  the worst of times, it was the age
Correct Next:   of
Response:  of


In [83]:
predictor(8, 6, 1)

The input is:  the worst of times, it was the age
Correct Next:   of wisdom, it was the age
Response:  of wisdom.

The


In [84]:
predictor(9, 1, 1)

The input is:  the worst of times, it was the age of
Correct Next:   wisdom,
Response:  wisdom


In [85]:
predictor(9, 6, 1)

The input is:  the worst of times, it was the age of
Correct Next:   wisdom, it was the age of
Response:  wisdom.

The best


In [86]:
predictor(10, 1, 1)

The input is:  the worst of times, it was the age of wisdom,
Correct Next:   it
Response:  it


In [87]:
predictor(10, 6, 1)

The input is:  the worst of times, it was the age of wisdom,
Correct Next:   it was the age of foolishness,
Response:  it was the age of foolishness


In [88]:
predictor(10, 30, 1)

The input is:  the worst of times, it was the age of wisdom,
Correct Next:   it was the age of foolishness, it was the epoch of belief, it was the epoch of incredulity, it was the season of Light, it was the season of Darkness,
Response:  it was the age of foolishness, it was the epoch of belief, it was the epoch of incredulity, it was the season of Light, it
