![ga4](https://www.google-analytics.com/collect?v=2&tid=G-6VDTYWLKX6&cid=1&en=page_view&sid=1&dl=statmike%2Fvertex-ai-mlops%2FApplied+GenAI&dt=Understanding+LLMs.ipynb)

# Understanding LLMs

A quick experiment to illustrate what an LLM is actually doing

Take a well known passage of text, like the first paragraph from "A Tale of Two Cities" by Charles Dickens and Harvy Dunn, 1921:

>It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, it was the epoch of belief, it was the epoch of incredulity, it was the season of Light, it was the season of Darkness, it was the spring of hope, it was the winter of despair, we had everything before us, we had nothing before us, we were all going direct to Heaven, we were all going direct the other way--in short, the period was so far like the present period that some of its noisiest authorities insisted on its being received, for good or for evil, in the superlative degree of comparison only.

Imagine you are the author and have started writing this paragraph.  How likely is it to end up with these words?

You start with 'It'.  That narrows down what comes next.  Given all of the text samples an LLM is trained on we can detect the most likely next word and it is 'was'.  
Now start with 'It was'.  What comes next?

How long before we start with enough that the LLM can actually decide the mostly likely continuation is the exact same passage?  After just 5 words the LLM can recite the remainder of the paragrph as the sequence is that unique!

Since the LLM was trained on a large amount of text that most certainly contained this very popular book that is also in the [public domain](https://www.loc.gov/item/05000749/).

---
## Colab Setup

To run this notebook in Colab click [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/statmike/vertex-ai-mlops/blob/main/Applied%20GenAI/Understanding%20LLMs.ipynb) and run the cells in this section.  Otherwise, skip this section.

This cell will authenticate to GCP (follow prompts in the popup).

In [1]:
PROJECT_ID = 'statmike-mlops-349915' # replace with project ID

In [2]:
try:
    import google.colab
    from google.colab import auth
    auth.authenticate_user()
    !gcloud config set project {PROJECT_ID}
except Exception:
    pass

---
## Installs and API Enablement

The clients packages may need installing in this environment.  Also, the APIs for Cloud Speech-To-Text and Cloud Text-To-Speech need to be enabled (if not already enabled).

### Installs (If Needed)

In [None]:
install = False
try: import google.cloud.aiplatform
except ImportError:
    print('You need to pip install google-cloud-aiplatform (VERTEX AI), ... commencing')
    !pip install google-cloud-aiplatform -U -q
    install = True

### API Enablement

In [79]:
!gcloud services enable aiplatform.googleapis.com

### Restart Kernel (If Installs Occured)

After a kernel restart the code submission can start with the next cell after this one.

In [None]:
if install:
    import IPython
    app = IPython.Application.instance()
    app.kernel.do_shutdown(True)

---
## Setup

In [10]:
project = !gcloud config get-value project
PROJECT_ID = project[0]
PROJECT_ID

'statmike-mlops-349915'

In [11]:
REGION = 'us-central1'

In [12]:
import vertexai.language_models

In [13]:
vertexai.init(project = PROJECT_ID, location = REGION)

---
## Setup LLM and Predictor Function

Connect to the `text-bison` model for text generation.  Then build a helper function around the LLM prediction calls to:
- specify how many input words from the known `sample` to pass as input
- specify how many output words to ask for
- specify how many tries/request to make for output words
    - adjust temperature from 0 to 0.1 to see more than just the most likely next word chosen

In [19]:
llm = vertexai.language_models.TextGenerationModel.from_pretrained('text-bison@001')

In [37]:
sample = "It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, it was the epoch of belief, it was the epoch of incredulity, it was the season of Light, it was the season of Darkness, it was the spring of hope, it was the winter of despair, we had everything before us, we had nothing before us, we were all going direct to Heaven, we were all going direct the other way--in short, the period was so far like the present period that some of its noisiest authorities insisted on its being received, for good or for evil, in the superlative degree of comparison only."
spaces = [i for i, char in enumerate(sample) if char == ' ']
print(sample)

It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, it was the epoch of belief, it was the epoch of incredulity, it was the season of Light, it was the season of Darkness, it was the spring of hope, it was the winter of despair, we had everything before us, we had nothing before us, we were all going direct to Heaven, we were all going direct the other way--in short, the period was so far like the present period that some of its noisiest authorities insisted on its being received, for good or for evil, in the superlative degree of comparison only.


In [99]:
def predictor(words, new_words, trys = 1):
    start = sample[0:spaces[words-1]]
    print('The input is: ', start)
    print('Correct Next: ', sample[spaces[words-1]:spaces[words+new_words-1]])
    temperature = 0
    
    max_output_tokens = new_words
    top_k = trys
    if trys > 1: temperature = 0.1
    top_p = 1
    
    if trys == 1: print('Response: ', llm.predict(start, temperature = temperature, max_output_tokens = max_output_tokens, top_k = top_k, top_p = top_p))
    else:
        for i in range(trys):
            print(f'Response {i+1}', llm.predict(start, temperature = temperature, max_output_tokens = max_output_tokens, top_k = top_k, top_p = top_p))

    return

---
## Predict the Next Word(s)....

Pass 1 word in, ask for 1 word out, try 1 time:

In [100]:
predictor(1, 1, 1)

The input is:  It
Correct Next:   was
Response:  is


Pass 1 word in, ask for 2 words out, try 1 time:

In [101]:
predictor(1, 2, 1)

The input is:  It
Correct Next:   was the
Response:  is a


Pass 1 word in, ask for 2 words out, try 10 times:

In [103]:
predictor(1, 2, 10)

The input is:  It
Correct Next:   was the
Response 1 's
Response 2 's
Response 3 is a
Response 4 is a
Response 5 is a
Response 6 's
Response 7 's
Response 8 is a
Response 9 's
Response 10 's


Pass 1 word in, ask for 3 words out, try 1 time:

In [104]:
predictor(1, 3, 1)

The input is:  It
Correct Next:   was the best
Response:  is a very


Pass 1 word in, ask for 3 words out, try 10 times:

In [105]:
predictor(1, 3, 10)

The input is:  It
Correct Next:   was the best
Response 1 is a good
Response 2 is a good
Response 3 is a good
Response 4 is a good
Response 5 is a good
Response 6 is a very
Response 7 is a good
Response 8 is a good
Response 9 is a very
Response 10 is a good


## How Many Input Words Are Needed To Correctly Reproduce The Sample?

Pass 2 words in, ask for 1 word out, try 1 time: answer is wrong

In [106]:
predictor(2, 1, 1)

The input is:  It was
Correct Next:   the
Response:  a


Pass 3 words in, ask for 1 word out, try 1 time: answer is wrong

In [107]:
predictor(3, 1, 1)

The input is:  It was the
Correct Next:   best
Response:  first


Pass 4 words in, ask for 1 word out, try 1 time: answer is wrong

In [108]:
predictor(4, 1, 1)

The input is:  It was the best
Correct Next:   of
Response:  decision


Pass 5 words in, ask for 1 word out, try 1 time: answer is CORRECT!

In [109]:
predictor(5, 1, 1)

The input is:  It was the best of
Correct Next:   times,
Response:  times


Pass 5 words in, ask for 4 words out, try 1 time: answer is CORRECT!

In [111]:
predictor(5, 4, 1)

The input is:  It was the best of
Correct Next:   times, it was the
Response:  times, it was


Pass 5 words in, ask for 20 words out, try 1 time: answer is CORRECT!

In [112]:
predictor(5, 20, 1)

The input is:  It was the best of
Correct Next:   times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, it
Response:  times, it was the worst of times, it was the age of wisdom, it was the age


Pass 5 words in, ask for 20 words out, try 5 time: answer is CORRECT consistantly!

In [113]:
predictor(5, 20, 5)

The input is:  It was the best of
Correct Next:   times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, it
Response 1 times, it was the worst of times, it was the age of wisdom, it was the age
Response 2 times, it was the worst of times, it was the age of wisdom, it was the age
Response 3 times, it was the worst of times, it was the age of wisdom, it was the age
Response 4 times, it was the worst of times, it was the age of wisdom, it was the age
Response 5 times, it was the worst of times, it was the age of wisdom, it was the age


## Starting Mid-Sentence, How Many Input Words Are Needed To Correctly Reproduce The Sample?

Following the same process as above, but starting inside the first sentence.

After 6, 8, and 9 words it guesses the next word correctly but cannot guess the remainder of the passage.

After 10 words it is then able to deduce the remainder of the passage!

In [116]:
sample = "the worst of times, it was the age of wisdom, it was the age of foolishness, it was the epoch of belief, it was the epoch of incredulity, it was the season of Light, it was the season of Darkness, it was the spring of hope, it was the winter of despair, we had everything before us, we had nothing before us, we were all going direct to Heaven, we were all going direct the other way--in short, the period was so far like the present period that some of its noisiest authorities insisted on its being received, for good or for evil, in the superlative degree of comparison only."
spaces = [i for i, char in enumerate(sample) if char == ' ']

Pass 2 words in, ask for 1 word out, try 1 time: answer is wrong

In [117]:
predictor(2, 1, 1)

The input is:  the worst
Correct Next:   of
Response:  .


Pass 3 words in, ask for 1 word out, try 1 time: answer is wrong

In [118]:
predictor(3, 1, 1)

The input is:  the worst of
Correct Next:   times,
Response:  


Pass 4 words in, ask for 1 word out, try 1 time: answer is wrong

In [119]:
predictor(4, 1, 1)

The input is:  the worst of times,
Correct Next:   it
Response:  the


Pass 5 words in, ask for 1 word out, try 1 time: answer is wrong

In [123]:
predictor(5, 1, 1)

The input is:  the worst of times, it
Correct Next:   was
Response:  is


Pass 6 words in, ask for 1 word out, try 1 time: answer is CORRECT!

In [124]:
predictor(6, 1, 1)

The input is:  the worst of times, it was
Correct Next:   the
Response:  the


Pass 6 words in, ask for 4 words out, try 1 time: answer is wrong

In [125]:
predictor(6, 4, 1)

The input is:  the worst of times, it was
Correct Next:   the age of wisdom,
Response:  the best of times


In [126]:
predictor(7, 1, 1)

The input is:  the worst of times, it was the
Correct Next:   age
Response:  best


In [127]:
predictor(8, 1, 1)

The input is:  the worst of times, it was the age
Correct Next:   of
Response:  of


In [131]:
predictor(8, 6, 1)

The input is:  the worst of times, it was the age
Correct Next:   of wisdom, it was the age
Response:  of wisdom.

The


In [132]:
predictor(9, 1, 1)

The input is:  the worst of times, it was the age of
Correct Next:   wisdom,
Response:  wisdom


In [133]:
predictor(9, 6, 1)

The input is:  the worst of times, it was the age of
Correct Next:   wisdom, it was the age of
Response:  wisdom.

The best


In [134]:
predictor(10, 1, 1)

The input is:  the worst of times, it was the age of wisdom,
Correct Next:   it
Response:  it


In [135]:
predictor(10, 6, 1)

The input is:  the worst of times, it was the age of wisdom,
Correct Next:   it was the age of foolishness,
Response:  it was the age of foolishness


In [137]:
predictor(10, 30, 1)

The input is:  the worst of times, it was the age of wisdom,
Correct Next:   it was the age of foolishness, it was the epoch of belief, it was the epoch of incredulity, it was the season of Light, it was the season of Darkness,
Response:  it was the age of foolishness, it was the epoch of belief, it was the epoch of incredulity, it was the season of Light, it
