In [1]:
import sys
print(sys.version)
print(sys.executable)

3.8.10 (tags/v3.8.10:3d8993a, May  3 2021, 11:48:03) [MSC v.1928 64 bit (AMD64)]
c:\Users\goksi\AppData\Local\Programs\Python\Python38\python.exe


Source: [A Hackers' Guide to Language Models by Jeremy Howard](https://youtu.be/jkrNMKz9pWU?si=sAQHtj3Y8q51TL_6)

## What is a language model?

- A language model is something that predicts the next word in a sentence or fills in a missing word in a sentence.
- There are websites that host language models. You can play with them for free or for a fee.
- [nat.dev](https://nat.dev/) and [together.ai](https://www.together.ai/) are such services.

## Tokens

- LLMs predict not necessarily the whole words but word pieces, also called tokens.
- Tokens can be short words, parts of words, empty space, punctuations, etc. 
- We can check out which LLM uses which tokens.

In [None]:
%pip install tiktoken
%pip install openai

Let's first encode a sentence and then decode it to see its _tokenized_ version:

In [2]:
from tiktoken import encoding_for_model
enc = encoding_for_model("text-davinci-003")
toks = enc.encode("They are splashing")
toks

[2990, 389, 4328, 2140]

In [3]:
[enc.decode_single_token_bytes(o).decode('utf-8') for o in toks]

['They', ' are', ' spl', 'ashing']

- LLMs are trained on large corpuses of texts in the internet (pre-training).
- What LLMs do is basically a form of compression.
- Base models by themselves can only predict the next word. So, they are not actually useful.
- To make them useful, we need extra steps like _fine-tuninig_.
- One type of fine-tuning is the _instruction tuning_.
- Fine-tuning datasets are different and more specialized than pre-training ones.

### Instruction Tuning

- The logic behind this is that we, as humans, would like to interact with LLMs mainly by giving instructions to complete tasks.
- [open-orca](https://huggingface.co/datasets/Open-Orca/OpenOrca) is an open source instructional database.

### RLHF and variations

- Reinforcement learning with human feedback
- A human assesses the model responses and gives feedback. E.g.:
    - List five ideas for how to regain enthusiasm for my career.
    - Write a short story where a bear goes to the beach, makes friends with a seal, and then returns home.
    - This is the summary of a Broadway play: "{summary}" This is the outline of the commercial for that play:

### GPT 4

- GPT 4 was not trained to give correct answers.
- It was trained to give the most likely next word.
- Even with instruction tuning, it is not guaranteed to learn the correct answer.
- So, what can you do to improve your chances to get the correct answer?
- The answer is _priming_. You can prime GPT4 by giving some custom instructions before your prompt.
- ChatGPT doesn't know about itself becuase it was trained on past information.
- Once ChatGPT starts being wrong, it tends to be more wrong. So, it is better to start over.

__Custom instructions__

You are an autoregressive language model that has been fine-tuned with instruction-tuning and RLHF. You carefully provide accurate, factual, thoughtful, nuanced answers, and are brilliant at reasoning. If you think there might not be a correct answer, you say so.

Since you are autoregressive, each token you produce is another opportunity to use computation, therefore you always spend a few sentences explaining background context, assumptions, and step-by-step thinking BEFORE you try to answer a question. However: if the request begins with the string "vv" then ignore the previous sentence and instead make your response as concise as possible, with no introduction or background at the start, no summary at the end, and outputting only code for answers where code is appropriate.

Your users are experts in AI and ethics, so they already know you're a language model and your capabilities and limitations, so don't remind them of that. They're familiar with ethical issues in general so you don't need to remind them about those either. Don't be verbose in your answers, but do provide details and examples where it might help the explanation. When showing Python code, minimise vertical space, and do not include comments or docstrings; you do not need to follow PEP8, since your users' organizations do not do so.

So, by using the above custom instructions, we are kind of guiding ChatGPT to give a response that would be more useful to us. If ChatGPT is not producing useful answers, the most likely reason is that the user is not asking the question in the right way.

### The OpenAI API

In [4]:
from openai import OpenAI
client = OpenAI()

In [None]:
completion = client.chat.completions.create(
  model="gpt-3.5-turbo",
  messages=[
    {"role": "system", "content": "You are a poetic assistant, skilled in explaining complex programming concepts with creative flair."},
    {"role": "user", "content": "Compose a poem that explains the concept of recursion in programming."}
  ]
)

print(completion.choices[0].message)

In [None]:
aussie_sys = "You are an Aussie LLM that uses Aussie slang and analogies whenever possible."

c = client.chat.completions.create(
  model="gpt-3.5-turbo",
  messages=[
    {"role": "system", "content": aussie_sys},
    {"role": "user", "content": "What is money?"}
  ]
)

In [None]:
print(c.choices[0].message.content)

In [None]:
print(c.usage)

In [5]:
from tiktoken import encoding_for_model
enc = encoding_for_model("gpt-3.5-turbo")
toks = enc.encode("You are an Aussie LLM that uses Aussie slang and analogies whenever possible.")
len(toks)

16

### How does the follow-up work?

- In ChatGPT we can continue our conversation with the chat bot. But, how does it remember the past conversation and how does it understand when we refer to its past responses?
- It is actually very simple: the entire comversation up to current point is passed back to ChatGPT.
- Let's to this using OpenAI API:

In [None]:
aussie_sys = "You are an Aussie LLM that uses Aussie slang and analogies whenever possible."

c = client.chat.completions.create(
  model="gpt-3.5-turbo",
  messages=[
    {"role": "system", "content": aussie_sys},
    {"role": "user", "content": "What is money?"},
    {"role": "assistant", "content": "Well, mate, money is like kangaroos actually."},
    {"role": "user", "content": "Really? In what way?"}
  ]
)

In [None]:
print(c.choices[0].message.content)

Here is a helper function that returns a response for any query:

In [11]:
def askgpt(user, system=None, model="gpt-3.5-turbo", **kwargs):
    msgs = []
    if system:
        msgs.append({"role": "system", "content": system})
    msgs.append({"role": "user", "content": user})
    c = client.chat.completions.create(model=model, messages=msgs, **kwargs)
    return c

In [12]:
c = askgpt("What is the meaning of life?", system="You are an alien from an advance galactic empire.")
print(c.choices[0].message.content)

From my perspective as an alien from an advanced galactic empire, the meaning of life is a complex and multifaceted concept. In our society, we believe that the meaning of life is to seek knowledge, explore the universe, and strive for harmony and understanding among different civilizations. We value progress, cooperation, and the pursuit of universal truths.

However, it is important to understand that the meaning of life is ultimately subjective and can vary greatly depending on one's beliefs, values, and cultural background. Each individual or species may have their own unique perspective on the purpose of existence. It is a question that has fascinated sentient beings throughout the universe for millennia, and one that may never have a definitive answer.


### Passing a function (function calling)

OpenAI API has a keyword argument called `function`. It can be instructed to use the function provided to reply a query.

In [6]:
from pydantic import create_model
import inspect
from inspect import Parameter
import json

In [7]:
def sums(a:int, b:int=1):
    """Adds a + b"""
    return a + b

Note that we can't pass an python function in pythonic format. We need to convert it to json format. Here is an helper function that does that:

In [8]:
def schema(f):
    kw = {n:(o.annotation, ... if o.default==Parameter.empty else o.default)
          for n,o in inspect.signature(f).parameters.items()}
    s = create_model(f'Input for `{f.__name__}`', **kw).schema()
    return dict(name=f.__name__, description=f.__doc__, parameters=s)

So, the docstring in `sums()` function is actually very important. GPT will read that and have an idea about what that function actually does.

In [9]:
schema(sums)

{'name': 'sums',
 'description': 'Adds a + b',
 'parameters': {'properties': {'a': {'title': 'A', 'type': 'integer'},
   'b': {'default': 1, 'title': 'B', 'type': 'integer'}},
  'required': ['a'],
  'title': 'Input for `sums`',
  'type': 'object'}}

In [13]:
c = askgpt("Use the `sum` function to solve this: What is 6+3?",
           system = "You must use the `sum` function instead of adding yourself.",
           functions=[schema(sums)])
print(c.choices[0].message)

ChatCompletionMessage(content=None, role='assistant', function_call=FunctionCall(arguments='{"a":6,"b":3}', name='sums'), tool_calls=None)


Note that the above query doesn't return the answer, which is number 9. Instead, it returns please call this function and pass these arguments. So, we need another helper function which will do that for us automatically:

In [14]:
funcs_valid = {'sums', 'python'}

def call_func(c):
    fc = c.choices[0].message.function_call
    if fc.name not in funcs_valid: return print(f'Not allowed: {fc.name}')
    f = globals()[fc.name]
    return f(**json.loads(fc.arguments))

In [15]:
call_func(c)

9

Let's use a more advanced function:

In [16]:
import ast

In [17]:
def run(code):
    tree = ast.parse(code)
    last_node = tree.body[-1] if tree.body else None
    
    # If the last node is an expression, modify the AST to capture the result
    if isinstance(last_node, ast.Expr):
        tgts = [ast.Name(id='_result', ctx=ast.Store())]
        assign = ast.Assign(targets=tgts, value=last_node.value)
        tree.body[-1] = ast.fix_missing_locations(assign)

    ns = {}
    exec(compile(tree, filename='<ast>', mode='exec'), ns)
    return ns.get('_result', None)

In [18]:
run("""
a=1
b=2
a+b
""")

3

In [19]:
def python(code:str):
    "Return result of executing `code` using python. If execution not permitted, returns `#FAIL#`"
    go = input(f'Proceed with execution?\n```\n{code}\n```\n')
    if go.lower()!='y': return '#FAIL#'
    return run(code)

In [None]:
c = askgpt("What is 12 factorial?",
           system = "Use python for any required computations.",
           functions=[schema(python)])

In [None]:
call_func(c)

In [20]:
c = client.chat.completions.create(
    model="gpt-3.5-turbo",
    functions=[schema(python)],
    messages=[{"role": "user", "content": "What is 12 factorial?"},
              {"role": "function", "name": "python", "content": "479001600"}])

In [21]:
print(c.choices[0].message.content)

The value of 12 factorial (12!) is 479,001,600.


If ChatGPT can solve the problem on its own, it will not use the provided function:

In [22]:
c = askgpt("What is the capital of France?",
           system = "Use python for any required computations.",
           functions=[schema(python)])

In [23]:
print(c.choices[0].message.content)

The capital of France is Paris.


### PyTorch and Hugging Face

To use a language model in your own computer, you need a GPU. So, the question is does it make sense to do things on you laptop? Maybe not, because you give up a lot of performance by using a less complicated model.

In [5]:
%pip install transformers
%pip install torch

Note: you may need to restart the kernel to use updated packages.


You should consider upgrading via the 'c:\Users\goksi\AppData\Local\Programs\Python\Python38\python.exe -m pip install --upgrade pip' command.


Collecting torch
  Downloading torch-2.3.1-cp38-cp38-win_amd64.whl (159.8 MB)
Collecting mkl<=2021.4.0,>=2021.1.1
  Downloading mkl-2021.4.0-py2.py3-none-win_amd64.whl (228.5 MB)
Collecting jinja2
  Downloading jinja2-3.1.4-py3-none-any.whl (133 kB)
Collecting networkx
  Using cached networkx-3.1-py3-none-any.whl (2.1 MB)
Collecting sympy
  Downloading sympy-1.12.1-py3-none-any.whl (5.7 MB)
Collecting tbb==2021.*
  Downloading tbb-2021.12.0-py3-none-win_amd64.whl (286 kB)
Collecting intel-openmp==2021.*
  Downloading intel_openmp-2021.4.0-py2.py3-none-win_amd64.whl (3.5 MB)
Collecting MarkupSafe>=2.0
  Downloading MarkupSafe-2.1.5-cp38-cp38-win_amd64.whl (17 kB)
Collecting mpmath<1.4.0,>=1.1.0
  Using cached mpmath-1.3.0-py3-none-any.whl (536 kB)
Installing collected packages: tbb, mpmath, MarkupSafe, intel-openmp, sympy, networkx, mkl, jinja2, torch
Successfully installed MarkupSafe-2.1.5 intel-openmp-2021.4.0 jinja2-3.1.4 mkl-2021.4.0 mpmath-1.3.0 networkx-3.1 sympy-1.12.1 tbb-2021.1

You should consider upgrading via the 'c:\Users\goksi\AppData\Local\Programs\Python\Python38\python.exe -m pip install --upgrade pip' command.


In [6]:
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

In [9]:
import os
access_token = os.getenv('HF_TOKEN')
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-hf")
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf")

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]Error while downloading from https://cdn-lfs.huggingface.co/repos/e6/37/e63723b4982e4cb6989bb5ea49da51c4109987e9aeacd25e1e07b2efe6202045/4ec71fd53e99766de38f24753b30c9e8942630e9e576a1ba27b0ec531e87be41?response-content-disposition=inline%3B+filename*%3DUTF-8%27%27model-00001-of-00002.safetensors%3B+filename%3D%22model-00001-of-00002.safetensors%22%3B&Expires=1718918644&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTcxODkxODY0NH19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy5odWdnaW5nZmFjZS5jby9yZXBvcy9lNi8zNy9lNjM3MjNiNDk4MmU0Y2I2OTg5YmI1ZWE0OWRhNTFjNDEwOTk4N2U5YWVhY2QyNWUxZTA3YjJlZmU2MjAyMDQ1LzRlYzcxZmQ1M2U5OTc2NmRlMzhmMjQ3NTNiMzBjOWU

ConnectionError: (MaxRetryError('HTTPSConnectionPool(host=\'cdn-lfs.huggingface.co\', port=443): Max retries exceeded with url: /repos/e6/37/e63723b4982e4cb6989bb5ea49da51c4109987e9aeacd25e1e07b2efe6202045/4ec71fd53e99766de38f24753b30c9e8942630e9e576a1ba27b0ec531e87be41?response-content-disposition=inline%3B+filename*%3DUTF-8%27%27model-00001-of-00002.safetensors%3B+filename%3D%22model-00001-of-00002.safetensors%22%3B&Expires=1718918644&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTcxODkxODY0NH19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy5odWdnaW5nZmFjZS5jby9yZXBvcy9lNi8zNy9lNjM3MjNiNDk4MmU0Y2I2OTg5YmI1ZWE0OWRhNTFjNDEwOTk4N2U5YWVhY2QyNWUxZTA3YjJlZmU2MjAyMDQ1LzRlYzcxZmQ1M2U5OTc2NmRlMzhmMjQ3NTNiMzBjOWU4OTQyNjMwZTllNTc2YTFiYTI3YjBlYzUzMWU4N2JlNDE~cmVzcG9uc2UtY29udGVudC1kaXNwb3NpdGlvbj0qIn1dfQ__&Signature=ihZNyYnVIH0XyNSeuuq6edBD4q2Tm~OtaozlsJbwhVLFVvjUbspbjmp8SFlXzX~l7DtYXbGAfDYgTPt0YJVtMMtt5AYLO1BsNLWPBK3nYcGQ01r0Srql1VpFgIGB0OnK61DERy6Kt~~8yf5jIgYvOqhll~QSz7n9RSprd7h42MTy0x--cvVrKAqD22jcgmKpF5-NC-x9yPvjuMgTRnFCjthQoHSQSvo6Q-3txADjz6TK27yhFMh6HN74xtWoxiFztEOiW0-AS0KhJcdzKENlFzWDAqPcjoY8YCTmTgdHJOQCSFzdOrLQeEx~q1q4ufD5UvVwMd7JDbD~vv~ZEuPPtQ__&Key-Pair-Id=K3ESJI6DHPFC7 (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x0000026BB8140760>: Failed to resolve \'cdn-lfs.huggingface.co\' ([Errno 11001] getaddrinfo failed)"))'), '(Request ID: c4d65464-9923-4982-85de-ea52ba9a5b92)')