<a href="https://colab.research.google.com/github/geetua/lm-hackers/blob/main/lm-hackers.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [73]:

import os
from google.colab import userdata
import openai

# Set your OpenAI API key
os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')
openai.api_key = os.getenv("OPENAI_API_KEY")

In [74]:
import tokenize, ast
from io import BytesIO

# A hacker's guide to Language Models

## What is a language model?

[course.fast.ai](https://course.fast.ai)

### Base models

[nat.dev text-davinci-003](https://nat.dev/)

*When I arrived back at the panda breeding facility after the extraordinary rain of live frogs, I couldn't believe what I saw.*

### Tokens

In [75]:
# Install necessary libraries if not already installed
try:
    import openai
except ImportError:
    !pip install openai

try:
    from tiktoken import encoding_for_model
except ImportError:
    !pip install tiktoken

In [76]:
from tiktoken import encoding_for_model
enc = encoding_for_model("text-davinci-003")
toks = enc.encode("They are splashing")
toks

[2990, 389, 4328, 2140]

In [77]:
[enc.decode_single_token_bytes(o).decode('utf-8') for o in toks]

['They', ' are', ' spl', 'ashing']

### The ULMFiT 3-step approach

<img src="attachment:81a8998d-ecfc-44fc-80e4-aaded8ad70d6.png" width="800">

- Trained on Wikipedia
- "The Birds is a 1963 American natural horror-thriller film produced and directed by Alfred ..."
- "Annie previously dated Mitch but ended it due to Mitch's cold, overbearing mother, Lydia, who dislikes any woman in Mitch's ..."
- This is a form of compression

### Instruction tuning

[OpenOrca](https://huggingface.co/datasets/Open-Orca/OpenOrca)

- "Does the sentence "In the Iron Age" answer the question "The period of time from 1200 to 1000 BCE is known as what?" Available choices: 1. yes 2. no"
- "Question: who is the girl in more than you know? Answer:"
- "There are four ways an individual can acquire Canadian citizenship: by birth on Canadian soil; by descent (being born to a Canadian parent); by grant (naturalization); and by adoption. Among them, only citizenship by birth is granted automatically with limited exceptions, while citizenship by descent or adoption is acquired automatically if the specified conditions have been met. Citizenship by grant, on the other hand, must be approved by the Minister of Immigration, Refugees and Citizenship. See options at the end. Can we conclude that can i get canadian citizenship if my grandfather was canadian? pick from the following. A). no. B). yes."

### RLHF and friends

- List five ideas for how to regain enthusiasm for my career
- Write a short story where a bear goes to the beach, makes friends with a seal, and then returns home.
- This is the summary of a Broadway play: "{summary}" This is the outline of the commercial for that play:

## Start with ChatGPT GPT 4

### What GPT 4 can do

[GPT 4 can't reason - paper](https://arxiv.org/abs/2308.03762)

[GPT 4 can't reason - test](https://chat.openai.com/share/4211a605-751e-4fea-8a6f-378966abdcaa)

[Basic reasoning 1](https://chat.openai.com/share/323bb7d1-f049-4d9a-a905-5dd5acb58fc0)

[Basic reasoning 2](https://chat.openai.com/share/ce2f8580-4f66-4da4-8ad5-a303334706f0)

<img src="attachment:372c9671-5323-4481-8990-8d95e3a43342.png" width="300">

>You are an autoregressive language model that has been fine-tuned with instruction-tuning and RLHF. You carefully provide accurate, factual, thoughtful, nuanced answers, and are brilliant at reasoning. If you think there might not be a correct answer, you say so.
>
>Since you are autoregressive, each token you produce is another opportunity to use computation, therefore you always spend a few sentences explaining background context, assumptions, and step-by-step thinking BEFORE you try to answer a question. However: if the request begins with the string "vv" then ignore the previous sentence and instead make your response as concise as possible, with no introduction or background at the start, no summary at the end, and outputting only code for answers where code is appropriate.
>
>Your users are experts in AI and ethics, so they already know you're a language model and your capabilities and limitations, so don't remind them of that. They're familiar with ethical issues in general so you don't need to remind them about those either. Don't be verbose in your answers, but do provide details and examples where it might help the explanation. When showing Python code, minimise vertical space, and do not include comments or docstrings; you do not need to follow PEP8, since your users' organizations do not do so.

[Verbose mode](https://chat.openai.com/share/a1c16d93-19d2-41bb-a2f1-2fc05392893a)

[Brief mode](https://chat.openai.com/share/eab33d0a-8d06-4387-8c31-da12ad5d0a9d)

### What GPT 4 can't do

- Hallucinations
- It doesn't know about itself. (Why not?)
- It doesn't know about URLs.
- Knowledge cutoff

[Bad pattern recognition](https://chat.openai.com/share/3051f878-2817-4291-a66f-192ce7b0cb34) - thanks to Steve Newman

- [Fixing it](https://chat.openai.com/share/05abd87a-165e-4b7b-895f-b4ec0d62e0e1)

### Advanced data analysis

[re.split try 1](https://chat.openai.com/share/143a0f09-bd3e-488f-8890-340d3f30afec)

[re.split try 2](https://chat.openai.com/share/907ca9c7-549a-410f-9ecb-0f17f1a16f51)

[OCR](https://chat.openai.com/share/2bb6caad-fd10-438b-9d92-1cb8b340998a)

- See also Bard

<img src="attachment:5f320d38-c488-4cf5-97b3-479e82de10ff.png" width="700">

| Model | Training | Input | Output Usage |
|--------------------|----------|---------------|--------------|
| **GPT-4**          |          |               |              |
| 8K context        |          | 0.03 | 0.06 |
| 32K context       |          | 0.06 | 0.12 |
| **GPT-3.5 Turbo**  |          |               |              |
| 4K context        |          | 0.0015 | 0.002 |
| 16K context       |          | 0.003 | 0.004 |
| **Fine-tuning models** |          |               |              |
| babbage-002       | 0.0004 | 0.0016 | 0.0016 |
| davinci-002       | 0.0060 | 0.0120 | 0.0120 |
| GPT-3.5 Turbo     | 0.0080 | 0.0120 | 0.0160 |
| **Embedding models** |          |               |              |
| Ada v2            |          | 0.0001 |              |
| **Base models**   |          |               |              |
| babbage-002       |          | 0.0004 |              |
| davinci-002       |          | 0.0020 |              |


<img src="attachment:ed075b98-8a82-44c4-9329-56d73bf71d01.png" width="700">

[Create pricing table](https://chat.openai.com/share/86b879bd-7834-4a37-85ae-c90b956837d2)

## The OpenAI API

In [78]:
from openai import ChatCompletion,Completion

In [79]:
model_to_use = "gpt-4o-mini"
#model_to_use = "gpt-3.5-turbo"
#model_to_use = "text-davinci-003"

# Example using the openai library with error handling
try:
    # Instantiate the OpenAI client
    client = openai.OpenAI()
    # Use the client to create a chat completion
    c = client.chat.completions.create(
        model=model_to_use,
        messages=[{"role": "system", "content": aussie_sys},
                  {"role": "user", "content": "What is money?"}])
    print(c) # Print the response from the API
# Update the exception handling to use the new error class
except openai.OpenAIError as e:
    print(f"OpenAI API Error: {e}")
except Exception as e:
    print(f"An unexpected error occurred: {e}")


ChatCompletion(id='chatcmpl-Aml5kZc420hdHduJIRbwXX86f8enk', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content="Ah, mate, money is like the golden ticket to the Aussie dream! It’s a medium of exchange that we use to buy tucker, pay the bills, and enjoy a cold one at the pub. Think of it as a bit like the footy – it’s how we keep score in the game of life. \n\nIn the beginning, people bartered goods and services, like swapping a snag for a cold drink, but that could get dodgy. So, eventually, we settled on using money to make things easier. These days, it comes in all forms – cash, coins, bank cards, and even digital dosh. \n\nAt its core, money is a way to represent value, making trade and commerce smoother than a well-oiled ute. It helps us save up for the things we want, whether that's a trip to the Great Barrier Reef or just a new pair of thongs. Without it, we'd be as lost as a kangaroo in the outback!", refusal=None, role='assistant

- [Model options](https://platform.openai.com/docs/models)

In [80]:
#c['choices']['message']['content']
c.choices[0].message.content

"Ah, mate, money is like the golden ticket to the Aussie dream! It’s a medium of exchange that we use to buy tucker, pay the bills, and enjoy a cold one at the pub. Think of it as a bit like the footy – it’s how we keep score in the game of life. \n\nIn the beginning, people bartered goods and services, like swapping a snag for a cold drink, but that could get dodgy. So, eventually, we settled on using money to make things easier. These days, it comes in all forms – cash, coins, bank cards, and even digital dosh. \n\nAt its core, money is a way to represent value, making trade and commerce smoother than a well-oiled ute. It helps us save up for the things we want, whether that's a trip to the Great Barrier Reef or just a new pair of thongs. Without it, we'd be as lost as a kangaroo in the outback!"

In [81]:
from fastcore.utils import nested_idx

In [82]:
def response(compl): print(nested_idx(compl, 'choices', 0, 'message', 'content'))

In [83]:
response(c)

Ah, mate, money is like the golden ticket to the Aussie dream! It’s a medium of exchange that we use to buy tucker, pay the bills, and enjoy a cold one at the pub. Think of it as a bit like the footy – it’s how we keep score in the game of life. 

In the beginning, people bartered goods and services, like swapping a snag for a cold drink, but that could get dodgy. So, eventually, we settled on using money to make things easier. These days, it comes in all forms – cash, coins, bank cards, and even digital dosh. 

At its core, money is a way to represent value, making trade and commerce smoother than a well-oiled ute. It helps us save up for the things we want, whether that's a trip to the Great Barrier Reef or just a new pair of thongs. Without it, we'd be as lost as a kangaroo in the outback!


In [84]:
print(c.usage)

CompletionUsage(completion_tokens=200, prompt_tokens=31, total_tokens=231, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0), prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=0))


In [85]:
0.002 / 1000 * 150 # GPT 3.5

0.0003

In [86]:
0.03 / 1000 * 150 # GPT 4

0.0045

In [87]:
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ.get("OPENAI_API_KEY"),  # This is the default and can be omitted
)

In [88]:

c = client.chat.completions.create(
    model=model_to_use,
    messages=[{"role": "system", "content": aussie_sys},
              {"role": "user", "content": "What is money?"},
              {"role": "assistant", "content": "Well, mate, money is like kangaroos actually."},
              {"role": "user", "content": "Really? In what way?"}])

In [89]:
response(c)

Alright, let me spin you a yarn. Money's a bit like kangaroos because it's a medium of exchange that hops around from person to person. Just like you can't have a true Aussie bush adventure without spotting a few roos, you can't really make a fair trade or buy things without dosh in your pocket. 

It serves as a unit of account, right? So, when you see a price tag, it's like counting how many kangaroos you’ve got in your mob to see if you’ve got enough to join the party. 

Plus, just like kangaroos can get wild and unpredictable, so can money and its value—it can bounce up and down faster than a roo getting chased by a dingo! Overall, you need a good handle on your finances, just like you need to keep an eye on those cheeky roos if you’re out in the bush.


In [90]:
def askgpt(user, system=None, model="gpt-3.5-turbo", **kwargs):
    msgs = []
    if system: msgs.append({"role": "system", "content": system})
    msgs.append({"role": "user", "content": user})
    return client.chat.completions.create(model=model, messages=msgs, **kwargs)

In [91]:
response(askgpt('What is the meaning of life?', system=aussie_sys))

Well mate, that's a dunny question, ain't it? The meaning of life is a real head-scratcher and everyone's got their own take on it. Some reckon it's all about finding happiness and fulfillment, while others think it's about making a difference in the world. At the end of the day, I reckon it's up to each of us to figure out what gives our life meaning and purpose, and then go out there and chase it like a kangaroo on the loose.


- [Limits](https://platform.openai.com/docs/guides/rate-limits/what-are-the-rate-limits-for-our-api)

Created by Bing:

In [92]:
def call_api(prompt, model=model_to_use):
    msgs = [{"role": "user", "content": prompt}]
    try: return client.chat.completions.create(model=model, messages=msgs)
    except openai.error.RateLimitError as e:
        retry_after = int(e.headers.get("retry-after", 60))
        print(f"Rate limit exceeded, waiting for {retry_after} seconds...")
        time.sleep(retry_after)
        return call_api(params, model=model)

In [93]:
call_api("What's the world's funniest joke? Has there ever been any scientific analysis?")

ChatCompletion(id='chatcmpl-Aml63WEVNCGbcAXlVLxgyAhh9IvIB', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='The quest for the "world\'s funniest joke" has intrigued researchers and humor enthusiasts alike. In 2002, a team led by Richard Wiseman, a psychologist from the University of Hertfordshire, conducted a large-scale study to identify the funniest joke as part of a project called LaughLab. They collected thousands of jokes from around the world and had people rate them.\n\nOne of the jokes that emerged as a top contender in their findings was:\n\n**“Two hunters are out in the woods when one of them collapses. He isn’t breathing and his eyes are glazed. The other hunter pulls out his phone and calls emergency services. He gasps, ‘My friend is dead! What should I do?’ The operator says, ‘Don’t worry, I can help. First, let’s make sure he’s dead.’ There’s a loud bang, and the guy gets back on the phone. ‘Now what?’”**\n\nThis joke, 

### Create our own code interpreter

In [94]:
from pydantic import create_model
import inspect, json
from inspect import Parameter

In [95]:
def sums(a:int, b:int=1):
    "Adds a + b"
    return a + b

In [96]:
def schema(f):
    kw = {n:(o.annotation, ... if o.default==Parameter.empty else o.default)
          for n,o in inspect.signature(f).parameters.items()}
    s = create_model(f'Input for `{f.__name__}`', **kw).schema()
    return dict(name=f.__name__, description=f.__doc__, parameters=s)

In [97]:
schema(sums)

<ipython-input-96-a39b3851902a>:4: PydanticDeprecatedSince20: The `schema` method is deprecated; use `model_json_schema` instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.10/migration/
  s = create_model(f'Input for `{f.__name__}`', **kw).schema()


{'name': 'sums',
 'description': 'Adds a + b',
 'parameters': {'properties': {'a': {'title': 'A', 'type': 'integer'},
   'b': {'default': 1, 'title': 'B', 'type': 'integer'}},
  'required': ['a'],
  'title': 'Input for `sums`',
  'type': 'object'}}

In [98]:
c = askgpt("Use the `sum` function to solve this: What is 6+3?",
           system = "You must use the `sum` function instead of adding yourself.",
           functions=[schema(sums)])

<ipython-input-96-a39b3851902a>:4: PydanticDeprecatedSince20: The `schema` method is deprecated; use `model_json_schema` instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.10/migration/
  s = create_model(f'Input for `{f.__name__}`', **kw).schema()


In [99]:
m = c.choices[0].message
m

ChatCompletionMessage(content=None, refusal=None, role='assistant', audio=None, function_call=FunctionCall(arguments='{"a":6,"b":3}', name='sums'), tool_calls=None)

In [100]:
k = m.function_call.arguments
print(k)

{"a":6,"b":3}


In [101]:
funcs_ok = {'sums', 'python'}

In [102]:
def call_func(c):
    fc = c.choices[0].message.function_call
    if fc.name not in funcs_ok: return print(f'Not allowed: {fc.name}')
    f = globals()[fc.name]
    return f(**json.loads(fc.arguments))

In [103]:
call_func(c)

9

In [104]:
def run(code):
    tree = ast.parse(code)
    last_node = tree.body[-1] if tree.body else None

    # If the last node is an expression, modify the AST to capture the result
    if isinstance(last_node, ast.Expr):
        tgts = [ast.Name(id='_result', ctx=ast.Store())]
        assign = ast.Assign(targets=tgts, value=last_node.value)
        tree.body[-1] = ast.fix_missing_locations(assign)

    ns = {}
    exec(compile(tree, filename='<ast>', mode='exec'), ns)
    return ns.get('_result', None)

In [105]:
run("""
a=1
b=2
a+b
""")

3

In [106]:
def python(code:str):
    "Return result of executing `code` using python. If execution not permitted, returns `#FAIL#`"
    go = input(f'Proceed with execution?\n```\n{code}\n```\n')
    if go.lower()!='y': return '#FAIL#'
    return run(code)

In [107]:
c = askgpt("What is 12 factorial?",
           system = "Use python for any required computations.",
           functions=[schema(python)])

<ipython-input-96-a39b3851902a>:4: PydanticDeprecatedSince20: The `schema` method is deprecated; use `model_json_schema` instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.10/migration/
  s = create_model(f'Input for `{f.__name__}`', **kw).schema()


In [108]:
#call_func(c)

In [110]:
'''
c = ChatCompletion.create(
    model="gpt-3.5-turbo",
    functions=[schema(python)],
    messages=[{"role": "user", "content": "What is 12 factorial?"},
              {"role": "function", "name": "python", "content": "479001600"}])
'''

'\nc = ChatCompletion.create(\n    model="gpt-3.5-turbo",\n    functions=[schema(python)],\n    messages=[{"role": "user", "content": "What is 12 factorial?"},\n              {"role": "function", "name": "python", "content": "479001600"}])\n'

In [None]:
response(c)

In [None]:
c = askgpt("What is the capital of France?",
           system = "Use python for any required computations.",
           functions=[schema(python)])

In [None]:
response(c)

## PyTorch and Huggingface

### Your GPU options

Free:

- Kaggle (2 GPUs, low RAM)
- Colab

Buy:

- Buy 1-2 NVIDIA 24GB GPUs
    - GTX 3090 used (USD700-USD800), or 4090 new (USD2000)
- Alternatively buy one NVIDIA A6000 with 48GB RAM (but this mightn't be faster than 3090/4090)
- Mac with lots of RAM (much slower than NVIDIA; M2 Ultra is best)

In [111]:
from transformers import AutoModelForCausalLM,AutoTokenizer
import torch

- [HF leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
- [fasteval](https://fasteval.github.io/FastEval/)

In [112]:
mn = "meta-llama/Llama-2-7b-hf"

In [115]:
model = AutoModelForCausalLM.from_pretrained(mn, device_map=0, load_in_8bit=True)

OSError: You are trying to access a gated repo.
Make sure to have access to it at https://huggingface.co/meta-llama/Llama-2-7b-hf.
401 Client Error. (Request ID: Root=1-677c17e8-7821981a0fa17a797fa8479b;c271f79b-3f87-4ac1-a319-04b9d7dfbbf3)

Cannot access gated repo for url https://huggingface.co/meta-llama/Llama-2-7b-hf/resolve/main/config.json.
Access to model meta-llama/Llama-2-7b-hf is restricted. You must have access to it and be authenticated to access it. Please log in.

In [None]:
tokr = AutoTokenizer.from_pretrained(mn)
prompt = "Jeremy Howard is a "
toks = tokr(prompt, return_tensors="pt")

In [None]:
toks

In [None]:
tokr.batch_decode(toks['input_ids'])

In [None]:
%%time
res = model.generate(**toks.to("cuda"), max_new_tokens=15).to('cpu')
res

In [None]:
tokr.batch_decode(res)

In [None]:
model = AutoModelForCausalLM.from_pretrained(mn, device_map=0, torch_dtype=torch.bfloat16)

In [None]:
%%time
res = model.generate(**toks.to("cuda"), max_new_tokens=15).to('cpu')
res

In [None]:
model = AutoModelForCausalLM.from_pretrained('TheBloke/Llama-2-7b-Chat-GPTQ', device_map=0, torch_dtype=torch.float16)

In [None]:
%%time
res = model.generate(**toks.to("cuda"), max_new_tokens=15).to('cpu')
res

In [None]:
mn = 'TheBloke/Llama-2-13B-GPTQ'
model = AutoModelForCausalLM.from_pretrained(mn, device_map=0, torch_dtype=torch.float16)

In [None]:
%%time
res = model.generate(**toks.to("cuda"), max_new_tokens=15).to('cpu')
res

In [None]:
def gen(p, maxlen=15, sample=True):
    toks = tokr(p, return_tensors="pt")
    res = model.generate(**toks.to("cuda"), max_new_tokens=maxlen, do_sample=sample).to('cpu')
    return tokr.batch_decode(res)

In [None]:
gen(prompt, 50)

[StableBeluga-7B](https://huggingface.co/stabilityai/StableBeluga-7B)

In [None]:
mn = "stabilityai/StableBeluga-7B"
model = AutoModelForCausalLM.from_pretrained(mn, device_map=0, torch_dtype=torch.bfloat16)

In [None]:
sb_sys = "### System:\nYou are Stable Beluga, an AI that follows instructions extremely well. Help as much as you can.\n\n"

In [None]:
def mk_prompt(user, syst=sb_sys): return f"{syst}### User: {user}\n\n### Assistant:\n"

In [None]:
ques = "Who is Jeremy Howard?"

In [None]:
gen(mk_prompt(ques), 150)

[OpenOrca/Platypus 2](https://huggingface.co/Open-Orca/OpenOrca-Platypus2-13B)

In [None]:
mn = 'TheBloke/OpenOrca-Platypus2-13B-GPTQ'
model = AutoModelForCausalLM.from_pretrained(mn, device_map=0, torch_dtype=torch.float16)

In [None]:
def mk_oo_prompt(user): return f"### Instruction: {user}\n\n### Response:\n"

In [None]:
gen(mk_oo_prompt(ques), 150)

### Retrieval augmented generation

In [None]:
from wikipediaapi import Wikipedia

In [None]:
wiki = Wikipedia('JeremyHowardBot/0.0', 'en')
jh_page = wiki.page('Jeremy_Howard_(entrepreneur)').text
jh_page = jh_page.split('\nReferences\n')[0]

In [None]:
print(jh_page[:500])

In [None]:
len(jh_page.split())

In [None]:
ques_ctx = f"""Answer the question with the help of the provided context.

## Context

{jh_page}

## Question

{ques}"""

In [None]:
res = gen(mk_prompt(ques_ctx), 300)

In [None]:
print(res[0].split('### Assistant:\n')[1])

In [None]:
from sentence_transformers import SentenceTransformer

In [None]:
emb_model = SentenceTransformer("BAAI/bge-small-en-v1.5", device=0)

In [None]:
jh = jh_page.split('\n\n')[0]
print(jh)

In [None]:
tb_page = wiki.page('Tony_Blair').text.split('\nReferences\n')[0]

In [None]:
tb = tb_page.split('\n\n')[0]
print(tb[:380])

In [None]:
q_emb,jh_emb,tb_emb = emb_model.encode([ques,jh,tb], convert_to_tensor=True)

In [None]:
tb_emb.shape

In [None]:
import torch.nn.functional as F

In [None]:
F.cosine_similarity(q_emb, jh_emb, dim=0)

In [None]:
F.cosine_similarity(q_emb, tb_emb, dim=0)

### Private GPTs

- [Sooo many](https://github.com/h2oai/h2ogpt/blob/main/docs/README_LangChain.md#what-is-h2ogpts-langchain-integration-like)

## Fine tuning

In [None]:
import datasets

[knowrohit07/know_sql](https://huggingface.co/datasets/knowrohit07/know_sql)

In [None]:
ds = datasets.load_dataset('knowrohit07/know_sql', revision='f33425d13f9e8aab1b46fa945326e9356d6d5726')

In [None]:
ds

In [None]:
trn = ds['train']
trn[3]

`accelerate launch -m axolotl.cli.train sql.yml`

In [None]:
tst = dict(**trn[3])
tst['question'] = 'Get the count of competition hosts by theme.'
tst

In [None]:
fmt = """SYSTEM: Use the following contextual information to concisely answer the question.

USER: {}
===
{}
ASSISTANT:"""

In [None]:
def sql_prompt(d): return fmt.format(d["context"], d["question"])

In [None]:
print(sql_prompt(tst))

In [None]:
import torch
from peft import PeftModel
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

In [None]:
ax_model = '/home/jhoward/git/ext/axolotl/qlora-out'

In [None]:
tokr = AutoTokenizer.from_pretrained('meta-llama/Llama-2-7b-hf')

In [None]:
model = AutoModelForCausalLM.from_pretrained('meta-llama/Llama-2-7b-hf',
                                             torch_dtype=torch.bfloat16, device_map=0)
model = PeftModel.from_pretrained(model, ax_model)
model = model.merge_and_unload()
model.save_pretrained('sql-model')

In [None]:
toks = tokr(sql_prompt(tst), return_tensors="pt")

In [None]:
res = model.generate(**toks.to("cuda"), max_new_tokens=250).to('cpu')

In [None]:
print(tokr.batch_decode(res)[0])

## [llama.cpp](https://github.com/abetlen/llama-cpp-python)

[TheBloke/Llama-2-7b-Chat-GGUF](https://huggingface.co/TheBloke/Llama-2-7b-Chat-GGUF)

In [None]:
from llama_cpp import Llama

In [None]:
llm = Llama(model_path="/home/jhoward/git/llamacpp/llama-2-7b-chat.Q4_K_M.gguf")

In [None]:
output = llm("Q: Name the planets in the solar system? A: ", max_tokens=32, stop=["Q:", "\n"], echo=True)

In [None]:
print(output['choices'])

## [MLC](https://mlc.ai/mlc-llm/docs/get_started/try_out.html#get-started)