<a href="https://colab.research.google.com/github/schumbar/CMPE258/blob/chumbar%2Fassignment_02/assignment_02/CMPE258_assignment02_part_C.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# CMPE 258 Assignment 02 - Part C: Code Interpretter

By Shawn Chumbar

## Assignment Description
In this assignment, we are tasked with replicating a series of experiments conducted with closed and open-source Language Model Models (LLMs) as presented in the references section.

Your primary objective is to replicate the experiments and modify them to perform alternative tasks. Instead of using the provided SQL dataset, you will be utilizing different datasets available at [Hugging Face](https://huggingface.co/knowrohit07) to demonstrate your adaptability and creativity.

This portion is for the Tokenization portion of the assignment.

#### Tasks:
1. Reproduce the experiments showcased in the provided YouTube videos, ensuring that you implement all aspects such as prompt-based generation, fine-tuning, Retrieval-Augmented Generation (RAG), local interpreter, function calling, and llama.cpp CPU inference.

2. Creatively adapt the Colab notebooks from the YouTube videos to highlight innovative use cases and applications of LLMs.

3. If necessary, make the required adjustments to the Colab notebooks to ensure they function correctly. Document these modifications in detail.

4. Produce a demonstration video that showcases the functionality of your modified Colab notebooks and your innovative use cases. Ensure the video is linked in your documentation.

Please note that it is essential to reference the summarized versions of the YouTube videos available at [Summarize.tech](https://www.summarize.tech/) as they provide an overview of the content and serve as a starting point for your experiments.

Your assignment should reflect your ability to replicate and creatively extend the experiments, as well as your capacity to document and present your work effectively. Please feel free to reach out if you encounter any issues with the Colab notebooks that require minor adjustments.

### References Used
1. [A Hackers' Guide to Language Models](https://www.youtube.com/watch?v=jkrNMKz9pWU): This video showcases various experiments using LLMs.
2. [A hacker's guide to open source LLMs - posit::conf(2023)](https://www.youtube.com/watch?v=sYliwvml9Es): This video provides additional insights into LLMs.
3. [Summary of A Hackers' Guide to Language Models](https://www.summarize.tech/www.youtube.com/watch?v=jkrNMKz9pWU): Summarized version video titled "A Hackers' Guide to Language Models".
4. [Summary of A hacker's guide to open source LLMs - posit::conf(2023)](https://www.summarize.tech/www.youtube.com/watch?v=sYliwvml9Es): Summarized version of video titled "Summary of A hacker's guide to open source LLMs - posit::conf(2023)".
5. [Hugging Face Datasets](https://huggingface.co/knowrohit07): Alternative datasets for experiments.

### Setup

In [1]:
!pip install openai==0.28.0

Collecting openai==0.28.0
  Downloading openai-0.28.0-py3-none-any.whl (76 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/76.5 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.5/76.5 kB[0m [31m2.2 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: openai
Successfully installed openai-0.28.0


In [2]:
import tokenize, ast
from io import BytesIO

In [3]:
import openai
api_key = 'enter-your-open-ai-api-key-here'

In [4]:
openai.api_key = api_key

In [5]:
from pydantic import create_model
import inspect, json
from inspect import Parameter
from openai import ChatCompletion,Completion


### The OpenAI API

##### Function Declarations

In [6]:
def response(compl): print(c.choices[0].message.content)

In [7]:
def get_content_only(result):
  return result.choices[0].message.content

In [8]:
def askgpt(user, system=None, model="gpt-3.5-turbo", **kwargs):
    msgs = []
    if system: msgs.append({"role": "system", "content": system})
    msgs.append({"role": "user", "content": user})
    return ChatCompletion.create(model=model, messages=msgs, **kwargs)

In [9]:
def call_api(prompt, model="gpt-3.5-turbo"):
    msgs = [{"role": "user", "content": prompt}]
    try: return client.chat.completions.create(model=model, messages=msgs)
    except openai.error.RateLimitError as e:
        retry_after = int(e.headers.get("retry-after", 60))
        print(f"Rate limit exceeded, waiting for {retry_after} seconds...")
        time.sleep(retry_after)
        return call_api(params, model=model)

In [10]:
def sums(a:int, b:int=1):
    "Adds a + b"
    return a + b

In [11]:
def multiply(a:int, b:int):
    "multiply a * b"
    return a * b

In [12]:
def schema(f):
    kw = {n:(o.annotation, ... if o.default==Parameter.empty else o.default)
          for n,o in inspect.signature(f).parameters.items()}
    s = create_model(f'Input for `{f.__name__}`', **kw).schema()
    return dict(name=f.__name__, description=f.__doc__, parameters=s)

In [13]:
def call_func(c):
    fc = c.choices[0].message.function_call
    if fc.name not in funcs_ok: return print(f'Not allowed: {fc.name}')
    f = globals()[fc.name]
    return f(**json.loads(fc.arguments))

In [14]:
def run(code):
    tree = ast.parse(code)
    last_node = tree.body[-1] if tree.body else None

    # If the last node is an expression, modify the AST to capture the result
    if isinstance(last_node, ast.Expr):
        tgts = [ast.Name(id='_result', ctx=ast.Store())]
        assign = ast.Assign(targets=tgts, value=last_node.value)
        tree.body[-1] = ast.fix_missing_locations(assign)

    ns = {}
    exec(compile(tree, filename='<ast>', mode='exec'), ns)
    return ns.get('_result', None)

In [15]:
def python(code:str):
    return run(code)

#### Code Interpretter

In [16]:
schema(sums)

{'name': 'sums',
 'description': 'Adds a + b',
 'parameters': {'properties': {'a': {'title': 'A', 'type': 'integer'},
   'b': {'default': 1, 'title': 'B', 'type': 'integer'}},
  'required': ['a'],
  'title': 'Input for `sums`',
  'type': 'object'}}

In [17]:
c = askgpt("Use the `sum` function to solve this: What is 6+3?",
           system = "You must use the `sum` function instead of adding yourself.",
           functions=[schema(sums)])

In [18]:
m = c.choices[0].message
m

<OpenAIObject at 0x77fe8c164220> JSON: {
  "role": "assistant",
  "content": null,
  "function_call": {
    "name": "sums",
    "arguments": "{\"a\":6,\"b\":3}"
  }
}

In [19]:
k = m.function_call.arguments
print(k)

{"a":6,"b":3}


In [20]:
funcs_ok = {'sums', 'python'}

In [21]:
call_func(c)

9

In [22]:
run("""
a=1
b=2
a+b
""")

3

In [23]:
c = askgpt("Use the `multiply` function to solve this: What is 6*3?",
           system = "You must use the `multiply` function instead of adding yourself.",
           functions=[schema(multiply)])

In [24]:
m = c.choices[0].message
m

<OpenAIObject at 0x77fe8c253740> JSON: {
  "role": "assistant",
  "content": null,
  "function_call": {
    "name": "multiply",
    "arguments": "{\"a\":6,\"b\":3}"
  }
}

In [25]:
k = m.function_call.arguments
print(k)

{"a":6,"b":3}


In [26]:
funcs_ok = {'sums', 'multiply', 'python'}

In [27]:
call_func(c)

18

In [28]:
run("""
a=3
b=6
a*b
""")

18