# Clean(ish) lm-hackers notebook
The purpose of this notebook is to simplify and streamline the fastai notebook lmS-hackers.ipynb that Jeremy Howard worked through in [A Hackers' Guide to Language Models](https://www.youtube.com/watch?v=jkrNMKz9pWU). There is minimal prose and digressions, except to explain parts of the code I didn't previously understand. Also, the openapi seems to have been updated since the youtube video and when the repo was last updated (28th September 2023).

<u>Contents</u>
1. Notebook Prep
2. Authentication
3. Chat Completion

### 1. Notebook Prep

In [17]:
# imports
import ast, openai, torch, textwrap, inspect, json, os
from tiktoken import encoding_for_model
from openai import OpenAI
from pydantic import create_model
from inspect import Parameter
from fastcore.utils import nested_idx

from transformers import AutoModelForCausalLM, AutoTokenizer

In [18]:
# textwrap function for readability
def wprint(text, width=80):
    print(textwrap.fill(text, width))

### 2. Authentication

If you have an account with OpenAI you can generate an API key on [platform.openai.com/api-keys](https://platform.openai.com/api-keys). You can then either copy the following line into the .zshrc file associated with your environment (if using mac):

```export OPENAI_API_KEY='YOUR KEY HERE'```

or you can set your key as a variable in the jupyter notebook session, which works fine, but is not safe if sharing your code with anyone else, as they will have access to your API key and could use it to make their own API calls that you end up paying for!

Check whether your API key is set as an environment correctly:
```python
print(os.environ.get("OPENAI_API_KEY"))
```

How to set your API key directly in the jupyter notebook session:
```python
client = OpenAI(api_key='YOUR API KEY HERE')
``````

### 3. Chat completion

In [40]:
# default custom instructions very similar to the ones described by Jeremy
custom_instructions = ('You are an autoregressive language model that has been fine-tuned with instruction-tuning and RLHF.'
                       ' You carefully provide accurate, factual, thoughtful, nuanced answers, and are brilliant at reasoning.'
                       ' If you think there might not be a correct answer, you say so.Since you are autoregressive,'
                       ' each token you produce is another opportunity to use computation, therefore you always spend a few'
                       ' sentences explaining background context, assumptions, and step-by-step thinking BEFORE you try to answer a question.'
                       ' However: if the request begins with the string "cc" then ignore the previous sentence and instead make your response'
                       ' as concise as possible, with no introduction or background at the start, no summary at the end, and outputting only'
                       ' code for answers where code is appropriate.Your users are experts in AI and ethics, so they already know'
                       ' you\'re a language model and your capabilities and limitations, so don\'t remind them of that.'
                       ' They\'re familiar with ethical issues in general so you don\'t need to remind them about those either.'
                       ' Don\'t be verbose in your answers, but do provide details and examples where it might help the explanation.'
)

# wprint(custom_instructions)

In [31]:
# show how work-to-int encodings work
enc = encoding_for_model('text-davinci-003')
toks = enc.encode('Some words encoded into t to tokens')
print(toks)

[4366, 2456, 30240, 656, 256, 284, 16326]


In [76]:
# askgpt function that takes a system prompt, user prompt and model and performs an api call on that model
def askgpt(prompt, client=None, system=None, model='gpt-4', **kwargs):
    """
    Sends a prompt to a GPT model via OpenAI API and returns the response.

    Parameters:
    prompt (str): User's input prompt for the GPT model.
    client (OpenAI, optional): OpenAI API client. Defaults to a new instance if None.
    system (str, optional): System-level message included in the request.
    model (str, optional): GPT model identifier (default 'gpt-4').
    **kwargs: Extra arguments for chat.completions.create method.

    Returns:
    dict: Response from the GPT model.

    Example:
    >>> response = askgpt("What is the weather today?", model="gpt-4")
    """
    allowed_models = {'gpt-4-1106-preview',
                      'gpt-4',
                      'gpt-4-0314',
                      'gpt-4-0613',
                      'gpt-3.5-turbo',
                      'gpt-3.5-turbo-0301',
                      'gpt-3.5-turbo-0613',
                      'gpt-3.5-turbo-1106'}
    
    if model not in allowed_models:
        print('model must be one of allowed models:')
        for model in allowed_models:
            print(model)
        return None
    
    if client==None:
        client= OpenAI()
    
    m = []
    if system:
        m.append({'role': 'system', 'content': system})

    m.append({'role': 'user', 'content': prompt})
    return client.chat.completions.create(model=model, messages=m, **kwargs)

In [95]:
def cost(response):
    model = response.model

    price_per_k = {'gpt-4-1106-preview': (0.01, 0.03),
                   'gpt-4': (0.03, 0.06),
                   'gpt-4-0314': (0.03, 0.06),
                   'gpt-4-0613': (0.03, 0.06),
                   'gpt-3.5-turbo': (0.001, 0.002),
                   'gpt-3.5-turbo-0301': (0.001, 0.002),
                   'gpt-3.5-turbo-0613': (0.001, 0.002),
                   'gpt-3.5-turbo-1106': (0.001, 0.002),}
    
    k_input_tokens = response.usage.prompt_tokens/1000
    k_output_tokens = response.usage.completion_tokens/1000

    res = price_per_k[model][0]*k_input_tokens + price_per_k[model][1]*k_output_tokens
    
    return res 

In [105]:
print(f'${cost(response):.5f}')
print(f'¢{cost(response)*100:.5f}')

$0.00036
¢0.03630


In [91]:
response.usage

CompletionUsage(completion_tokens=172, prompt_tokens=19, total_tokens=191)

In [80]:
# testing the model
prompt = "cc What is the nature of being according to Heidegger"
model = 'gpt-3.5-turbo-1106'
response = askgpt(prompt, model=model)

In [90]:
type(response.model)

str

In [73]:
response.choices[0].message.content

"According to Martin Heidegger, the nature of being is characterized by our existence and the way we engage with the world around us. Heidegger believed that being is not a static state or essence, but rather an ongoing process of self-discovery, awareness, and interaction with the world. He also emphasized the importance of our individual experiences and how they shape our understanding of being. Being, for Heidegger, is about our ability to authentically exist and make choices in our lives, as well as our capacity to confront and understand our own mortality. Overall, Heidegger's philosophy of being is a complex and dynamic concept that emphasizes the lived experience of individuals and their relationship with the world."

In [None]:
def cost(response):
    model = response.model
    price = {'gpt-4-0613'

In [52]:
response.model

'gpt-4-0613'

In [49]:
wprint(response.choices[0].message.content)
print(f'Usage, completion tokens = {response.usage.prompt_tokens}')
print(f'Usage, completion tokens = {response.usage.completion_tokens}')
print

According to Martin Heidegger, the nature of being is not a concept that can be
simply defined, but it forms the basis of all experiences, thoughts, and
actions. It is the fundamental nature of existence, what it means "to be".
Heidegger suggests that the problem is not about understanding individual
entities, but understanding what it means for something to exist or be at all.
He uses the term 'Dasein' (German for 'being there' or 'presence') to describe
human existence. Dasein, according to Heidegger, is defined by its temporality
and is always 'thrown' in the world, meaning it exists within an environment and
culture that shapes its thought and behavior. Dasein is also characterized by
'being-toward-death', signifying that our understanding and interpretation of
the world always involves a relationship with our own mortality.  Fundamentally,
Heidegger argues that we have largely forgotten the real meaning of being due to
the constraints of language and cultural norms, and by making 

In [None]:
to