In [4]:
import openai
openai.api_key = ''

# Playing with OpenAI API
### Goal: to optimize precision, token/cost, and time by choosing and/or improving the best OpenAI model.

- https://platform.openai.com/playground/ is the same as if you use the API with your code
    - Load a preset
- Documentation: https://github.com/openai/openai-cookbook
- Pricing (https://openai.com/api/pricing/):
    - Base models (per 1K tokens)
        - \$0.0004 Ada (the fastest)
        - \$0.0005 Babbage
        - \$0.0020 Curie 
        - \$0.0200 Davinci (the most powerful)
    - Fine-tuned models (training - usage per 1k tokens)
        - \$ 0.0004 - 0.0016 Ada
        - \$ 0.0006 - 0.0024 Babbage
        - \$ 0.0030 - 0.0120 Curie
        - \$ 0.0300 - 0.1200 Davinci

- free trial comes with **$18.00** (for all OpenAI API's)
````
"Because this technology is new, we also want to make sure that rollouts are done responsibly. When you sign up, you’ll be granted an initial spend limit, or quota, and we’ll increase that limit over time as you build a track record with your application. If you need more tokens, you can always request a quota increase."
`````

- Token: use https://platform.openai.com/tokenizer to undestand it! 1 token ~4 characters of text in English! Note blank space counts as a character.
```
"If you have a prompt of 10 tokens and set completion_tokens to 64, the API will generate 64 tokens of text starting from the end of the prompt, for a total of 74 tokens."
```

\prompt: in the openai api, what is the difference between:  completion, fine-tunning, classification, search, and answer. Please provide examples

```
In the OpenAI API, each of these terms refers to a different type of task or function that the API can perform. Here is a brief description of each and some examples:

Completion: Completion refers to the task of generating text to complete a given prompt. [...]
Example: You could use the API to generate a short description of a product based on a prompt like "Describe the features of a new smartphone."

Classification: Classification refers to the task of assigning a category or label to a given input. This is often used in natural language processing to classify text into different categories, such as topics or sentiments.
Example: You could use the API to classify news articles into categories such as sports, politics, or business.

Search: Search involves finding relevant results based on a user's input, often by matching keywords or concepts in the input to similar keywords or concepts in a database or corpus.
Example: You could use the API to provide search results for a given query, such as "find me articles about machine learning".

Answer: Answer refers to the task of generating a direct answer to a specific question, often based on a large corpus of knowledge.
Example: You could use the API to generate an answer to a factual question, such as "What is the capital of France?"

Fine-tuning: Fine-tuning involves training a pre-existing language model on a specific dataset or task, to improve its performance on that task. This is often done by feeding the model examples of the task to be performed, along with the desired output. By adjusting the model's parameters based on the examples, the model can learn to perform the task more accurately.
Example: You could fine-tune a pre-trained language model to perform sentiment analysis on customer reviews of a product, in order to determine whether the reviews are positive, negative, or neutral.
```


### Useful Libraries

In [42]:
import os
import pandas as pd
pd.set_option('display.max_colwidth', None)
import time

## Listing all models available

In [43]:
models = openai.Model.list()
df_models = pd.DataFrame(models['data'])
display (len(df_models))
df_models.head()

66

Unnamed: 0,id,object,created,owned_by,permission,root,parent
0,babbage,model,1649358449,openai,"[{'id': 'modelperm-49FUp5v084tBB49tC4z8LPH5', ...",babbage,
1,code-davinci-002,model,1649880485,openai,"[{'id': 'modelperm-72SytUePpIw8hgCcpCeQCMrD', ...",code-davinci-002,
2,davinci,model,1649359874,openai,"[{'id': 'modelperm-U6ZwlyAd0LyMk4rcMdz33Yc3', ...",davinci,
3,text-embedding-ada-002,model,1671217299,openai-internal,"[{'id': 'modelperm-Ad4J5NsqPbNJy0CMGNezXaeo', ...",text-embedding-ada-002,
4,babbage-code-search-code,model,1651172509,openai-dev,"[{'id': 'modelperm-4qRnA3Hj8HIJbgo0cGbcmErn', ...",babbage-code-search-code,


**NOTE:** For a comparison of models check here: https://platform.openai.com/docs/models

## Function to evaluate Tokens/Cost and Time of an OpenAi (Completion) API call 

In [101]:
def openai_evaluation (model, prompt):
    st = time.time()
    response = openai.Completion.create(
      model=model,
      prompt=prompt,
      temperature=0,
      max_tokens=256,
      top_p=1.0,
      frequency_penalty=0.0,
      presence_penalty=0.0
    )
    et = time.time()
    delta_time = et - st

    df = pd.json_normalize(response)
    df = pd.concat([df, pd.json_normalize(df['choices'][0])], axis=1)
    df['time'] = delta_time
    df['prompt'] = prompt

    return df

## Evaluating multiple models

In [117]:
prompt="""Without comments, deduce the specific technique ID related to MITRE ATT&CK framework in the sentence: 'brute force via RDP'"""

models = ['davinci',
          'text-davinci-001',
          'text-davinci-002',
          'text-davinci-003',
          'text-ada-001',
          'text-babbage-001',
          'text-curie-001']

df = pd.DataFrame()
for model in models:
    df = pd.concat([df, openai_evaluation (model, prompt)])
    
df[['model','prompt','text','usage.total_tokens','time']]

Unnamed: 0,model,prompt,text,usage.total_tokens,time
0,davinci,"Without comments, deduce the specific technique ID related to MITRE ATT&CK framework in the sentence: 'brute force via RDP'",\n\nAnswer:\n\nThe specific technique ID is: T1037\n\nQuestion:\n\nWhat is the specific technique ID related to MITRE ATT&CK framework in the sentence: 'brute force via RDP'\n\nAnswer:\n\nThe specific technique ID is: T1037\n\nQuestion:\n\nWhat is the specific technique ID related to MITRE ATT&CK framework in the sentence: 'brute force via RDP'\n\nAnswer:\n\nThe specific technique ID is: T1037\n\nQuestion:\n\nWhat is the specific technique ID related to MITRE ATT&CK framework in the sentence: 'brute force via RDP'\n\nAnswer:\n\nThe specific technique ID is: T1037\n\nQuestion:\n\nWhat is the specific technique ID related to MITRE ATT&CK framework in the sentence: 'brute force via RDP'\n\nAnswer:\n\nThe specific technique ID is: T1037\n\nQuestion:\n\nWhat is the specific technique ID related to MITRE ATT&CK framework in the sentence: 'brute force via RDP'\n\nAnswer:\n\nThe specific technique ID is: T1037\n,286,9.263021
0,text-davinci-001,"Without comments, deduce the specific technique ID related to MITRE ATT&CK framework in the sentence: 'brute force via RDP'","\n\nThe specific technique ID related to MITRE ATT&CK framework in the sentence is "" brute force via RDP"".",56,1.049218
0,text-davinci-002,"Without comments, deduce the specific technique ID related to MITRE ATT&CK framework in the sentence: 'brute force via RDP'",\n\nT1059,35,0.80301
0,text-davinci-003,"Without comments, deduce the specific technique ID related to MITRE ATT&CK framework in the sentence: 'brute force via RDP'",\n\nT1110 - Brute Force,39,1.696294
0,text-ada-001,"Without comments, deduce the specific technique ID related to MITRE ATT&CK framework in the sentence: 'brute force via RDP'","\n\nThe MITRE ATT&CK framework is used to bruteforce a network interface card (NIC) . The framework allows for a series of operations such as man-in-the-middle (MIM), data reuse, and man-in-the-middle (MIM) protection.",93,0.826171
0,text-babbage-001,"Without comments, deduce the specific technique ID related to MITRE ATT&CK framework in the sentence: 'brute force via RDP'","\n\nThe technique ID related to MITRE ATT&CK framework is ""brute force via RDP.""",53,0.522493
0,text-curie-001,"Without comments, deduce the specific technique ID related to MITRE ATT&CK framework in the sentence: 'brute force via RDP'","\n\nThe technique ID related to MITRE ATT&CK framework in the sentence is ""brute force via RDP.""",56,0.849375
