In [4]:
!pip install -r requirements.txt

Collecting openai (from -r requirements.txt (line 1))
  Using cached openai-0.27.8-py3-none-any.whl (73 kB)
Collecting tqdm (from openai->-r requirements.txt (line 1))
  Using cached tqdm-4.65.0-py3-none-any.whl (77 kB)
Collecting aiohttp (from openai->-r requirements.txt (line 1))
  Using cached aiohttp-3.8.4-cp310-cp310-win_amd64.whl (319 kB)
Collecting multidict<7.0,>=4.5 (from aiohttp->openai->-r requirements.txt (line 1))
  Using cached multidict-6.0.4-cp310-cp310-win_amd64.whl (28 kB)
Collecting async-timeout<5.0,>=4.0.0a3 (from aiohttp->openai->-r requirements.txt (line 1))
  Using cached async_timeout-4.0.2-py3-none-any.whl (5.8 kB)
Collecting yarl<2.0,>=1.0 (from aiohttp->openai->-r requirements.txt (line 1))
  Using cached yarl-1.9.2-cp310-cp310-win_amd64.whl (61 kB)
Collecting frozenlist>=1.1.1 (from aiohttp->openai->-r requirements.txt (line 1))
  Using cached frozenlist-1.3.3-cp310-cp310-win_amd64.whl (33 kB)
Collecting aiosignal>=1.1.2 (from aiohttp->openai->-r requiremen

# **Available Models**

## **GPT-3: Processing and Generating Natural Language**

The GPT-3 model is capable of understanding human language and text that appears to be natural
language. This model family comes in a series of 4 models (A, B, C, D) that are more or less fast and
performant.

- D: **text-davinci-003**. This is the most capable GPT-3 model as it can perform what all the other models can do. In addition,
it offers a higher quality compared to the others. It is the most recent model, as it was trained with
data dating up to June 2021.

- C: **text-curie-001**. The text-curie-001 model is the second most capable GPT-3 model as it supports up to 2048 tokens.
Its advantage is that it is more cost-efficient than text-davinci-003 but still has high accuracy.
It was trained with data dating up to October 2019, so it is slightly less accurate than text-davinci-003.
It could be a good option for translation, complex classification, text analysis, and summaries.

- B: **text-babbage-001**. Same as Curie: 2,048 tokens and data training up to October 2019.
This model is effective for simpler categorizations and semantic classification.

- A: **text-ada-001**. Same as Curie: 2,048 tokens and data training up to October 2019.
This model is very fast and cost-effective, to be preferred for the simplest classifications, text
extraction, and address correction.

## **Codex: Understanding and Generating Computer Code**

OpenAI proposes two Codex models for understanding and generating computer code: code-davinci-
002 and code-cushman-001.
Codex is the model that powers GitHub Copilot. It is proficient in more than a dozen programming
languages including Python, JavaScript, Go, Perl, PHP, Ruby, Swift, TypeScript, SQL and, Shell.
Codex is capable of understanding basic instructions expressed in natural language and carrying
out the requested tasks on behalf of the user.
Two models are available for Codex:

- **code-davinci-002**. The Codex model is the most capable. It excels at translating natural language into code. Not only
does it complete the code, but it also supports the insertion of supplementary elements. It can handle
up to 8,000 tokens and was trained with data dating up to June 2021.

- **code-cushman-001**. Cushman is powerful and fast. Even if Davinci is more powerful when it comes to analyzing complex
tasks, this model has the capability for many code generation tasks.
It is also faster, and more affordable than Davinci.

## **Content Filter**
As its name suggests, this is a filter for sensitive content.
Using this filter you can detect API-generated text that could be sensitive or unsafe. This filter can
classify text into 3 categories:

- safe,
- sensitive,
- unsafe.
  
If you are building an application that will be used by your users, you can use the filter to detect if
the model is returning any inappropriate content.content.ci.o June 2021.

## **Listing all Models**

In [8]:
import os
import openai

def init_api():
    with open(".env") as env:
        for line in env:
            key, value = line.strip().split("=")
            os.environ[key] = value
        
    openai.api_key = os.environ.get("OPENAI_API_KEY")
    openai.organization = os.environ.get("ORG_ID")

init_api()
models = openai.Model.list()
for model in models["data"]:
    print(model["id"])

whisper-1
babbage
davinci
text-davinci-edit-001
babbage-code-search-code
text-similarity-babbage-001
code-davinci-edit-001
text-davinci-001
ada
babbage-code-search-text
babbage-similarity
code-search-babbage-text-001
text-curie-001
code-search-babbage-code-001
text-ada-001
text-similarity-ada-001
curie-instruct-beta
ada-code-search-code
ada-similarity
code-search-ada-text-001
text-search-ada-query-001
davinci-search-document
ada-code-search-text
text-search-ada-doc-001
davinci-instruct-beta
text-similarity-curie-001
code-search-ada-code-001
ada-search-query
text-search-davinci-query-001
curie-search-query
davinci-search-query
babbage-search-document
ada-search-document
text-search-curie-query-001
text-search-babbage-doc-001
curie-search-document
text-search-curie-doc-001
babbage-search-query
text-babbage-001
text-search-davinci-doc-001
text-search-babbage-query-001
curie-similarity
text-embedding-ada-002
gpt-3.5-turbo-0613
curie
gpt-3.5-turbo-16k-0613
text-similarity-davinci-001
text-d

# **Using GPT Text Completions**

Once you have authenticated your application, you can start using the OpenAI API to perform
completions. To do this, you need to use the OpenAI Completion API.
The OpenAI Completion API enables developers to access OpenAI’s datasets and models, making
completions effortless.
Begin by providing the start of a sentence. The model will then predict one or more possible
completions, each with an associated score.

In [25]:
next = openai.Completion.create(model="text-davinci-003",
                                prompt="Qué día de la semena es",
                                max_tokens=7,
                                temperature=0
                               )

print(next)
print(next.choices[0].text)

{
  "id": "cmpl-7UPaG2OlEHiOSyKO2U2yuWOUuQSbp",
  "object": "text_completion",
  "created": 1687482540,
  "model": "text-davinci-003",
  "choices": [
    {
      "text": " hoy?\n\nHoy",
      "index": 0,
      "logprobs": null,
      "finish_reason": "length"
    }
  ],
  "usage": {
    "prompt_tokens": 9,
    "completion_tokens": 7,
    "total_tokens": 16
  }
}
 hoy?

Hoy


This result has an index of 0. The API also returned the “finish_reason”, which was “length” in this case.
The length of the output is determined by the API, based on the “max_tokens” value provided by
the user. In our case, we set this value to 7.

Note: Tokens, by definition, are common sequences of characters in the output text. A good way
to remember is that one token usually means about 4 letters of text for normal English words. This
means that 100 tokens are about the same as 75 words. 

### **Controlling the Output’s Token Count**

Let’s test with a longer example, which means a greater number of tokens (15):
Once upon a time

In [24]:
next = openai.Completion.create(model="text-davinci-003",
                                prompt="Érase una vez",
                                max_tokens=15,
                                temperature=0
                               )

print(next)

{
  "id": "cmpl-7UPZ1iR7KSxmXyx2oPR6mUY49Ho3v",
  "object": "text_completion",
  "created": 1687482463,
  "model": "text-davinci-003",
  "choices": [
    {
      "text": " una princesa que viv\u00eda en un castillo. Era m",
      "index": 0,
      "logprobs": null,
      "finish_reason": "length"
    }
  ],
  "usage": {
    "prompt_tokens": 6,
    "completion_tokens": 15,
    "total_tokens": 21
  }
}


### **Logprobs**

To increase the possibilities, we can use the “logprobs” parameter. For example, setting logprobs to
2 will return two versions of each token.

In [28]:
next = openai.Completion.create(model="text-davinci-003",
                                prompt="Érase una vez",
                                max_tokens=15,
                                temperature=0,
                                logprobs=2
                                )

print(next)

{
  "id": "cmpl-7UPk14Seq5s1RrYTt9iuv8bNAN2SZ",
  "object": "text_completion",
  "created": 1687483145,
  "model": "text-davinci-003",
  "choices": [
    {
      "text": " una princesa que viv\u00eda en un castillo. Era m",
      "index": 0,
      "logprobs": {
        "tokens": [
          " un",
          "a",
          " princes",
          "a",
          " que",
          " v",
          "iv",
          "\u00eda",
          " en",
          " un",
          " cast",
          "illo",
          ".",
          " Era",
          " m"
        ],
        "token_logprobs": [
          -0.17570852,
          -0.5495857,
          -1.0853332,
          -0.001761513,
          -0.8632536,
          -0.10063593,
          -0.00031691935,
          -0.0046087205,
          -0.08829467,
          -0.023945194,
          -1.1286361,
          -7.584048e-06,
          -1.3497478,
          -1.234629,
          -0.538533
        ],
        "top_logprobs": [
          {
            " un": -0.17570

You can see that each token has a probability or score associated with it. The API will return “there”
between “\n” and “there” since -1.1709108 is less than -0.9263134 (top_logprobs).
The API will select “was” instead of “lived” since -0.2422086 is greater than -2.040701. Similarly, this
will be the case for other values.


Each token has two possible values. The API returns the probability of each one and the sentence
formed by the tokens with the highest probability We can increase the size to 5. According to OpenAI, the maximum value for logprobs is 5..

### **Controlling Creativity: The Sampling Temperature**

The next parameter we can customize is the temperature. This can be used to make the model more
creative, but creativity comes with some risks.
For a more creative application, we could use higher temperatures such as 0.2, 0.3, 0.4, 0.5, and 0.6.
The maximum temperature is 2.

In [30]:
next = openai.Completion.create(model="text-davinci-003",
                                prompt="Érase una vez",
                                max_tokens=15,
                                temperature=2
                                )

print(next)

{
  "id": "cmpl-7UPqeTe8ctd5t98VX4dEeOZrdLLd6",
  "object": "text_completion",
  "created": 1687483556,
  "model": "text-davinci-003",
  "choices": [
    {
      "text": " yo gritando Via Prin Mer burbufor min cocatsulfit hamb",
      "index": 0,
      "logprobs": null,
      "finish_reason": "length"
    }
  ],
  "usage": {
    "prompt_tokens": 6,
    "completion_tokens": 15,
    "total_tokens": 21
  }
}


### **Sampling with “top_p”**
Alternatively, we could use the top_p parameter. For example, using 0.5 means only the tokens with
the highest probability mass, comprising 50%, are considered. Using 0.1 means the tokens with the
highest probability mass, comprising 10%, are considered.

It is recommended to either use the top_p parameter or the temperature parameter but not both.
The top_p parameter is also called nucleus sampling or top-p sampling.

In [31]:
next = openai.Completion.create(model="text-davinci-003",
                                prompt="Érase una vez",
                                max_tokens=15,
                                top_p=.9,
                                )

print(next)

{
  "id": "cmpl-7UPuxkGvR0uwPep2IFogNKCYeNRNC",
  "object": "text_completion",
  "created": 1687483823,
  "model": "text-davinci-003",
  "choices": [
    {
      "text": ", hace mucho tiempo, un reino hermoso",
      "index": 0,
      "logprobs": null,
      "finish_reason": "length"
    }
  ],
  "usage": {
    "prompt_tokens": 6,
    "completion_tokens": 15,
    "total_tokens": 21
  }
}


### **Streaming the Results**

Another common parameter we can use in OpenAI is the stream. It’s possible to instruct the API to
return a stream of tokens instead of a block containing all tokens. In this case, the API will return a
generator that yields tokens in the order they were generated.

In [38]:
next = openai.Completion.create(model="text-davinci-003",
                                prompt="Érase una vez",
                                max_tokens=7,
                                stream=True,
                                )

print(type(next))

print(*next, sep='\n')

<class 'generator'>
{
  "id": "cmpl-7UQ2xnH3no6jx8ZWuDhAPUtU0oVCB",
  "object": "text_completion",
  "created": 1687484319,
  "choices": [
    {
      "text": " un",
      "index": 0,
      "logprobs": null,
      "finish_reason": null
    }
  ],
  "model": "text-davinci-003"
}
{
  "id": "cmpl-7UQ2xnH3no6jx8ZWuDhAPUtU0oVCB",
  "object": "text_completion",
  "created": 1687484319,
  "choices": [
    {
      "text": "a",
      "index": 0,
      "logprobs": null,
      "finish_reason": null
    }
  ],
  "model": "text-davinci-003"
}
{
  "id": "cmpl-7UQ2xnH3no6jx8ZWuDhAPUtU0oVCB",
  "object": "text_completion",
  "created": 1687484319,
  "choices": [
    {
      "text": " princes",
      "index": 0,
      "logprobs": null,
      "finish_reason": null
    }
  ],
  "model": "text-davinci-003"
}
{
  "id": "cmpl-7UQ2xnH3no6jx8ZWuDhAPUtU0oVCB",
  "object": "text_completion",
  "created": 1687484319,
  "choices": [
    {
      "text": "a",
      "index": 0,
      "logprobs": null,
      "finish_

In [41]:
for i in next:
    print(i['choices'][0]['text'])

------------------------- Revisar ---------------------------------

## **Examples:** 

### **Extracting keywords**

In [46]:
prompt = "Python es un lenguaje de programación ampliamente utilizado en las aplicaciones web, el\
desarrollo de software, la ciencia de datos y el machine learning (ML). Los desarrolladores utilizan\
Python porque es eficiente y fácil de aprender, además de que se puede ejecutar en muchas plataformas\
diferentes. El software Python se puede descargar gratis, se integra bien a todos los tipos de sistemas\
y aumenta la velocidad del desarrollo."
prompt = prompt + "\n\nPalabras clave:"

In [47]:
tweet = openai.Completion.create(model="text-davinci-002",
                                 prompt=prompt,
                                 temperature=0.5,
                                 max_tokens=300
                                )

print(tweet)

{
  "id": "cmpl-7UQHQMuKb1tZgQjF8V19iujca9whB",
  "object": "text_completion",
  "created": 1687485216,
  "model": "text-davinci-002",
  "choices": [
    {
      "text": " Python, lenguaje de programaci\u00f3n, software",
      "index": 0,
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 148,
    "completion_tokens": 12,
    "total_tokens": 160
  }
}


### **Generating Tweets**

In [52]:
prompt = "Python es un lenguaje de programación ampliamente utilizado en las aplicaciones web, el\
desarrollo de software, la ciencia de datos y el machine learning (ML). Los desarrolladores utilizan\
Python porque es eficiente y fácil de aprender, además de que se puede ejecutar en muchas plataformas\
diferentes. El software Python se puede descargar gratis, se integra bien a todos los tipos de sistemas\
y aumenta la velocidad del desarrollo."
prompt = prompt + "\n\nTweet en español:"

In [53]:
tweet = openai.Completion.create(model="text-davinci-002",
                                 prompt=prompt,
                                 temperature=0.5,
                                 max_tokens=300
                                )

print(tweet)

{
  "id": "cmpl-7UQKwcwDAHBccCdfpArYAn6eFrm3T",
  "object": "text_completion",
  "created": 1687485434,
  "model": "text-davinci-002",
  "choices": [
    {
      "text": "\n\nPython es un lenguaje de programaci\u00f3n muy utilizado en aplicaciones web, software, ciencia de datos y machine learning. Los desarrolladores lo usan porque es eficiente y f\u00e1cil de aprender, adem\u00e1s de que funciona en muchas plataformas diferentes. El software Python se puede descargar gratis, se integra bien a todos los tipos de sistemas y acelera el desarrollo.",
      "index": 0,
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 149,
    "completion_tokens": 120,
    "total_tokens": 269
  }
}
