# Creating basic chat completions

Chat models are a class of GPT models that take a series of messages as input, and return a model-generated message as output. All newer GPT models released by OpenAI are chat models, including GPT-3.5-Turbo and GPT-4 (which are used in ChatGPT). In this notebook, we will show how to use these models to generate chat completions.

## 1. Setting up the environment

First we need to install the `openai` library in your environment. You can do this by running the `pip` command below. Typically, you would do this in a terminal or command prompt.

In [None]:
# if needed, install and/or upgrade to the latest version of the OpenAI Python library
%pip install --upgrade "openai<1.0.0"

Next we need to import the `openai` library so we can use it in our code.

In [2]:
# import the OpenAI Python library for calling the OpenAI API
import openai

Now we can use the `openai` library to configure the API endpoint and API key that we will use to access the Azure OpenAI API.<br>
Head over to [ai.cosmoconsult.com/me](https://ai.cosmoconsult.com/me) to get your personal ApiKey for this training series and paste it below.

In [3]:
# setup parameters for using an Azure OpenAI endpoint
openai.api_type = "azure"
openai.api_key = "<your_api_key>"                           # The API key for your Azure OpenAI resource -> Get yours from https://ai.cosmoconsult.com/me
openai.api_base = "https://apis.ai.cosmoconsult.com"        # The base URL for your Azure OpenAI resource. e.g. "https://<your resource name>.openai.azure.com"
openai.api_version = "2023-07-01-preview"                   # The API version for your Azure OpenAI resource. e.g. "2023-07-01-preview"

## 2. Using an Azure OpenAI resource to generate chat completions

After we have configured the API endpoint and API key that we want to use, we can now use the `openai.ChatCompletion` class to generate content with GPT-3.5-Turbo or GPT-4. This class simply acts as a wrapper around the OpenAI API, and allows us to more easily post requests to the API and parse the responses. But you can of course also [post requests to the API directly](https://learn.microsoft.com/en-us/azure/ai-services/openai/reference#chat-completions), without using the `openai` library.

The main input is the `messages` parameter. `messages` must be a list of message objects, where each object has a role (either `system`, `user`, or `assistant`) and content.
- The first message is typically a `system` message, which contains global instructions and context for the model. You can use this to tell the model how to behave, how to interpret the user's messages or how to structure the response.
- `user` messages typically contain user input or other information that the model should use to generate a response.
- The models previous responses are stored as messages with the role `assistant`.

You also have to specify the `deployment_id` parameter (or, when using the native OpenAI API, the `model` parameter). This is the ID of the model you want to use (e.g. `gpt-4`).

Other than that, you can specify multiple optional parameters to control the behavior of the model. For example, you can specify the `max_tokens` parameter to control the length of the response, or the `temperature` parameter to control the creativity of the response. You can find a full list of parameters in the [Azure OpenAI documentation](https://learn.microsoft.com/en-us/azure/cognitive-services/openai/reference#chat-completions) (or in the [OpenAI API documentation](https://platform.openai.com/docs/api-reference/chat)).

In [5]:
# a sample API call for chat completions looks as follows:
try:
    response = openai.ChatCompletion.create(
        deployment_id = "gpt-35-turbo", # The deployment ID for your Azure OpenAI resource
        messages = [
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Who was Marie Curie?"},
        ],
        temperature = 0.2,              # Number between 0.0 and 2.0. Higher values will make the output more random, while lower values will make it more focused and deterministic. It is recommended altering this or top_p but not both.
        n = 1,                          # How many completion choices to generate on each call.
        stop = None,                    # One or more sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
        max_tokens = 150,               # The maximum number of tokens to generate.
        stream = False,                 # Whether to stream back partial progress. If set, tokens will be returned one-by-one as they are generated, using multiple API calls if necessary.
        # top_p = 1,                    # Number between 0.0 and 1.0. Controls diversity via nucleus sampling: 0.5 means half of all likelihood-weighted options are considered. It is recommended altering this or temperature but not both.
        # presence_penalty = 0.0,       # Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
        # frequency_penalty = 0.0,      # Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
    )
    # print the response
    for choice in response['choices']:
        print(choice['message']['content'])
except Exception as e:
    print(e)

Marie Curie was a Polish-born physicist and chemist who conducted pioneering research on radioactivity. She was the first woman to win a Nobel Prize, and the first person to win two Nobel Prizes in different fields (physics and chemistry). Her discoveries led to the development of important medical technologies, including X-rays and radiation therapy for cancer. Curie was also a trailblazer for women in science, and her achievements continue to inspire generations of scientists around the world.


As you can see, the `response` we get from the API contains an array called `choices`, which in turn contains a `message` object with the content generated by the GPT model. Depending on the `n` parameter you set in your API call, you can get multiple `message` objects in the `choices` array. If you set `n=1`, the generated message will be the first (and only) element in the `choices` array - you can simply access it with `response.['choices'][0]['message']['content']`.

The response also contains other information, such as the number of tokens processed or, if set up in the Azure OpenAI resource, information about potentially harmful content in the user input and in the content generated by the model. Have a look at the complete response to see what other information you can get from the API:

In [6]:
print(response)

{
  "id": "chatcmpl-7wTE1zaqco7ImqLc8eMN1FgwMmvnF",
  "object": "chat.completion",
  "created": 1694169721,
  "model": "gpt-35-turbo",
  "prompt_filter_results": [
    {
      "prompt_index": 0,
      "content_filter_results": {
        "hate": {
          "filtered": false,
          "severity": "safe"
        },
        "self_harm": {
          "filtered": false,
          "severity": "safe"
        },
        "sexual": {
          "filtered": false,
          "severity": "safe"
        },
        "violence": {
          "filtered": false,
          "severity": "safe"
        }
      }
    }
  ],
  "choices": [
    {
      "index": 0,
      "finish_reason": "stop",
      "message": {
        "role": "assistant",
        "content": "Marie Curie was a Polish-born physicist and chemist who conducted pioneering research on radioactivity. She was the first woman to win a Nobel Prize, and the first person to win two Nobel Prizes in different fields (physics and chemistry). Her discoveries 

Now let's get creative and generate some chat completions! Experiment with the parameters and see how they affect the output of the model. Try out different system and user messages, and see how the model responds to them. E.g., can you craft a system message that always creates a translation of the user's message, without any other content and no matter what the user says or asks? Or can you create a system message that always generates a response that contains a specific word or phrase?

Happy experimenting!