# Text Generation

The latest models, gpt-4 (and gpt-4 turbo) and gpt-3.5-turbo, are accessed through the **chat completions API** endpoint.

## Chat Completion API

Chat completion APIs are good for both: 
* multi-turn conversations, 
* and single task completions

In [19]:
from openai import OpenAI
client = OpenAI()

response = client.chat.completions.create(
  model="gpt-3.5-turbo",
  messages=[
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Who won the world series in 2020?"},
    {"role": "assistant", "content": "I don't know!"},  #
    {"role": "user", "content": "Where was it played?"},
    {"role": "assistant", "content": "Who cares?"},  #
    {"role": "user", "content": "Where was it played?"}
  ]
)
# Looks like the provided example assistant answers have no impact on the output response

In [21]:
print(response.choices[0].message)

ChatCompletionMessage(content='The 2020 World Series was played at the Globe Life Field in Arlington, Texas.', role='assistant', function_call=None, tool_calls=None)


The system message helps set the behavior of the assistant. For example, you can modify the personality of the assistant or provide specific instructions about how it should behave throughout the conversation. However note that the system message is optional and the model’s behavior without a system message is likely to be similar to using a generic message such as "You are a helpful assistant."

The user messages provide requests or comments for the assistant to respond to. Assistant messages store previous assistant responses, but can also be written by you to give examples of desired behavior.

Including conversation history is important when user instructions refer to prior messages. In the example above, the user’s final question of "Where was it played?" only makes sense in the context of the prior messages about the World Series of 2020. Because the models have no memory of past requests, all relevant information must be supplied as part of the conversation history in each request. If a conversation cannot fit within the model’s token limit, it will need to be shortened in some way.

## JSON mode

You can set response_format to `{ "type": "json_object" }` to enable JSON mode.

* When using JSON mode, **always** ensure to instruct the model to produce JSON via some message in the conversation, for example via your system message. If you don't include an explicit instruction to generate JSON, the model may generate an unending stream of whitespace and the request may run continually until it reaches the token limit. To help ensure you don't forget, the API will throw an error if the string "JSON" does not appear somewhere in the context.
* The JSON in the message the model returns may be partial (i.e. cut off) if finish_reason is length, which indicates the generation exceeded max_tokens or the conversation exceeded the token limit. To guard against this, check finish_reason before parsing the response.
* JSON mode will not guarantee the output matches any specific schema, only that it is valid and parses without errors.

In [22]:
from openai import OpenAI
client = OpenAI()

response = client.chat.completions.create(
  model="gpt-3.5-turbo-1106",
  response_format={ "type": "json_object" },
  messages=[
    {"role": "system", "content": "You are a helpful assistant designed to output JSON."},
    {"role": "user", "content": "Who won the world series in 2020?"}
  ]
)

{
  "winner": "Los Angeles Dodgers"
}


In [23]:
print(response.choices[0].message.content)

{
  "winner": "Los Angeles Dodgers"
}


## Reproducible Outputs

You can use **seed** and also potentially **system_fingerprints**:
https://platform.openai.com/docs/guides/text-generation/reproducible-outputs

## Managing Tokens

For example, the string **"ChatGPT is great!"** is encoded into six tokens: ["Chat", "G", "PT", **" is"**, **" great"**, "!"].

### Token Usage

To see how many tokens are used by an API call, check the usage field in the API response (e.g., response['usage']['total_tokens']).

In [25]:
response.usage

CompletionUsage(completion_tokens=11, prompt_tokens=31, total_tokens=42)

To see how many tokens are in a text string without making an API call, use OpenAI’s [tiktoken](https://github.com/openai/tiktoken) Python library. Example code can be found in the OpenAI Cookbook’s guide on [how to count tokens with tiktoken](https://cookbook.openai.com/examples/how_to_count_tokens_with_tiktoken).

Each message passed to the API consumes the number of tokens in the content, role, and other fields, plus a few extra for behind-the-scenes formatting. This may change slightly in the future.

If a conversation has too many tokens to fit within a model’s maximum limit (e.g., more than 4097 tokens for gpt-3.5-turbo), you will have to truncate, omit, or otherwise shrink your text until it fits. Beware that if a message is removed from the messages input, the model will lose all knowledge of it.

Note that very long conversations are more likely to receive incomplete replies. For example, a gpt-3.5-turbo conversation that is 4090 tokens long will have its reply cut off after just 6 tokens.

## How should I set the temperature parameter?

The temperature can range is from 0 to 2.

# Function Calls

Connect LLMs to external tools.

In an API call, you can **describe** functions and have the model intelligently choose to **output a JSON object containing arguments** to call one or many functions. 

The Chat Completions API **does not call** the function; instead, the model **generates JSON** that you can use to **call the function** in your code.

https://platform.openai.com/docs/guides/function-calling

In [4]:
from openai import OpenAI
import json

client = OpenAI()

# Example dummy function hard coded to return the same weather
# In production, this could be your backend API or an external API
def get_current_weather(location, unit="fahrenheit"):
    """Get the current weather in a given location"""
    if "tokyo" in location.lower():
        return json.dumps({"location": "Tokyo", "temperature": "10", "unit": "celsius"})
    elif "san francisco" in location.lower():
        return json.dumps({"location": "San Francisco", "temperature": "72", "unit": "fahrenheit"})
    elif "paris" in location.lower():
        return json.dumps({"location": "Paris", "temperature": "22", "unit": "celsius"})
    else:
        return json.dumps({"location": location, "temperature": "unknown"})

def run_conversation():
    # Step 1: send the conversation and available functions to the model
    messages = [{"role": "user", "content": "What's the weather like in San Francisco, Tokyo, and Paris?"}]
    tools = [
        {
            "type": "function",
            "function": {
                "name": "get_current_weather",
                "description": "Get the current weather in a given location",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "The city and state, e.g. San Francisco, CA",
                        },
                        "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
                    },
                    "required": ["location"],
                },
            },
        }
    ]
    response = client.chat.completions.create(
        model="gpt-3.5-turbo-1106",
        messages=messages,
        tools=tools,
        tool_choice="auto",  # auto is default, but we'll be explicit
    )
    response_message = response.choices[0].message
    if response_message.content is None:
        response_message.content = ""
    if response_message.function_call is None:
        del response_message.function_call
    tool_calls = response_message.tool_calls
    # Step 2: check if the model wanted to call a function
    if tool_calls:
        # Step 3: call the function
        # Note: the JSON response may not always be valid; be sure to handle errors
        available_functions = {
            "get_current_weather": get_current_weather,
        }  # only one function in this example, but you can have multiple
        messages.append(response_message)  # extend conversation with assistant's reply
        # Step 4: send the info for each function call and function response to the model
        for tool_call in tool_calls:
            function_name = tool_call.function.name
            function_to_call = available_functions[function_name]
            function_args = json.loads(tool_call.function.arguments)
            function_response = function_to_call(
                location=function_args.get("location"),
                unit=function_args.get("unit"),
            )
            messages.append(
                {
                    "tool_call_id": tool_call.id,
                    "role": "tool",
                    "name": function_name,
                    "content": function_response,
                }
            )  # extend conversation with function response
        second_response = client.chat.completions.create(
            model="gpt-3.5-turbo-1106",
            messages=messages,
        )  # get a new response from the model where it can see the function response
        return second_response

In [5]:
print(run_conversation())

ChatCompletion(id='chatcmpl-8JZTTmpyMEYWQkT4Px8OCgKBbT69F', choices=[Choice(finish_reason='stop', index=0, message=ChatCompletionMessage(content="Currently, the weather in San Francisco is 72°F and partly cloudy, in Tokyo it's 10°C and cloudy, and in Paris it's 22°C and sunny.", role='assistant', function_call=None, tool_calls=None))], created=1699675287, model='gpt-3.5-turbo-1106', object='chat.completion', system_fingerprint='fp_eeff13170a', usage=CompletionUsage(completion_tokens=36, prompt_tokens=169, total_tokens=205))


## Embedding

OpenAI’s text embeddings measure the relatedness of text strings. 

It is recommended to use text-embedding-ada-002 for nearly all use cases. It’s better, cheaper, and simpler to use. Read the [blog post announcement](https://openai.com/blog/new-and-improved-embedding-model).

* Search (where results are ranked by relevance to a query string)
* Clustering (where text strings are grouped by similarity)
* Recommendations (where items with related text strings are recommended)
* Anomaly detection (where outliers with little relatedness are identified)
* Diversity measurement (where similarity distributions are analyzed)
* Classification (where text strings are classified by their most similar label)

### Example Code snippets for above Embedding Use Cases

https://platform.openai.com/docs/guides/embeddings/use-cases