## Challenges when calling OPENAI API
**Error Handling**
- Displaying user-friendly error messages
- Alternatives for when the service isunavailable

**Moderation and Safety**
- Control unwanted inputs
- Minimizing the risk of data leaks

**Testing and Validation**
- Checking for responses that are out oftopic
- Testing for inconsistent behavior

**Communication with External Systems**
- Calling external functions and APIs
- Optimizing response times

In [1]:
from openai import OpenAI
import json
client = OpenAI()

In [4]:
prompt= "What challenges do you face when calling the OpenAI API in a production environment?"
repsonse = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": prompt}],
    # response_format={"type": "json", "properties": {"content": {"type": "string"}}}
)

print(repsonse.choices[0].message.content)  # extract message

When calling the OpenAI API in a production environment, several challenges may arise, including:

1. **Rate Limiting**: The OpenAI API has rate limits that restrict the number of requests you can make in a given time frame. Managing these limits is crucial to avoid interruptions in service and to ensure optimal performance.

2. **Cost Management**: Using the API incurs costs based on usage. Monitoring and optimizing usage to prevent unexpected expenses can be challenging, especially for applications with variable traffic.

3. **Latency**: Network latency can affect the responsiveness of your application. Ensuring quick API responses, especially in real-time applications, may require strategies such as caching frequent queries or optimizing the number of calls made.

4. **Error Handling**: The API may return errors for various reasons (e.g., exceeding rate limits, invalid inputs). Implementing robust error handling and retry mechanisms is essential to maintain a smooth user experience.

### Common errors
**Connection errors**
- Generally due to connection issues on either the user's or the service's side
- Examples: InternalServerError, APIConnectionError, APITimeoutError
- Solution: 
    - Checking your connection configuration, 
    - Reaching out to support if that fails

**Resource limit errors**
- Generally due limits on the frequency of requests or the amount of text passed
- Examples: ConflictError, RateLimitError
- Potential solution:
    - Checking limit restrictions
    - Ensure requests are within limits

**Authentication errors**
- Use invalid API key

**Bad request errors**
- Pass invalid role

### Handle exceptions
- To solve the errors above, your code need to handle exceptions


In [6]:
import openai


try:     
	response = client.chat.completions.create(
		model="gpt-4o-mini",
		messages=[{"role": "user", "content": "List five data science professions."}])
except openai.AuthenticationError as e:
	print(f"OpenAI API failed to authenticate: {e}")
	pass
except openai.RateLimitError as e:
	print(f"OpenAI API request exceeded rate limit: {e}")
	pass
except Exception as e:
	print(f"Unable to generate a response. Exception: {e}")
	pass

# Batching
- Combine multiple messages into a request to avoid rate limits due to calling multiple requests in a time unit like one minute.
- Avoiding rate limits
    - Retry: Short wait between requests
    - Batching: Processing multiple messages in one request
    - Reducing tokens: Quantifying and cutting down the number of tokens


## Retrying with tenacity

In [11]:
# Retrying
from tenacity import (retry, stop_after_attempt, wait_random_exponential)

@retry(wait=wait_random_exponential(min=1, max=60), stop=stop_after_attempt(6))
def get_response(model, message):
    response = client.chat.completions.create(
        model=model,
        messages=[message], 
        response_format={"type": "json_object"})
    return response.choices[0].message.content


prompt= "What challenges do you face when calling the OpenAI API in a production environment? Respond in JSON format."
response = get_response(
    model="gpt-4o-mini",
    message={"role": "user", "content": prompt}
)
print(response)  # extract message




{
  "challenges": {
    "1": {
      "challenge": "Rate Limiting",
      "description": "OpenAI API has usage limits that can affect the ability to handle high traffic or request bursts, requiring careful management of request flows."
    },
    "2": {
      "challenge": "Cost Management",
      "description": "Frequent or intensive use of the API can lead to significant costs, necessitating budget tracking and cost optimization strategies."
    },
    "3": {
      "challenge": "Latency",
      "description": "Response times can vary, impacting user experience, especially in real-time applications that demand low latency."
    },
    "4": {
      "challenge": "Error Handling",
      "description": "Handling various types of errors, including timeouts, unexpected responses, or service downtime, to maintain application stability."
    },
    "5": {
      "challenge": "Security and Privacy",
      "description": "Ensuring secure handling of user data and compliance with privacy regulation

## Batching

In [13]:
countries = ["United States", "Ireland", "India"]
message=[{"role": "system","content": """You are given a series of countries and are asked to return the country and capital city. Provide each of the questions with an answer in the response as separate content."""}]

[message.append({"role": "user", "content": i }) for i in countries]
response = client.chat.completions.create(model="gpt-4o-mini", messages=message)
print(response.choices[0].message.content)



Country: United States  
Capital City: Washington, D.C.

---

Country: Ireland  
Capital City: Dublin

---

Country: India  
Capital City: New Delhi


## Reducing tokens
- Use `tiktoken` to count the number of tokens generated by a prompt


In [None]:
import tiktoken

encoding = tiktoken.encoding_for_model("gpt-4o-mini")
prompt = "Tokens can be full words, or groups of characters commonly grouped together: tokenization."
num_tokens = len(encoding.encode(prompt))
print("Number of tokens in prompt:", num_tokens)
