<a href="https://colab.research.google.com/github/nafis-neehal/new-tech-everyweek/blob/main/1_Asyncio_with_Backoff_Tenacity.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Learn Asyncio, and Backoff/Tenacity
Asyncio - to do asynchronous programming, helpful when multiple independent LLM API calls and no need for sequential calls  
Backoff/Tenacity - two libraries, to write decorator functions to deal with retrying upon failure

In [4]:
!pip3 install asyncio --quiet
!pip3 install backoff --quiet
!pip3 install openai --quiet

In [37]:
import asyncio
import openai
import backoff
from google.colab import userdata

### Use of asyncio.gather()
Helps with running multiple asynchronous tasks and collect their results once all tasks are finished

In [28]:
import numpy as np
import nest_asyncio
nest_asyncio.apply()

async def fetch_data(url):
    # Simulate fetching data from a URL
    # (In a real scenario, this would involve asynchronous I/O operations)
    return f"Data from {url}"

async def main():
    tasks = ['url'+str(i) for i in range(100)]
    results = [fetch_data(task) for task in tasks]
    final_results = await asyncio.gather(*results)
    print(final_results)

asyncio.run(main())
#await main()

['Data from url0', 'Data from url1', 'Data from url2', 'Data from url3', 'Data from url4', 'Data from url5', 'Data from url6', 'Data from url7', 'Data from url8', 'Data from url9', 'Data from url10', 'Data from url11', 'Data from url12', 'Data from url13', 'Data from url14', 'Data from url15', 'Data from url16', 'Data from url17', 'Data from url18', 'Data from url19', 'Data from url20', 'Data from url21', 'Data from url22', 'Data from url23', 'Data from url24', 'Data from url25', 'Data from url26', 'Data from url27', 'Data from url28', 'Data from url29', 'Data from url30', 'Data from url31', 'Data from url32', 'Data from url33', 'Data from url34', 'Data from url35', 'Data from url36', 'Data from url37', 'Data from url38', 'Data from url39', 'Data from url40', 'Data from url41', 'Data from url42', 'Data from url43', 'Data from url44', 'Data from url45', 'Data from url46', 'Data from url47', 'Data from url48', 'Data from url49', 'Data from url50', 'Data from url51', 'Data from url52', 'D

### Use of asyncio.completed()
Handy for handling tasks as they are completed, allowing you to process results as soon as they are available.

In [34]:
import numpy as np
import nest_asyncio
nest_asyncio.apply()

async def fetch_data(url):
    # Simulate fetching data from a URL
    # (In a real scenario, this would involve asynchronous I/O operations)
    return f"Data from {url}"

async def main():
    tasks = ['url '+str(i) for i in range(10)]
    results = [fetch_data(task) for task in tasks]
    for completed_task in asyncio.as_completed(results):
      result = await completed_task
      print(result)

asyncio.run(main())
#await main()

Data from url 6
Data from url 3
Data from url 7
Data from url 5
Data from url 8
Data from url 4
Data from url 9
Data from url 0
Data from url 2
Data from url 1


##OpenAI Chat Completion with Async Code with Backoff and asyncio.gather()

The order of the questions and answers are same. Gather() collects all the results first then processes it.

In [45]:
@backoff.on_exception(backoff.expo, openai.RateLimitError)
async def make_api_call_to_gpt(prompt, model="gpt-3.5-turbo"):
    messages = [{"role": "user", "content": prompt}]
    response = await client.chat.completions.create(
        model=model,
        messages=messages,
        temperature=0
    )
    return response.choices[0].message.content

In [47]:
async def main():
  prompts = ["Who is the first president of USA?", "Describe in 2 lines how does Async programming work?", "What is your greatest fear?", "When is 0=1?", "How many 'r' is there in the word 'Strawberry'?"]
  results = [make_api_call_to_gpt(prompt) for prompt in prompts]
  final_results = await asyncio.gather(*results)
  for result in final_results:
    print(result)

# asyncio.run(main()) <- an event loop is already running by running this jupyter cell, so this will throw an error
await main() # <- use this instead to run the main function

George Washington
Async programming allows tasks to run concurrently, without blocking the main thread. It uses callbacks or promises to handle the completion of tasks asynchronously.
As an AI, I do not have personal fears or emotions.
In mathematics, the equation 0=1 is never true. This is because 0 and 1 are distinct numbers with different values, and they cannot be equal to each other.
There are 2 'r's in the word 'Strawberry'.


## OpenAI Chat Completion

You will find that the order of the questions and the order of the answers printed are different - proving the asynchronous calls. As_completed() processes the results as soon as they are complete.

In [44]:
import asyncio
import openai
import nest_asyncio
nest_asyncio.apply()

async def chat_completion(client, prompt, **kwargs):
  messages = [{"role":"user", "content": prompt}]
  return await client.chat.completions.create(messages=messages, **kwargs)

async def run_chat_completions(client, prompts, **kwargs):
  calls = [chat_completion(client, prompt, **kwargs) for prompt in prompts]
  for completed_task in asyncio.as_completed(calls): # each completed task is an API call
    response = await completed_task
    print(response.choices[0].message.content)

async def main(prompts, **kwargs):
  #ensures that the client is automatically closed when it’s no longer needed, preventing resource leaks from occurring
  async with openai.AsyncOpenAI(api_key = userdata.get('OPENAI_KEY_BS1')) as client:
    await run_chat_completions(client, prompts, **kwargs)

prompts = ["Who is the first president of USA?", "Describe in 2 lines how does Async programming work?", "What is your greatest fear?", "When is 0=1?", "How many 'r' is there in the word 'Strawberry'?"]

asyncio.run(main(prompts, model="gpt-3.5-turbo", temperature=0))

George Washington
There are two 'r's in the word "Strawberry".
As an AI, I do not experience fear or emotions.
In mathematics, 0 does not equal 1. The statement "0=1" is always false in standard arithmetic.
Async programming allows tasks to run concurrently, without blocking the main thread. It uses callbacks or promises to handle the completion of tasks asynchronously.


## Batching Requests
This code will run the 6 requests in two batches, meaning it will wait for the first 2 prompts to finish running the next 2.

In [54]:
def generate_batches(calls, batch_size):
    for i in range(0, len(calls), batch_size):
        yield calls[i : i + batch_size]

In [60]:
async def chat_completion(client, prompt, **kwargs):
  messages = [{"role":"user", "content": prompt}]
  return await client.chat.completions.create(messages=messages, **kwargs)

async def run_chat_completions(client, prompts: str, **kwargs):
    calls = [chat_completion(client, prompt, **kwargs) for prompt in prompts]
    for idx, batch in enumerate(generate_batches(calls, batch_size = 3)):
        print(f"Batch No {idx+1}")
        print("----------------")
        for completed_task in asyncio.as_completed(batch):
            response = await completed_task
            print(response.choices[0].message.content)

async def main(prompts, **kwargs):
    async with openai.AsyncClient(api_key = userdata.get('OPENAI_KEY_BS1'), timeout=60) as client:
        await run_chat_completions(client, prompts, **kwargs)

prompts = ["Who is the first president of USA?", "Who first landed on the Moon?",
           "Describe in 2 lines how does Async programming work?", "What is your greatest fear?",
           "When is 0=1?", "How many 'r' is there in the word 'Strawberry'?"]

asyncio.run(main(prompts, model="gpt-3.5-turbo"))

Batch No 1
----------------
George Washington
The first person to land on the Moon was American astronaut Neil Armstrong on July 20, 1969, as part of the Apollo 11 mission. He was followed shortly after by astronaut Edwin "Buzz" Aldrin.
Async programming allows code to run non-sequentially, performing tasks concurrently without blocking the main execution thread. It uses callbacks, promises, or async/await keywords to handle asynchronous operations and ensure smooth, responsive application performance.
Batch No 2
----------------
There are three 'r's in the word "Strawberry."
As an AI, I do not possess emotions or fear.
In mathematics, the statement "0=1" is never true. This is because 0 and 1 are two distinct numbers with different values and properties. In mathematical equations and expressions, it is not possible for 0 to equal 1.


## Tenacity decorators for error handling and retrying with constraints

This code will re-run any requests that raise one of the below-mentioned errors (see openai errors for more detail) up to 3 times, increasing the waiting time for each iteration exponentially starting from 2sec up to 5sec.

In [62]:
import asyncio
from openai import AsyncOpenAI, APITimeoutError, InternalServerError, RateLimitError, UnprocessableEntityError
from tenacity import retry, retry_if_exception_type, stop_after_attempt, wait_exponential

@retry(
    stop=stop_after_attempt(3),
    wait=wait_exponential(min=2, max=5),
    retry=retry_if_exception_type(
        (
            APITimeoutError,
            InternalServerError,
            RateLimitError,
            UnprocessableEntityError
        )
    )
)
async def chat_completion(client, prompt, **kwargs):
  messages = [{"role":"user", "content": prompt}]
  return await client.chat.completions.create(messages=messages, **kwargs)

async def run_chat_completions(client, prompts, **kwargs):
  calls = [chat_completion(client, prompt, **kwargs) for prompt in prompts]
  for completed_task in asyncio.as_completed(calls): # each completed task is an API call
    response = await completed_task
    print(response.choices[0].message.content)

async def main(prompts, **kwargs):
  #ensures that the client is automatically closed when it’s no longer needed, preventing resource leaks from occurring
  async with openai.AsyncOpenAI(api_key = userdata.get('OPENAI_KEY_BS1')) as client:
    await run_chat_completions(client, prompts, **kwargs)

prompts = ["Who is the first president of USA?", "Describe in 2 lines how does Async programming work?", "What is your greatest fear?", "When is 0=1?", "How many 'r' is there in the word 'Strawberry'?"]

asyncio.run(main(prompts, model="gpt-3.5-turbo", temperature=0))

George Washington
As an AI, I do not have personal fears or emotions.
There are two 'r's in the word "Strawberry".
Async programming allows tasks to run concurrently, without blocking the main thread. It uses callbacks or promises to handle the completion of tasks asynchronously.
In mathematics, the equation 0=1 is never true. This is because 0 and 1 are distinct numbers with different values, and they cannot be equal to each other.


References:


*   https://diverger.medium.com/building-asynchronous-llm-applications-in-python-f775da7b15d1

