# Using the Groq API for High-Speed LLM Inference

This Jupyter Notebook provides a comprehensive and robust guide on how to use the Groq API to interface with Large Language Models (LLMs). Groq is known for its exceptionally fast inference speeds, making it a powerful tool for building responsive and real-time AI applications.

We will cover the following topics:
1.  **Prerequisites**: What you need to get started.
2.  **Setup**: Installing the necessary library and configuring your environment.
3.  **Basic Usage**: Making your first API call to the Groq API.
4.  **Advanced Usage**:
    *   Streaming responses for a real-time feel.
    *   Using the asynchronous client for concurrent requests.
    *   Controlling the LLM's output with parameters.
5.  **Error Handling**: How to gracefully handle potential API errors.
6.  **Putting It All Together**: A simple command-line chatbot example.

## 1. Prerequisites

Before you can use the Groq API, you need to have a Groq account and an API key.

1.  **Create a Groq Account**: If you don't have one already, sign up for a free account on the [Groq website](https://groq.com/).
2.  **Generate an API Key**: Navigate to the API Keys section in your GroqCloud dashboard and create a new API key.

**Important**: Your API key is a secret! Do not share it publicly or commit it to version control. The best practice is to store it as an environment variable.

## 2. Setup

First, you need to install the official Groq Python library.

In [None]:
%pip install groq

### Setting up the Environment Variable

To keep your API key secure, we'll use an environment variable named `GROQ_API_KEY`. You can set this in your operating system, or for the purpose of this notebook, we'll use the `os` library to set it for the current session. 

**Note:** For a more permanent solution, consider using a `.env` file and the `python-dotenv` library.

In [None]:
import os
from getpass import getpass

#it's recommended to set the API key as an environment variable for security
if "GROQ_API_KEY" not in os.environ:
    os.environ["GROQ_API_KEY"] = getpass("Enter your Groq API Key: ")

## 3. Basic Usage: Your First API Call

Now that the setup is complete, let's make a simple request to the Groq API to get a chat completion.

In [None]:
from groq import Groq

client = Groq(
    #api_key is read from the GROQ_API_KEY environment variable by default
)

chat_completion = client.chat.completions.create(
    messages=[
        {
            "role": "system",
            "content": "You are a helpful assistant."
        },
        {
            "role": "user",
            "content": "Identify three key problems in the clinical space from a patient's perspective.",
        }
    ],
    model="llama3-8b-8192",
)

print(chat_completion.choices[0].message.content)

### Understanding the Response Object

The response from the `create` method is an object containing useful information. Let's inspect it:

In [None]:
print(chat_completion)

Key fields in the response include:
- `id`: A unique identifier for the completion.
- `object`: The type of object, which is `chat.completion`.
- `created`: A Unix timestamp of when the completion was created.
- `model`: The model used for the completion.
- `choices`: A list of completion choices. You can request more than one, but typically you'll work with the first one.
    - `message`: The message object containing the `role` ('assistant') and the `content` (the actual response).
    - `finish_reason`: The reason the model stopped generating text (e.g., 'stop').
- `usage`: Information about the number of tokens used for the prompt and the completion.

## 4. Advanced Usage

### Streaming Responses

For interactive applications like chatbots, you'll want to display the response as it's being generated. This is achieved through streaming.

In [None]:
stream = client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": "Come up with a list of ideas for a clinically focused hackathon.",
        }
    ],
    model="llama3-8b-8192",
    stream=True,
)

for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="")

### Asynchronous Client

For applications that require high concurrency (e.g., a web server handling multiple user requests at once), you should use the asynchronous client. This allows your program to make multiple API calls without waiting for each one to complete.

In [None]:
import asyncio
from groq import AsyncGroq

async_client = AsyncGroq()

async def main():
    stream = await async_client.chat.completions.create(
        messages=[
            {
                "role": "user",
                "content": "What is the main difference between a synchronous and an asynchronous API call?",
            }
        ],
        model="llama3-8b-8192",
        stream=True,
    )

    async for chunk in stream:
        print(chunk.choices[0].delta.content or "", end="")

# To run the async function in a Jupyter Notebook
await main()

### Controlling the LLM's Output

You can influence the behavior of the LLM by adjusting several parameters in the `create` method:

In [None]:
controlled_completion = client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": "Come up with an idea for a clinically focused hackathon.",
        }
    ],
    model="llama3-70b-8192",
    
    #controls randomness. A lower value makes the model more deterministic.
    temperature=0.9,
    
    # The maximum number of tokens to generate.
    max_tokens=1024,
    
    #the model considers the results of the tokens with top_p probability mass.
    top_p=1,
    
    #sequence where the API will stop generating further tokens.
    stop=None,
)

print(controlled_completion.choices[0].message.content)

## 5. Error Handling

It's crucial to handle potential errors when making API calls, such as network issues, invalid API keys, or invalid requests. The Groq library raises specific exceptions for different types of errors.

In [None]:
from groq import Groq, APIStatusError, APIConnectionError

bad_client = Groq(api_key="invalid-api-key")

try:
    chat_completion = bad_client.chat.completions.create(
        messages=[
            {
                "role": "user",
                "content": "This request will fail.",
            }
        ],
        model="llama3-8b-8192",
    )
except APIStatusError as e:
    print(f"API Status Error: {e.status_code} {e.response}")
except APIConnectionError as e:
    print(f"API Connection Error: {e.__cause__}")
except Exception as e:
    print(f"An unexpected error occurred: {e}")

## 6. Putting It All Together: A Simple Chatbot

Let's create a simple command-line chatbot that maintains a conversation history.

In [None]:
import sys

def chatbot():
    client = Groq()
    
    conversation_history = [
        {
            "role": "system",
            "content": "You are a friendly and helpful chatbot. Keep your answers concise."
        }
    ]
    
    print("Chatbot initialized. Type 'exit' to end the conversation.")
    
    while True:
        try:
            user_input = input("You: ")
            if user_input.lower() == 'exit':
                print("Chatbot shutting down. Goodbye!")
                break
            
            conversation_history.append({"role": "user", "content": user_input})
            
            stream = client.chat.completions.create(
                messages=conversation_history,
                model="llama3-8b-8192",
                stream=True,
            )
            
            print("Groq: ", end="")
            full_response = ""
            for chunk in stream:
                response_chunk = chunk.choices[0].delta.content or ""
                print(response_chunk, end="")
                sys.stdout.flush()
                full_response += response_chunk
            
            print() # for a new line
            
            conversation_history.append({"role": "assistant", "content": full_response})

        except KeyboardInterrupt:
            print("\nChatbot shutting down. Goodbye!")
            break
        except Exception as e:
            print(f"An error occurred: {e}")
            break

#to run the chatbot, uncomment the line below and run the cell.
#chatbot()

## Conclusion and Further Resources

For more detailed information, refer to the official Groq documentation:

*   [Groq API Documentation](https://console.groq.com/docs)
*   [Groq Python GitHub Repository](https://github.com/groq/groq-python)