In [1]:
!pip install groq
!pip install maxim-py



In [2]:
from groq import Groq

In [3]:
from maxim.logger.groq import instrument_groq
from maxim import Config, Maxim
from maxim import logger
from maxim.logger import LoggerConfig

In [4]:
from google.colab import userdata
import os

MAXIM_API_KEY=userdata.get("MAXIM_API_KEY")
MAXIM_LOG_REPO_ID=userdata.get("MAXIM_REPO_ID")
GROQ_API_KEY = userdata.get('GROQ_API_KEY')

os.environ["MAXIM_API_KEY"] = MAXIM_API_KEY
os.environ["MAXIM_LOG_REPO_ID"] = MAXIM_LOG_REPO_ID
os.environ["GROQ_API_KEY"] = GROQ_API_KEY

## Create a Maxim Logger & Callback

In [5]:
maxim = Maxim(Config(api_key= MAXIM_API_KEY))

logger = maxim.logger(LoggerConfig(id=MAXIM_LOG_REPO_ID))

instrument_groq(logger)

[32m[MaximSDK] Initializing Maxim AI(v3.9.12)[0m


  maxim = Maxim(Config(api_key= MAXIM_API_KEY))
  logger = maxim.logger(LoggerConfig(id=MAXIM_LOG_REPO_ID))


## Simple Inference

In [6]:
client = Groq()

In [7]:
chat_completion = client.chat.completions.create(
    messages=[
        {
            "role": "system",
            "content": "You are a helpful assistant."
        },
        {
            "role": "user",
            "content": "Explain the importance of fast language models",
        }
    ],

    model="llama-3.3-70b-versatile"
)

In [8]:
print(chat_completion.choices[0].message.content)

Fast language models are crucial in various applications, including natural language processing (NLP), machine translation, text summarization, and chatbots. The importance of fast language models can be understood from several perspectives:

1. **Real-time Applications**: Many NLP applications require rapid response times, such as virtual assistants, chatbots, and real-time translation systems. Fast language models enable these applications to process and respond to user input quickly, providing a seamless user experience.
2. **Efficient Processing**: Fast language models can handle large volumes of text data, making them ideal for applications that involve processing vast amounts of unstructured data, such as text classification, sentiment analysis, and information retrieval.
3. **Low Latency**: Fast language models minimize latency, which is critical in applications where timely responses are essential, such as in customer service chatbots, real-time translation systems, and emergen

## Streaming Response

In [9]:
stream = client.chat.completions.create(
    messages=[
        {
            "role": "system",
            "content": "You are a helpful assistant."
        },
        {
            "role": "user",
            "content": "Explain the importance of fast language models",
        }
    ],

    model="llama-3.3-70b-versatile",

    temperature=0.5,

    max_completion_tokens=1024,

    top_p=1,

    stop=None,

    stream=True,
)

for chunk in stream:
    print(chunk.choices[0].delta.content, end="")

NoneFast language models are crucial in today's technological landscape, and their importance can be understood from several perspectives:

1. **Efficient Processing**: Fast language models enable quick processing of vast amounts of text data. This efficiency is vital for applications where real-time or near-real-time responses are necessary, such as in chatbots, virtual assistants, and real-time translation services. Faster models can handle more requests and larger datasets without significant delays, making them indispensable for high-traffic and high-data-volume applications.

2. **Resource Optimization**: Speed in language models often correlates with computational resource efficiency. Faster models typically require less computational power (e.g., GPU resources) to achieve the same or better results than slower models. This optimization is critical for reducing operational costs, especially in cloud-based services where computational resources are billed based on usage. Moreover,

## Async Chat Completion

In [10]:
import asyncio

from groq import AsyncGroq

In [11]:
import asyncio

from groq import AsyncGroq

async def main():
    client = AsyncGroq()

    chat_completion = await client.chat.completions.create(
        messages=[
            {
                "role": "system",
                "content": "You are a helpful assistant."
            },
            # Set a user message for the assistant to respond to.
            {
                "role": "user",
                "content": "Explain the importance of fast language models",
            }
        ],

        model="llama-3.3-70b-versatile",

        temperature=0.5,

        max_completion_tokens=1024,
        top_p=1,

        stop=None,

        stream=False,
    )

    # Print the completion returned by the LLM.
    print(chat_completion.choices[0].message.content)

await main()

Fast language models are crucial in today's technology-driven world, and their importance can be understood from several perspectives:

1. **Efficient Processing**: Fast language models can process and analyze vast amounts of text data quickly, making them essential for applications where speed is critical. This efficiency enables real-time language understanding, sentiment analysis, and text generation, which are vital in various industries such as customer service, social media monitoring, and content creation.

2. **Improved User Experience**: The speed of language models directly impacts the user experience in applications like chatbots, virtual assistants, and language translation software. Faster models respond more quickly to user queries, providing a more seamless and interactive experience. This, in turn, enhances user engagement, satisfaction, and loyalty.

3. **Scalability and Cost-Effectiveness**: Fast language models can handle a large volume of requests without significan

## Async Completion with Streaming

In [12]:
import asyncio

from groq import AsyncGroq


async def main():
    client = AsyncGroq()

    stream = await client.chat.completions.create(
        messages=[
            {
                "role": "system",
                "content": "You are a helpful assistant."
            },
            # Set a user message for the assistant to respond to.
            {
                "role": "user",
                "content": "Explain the importance of fast language models",
            }
        ],

        # The language model which will generate the completion.
        model="llama-3.3-70b-versatile",

        temperature=0.5,

        max_completion_tokens=1024,

        top_p=1,

        stop=None,

        # If set, partial message deltas will be sent.
        stream=True,
    )

    # Print the incremental deltas returned by the LLM.
    async for chunk in stream:
        print(chunk.choices[0].delta.content, end="")

await main()

NoneFast language models are crucial in today's technology landscape, and their importance can be seen in several aspects:

1. **Efficient Processing**: Fast language models can process and analyze large amounts of text data quickly, which is essential for applications that require real-time responses, such as chatbots, virtual assistants, and language translation software.
2. **Improved User Experience**: Faster language models enable more responsive and interactive systems, leading to a better user experience. For example, a fast language model can quickly generate human-like responses to user queries, making the interaction feel more natural and engaging.
3. **Scalability**: Fast language models can handle a large volume of requests and conversations simultaneously, making them ideal for large-scale applications, such as customer service chatbots, language translation platforms, and content generation tools.
4. **Reduced Latency**: Fast language models minimize the time it takes to 