<a href="https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/docs/examples/llm/groq.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Groq

Welcome to Groq! 🚀 At Groq, we've developed the world's first Language Processing Unit™, or LPU. The Groq LPU has a deterministic, single core streaming architecture that sets the standard for GenAI inference speed with predictable and repeatable performance for any given workload.

Beyond the architecture, our software is designed to empower developers like you with the tools you need to create innovative, powerful AI applications. With Groq as your engine, you can:

* Achieve uncompromised low latency and performance for real-time AI and HPC inferences 🔥
* Know the exact performance and compute time for any given workload 🔮
* Take advantage of our cutting-edge technology to stay ahead of the competition 💪

Want more Groq? Check out our [website](https://groq.com) for more resources and join our [Discord community](https://discord.gg/JvNsBDKeCG) to connect with our developers!

## Setup

If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙.

In [None]:
% pip install llama-index-llms-groq

In [1]:
!pip install llama-index

Collecting llama-index
  Downloading llama_index-0.11.20-py3-none-any.whl.metadata (11 kB)
Collecting llama-index-agent-openai<0.4.0,>=0.3.4 (from llama-index)
  Downloading llama_index_agent_openai-0.3.4-py3-none-any.whl.metadata (728 bytes)
Collecting llama-index-cli<0.4.0,>=0.3.1 (from llama-index)
  Downloading llama_index_cli-0.3.1-py3-none-any.whl.metadata (1.5 kB)
Collecting llama-index-core<0.12.0,>=0.11.20 (from llama-index)
  Downloading llama_index_core-0.11.20-py3-none-any.whl.metadata (2.4 kB)
Collecting llama-index-embeddings-openai<0.3.0,>=0.2.4 (from llama-index)
  Downloading llama_index_embeddings_openai-0.2.5-py3-none-any.whl.metadata (686 bytes)
Collecting llama-index-indices-managed-llama-cloud>=0.3.0 (from llama-index)
  Downloading llama_index_indices_managed_llama_cloud-0.4.0-py3-none-any.whl.metadata (3.8 kB)
Collecting llama-index-legacy<0.10.0,>=0.9.48 (from llama-index)
  Downloading llama_index_legacy-0.9.48.post3-py3-none-any.whl.metadata (8.5 kB)
Collecti

In [3]:
!pip install llama-index-llms-groq # Install the missing package that provides the Groq integration for llama_index.
from llama_index.llms.groq import Groq # Imp

Collecting llama-index-llms-groq
  Downloading llama_index_llms_groq-0.2.0-py3-none-any.whl.metadata (2.3 kB)
Collecting llama-index-llms-openai-like<0.3.0,>=0.2.0 (from llama-index-llms-groq)
  Downloading llama_index_llms_openai_like-0.2.0-py3-none-any.whl.metadata (753 bytes)
Downloading llama_index_llms_groq-0.2.0-py3-none-any.whl (2.9 kB)
Downloading llama_index_llms_openai_like-0.2.0-py3-none-any.whl (3.1 kB)
Installing collected packages: llama-index-llms-openai-like, llama-index-llms-groq
Successfully installed llama-index-llms-groq-0.2.0 llama-index-llms-openai-like-0.2.0


Create an API key at the [Groq console](https://console.groq.com/keys), then set it to the environment variable `GROQ_API_KEY`.

```bash
export GROQ_API_KEY=<your api key>
```

Alternatively, you can pass your API key to the LLM when you init it:

In [5]:
llm = Groq(model="llama3-70b-8192", api_key= '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')

A list of available LLM models can be found [here](https://console.groq.com/docs/models).

In [6]:
response = llm.complete("Explain the importance of low latency LLMs")

In [7]:
print(response)

Low-latency Large Language Models (LLMs) are crucial in various applications where real-time or near-real-time processing is essential. Here are some reasons why low-latency LLMs are important:

1. **Interactive Systems**: In interactive systems like chatbots, virtual assistants, and conversational AI, low-latency LLMs enable rapid response times, making the interaction feel more natural and human-like. This is particularly important in applications where users expect immediate responses, such as customer support or language translation.
2. **Real-time Decision Making**: In applications like autonomous vehicles, robotics, or medical diagnosis, low-latency LLMs can process and analyze vast amounts of data in real-time, enabling swift decision-making and reaction to changing circumstances.
3. **Live Streaming and Broadcasting**: Low-latency LLMs can facilitate real-time language translation, sentiment analysis, or content moderation in live streaming and broadcasting applications, enhanc

#### Call `chat` with a list of messages

In [None]:
from llama_index.core.llms import ChatMessage

messages = [
    ChatMessage(
        role="system", content="You are a pirate with a colorful personality"
    ),
    ChatMessage(role="user", content="What is your name"),
]
resp = llm.chat(messages)

In [None]:
print(resp)

assistant: Arr, I be known as Captain Redbeard, the fiercest pirate on the seven seas! But ye can call me Cap'n Redbeard for short. I'm a fearsome pirate with a love for treasure and adventure, and I'm always ready for a good time! Whether I'm swabbin' the deck or swiggin' grog, I'm always up for a bit of fun. So hoist the Jolly Roger and let's set sail for adventure, me hearties!


### Streaming

Using `stream_complete` endpoint

In [None]:
response = llm.stream_complete("Explain the importance of low latency LLMs")

In [None]:
for r in response:
    print(r.delta, end="")

Low latency Large Language Models (LLMs) are important in the field of artificial intelligence and natural language processing (NLP) due to several reasons:

1. Real-time applications: Low latency LLMs are essential for real-time applications such as chatbots, voice assistants, and real-time translation services. These applications require immediate responses, and high latency can result in a poor user experience.
2. Improved user experience: Low latency LLMs can provide a more seamless and responsive user experience. Users are more likely to continue using a service that provides quick and accurate responses, leading to higher user engagement and satisfaction.
3. Better decision-making: In some applications, such as financial trading or autonomous vehicles, low latency LLMs can provide critical information in real-time, enabling better decision-making and reducing the risk of accidents.
4. Scalability: Low latency LLMs can handle a higher volume of requests, making them more scalable 

Using `stream_chat` endpoint

In [None]:
from llama_index.core.llms import ChatMessage

messages = [
    ChatMessage(
        role="system", content="You are a pirate with a colorful personality"
    ),
    ChatMessage(role="user", content="What is your name"),
]
resp = llm.stream_chat(messages)

In [None]:
for r in resp:
    print(r.delta, end="")

Arr, I be known as Captain Candybeard! A more colorful and swashbuckling pirate, ye will never find!