In [1]:
import requests
import os
from dotenv import load_dotenv, dotenv_values
load_dotenv()
hugging_face_token = os.getenv("HUGGING_FACE_API_KEY")

## Huggingface

In [2]:
import os

from groq import Groq

client = Groq(
    api_key=os.environ.get("GROQ_API_KEY"),
)

chat_completion = client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": "Explain the importance of fast language models",
        }
    ],
    model="llama3-8b-8192",
)

print(chat_completion.choices[0].message.content)

Fast language models, also known as efficient or lightweight language models, have gained significant attention in recent years due to their importance in various applications. Here are some key reasons why fast language models are crucial:

1. **Scalability**: Fast language models can process large amounts of data and generate text at a much faster rate than traditional language models, making them ideal for use cases where speed and scalability are essential.
2. **Real-time applications**: Fast language models can be used in real-time applications such as:
	* Chatbots: Immediate responses are critical in chatbots, and fast language models can provide quick and accurate responses.
	* Speech recognition: Real-time speech recognition relies on fast language models to transcribe speech quickly.
	* Natural Language Processing (NLP) pipelines: Fast language models can accelerate the processing of NLP pipelines, enabling faster analysis and decision-making.
3. **Edge computing and IoT devic

In [3]:
chat_completion = client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": "Explain the importance of fast language models",
        }
    ],
    model="gemma2-9b-it",
)

print(chat_completion.choices[0].message.content)

Fast language models are gaining significant importance in various fields due to their ability to process and generate text at an impressive speed. Here's a breakdown of their key benefits and why they matter:

**1. Real-Time Applications:**

* **Chatbots & Conversational AI:** Fast response times are crucial for creating engaging and natural-sounding chatbots. Users expect immediate replies, and slow models can lead to frustrating experiences.
* **Live Transcription & Translation:**  

Real-time captioning for videos or translating spoken language on the fly relies heavily on the speed of language models. 

* **Interactive Systems:**  Applications like code completion, predictive text, and interactive storytelling benefit from the instantaneous feedback provided by fast models.

**2. Efficiency and Scalability:**

* **Reduced Latency:**  Faster processing reduces the time it takes to complete tasks, leading to more efficient workflows.
* **Lower Computational Costs:** While some fast 

In [4]:
chat_completion = client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": "Explain the importance of fast language models",
        }
    ],
    model="mixtral-8x7b-32768",
)

print(chat_completion.choices[0].message.content)

Fast language models are important for several reasons:

1. Improved user experience: Fast language models can quickly generate responses to user queries, which leads to a more seamless and responsive user experience. This is especially important in real-time applications such as chatbots, virtual assistants, and speech recognition systems.
2. Cost-effective: Fast language models can process large volumes of data quickly, which can reduce the computational resources required to train and deploy them. This can result in significant cost savings for organizations that rely on language models for their products and services.
3. Scalability: Fast language models can handle larger datasets and more complex tasks than slower models. This makes them more scalable and adaptable to a wide range of applications, from social media monitoring to content generation.
4. Real-time analytics: Fast language models can analyze data in real-time, which can provide valuable insights and feedback for busin

## meta-llama/Meta-Llama-3-8B-Instruct

In [5]:
from huggingface_hub import InferenceClient

client = InferenceClient(
    "meta-llama/Meta-Llama-3-8B-Instruct",
    token=hugging_face_token,
)

for message in client.chat_completion(
	messages=[{"role": "user", "content": "What is the capital of France?"}],
	max_tokens=500,
	stream=True,
):
    print(message.choices[0].delta.content, end="")

  from .autonotebook import tqdm as notebook_tqdm


The capital of France is Paris!

## microsoft/Phi-3-mini-4k-instruct

In [6]:
from huggingface_hub import InferenceClient

client = InferenceClient(
    "microsoft/Phi-3-mini-4k-instruct",
    token=hugging_face_token,
)

for message in client.chat_completion(
	messages=[{"role": "user", "content": "What is the capital of France?"}],
	max_tokens=500,
	stream=True,
):
    print(message.choices[0].delta.content, end="")

 The capital of France is Paris. It is not only the most populous city in France, but it is also known for its history, culture, and architecture. Among these are iconic landmarks such as the Eiffel Tower, the Louvre Museum, which is the world’s largest art museum, and Notre-Dame Cathedral, known for its French Gothic architecture. Paris is also an important center for fashion, design, and gastronomy, adding to its reputation as a leading global city.

## mistralai/Mistral-7B-Instruct-v0.1

In [7]:
from huggingface_hub import InferenceClient

client = InferenceClient(
    "mistralai/Mistral-7B-Instruct-v0.1",
    token=hugging_face_token,
)

for message in client.chat_completion(
	messages=[{"role": "user", "content": "What is the capital of France?"}],
	max_tokens=500,
	stream=True,
):
    print(message.choices[0].delta.content, end="")

 The capital of France is Paris.