# Advanced NLP with Transformers and Hugging Face

This notebook demonstrates how to use various models and tools from the Hugging Face ecosystem for natural language processing (NLP) tasks. The notebook covers:

1. **Loading and Using a Transformer Model**
2. **Performing Inference with Mixed Precision**
3. **Generating Responses Using a Transformer Model**
4. **Using the Hugging Face Inference API**
5. **Using Microsoft TTS API for Text-to-Speech**

Let's start by setting up our environment.

In [1]:
# Importing necessary libraries
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
from torch.cuda.amp import autocast
import requests
from IPython.display import Audio

## 1. Load Tokenizer and Model
First, we'll load a pre-trained tokenizer and model from the Hugging Face model hub. In this case, we use the `mistralai/Mistral-7B-v0.1` model.

In [2]:
# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-v0.1")
model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1")

# Move model to GPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

## 2. Perform Inference with Mixed Precision
To improve performance and reduce memory usage, we can use mixed precision inference. We define a function `infer` to tokenize the input text and perform inference using mixed precision.

In [3]:
def infer(input_text):
    # Tokenize input
    inputs = tokenizer(input_text, return_tensors="pt").to(device)
    
    # Perform inference with mixed precision
    with autocast():
        outputs = model(**inputs)
    
    # Clear GPU cache
    torch.cuda.empty_cache()
    
    # Process outputs
    return outputs

# Example usage
input_text = "Your input text here"
outputs = infer(input_text)
print(outputs)

## 3. Generate Responses Using Transformer Model
We also define a function `generate_response` to generate responses based on a prompt. This function uses the `generate` method of the model and decodes the generated text.

In [4]:
def generate_response(prompt):
    inputs = tokenizer(prompt, return_tensors="pt").to(device)
    outputs = model.generate(
        inputs.input_ids,
        max_length=100,
        num_return_sequences=1,
        pad_token_id=tokenizer.eos_token_id,
    )
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

# Example query
query = "What is the impact of AI on modern healthcare?"
response = generate_response(query)
print(response)

## 4. Use the Hugging Face Inference API
We demonstrate how to use the Hugging Face Inference API for a model named `mistralai/Mistral-7B-Instruct-v0.1` to handle chat completions. This part requires an API token for authentication.

In [5]:
from huggingface_hub import InferenceClient

client = InferenceClient(
    "mistralai/Mistral-7B-Instruct-v0.1",
    token="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",
)

for message in client.chat_completion(
    messages=[{"role": "user", "content": "Who is obama?"}],
    max_tokens=500,
    stream=True,
):
    print(message.choices[0].delta.content, end="")

## 5. Use Microsoft TTS API for Text-to-Speech
Finally, we show how to use Microsoft's Text-to-Speech API to convert text into speech. We use an API endpoint and handle the response to get the audio output.

In [6]:
API_URL = "https://api-inference.huggingface.co/models/microsoft/speecht5_tts"
headers = {"Authorization": "Bearer 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"}

def query(payload):
    response = requests.post(API_URL, headers=headers, json=payload)
    return response.content

audio_bytes = query({
    "inputs": "How are you?",
})

# Display the audio
Audio(audio_bytes)