## Hugging Face
#### What is Hugging Face?
Hugging Face is a leading open-source platform and community focused on natural language processing (NLP) and machine learning (ML). It provides tools, libraries, and models for tasks such as text generation, translation, classification, summarization, and beyond. Hugging Face has revolutionized accessibility to pre-trained models and democratized AI through its ecosystem of resources. Key Components of Hugging Face are:

- **Transformers Library:** A widely used Python library for working with pre-trained transformer models (like BERT, GPT, T5, etc.).
- **Datasets Library:** Simplifies loading, preprocessing, and using datasets for training models.
- **Model Hub:** A repository of thousands of pre-trained models contributed by the community and organizations, covering diverse tasks and languages.
- **Hugging Face API:** A cloud-based API that allows users to access pre-trained models without requiring local infrastructure.
- **Spaces:** A platform for hosting and sharing AI models and applications using frameworks like Gradio or Streamlit.




#### How does Hugging Face differ from OpenAI?

| **Aspect**                  | **Hugging Face**                                                                                                     | **OpenAI**                                                                                           |
|-----------------------------|---------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------|
| **Openness**                | Open-source platform with publicly available models and code.                                                       | Primarily closed-source. OpenAI API provides access to its proprietary models like GPT.             |
| **Model Availability**      | Hosts a variety of pre-trained LLMs (GPT, BERT, T5, etc.) from multiple organizations and contributors.              | Only provides proprietary models (e.g., GPT-3, GPT-4, Codex, DALL·E).                               |
| **Customization**           | Allows **fine-tuning** of models and training custom models locally or in the cloud.                                | Fine-tuning OpenAI models is possible but limited, and requires API access.                        |
| **API Flexibility**         | API supports a wide range of models/tasks from the Model Hub (e.g., BERT, RoBERTa, GPT-based models).               | API only gives access to OpenAI’s proprietary models like GPT.                                      |
| **Pricing Model**           | Many tools and models are free/open-source. Hugging Face API offers paid tiers for hosted model inference.          | OpenAI is predominantly API-driven and has a consumption-based pricing model for access to its models. |
| **Community Contribution**  | Strong focus on community collaboration, with open contributions to the Model Hub and libraries.                   | No community contributions to OpenAI’s proprietary models.                                          |
| **Task Diversity**          | Hugging Face supports **transformers** for various domains (text, vision, audio, etc.).                            | OpenAI models are versatile but typically limited to text (and some image) tasks.                   |
| **Training Flexibility**    | Users can train their own models from scratch or fine-tune existing models.                                         | OpenAI does not offer training from scratch. Users only interact with pre-trained models via API.   |


Hugging Face is closely tied to large language models (LLMs) in the following ways: repository for pre-trained LLMs, fine-tuning LLMs, democratizing LLMs (making them more accessible to developers), and supporting customization.

In [20]:
# !pip install huggingface_hub
# !pip install transformers
# !pip install accelerate
# !pip install bitsandbytes

In [2]:
# below import the necessary libraries
from langchain import PromptTemplate, HuggingFaceHub, LLMChain

#### Before we start...
You should access the Hugging Face API token
`os.environ["HUGGINGFACEHUB_API_TOKEN"]` is used to access the Hugging Face API token stored in the environment variables.
#### When is it used?
- When using Hugging Face models via transformers or diffusers
- When accessing private models or datasets via huggingface_hub
- When fine-tuning or pushing models to Hugging Face Hub

In [22]:
import os

# set the environment
os.environ["HUGGINGFACEHUB_API_TOKEN"] = "Add your own token here!"

In order to check whether the environment variable has been properly set or not, you can print `os.environ["HUGGINGFACEHUB_API_TOKEN"]`.

In [23]:
# create a prompt
prompt_container = PromptTemplate(
    input_variables=["product"],
    template= "What container is suitable for {product}?"
)

We are going to firstly do the warm up with the model `facebook/mbart-large-50` or `google/flan-t5-large`.

In [24]:
HuggingFaceHub(repo_id="facebook/mbart-large-50", model_kwargs={"temperature":1.5})

HuggingFaceHub(client=<InferenceClient(model='facebook/mbart-large-50', timeout=None)>, repo_id='facebook/mbart-large-50', task='text2text-generation', model_kwargs={'temperature': 1.5})

In [32]:
client1 = HuggingFaceHub(repo_id="google/flan-t5-large", model_kwargs={"temperature":0})
client2 = HuggingFaceHub(repo_id="facebook/mbart-large-50", model_kwargs={"temperature":1})

In [29]:
request_container1 = LLMChain(llm=client, prompt=prompt_container)
request_container2 = LLMChain(llm=client, prompt=prompt_container)

In [34]:
request_container1.run("cake")

'tin'

**IMPORTANT NOTE!** There is the possibility of having an ERROR raised as the one below: `HFHUBHTTPERROR: 401 client error: unauthorized for url: https://api inference.huggingface.co/models/google/flan-t5-large (request id: orynyb)`
In order to avoid such an error make sure the role is **write**.

In [33]:
request_container2.run("cake")

'tin'

#### In some cases, you might want to utilize the same model locally. 

Therefore, let me introduce `AutoTokenizer` library.

`AutoTokenizer` is a part of the Hugging Face Transformers library. It is an automatic tokenizer class that can dynamically load the correct tokenizer for a given model. Instead of manually specifying the tokenizer type (e.g., `BertTokenizer`, `GPT2Tokenizer`), `AutoTokenizer` automatically selects the appropriate one based on the model name or ID.

`AutoTokenizer.from_pretrained(model_id)` This method loads a pre-trained tokenizer from the Hugging Face model hub or from a local directory. The tokenizer is responsible for converting text into tokens and token IDs that the model can understand. is downloaded from the Hugging Face model hub if not already cached.

In [18]:
from langchain.llms import HuggingFacePipeline
from transformers import AutoModelForCausalLM, pipeline, AutoTokenizer, AutoModelForSeq2SeqLM
import torch

model_id = "google/flan-t5-large"

In [19]:
prompt_outfit = PromptTemplate(
    input_variables=["outfit"],
    template="What part of the body do(es) (a/an) {outfit} cover?"
)

In [20]:
# Load tokenizer - the one used particularly to train the model flan-t5-large
tokenizer = AutoTokenizer.from_pretrained(model_id)

`AutoModelForSeq2SeqLM.from_pretrained(model_id, device_map="auto")` loads a pretrained sequence-to-sequence model (e.g., T5, BART). `device_map="auto"` automatically places the model on CPU, GPU, or multiple devices for optimal performance. Then, the model can process tokenized input and generate text.

In [21]:
model = AutoModelForSeq2SeqLM.from_pretrained(model_id, device_map="auto")

Some parameters are on the meta device because they were offloaded to the cpu and disk.


In [22]:
pipeline = pipeline("text2text-generation", model=model, tokenizer=tokenizer, max_length=128)

Device set to use cpu


In [23]:
local_llm = HuggingFacePipeline(pipeline=pipeline)

In [24]:
request_outfit = LLMChain(llm=local_llm, prompt=prompt_outfit)

In [25]:
request_outfit.run("sock")

'foot'

In [26]:
request_outfit.run("gloves")

'hands'

**NOTE!** You may have already noticed that the model can perfectly distinguish between the plural and singular forms of noun and provide an answer accordingly.

**BE CAREFUL!** It is always possible that your model cannot provide the right answer if it is not very well trained to catch the corresponding concept proposed in the prompt. Therefore, be aware of the main objective for which a particular model has been trained and designed.