# Path 2 - HuggingFace
HuggingFace (HF) is a free platform where user can upload models (of various kinds, not just LLMs) that can then be used through their `transformers` library. To be able to use the models on HF you don't need to create an account, however, some models are 'gated' and require approval from the creator before being able to use them (it is the case e.g. for LLaMA models). For those models, it's required both authentication and authorization to use the model.

### 1. First simple generation
For the means of this lab, we will use the model `Qwen/Qwen2-VL-2B-Instruct`, which is a non-gated fairly small model that, besides text, also support images and videos. For the assignment and the project you can choose the model that you prefer from the [HF catalogue](https://huggingface.co/models).

**NOTE:** Until the next release of `transformers`, the use of Qwen VL requires installation from source (see `requirements.txt`).

In [None]:
from transformers import Qwen2VLForConditionalGeneration, AutoProcessor, GenerationConfig


MODEL_NAME = "Qwen/Qwen2-VL-2B-Instruct"

# We're using the `Qwen2VLForConditionalGeneration` class to enable multimodal generation
# Normally, you can use AutoModelForCausalLM
model = Qwen2VLForConditionalGeneration.from_pretrained(
    MODEL_NAME,
    torch_dtype="auto",  # automatically uses right precision based on model
    device_map="auto"  # automatically uses right device e.g. GPU if available
)

# We're using the `AutoProcessor` class to enable multimodal generation
# Normally, you can use AutoTokenizer
processor = AutoProcessor.from_pretrained(MODEL_NAME)

#### Exercise 1

Start with using the model to predict the next part in a conversation. You need to  tokenize the input, generate the response, detokenize it and print it.

In [None]:
conversation = [
    {
        "role": "system",  # optional, could start directly with user
        "content": "You are a helpful pirate. Only reply with pirate jargon.",  # system prompt
    },
    {
        "role": "user",
        "content": "Hello",  # user query
    },
]
# Answer here

### 2. Generation parameters
When asking the model to generate some text, there are different parameters that you can tune to improve on the final quality of the text. [Here](https://huggingface.co/docs/transformers/generation_strategies) is an overview of the parameters that you can change. Try some of them in different context and understand how they affect the final generated text. Feel also free to explore different decoding strategies.

#### Exercise 2

Play with the output temperature, which controls the randomness of the generated text `temperature=0` means deterministic output, while `temperature=1` means maximum randomness (try some intermediate value too) and keep the `max_output_tokens` to 50 so that the output is not too long.

In [None]:
# Answer here

#### Exercise 3

Try out different `top_k` values, which controls how many tokens the model considers for output `top_k=1` means the model considers only one token for output (the one with the highest probability) `top_k=50` means the model considers the top 50 tokens for output.

In [None]:
# Answer here

#### Exercise 4

The same exercise as before but now with `top_p`, which controls how the model selects tokens for output `top_p=0.1` means the model selects tokens that make up 10% of the cumulative probability mass `top_p=0.9` means the model selects tokens that make up 90% of the cumulative probability mass `top_p` filters tokens *after* applying `top_k`.

Can you determine a rule of thumb as to how `top_k` and `top_p` affect the output results? (If you can't try to push the values to extreme values)

In [None]:
# Answer here

### 3. Add images to the prompt
This model, beside text also accepts images (and videos).


#### Exercise 5
Try prompting it with one. Choose an interesting image and prompt the model with a query about it.

You can use the model's [README](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct).

Use [PIL](https://pillow.readthedocs.io/en/stable/) to load an image. It should already be present in the Python environment.

In [None]:
IMAGE_PATH = "./data/engineer_fitting_prosthetic_arm.jpg"

# Answer here

### 4. Document grounding

#### Exercise 6 (optional)

Depending on the application of the project, you might need to extract text from given documents and include it as additional context. In this case of HuggingFace, you'll need to use external libraries to achieve it more easily. Here are some libraries that you can use: [LangChain](https://python.langchain.com/v0.2/docs/introduction/), [LlamaIndex](https://docs.llamaindex.ai/en/stable/examples/), [Haystack](https://docs.haystack.deepset.ai/docs/intro).

For the solution of this lab we will use LangChain.

**NOTE:** This part is here only to mirror the lab's Path 1 and is *optional*.

In [None]:
DOC_PATH = "./data/chain_of_thought_prompting.pdf"

# Answer here

### 5. Create a user interface

#### Exercise 7

Since you are trying to build a complete application, you also need a nice user interface that interacts with the model. There are various libraries available for this purpose. Notably: [gradio](https://www.gradio.app/docs/gradio/interface) and [chat UI](https://huggingface.co/docs/chat-ui/index). For the solution of this lab, we will use gradio.

Gradio has pre-defined input/output blocks that are automatically inserted in the interface. You only need to provide an appropriate function that takes all the inputs and returns the relevant output. See documentation [here](https://www.gradio.app/docs/gradio/interface).

In [None]:
# Answer here

# This part closes the demo server if it is already running (which
# happens easily in notebooks) and prevents you from opening multiple
# servers at the same time.
if "demo" in locals() and demo.is_running:
    demo.close()

# Edit the parameters below
demo = gr.ChatInterface(...)
demo.launch()