# Path 2 - HuggingFace
HuggingFace (HF) is a free platform where user can upload models (of various kinds, not just LLMs) that can then be used through their `transformers` library. To be able to use the models on HF you don't need to create an account, however, some models are 'gated' and require approval from the creator before being able to use them (it is the case e.g. for LLaMA models). For those models, it's required both authentication and authorization to use the model.

### 1. First simple generation
For the means of this lab, we will use the model `Qwen/Qwen2.5-VL-3B-Instruct`, which is a non-gated fairly small model that, besides text, also support images and videos. For the assignment and the project you can choose the model that you prefer from the [HF catalogue](https://huggingface.co/models).

In [2]:
from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor

# fairly small but good model
MODEL_NAME = "Qwen/Qwen2.5-VL-3B-Instruct"

# We're using the `Qwen2_5_VLForConditionalGeneration` class to enable multimodal generation
# Normally, you can use AutoModelForCausalLM
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    MODEL_NAME,
    dtype="auto",  # automatically uses right precision based on model
    device_map="auto"  # automatically uses right device e.g. GPU if available
)

# We're using the `AutoProcessor` class to enable multimodal generation
# Normally, you can use AutoTokenizer
processor = AutoProcessor.from_pretrained(MODEL_NAME)



Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Fetching 1 files:   0%|          | 0/1 [00:00<?, ?it/s]

Fetching 1 files:   0%|          | 0/1 [00:00<?, ?it/s]

Fetching 1 files:   0%|          | 0/1 [00:00<?, ?it/s]

### 3. Add images to the prompt
This model, beside text also accepts images (and videos).


#### Exercise 5
Try prompting it with one. Choose an interesting image and prompt the model with a query about it.

You can use the model's [README](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct).

Use [PIL](https://pillow.readthedocs.io/en/stable/) to load an image. It should already be present in the Python environment.

In [None]:
#This function creates a text streamer to stream the output tokens as they are generated
from transformers import TextStreamer
def create_streamer():
    return TextStreamer(
        processor.tokenizer,
        skip_prompt=True,  # do not print the prompt
        skip_special_tokens=True  # do not print special tokens
    )

In [14]:
#this function records in a log file the user prompt and the expert reply, as well as the generation parameters, the time taken to generate the reply and the model name, each entry is in json format. It also appends to the log file.
import json
def log_interaction(user_prompt, expert_reply, temp, topK, topP, time_taken, model_name=MODEL_NAME):
    log_entry = {
        "user_prompt": user_prompt,
        "expert_reply": expert_reply,
        "temperature": temp,
        "topK": topK,
        "topP": topP,
        "time_taken_seconds": time_taken,
        "model_name": model_name
    }
    with open("interaction_log.jsonl", "a") as log_file:
        log_file.write(json.dumps(log_entry) + "\n")

In [19]:
from PIL import Image
import time
IMAGE_PATH = "./data/mushroom_copper_spike.jpg"

im = Image.open(IMAGE_PATH)

# Your code here
from transformers import GenerationConfig
from qwen_vl_utils import process_vision_info



user_prompt_str = "Tell me both, the scientific and common names of the mushroom or mushrooms in this picture. Tell me the family to which they belong. Give me their physical description. Where are they most commonly found. Clarify if they are edible and if not, explain why and what are the sympthoms or side effects in a person, in this last case explain if there is a known treatment or antidote. Mention if there is any similarity with other specties. Lastly, you may mention any other documented fact about the mushroom."

system_prompt_str = "You are a mushroom expert chatbot that responds to user queries about mushrooms, and you must always steer the conversation to keep it in the context of mushrooms. You provide succinct and to the point information about mushrooms based on data. You talk in mycological terms about data. Also mention the common names of mushrooms. If you don't know the answer, just say you don't know. Never make up an answer."

conversation = [
    {
        "role": "system",
        "content": [
            {"type": "text", "text": system_prompt_str}
        ],
    },
    {
        "role": "user",
        "content": [
            {"type": "image", "image": im},        
            {"type": "text", "text": user_prompt_str}
        ],
    },
]

temp = 0.9
topK = 0
topP = 0.5

# Define a generation config
gen_config = GenerationConfig(
    max_new_tokens=500,   # limit reply length
    temperature=temp,     # controls randomness (try 0.0, 0.5, 1.0, etc.)
    top_k = topK,
    do_sample=True,      # must be True for temperature to have effect
    top_p=topP            # nucleus sampling (optional, helps variety)
)
    
    
# Preparation for inference
text = processor.apply_chat_template(
    conversation, tokenize=False, add_generation_prompt=True
)
image_inputs, video_inputs, = process_vision_info(conversation)
inputs = processor(
    text=[text],
    images=image_inputs,
    videos=video_inputs,
    padding=True,
    return_tensors="pt",
)
inputs = inputs.to(model.device)

# Log the interaction with time taken
start_time = time.time()
# (Place the generation code here)
# Inference: Generation of the output
generated_ids = model.generate(**inputs, generation_config=gen_config)
generated_ids_trimmed = [
    out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
    generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
end_time = time.time()

print("\nExpert reply:", output_text)

time_taken = end_time - start_time
log_interaction(user_prompt_str, output_text, temp, topK, topP, time_taken, model_name=MODEL_NAME)


Expert reply: ['**Scientific Name:** Cantharellus cibarius (Champagne mushroom)\n**Common Name:** Champagne mushroom\n\n**Family:** Cantharellaceae\n\n**Physical Description:**\n- **Cap:** The cap is convex to bell-shaped, with a diameter of 3-7 cm. It is initially pale yellow to light brown, becoming darker with age.\n- **Gills:** The gills are attached (decurrent), and they are initially white, becoming yellowish as the mushroom matures.\n- **Stem:** The stem is equal to the cap, often with a slight depression at the base. It is smooth, yellowish to orange, and often has a ring at the top.\n- **Spore Print:** The spore print is yellowish.\n\n**Habitat and Distribution:**\n- They are commonly found in coniferous forests, particularly in Europe and North America.\n- They are often found on the ground or in leaf litter.\n\n**Edibility:**\n- **Edible:** Yes, the Champagne mushroom is considered a safe and delicious edible mushroom.\n- **Symptoms:** None known.\n- **Treatment:** None kno

### 5. Create a user interface

#### Exercise 7

Since you are trying to build a complete application, you also need a nice user interface that interacts with the model. There are various libraries available for this purpose. Notably: [gradio](https://www.gradio.app/docs/gradio/interface) and [chat UI](https://huggingface.co/docs/chat-ui/index). For the solution of this lab, we will use gradio.

Gradio has pre-defined input/output blocks that are automatically inserted in the interface. You only need to provide an appropriate function that takes all the inputs and returns the relevant output. See documentation [here](https://www.gradio.app/docs/gradio/interface).

In [55]:
def function_similarity(query: str): 
    
    #Test similarity search
    #query = "What is chain of thought prompting?"
    results = vectorstore.similarity_search(query, k=3)
    
    return results[0].page_content[:300]

In [56]:
import gradio as gr

# This part closes the demo server if it is already running (which
# happens easily in notebooks) and prevents you from opening multiple
# servers at the same time.
if "demo" in locals() and demo.is_running:
    demo.close()

# Your code here
#USER_QUERY = "What is chain of thought?"
demo = gr.Interface(fn=function_similarity, inputs="textbox", outputs="textbox")

if __name__ == "__main__":
    demo.launch()

* Running on local URL:  http://0.0.0.0:7860
* To create a public link, set `share=True` in `launch()`.
