# Inference of a finetuned llama-3

## Install required libraries


In [None]:
!pip install -qqq transformers --progress-bar off
!pip install -qqq bitsandbytes --progress-bar off
!pip install -qqq peft torch --progress-bar off
!pip install -qqq gradio --progress-bar off
!pip install -qqq kaggle --progress-bar off

  Preparing metadata (setup.py) ... [?25l[?25hdone
  Building wheel for ffmpy (setup.py) ... [?25l[?25hdone
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
spacy 3.7.4 requires typer<0.10.0,>=0.3.0, but you have typer 0.12.3 which is incompatible.
weasel 0.3.4 requires typer<0.10.0,>=0.3.0, but you have typer 0.12.3 which is incompatible.[0m[31m
[0m

## Imports
Import necessary libraries and modules.

In [None]:
import zipfile
import os
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel, PeftConfig
from huggingface_hub import login
import gradio as gr

In [None]:
# Set the device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
device

device(type='cuda')

In [None]:
# Login to Hugging Face Hub using your token
hf_token = "Your_API_key"

login(hf_token)

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: read).
Your token has been saved to /root/.cache/huggingface/token
Login successful


## Download finetuned model weights

To download finetuned weights in the colab notebook I've created a Kaggle dataset with all the weights, checkpoints and logs. In the following cells we're going to upload an [API token from Kaggle](https://www.kaggle.com/discussions/general/74235), so we could download archive of the dataset and then extract all the content from it.

In [None]:
from google.colab import files

# upload kaggle API token
files.upload()

Saving kaggle.json to kaggle.json


{'kaggle.json': b'{"username":"geraygench","key":"044426052610aea6df5ef6aaaa85b3ce"}'}

In [None]:
! mkdir -p ~/.kaggle/ && mv kaggle.json ~/.kaggle/ && chmod 600 ~/.kaggle/kaggle.json

In [None]:
! kaggle datasets download geraygench/llama3-ig-ad-generation-task

Dataset URL: https://www.kaggle.com/datasets/geraygench/llama3-ig-ad-generation-task
License(s): unknown
Downloading llama3-ig-ad-generation-task.zip to /content
100% 1.87G/1.88G [00:24<00:00, 77.3MB/s]
100% 1.88G/1.88G [00:24<00:00, 82.9MB/s]


In [None]:
# Specify the path to the zip archive
zip_file_path = 'llama3-ig-ad-generation-task.zip'

# Specify the directory where you want to extract the files
extract_dir = '.'

# Create the extract directory if it doesn't exist
if not os.path.exists(extract_dir):
    os.makedirs(extract_dir)

# Open the zip archive
with zipfile.ZipFile(zip_file_path, 'r') as zip_ref:
    # Extract all the contents to the specified directory
    zip_ref.extractall(extract_dir)

print("Zip archive extracted to", extract_dir)

Zip archive extracted to .


## Model loading

In [None]:
"""
Paths to saved model and tokenizer
"""

# Model name we want to use
model_name = "meta-llama/Meta-Llama-3-8B-Instruct"
# The directory where the model and tokenizer are saved
source_dir = "experiments/"
checkpoint_dir = source_dir + "checkpoint-483/"


# kaggle paths
# source_dir = "/kaggle/input/llama3-ig-ad-generation-task/experiments/"
# checkpoint_dir = source_dir + "checkpoint-483/"

Now we need to load the base tokenizer and model with 4-bit quantization. In the cell below we complete following steps:

In [None]:
# 1. Load the model with 4-bit quantization
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
)

base_model = AutoModelForCausalLM.from_pretrained(
    model_name,
    use_safetensors=True,
    quantization_config=bnb_config,
    trust_remote_code=True,
    device_map="auto",
)

# 2. Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(checkpoint_dir)

# 3. Load the PEFT configuration
peft_config = PeftConfig.from_pretrained(source_dir)

# 4. Load the adapter
model = PeftModel.from_pretrained(base_model, source_dir)

# 5. Ensure the tokenizer padding
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/654 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/4 [00:00<?, ?it/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/4.92G [00:00<?, ?B/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/1.17G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/187 [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Now the model and tokenizer are loaded and ready for inference

In [None]:
default_system_prompt = """
Write an engaging Instagram post caption about the given input. You can generate a few heashtags.
""".strip()

In [None]:
def generate_prompt(conversation: str, system_prompt: str = default_system_prompt) -> str:
    return f"""### Instruction: {system_prompt}

### Input:
{conversation.strip()}

### Response:
""".strip()

def clean_generated_text(text: str) -> str:
    # Remove duplicate hashtags
    hashtags = set()
    cleaned_text = []
    for word in text.split():
        if word.startswith("#"):
            if word.lower() not in hashtags:
                hashtags.add(word.lower())
                cleaned_text.append(word)
        else:
            cleaned_text.append(word)

    # There may be to that function,
    # but for now we'll proccess the duplicates
    return " ".join(cleaned_text)

def generate_post(model, text: str):
    inputs = tokenizer(text, return_tensors="pt").to(device)
    inputs_length = len(inputs["input_ids"][0])
    with torch.no_grad():
        outputs = model.generate(**inputs,
                                 max_new_tokens=100,
                                 temperature=0.7,
                                 top_p=0.95)
    generated_text = tokenizer.decode(outputs[0][inputs_length:], skip_special_tokens=True)
    return clean_generated_text(generated_text)


In [None]:
# Test the function with a sample instruction
sample_instruction = "Create a new post about the 'Adventure' model backpack with 25 liters capacity for $200, perfect for climbers."
prompt = generate_prompt(sample_instruction)
generated_post = generate_post(model, prompt)
print("Generated Post Content:\n", generated_post)

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Generated Post Content:
 🎥 Reach for the peaks with the Adventure 25 - a versatile day trekking pack 🎥 Explore nature and discover more about this durable, comfortable and highly functional partner at the link in bio! #Fjallraven #Adventure #Backpacks #DayTrekking #DayHike #Hiking #Climbing #ClimbingLife #Nature #ExploreNature #Sustainability #SeeYouOutHere #SeeYouOutHere2024 #Fjallr


## Gradio interface for user input

In [None]:
# Gradio interface

def gradio_interface(instruction):
    prompt = generate_prompt(instruction)
    return generate_post(model,prompt)

interface = gr.Interface(
    fn=gradio_interface,
    inputs=gr.Textbox(lines=3, placeholder="Enter instruction here..."),
    outputs="text"
)

In [None]:
 # Launch gradio interface
interface.launch()

Setting queue=True in a Colab notebook requires sharing enabled. Setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Running on public URL: https://352c24c7863e73da60.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)




In this notebook we implementated of a fine-tuned LLaMA-3 model for generating IG ad captions. It provides a comprehensive guide from model loading to user interaction via a web interface. The generated content is relevant and engaging, showcasing the model's potential for practical applications in social media marketing. The efficient use of quantization techniques ensures that the model can run on less powerful hardware without compromising performance.



### Further Improvements for the Notebook

* **Multilingual Support**:
    - At this moment the model mostly works only in English and generates posts in that langauage. That's why we need to extend the model's capability to generate captions in multiple languages.

* **Hyperparameter Tuning and Data Augmentation**
   - We can keep experimenting with different hyperparameters for text generation such as `temperature`, `top_k`, `top_p`, and `max_length`. Aslo we can use data augmentation techniques to expand the training dataset, improving the model’s robustness and ability to handle diverse inputs.

* **Advanced Prompt Engineering**
  - Besides that, we can look at prompt engineering to provide more specific and detailed instructions to the model, which could result in more creative outputs.

* **Enhanced Post-Processing**:
   - It's important to consider developing more sophisticated post-processing techniques to improve the readability of the generated captions, such as better handling of anomalies like  repetetive hashtags and emojis.

* **Integration with Real-Time Data**:
   - Integrate the model with real-time data sources (e.g., trending hashtags, popular topics) to make the generated captions more relevant and timely.

* **Performance Metrics**:
   - Feedback mechanisms to evaluate the quality of the generated captions will allow us continuously improving the model's performance based on user feedback.