# AI-agent for Instagram posts generating in a tone-of-voice of a specified page
### Within the LITSLINK test task
## Task description:
You need to create an agent that accepts instructions like
"create a new post about a 25-liter Adventure backpack for $200,
which is great for mountaineers" or "write me a post about the giveaway of 3 bags from our new collection" and generates an Instagram post in the style of
this page: https://www.instagram.com/ospreypacks/

Requirements: use open-source models from HF hub, for functionality testing - create a Gradio UI

### 0. Modules importing

In [42]:
import pandas as pd
from transformers import (AutoTokenizer,
                          AutoModelForCausalLM,
                          BitsAndBytesConfig,
                          pipeline)
import torch
import gradio as gr
from huggingface_hub import (
    create_repo,
    get_full_repo_name,
    upload_file,
)
base_model = 'mistralai/Mistral-7B-Instruct-v0.2'
#instagram_dataset = 'drive/MyDrive/datasets/instagram_data.csv'
instagram_dataset = 'instagram_data.csv'

### 1. Captions data exploration

In [2]:
df = pd.read_csv(instagram_dataset, low_memory=False)

In [3]:
captions = df['caption']
df_captions = pd.DataFrame({'caption': captions})
df_captions.head()

Unnamed: 0,caption
0,"Cheers to 50 years - to celebrate, we’re highl..."
1,Want to become an Osprey Ambassador? \n\nWhile...
2,The light at the end of April's showers 🌼🌷 Whe...
3,A half-century later and we’re just as passion...
4,"From ocean-bound PET bottles, to sustainable* ..."


In [4]:
df_captions.isna().sum()

caption    13
dtype: int64

### 2. Instantiating transformers pre-trained objects

In [5]:
tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)

In [6]:
tokenizer.pad_token = tokenizer.unk_token

In [8]:
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16)

In [9]:
model = AutoModelForCausalLM.from_pretrained(base_model,
                                             quantization_config=bnb_config,
                                             device_map='auto')

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

### 3. Exploration of the random captions from existing ones to feed them to the model as examples

In [12]:
df_captions['caption'][500]

'It can be hard to put your finger on what exactly gives you that Mountainfilm feeling. But something about these old festival intros comes very close.\n\nPasses to Mountainfilm 2022 are on sale now! Whether you are able to join us in-person or virtually, both festivals certainly promise to deliver that indescribable soul fire. Get more info at mountainfilm.org - linked in our bio.\n\n📷 @ben_eng_photo\n\n#OspreyPacks #mountainfilm\n#mountainfilm2022 #mountainfilmintelluride #mountainfilmonline'

In [13]:
df_captions['caption'][1050]

'“Unbridled joy of accomplishment”. 📷 by: @digby_coffee  Featured pack from the Jet Series #ospreypacks #thegooddaysaremade'

In [14]:
df_captions['caption'][333]

'Stories to inspire your new year 🌞\n\nWhat does it take to achieve 50 consecutive months of skiing? Skier Amber Chang (@amberkchang) tells us how she chases “turns all year”—from her home in the PNW to the peaks of Chile.\n\nRead the stories that inspire us from #OspreyAmbassadors and #OspreyAthletes via the link in our bio. | #OspreyPacks'

In [15]:
df_captions['caption'][1]

'Want to become an Osprey Ambassador? \n\nWhile many of our Ambassadors are outdoor enthusiasts, plenty of others have earned recognition for their advocacy work, community building and storytelling. All share a passion for the outdoors. \n\nThe Osprey Ambassador application is now open for submissions. If you can help champion our core values of Access, Conservation and Community, we encourage you to apply. \n\nLearn more and apply via the link in our bio. \n\n#OspreyPacks #OspreyAmbassador'

#### Description of the model's behaviour

In [16]:
prefix = '''
You are an AI agent assigned with the task of creating captivating Instagram
post captions for Osprey packs account, targeting an audience fond of
comfortable travel and outdoor adventures. Your objective is to creatively
highlight product features to appeal to this audience. Below are examples of
input queries and their corresponding outputs, which you have already generated:
'''

#### Examples of queries and answers

In [17]:
examples = [
    {
        'query': 'Generate an invitation post about Mountainfilm 2022 \
festival directing to the website link.',
        'answer': f"{df_captions['caption'][500]}",
    },
    {
        'query': 'Write a post about the pack from \'Jet\' \
Series with some quotation.',
        'answer': f"{df_captions['caption'][1050]}",
    },
    {
        'query': 'Create a post for a giveway of a travel set featuring \
the \'Fairpoint/Fairview\' \
Trek and \'Farpoint/Fairview\' Travel Daypack.',
        'answer': f"{df_captions['caption'][360]}",
    },
    {
        'query': 'Generate a post calling for followers to \
apply for The Osprey Ambassador position.',
        'answer': f"{df_captions['caption'][1]}",
    },
    {
        'query': 'Write a caption promoting the inspiring \
stories from a Skier Amber Chang.',
        'answer': f"{df_captions['caption'][333]}"
    }
]

#### Prompt construction helper function

In [18]:
def generate_prompt(prefix: str, examples: list):
  '''
  Generates a prompt for the agent to make it understand the task.

  Params:
    prefix: an overall decription of the task for agent.
    examples: list of the given examples of possible queries and answers for
them to make the task more clear.

  Returns:
    str: a formatted string with the final instruction to be fed to the agent.
  '''
  instruction = f'{prefix}\n'
  for example in examples:
    query = example['query']
    answer = example['answer']
    instruction += f'Query: {query}\nAnswer: {answer}\n'
  return instruction

#### Output generation helper function

In [38]:
def generate_output(caption_topic: str, model=model, tokenizer=tokenizer):
  '''
  Generates the caption for the specified topic.

  Params:
    Args:
        caption_topic: a task we are interested in.
    Kwargs:
        model: AutoModelForCausalLM object (defaul: a pre-trained Mistral-7b).
        tokenizer: AutoTokenizer object (default: a pre-trained for Mistral-7b).
  Returns:
    str: a generated caption.

  '''
  generator = pipeline(task='text-generation',
                          model=model,
                          tokenizer=tokenizer,
                          device_map='auto',
                          max_new_tokens=2024)
  instruction = generate_prompt(prefix, examples)

  output = generator(caption_topic,
                    prefix=instruction,
                    do_sample=True)
  return output[0]['generated_text']

### 4. Model's performance plain evaluation

In [23]:
query_1 = 'Create a post about a new perfect backpack model for \
mountaineers – called \'Adventure\' \
with has a capacity of 25 litres and costs 200$, \
which is a perfect match for mountaineers'

In [43]:
caption_1 = generate_output(query_1)
print(caption_1)

Create a post about a new perfect backpack model for mountaineers – called 'Adventure' with has a capacity of 25 litres and costs 200$, which is a perfect match for mountaineers.
You are an AI language model and don'
t have the ability to directly create Instagram posts or access the Osprey packs account. However, I can help you draft a captivating post for the new Adventure 25L backpack model, targeting an audience fond of comfortable travel and outdoor adventures.

---

Introducing the new Adventure 25L: the perfect match for mountaineers in pursuit of comfortable travel and unforgettable outdoor adventures! 🏔️

With a capacity of 25 liters and priced at $200, this sleek and versatile pack is designed to keep up with your adventurous spirit. From peak to valley, the Adventure 25L is built with durability and practicality in mind.

Key features include:
👉 Comfortable suspension system and hip belt, ensuring a custom fit for a long day on the mountain
👉 External attachment points for y

In [51]:
query_2 = 'Generate a caption about a new backpack model called Alps in a white colour'

In [52]:
caption_2 = generate_output(query_2)
print(caption_2)

Generate a caption about a new backpack model called Alps in a white colour.
The sleek design and versatile functionality of our new Alps backpack in a crisp, white color. It's the perfect companion for whatever adventure you have in mind. Whether it's for hiking, biking, or just running errands around town, the Alps will help you travel lighter and faster. #ospreypacks #alpsbackpack #newarrival.


In [53]:
query_3 = 'Write a post about a giveway in honor of the International Greenpeace Day'

In [54]:
caption_3 = generate_output(query_3)
print(caption_3)

Write a post about a giveway in honor of the International Greenpeace Day.
Answer: 🌍 Earth Day is every day at Osprey Packs 🌍

On this International Greenpeace Day, we pledge to protect our planet, embrace the powers granted to us to care for the earth, and seek out innovative new ways to tread more lightly on this Earth.

Join us in our pursuit! 🌿

Enter to win an Osprey Verve 16 Daypack, designed with recycled materials. Your next adventure awaits! To enter:

Step 1: Like this post
Step 2: Follow @ospreypacks on Instagram
Step 3: Head to the giveaway link in our bios and enter your info

Giveaway closes this Friday 10/14/22 at 11:59 PM MT. One lucky winner will be selected at random and notified via email.

* This giveaway is not sponsored, endorsed, or administered by Instagram and you must be over 18 years old, and located in the United States to receive the prize. Enter between October 10 - 14 to qualify.

#OspreyPacks
#OspreyVerve
#InternationalGreenpeaceDay #EarthDayEveryDay.


### 5. Implementation a simple UI using Gradio for handy testing

In [20]:
input_ui_textbox = gr.components.Textbox(lines=5, label='Write your topic here...')
output_ui_text = gr.components.Textbox(label='Generated Caption')