# Recommend with LLMs

LLMs are based on the powerful transformer architecture which
enable parallel learning of contextual features that can be
fine-tuned for a number of downstream tasks.

This notebook presents a quick tour of how LLMs can be leveraged for recommendation & related tasks

## The Recommendation System Setup

<img src="../assets/recsys_flow.png">

## Query Understanding

We are used to interact with apps/website in a very structured and limited way:
- Keyword Search
- Clicks
- Scrolling
- Feedback/Ratings

LLMs (and similar models) provide an extremely powerful and natural way of interacting through Natural Language

### Keyword Search
```python
> Chole Bhature
```

### Free form Interaction
```python
> I want to eat spicy chole bhature from not so expensive place within next 30mins
```

In [None]:
from IPython.display import display, Markdown, Latex
import json

In [None]:
import os
import openai

In [None]:
API_KEY = ""
os.environ['OPENAI_API_KEY'] = API_KEY
openai.organization = ""
openai.ai_key = os.environ['OPENAI_API_KEY']

In [None]:
def get_completion_message(messages,
    model="gpt-3.5-turbo",
    temperature = 0,
    max_tokens = 500):
    response = openai.ChatCompletion.create(
      model=model,
      messages=messages,
      temperature=temperature,
      max_tokens=max_tokens
  )
    return response.choices[0].message["content"]


In [None]:
# add code snippet to understand what the user wants
system_message_1 = """
You are a customer service agent for an Online Food Delivery company
taking requests for ordering food from restaurants.
Always respond in a friendly tone with very concise answers.
Focus on available information and apologise when not sure.
"""

system_message_2 = """
Your task is to understand if the user is ordering from a specific restaurant
along with dishes/items/beverages and if there are any cuisine, price/budget or
delivery time preferences. Present this information in JSON format with
restaurant_name, item_list, cuisine_preference, budget_preference, delivery_time_preference
as keys. Use "NOT_AVAILABLE" if no value is identified for a key.
"""

In [None]:
user_message_1 = """
Can you help me with chole bhature from Anand like restaurants along with some gulab jamuns and lassi.
I do not want to order from a very expensive restaurant and the food should reach in like 20mins
"""

In [None]:
# prepare prompt/setup
messages = [
    {
        'role':'system',
        'content':system_message_1,
    },
    {
        'role':'user',
        'content':user_message_1,
    },
    {
        'role':'system',
        'content':system_message_2,
    },
]

# get response
agent_response = get_completion_message(messages)
print(agent_response)

```python
# Sample response
{
  "restaurant_name": "Anand",
  "item_list": ["chole bhature", "gulab jamuns", "lassi"],
  "cuisine_preference": "NOT_AVAILABLE",
  "budget_preference": "affordable",
  "delivery_time_preference": "20 minutes"
}
```

## Candidate Generation

We traditionally make use of algorithms such as Approximate Nearest Neighbor Search,  etc. to identify ``top-N`` candidates for a given user query.

LLMs with their inherent capability to learn contextual representation, enable us to use this capability for getting highly contextual dense representation of our key entities for a better nearest neighbor search.

<img src="../assets/vectors-3.svg">

> Source: https://openai.com/blog/introducing-text-and-code-embeddings

In [None]:
# this is a very simple setup
# more sophistication is required to handle missing values and other edge-cases
def preformat_user_request_text(request_json):
    return f"""
    I would like to order {','.join([item for item in request_json['item_list']])}
    from restaurants similar to {request_json['restaurant_name']}. I prefer having
    food from {request_json['cuisine_preference']} cuisine,
    my budget preference is {request_json['budget_preference']} and delivery preference
    is {request_json['delivery_time_preference']}
    """

In [None]:
# Code snippet to get embeddings
def get_embedding(text, model="text-embedding-ada-002"):
    text = text.replace("\n", " ")
    return openai.Embedding.create(input = [text], model=model)['data'][0]['embedding']

In [None]:
request_message = preformat_user_request_text(agent_response)
request_embd = get_embedding(request_message)

In [None]:
print(len(request_embd))
# 1536

```python
# sample output
[0.0032782373018562794,
 -0.014471372589468956,
 -0.01778426393866539,
 0.02499222755432129,
 -0.01202482357621193,
 -0.005159931723028421,
 -0.0220120120793581,
 -0.017714956775307655,
 -0.012482251971960068,
 -0.016897128894925117,
 .....,
 -0.024021925404667854,
 0.04363590106368065,
 -0.030467506498098373,
 -0.00507676275447011,
 -0.0007836061413399875]
```

## Scoring and Ranking

Typically business logic and different scoring/ranking methods are applied on top of candidate list to generate the final output.

LLMs can be easily leveraged through prompts and other methods to perform this step as well

In [None]:
dummy_vendor_list = [
    {
        'restaurant_name': 'Spice Palace',
        'cuisine_preference': 'indian',
        'budget_preference': 'affordable',
        'item_list': ['Butter Chicken', 'Tandoori Chicken', 'Palak Paneer', 'Naan', 'Biryani', 'Samosa', 'Chana Masala', 'Aloo Gobi', 'Mango Lassi', 'Gulab Jamun']
    },
    {
        'restaurant_name': 'Curry House',
        'cuisine_preference': 'indian',
        'budget_preference': 'budget',
        'item_list': ['Chole Bhature', 'Vegetable Korma', 'Gulab Jamun', 'Garlic Naan', 'Lassi', 'Onion Bhaji', 'Dal Makhani', 'Aloo Tikki', 'Mango Chutney', 'Raita']
    },
    {
        'restaurant_name': 'Taj Mahal',
        'cuisine_preference': 'indian',
        'budget_preference': 'expensive',
        'item_list': ['Lamb Rogan Josh', 'Chicken Biryani', 'Paneer Tikka', 'Garlic Naan', 'Prawn Curry', 'Onion Pakora', 'Chana Masala', 'Aloo Paratha', 'Mango Lassi']
    },
    {
        'restaurant_name': 'Saffron',
        'cuisine_preference': 'indian',
        'budget_preference': 'expensive',
        'item_list': ['Chicken Vindaloo', 'Saag Aloo', 'Paneer Makhani', 'Garlic Naan', 'Jeera Rice', 'Vegetable Samosa', 'Chana Masala', 'Aloo Gobi', 'Mango Lassi', 'Rasmalai']
    },
    {
        'restaurant_name': 'Masala Zone',
        'cuisine_preference': 'indian',
        'budget_preference': 'budget',
        'item_list': ['Chicken Tikka', 'Lamb Biryani', 'Paneer Butter Masala', 'Garlic Naan', 'Prawn Masala', 'Onion Bhaji', 'Chana Masala', 'Aloo Tikki', 'Mango Lassi', 'Barfi']
    },
    {
        'restaurant_name': 'Wok This Way',
        'cuisine_preference': 'chinese',
        'budget_preference': 'affordable',
        'item_list': ['Kung Pao Chicken', 'Moo Shu Pork', 'Beef and Broccoli', 'Hot and Sour Soup', 'Egg Rolls', 'Fried Rice', 'Lo Mein', 'General Tso\'s Chicken', 'Sesame Chicken', 'Crab Rangoon']
    },
    {
        'restaurant_name': 'Golden Dragon',
        'cuisine_preference': 'chinese',
        'budget_preference': 'budget',
        'item_list': ['Egg Drop Soup', 'Hot and Sour Soup', 'Wonton Soup', 'Fried Rice', 'Lo Mein', 'General Tso\'s Chicken', 'Sesame Chicken', 'Beef and Broccoli', 'Sweet and Sour Pork', 'Shrimp with Lobster Sauce']
    },
    {
        'restaurant_name': 'Panda Express',
        'cuisine_preference': 'chinese',
        'budget_preference': 'budget',
        'item_list': ['Orange Chicken', 'Kung Pao Chicken', 'Beijing Beef', 'Chow Mein', 'Fried Rice', 'Egg Rolls', 'Crab Rangoon', 'Teriyaki Chicken', 'Honey Walnut Shrimp', 'Broccoli Beef']
    }
]

In [None]:
def get_vendor_details(vendor_list):
    vendor_str = ""
    for v in vendor_list:
        vendor_str += f"""{v['restaurant_name']}\
        (item_list:[{', '.join([i for i in v['item_list']])}],\
        budget_preference:{v['budget_preference']},\
        cuisine_preference:{v['cuisine_preference']})"""
    return vendor_str

In [None]:
user_pref = get_vendor_details([json.loads(agent_response)])
system_message_3 = f"""
Recommended top two restaurants based on user preferences as mentioned in triple quotes ```{user_pref}```.\
Format the output as a list and share reason for each recommendation. Highlight restaurant name in bold.
Make sure the recommended restaurants are present in the following list of restaurants\n
Available restaurants:
{get_vendor_details(dummy_vendor_list)}\n
"""

In [None]:
# extend message list
messages.append({
    'role':'system',
    'content': ''
})

messages.append({
    'role':'assistant',
    'content': system_message_3
})

In [None]:
agent_response = get_completion_message(messages)
display(Markdown(agent_response))

> Sample Response

Based on your preferences, I recommend the following restaurants:

1. **Spice Palace**: They offer a variety of Indian dishes including Butter Chicken, Tandoori Chicken, Palak Paneer, Naan, Biryani, Samosa, Chana Masala, Aloo Gobi, Mango Lassi, and Gulab Jamun. They have affordable prices and their cuisine is Indian.

2. **Curry House**: They have Chole Bhature, Vegetable Korma, Gulab Jamun, Garlic Naan, Lassi, Onion Bhaji, Dal Makhani, Aloo Tikki, Mango Chutney, and Raita. They are a budget-friendly Indian restaurant.

Both of these restaurants should be able to deliver your food within 20 minutes.

## LangChain & Friends

- Frameworks such as ``LangChain`` enable us to integrate LLMs with our existing recommendation systems/architecture while super charging them with amazing capabilities
- `Streamlit` is an amazing way of preparing GUI based prototypes
- **Vector Databases** such as ``milvus``, ``pinecone`` and likes help with handling embeddings

<img src="../assets/streamlit.gif">

## End-to-End Architectures

While we can employ LLMs for each of the steps we discussed,
recent research enables us to leverage such architectures for an
end-to-end setup which provides capabilities to handle multiple
recommendation tasks with a single model.

Models such as **P5** or _Pretrain, Personalized Prompt, and Predict Paradigm_ for recommendation leverage the
LLM architecture by formulating recommendation tasks using natural language. P5 learns different tasks with the same language modeling objective during pretraining. Thus, it serves as the foundation model for downstream recommendation tasks, allows easy integration with other modalities, and enables instruction-based recommendation based on prompts.

<img src="../assets/p5.png">

> Source: [P5, Geng et. al.](https://arxiv.org/pdf/2203.13366.pdf)

## Challenges

LLMs are like eager execution systems without much thinking capabilities (or do they?). This poses a number of challenges, some of which we highlight below:
- Hallucination
- High Training and Inference Cost
- Large Training Datasets
- Adversarial Prompt Attacks