# Use OpenAI CLiP, LangGraph, & RAG to Generate Competitive Restaurant Insights


## Summary
In this article, we are going to provide you with a set of tools, that will allow you to analyze restaurants in your neighbourhood and use that information to your advantage. Even though it may seem this is only for individuals who may be interested in seeking popular dishes, it also provides valuable information for businesses about their competition as well as insights into their own dining services.

To achieve this, we are going to utilize [CLIP](https://www.activeloop.ai/resources/glossary/open-ai-cli-p/), which is a very capable model as it can project many different kinds of modalities into a common vector space. In case you would be interested in a introduction to this transformer-based model and the other technologies we utilize in our guide, you could also follow our official documentation for [image similarity search](https://docs.activeloop.ai/example-code/tutorials/vector-store/image-similarity-search) and [vector store in langchain](https://docs.activeloop.ai/example-code/tutorials/vector-store/deep-lake-vector-store-in-langchain). Since our data will consist of reviews from Google Maps, we will focus only on textual and image modalities. This is especially powerful in combination with [DeepLake](https://www.activeloop.ai/), which is a multimodal vector database capable of efficient storage of both of these. In particular, we will use it for vector search, in which we will extract image and text reviews that are the most relevant for a particular task.

Additionally, to further enhance our capabilities, we employed LangGraph, a library for building stateful, multi-actor applications with LLMs, built on top of LangChain.

Overall, this will allow you to extract information from publicly available reviews and utilize it for further decision-making. Be it finding an unexplored place or understanding what tastes people around share.

## Steps
1. Selecting Location
2. Scraping the Restaurant Reviews
3. Ingesting the data into DeepLake Vector Store
4. Finding the Best Reviewed Restaurant with Your Favourite Food
5. Question Answering Based on Reviews
6. Categorizing Images to Restaurant Tags
7. Clustering All Images to Find the Most Popular Dishes
8. Summarizing the Findings

In [None]:
!pip install openai langchain deeplake apify-client torch open_clip_torch deeplake langchain_openai langgraph

In [None]:
# Import libraries
from apify_client import ApifyClient
import urllib.request
from langchain.vectorstores import DeepLake
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI
from langchain.llms import OpenAI
from langchain import PromptTemplate
from langchain.embeddings.base import Embeddings
import sys

import torch
import os
import re
from tqdm import tqdm
from collections import defaultdict
from PIL import Image
import pandas as pd
import numpy as np
import base64
from io import BytesIO
from IPython.display import HTML
from sklearn.cluster import KMeans

pd.set_option('display.max_colwidth', None)

activeloop_token='YOUR_ACTIVELOOP_TOKEN'
os.environ['ACTIVELOOP_TOKEN'] = activeloop_token

os.environ['OPENAI_API_KEY'] = '<YOUR_OPENAI_TOKEN>'
os.environ['ACTIVELOOP_TOKEN'] = '<YOUR_ACTIVELOOP_TOKEN>'
os.environ['APIFY_API_TOKEN'] = '<YOUR_APIFY_API_TOKEN>'

os.environ['TAVILY_API_KEY'] = "<YOUR_TAVILY_API_TOKEN>"

### Step 1: Selecting Location
First of all, you need to find the longitude and latitude of your location, from which the data will be scraped. This can be done in many ways, but the most straightforward is to open Google Maps, search for the place, right-click it and copy the coordinates. In our example, we will use the following location: `Crepevine, 300 Castro Street, Mountain View, CA 94041, United States of America`, which gives us `Latitude=37.3926` and `Longitude=-122.0800`. In case you would need to automate this, feel free to use geocoding via [Google Maps API](https://github.com/googlemaps/google-maps-services-python).


### Step 2: Scraping the Restaurant Reviews
[Google Maps API](https://github.com/googlemaps/google-maps-services-python) offers many capabilities including information about places from a given location. They also have a generous free budget of `300$` every month, but it was a great disappointment that it is only able to extract 5 reviews for each restaurant, which is far from being enough for our task. Therefore, we recommend to utilize [Apify](https://apify.com/) actor. We experimented with other scrapers too, but unless you are willing to pay extra for faster scraping, it should be sufficient. You can see the `run_inputs` for more details, but this is the summary of our setup:
- all restaurants in 2 KM radius
- reviews only from 1.1.2022
- no limit for max images/texts/restaurants

Apify provides a budget of `5$` per month and to give you an idea, here are the results of our run:
- total restaurants scraped: 130
- total scraping time: 75 minutes
- total costs: 2.3$

This should be fast enough to scrape restaurants in your city, however, expanding to more locations might be problematic.


In [None]:
# Initialize the ApifyClient with your API token
client = ApifyClient(os.environ["APIFY_API_TOKEN"])

# Prepare the Actor input
run_input = {
  "customGeolocation": {
    "type": "Point",
    "coordinates": [
      "-122.0800081",
      "37.39252210000001"
    ],
    "radiusKm": 2
  },
  "deeperCityScrape": False,
  "includeWebResults": False,
  "language": "en",
  "maxCrawledPlacesPerSearch": 500,
  "maxImages": 200,
  "maxReviews": 200,
  "oneReviewPerRow": False,
  "onlyDataFromSearchPage": False,
  "reviewsSort": "newest",
  "reviewsStartDate": "2022-01-01",
  "scrapeResponseFromOwnerText": False,
  "scrapeReviewId": False,
  "scrapeReviewUrl": False,
  "scrapeReviewerId": False,
  "scrapeReviewerName": False,
  "scrapeReviewerUrl": False,
  "searchStringsArray": [
    "restaurant"
  ]
}

# Run the Actor and wait for it to finish
run = client.actor("compass/crawler-google-places").call(run_input=run_input)

You can now proceed with the returned results or download them from your Apify account in csv format


In [None]:
choose = ''
if choose == 'csv':
  scraped_data = pd.read_csv("items.csv")
  scraped_data = scraped_data.to_dict()
else:
  scraped_data = client.dataset(run['defaultDatasetId']).list_items().items

After we scrape the data, it is necessary to define a function to extract reviews and other parameters. Since the scraper only extracts the URL link, it is useful to save the images during the first run and save them locally. This is useful because during our experiments we encountered problems when some of the images were no longer available and we needed to check each URL separately prior to ingesting the data into DeepLake. Their format was different from other images which caused issues during ingestion. Nevertheless, it is likely that there is a more efficient way to avoid it.


The image requesting is quite slow as well and in our experiments it took around 70 minutes to process the total of 7813 images.

In [None]:
def review_mapping_function(item, save_images=False):
    title = item["title"]
    text_dict = defaultdict(list)
    image_dict = defaultdict(list)
    tag_dict = defaultdict(list)
    image_folder = "images/" + title
    image_folder = image_folder.replace('|', '')
    if (not os.path.exists(image_folder)) and save_images:
        os.makedirs(image_folder)

    for tag in item['reviewsTags']:
        tag_dict['metadata'] += [{'title': title}]
        tag_dict['tags'] += [tag['title']]
    for tag in ['interior', 'menu', 'drink']:
        tag_dict['metadata'] += [{'title': title}]
        tag_dict['tags'] += [tag]

    for idx, r in enumerate(item["reviews"]):
        text = r["textTranslated"]
        # if text was originally in english, textTranslated is None
        if text is None:
            text = r["text"]
        image = r["reviewImageUrls"]

        if text:
            metadata = {
                "title": title,
                "review_id": idx,
                "likes": r['likesCount'],
                "stars": r['stars'],
            }
            text_dict["text"].append(text)
            text_dict["metadata"].append(metadata)

        if image:
            j=0
            for image_url in image:
                image_path = f"{image_folder}/{j}.jpg"
                metadata = {
                    "title": title,
                    "review_id": idx,
                    "likes": r['likesCount'],
                    "stars": r['stars'],
                }
                image_dict["metadata"].append(metadata)
                # change image size
                image_url = re.sub('=w[0-9]+-h[0-9]+-', '=w512-h512-', image_url)
                if save_images:
                    urllib.request.urlretrieve(image_url, image_path)

                if os.path.isfile(image_path):
                    # image was saved, we can just load it with path
                    image_dict["image"].append(image_path)
                    j+=1
                else:
                    # image not saved, need to put url and request it later
                    image_dict["image"].append(image_url)

    return {'text_dict': text_dict, 'image_dict': image_dict, 'tag_dict': tag_dict}

In [None]:
reviews = [review_mapping_function(item) for item in scraped_data]

In [None]:
# extract reviews
reviews = [review_mapping_function(item) for item in client.dataset(run["defaultDatasetId"]).iterate_items()]

# aggregate them into a single dictionary
text_dict_concat = defaultdict(list)
image_dict_concat = defaultdict(list)
tag_dict_concat = defaultdict(list)

for r in reviews:
    for key in r['text_dict'].keys():
        text_dict_concat[key] += r['text_dict'][key]
    for key in r['image_dict'].keys():
        image_dict_concat[key] += r['image_dict'][key]
    for key in r['tag_dict'].keys():
        tag_dict_concat[key] += r['tag_dict'][key]

### 3) Ingesting the data into DeepLake Vector Store
Initially, we experimented with various setups of the vector database. Even though the DeepLake is capable of saving text and image tensors in the same database, it gets quite complex as one review can have none or a single textual message, while images can range from zero up to as many as the reviewers add. If we decide to put all of them in the same database, it might look convenient but we end up duplicating the textual messages. As we did not find a particular use case that would benefit from this and it only resulted in a more complicated similarity search, we created two separate databases. One storing images from reviews and the other textual reviews, each with a different call to our custom embedding function. Additionally, we also introduced a third Deep Lake Vector Store with tags from each restaurant, which will be particularly useful during categorization in step 6.

Also note that before ingesting the data, it is a common practice to split the text into documents. However, as Google map reviews are limited to 4096 characters (around 700 words), it is not necessary.

In [None]:
activeloop_ord_id = 'YOUR_ACTIVELOOP_ORG'

In [None]:
from deeplake import VectorStore

overwrite = False

# Create empty database for texts
reviews_path_texts = f'hub://{activeloop_ord_id}/reviews-texts'
reviews_texts = VectorStore(
    path = reviews_path_texts,
    tensor_params = [
        {'name': 'text', 'htype': 'text'},
        {'name': 'embedding', 'htype': 'embedding'},
        {'name': 'metadata', 'htype': 'json'}
    ],
    overwrite = overwrite
)
# Create empty database for images
reviews_path_images = f'hub://{activeloop_ord_id}/reviews-images'
reviews_images = VectorStore(
    path = reviews_path_images,
    tensor_params = [
        {'name': 'image', 'htype': 'image', 'sample_compression': 'png'},
        {'name': 'embedding', 'htype': 'embedding'},
        {'name': 'metadata', 'htype': 'json'}
    ],
    overwrite = overwrite
)
# Create empty database for tags
reviews_path_tags = f'hub://{activeloop_ord_id}/restaurants-tags'
restaurants_tags = VectorStore(
    path = reviews_path_tags,
    tensor_params = [
        {'name': 'tag', 'htype': 'text'},
        {'name': 'embedding', 'htype': 'embedding'},
        {'name': 'metadata', 'htype': 'json'}
    ],
    overwrite = overwrite
)


Now, let's define the custom OpenCLIP embedding function, which is a wrapper around [CLIP](https://github.com/openai/CLIP). Since the model needs to explicitly set the modality of input, it has two call options: 1) for text embedding and 2) for image embedding. It is quite important to set up CUDA and run the predictions on GPU, as the performance on the CPU is quite slow.

In [None]:
from typing import Any, Dict, List

from langchain.pydantic_v1 import BaseModel, root_validator
from langchain.schema.embeddings import Embeddings
from PIL import Image
import requests
from io import BytesIO


class OpenCLIPEmbeddings(BaseModel, Embeddings):
    model: Any
    preprocess: Any
    tokenizer: Any
    # Select model: https://github.com/mlfoundations/open_clip
    model_name: str = "ViT-H-14"
    checkpoint: str = "laion2b_s32b_b79k"

    @root_validator()
    def validate_environment(cls, values: Dict) -> Dict:
        """Validate that open_clip and torch libraries are installed."""
        try:
            import open_clip

            # Fall back to class defaults if not provided
            model_name = values.get("model_name", cls.__fields__["model_name"].default)
            checkpoint = values.get("checkpoint", cls.__fields__["checkpoint"].default)

            # Load model
            model, _, preprocess = open_clip.create_model_and_transforms(
                model_name=model_name, pretrained=checkpoint
            )
            tokenizer = open_clip.get_tokenizer(model_name)
            values["model"] = model
            values["preprocess"] = preprocess
            values["tokenizer"] = tokenizer

        except ImportError:
            raise ImportError(
                "Please ensure both open_clip and torch libraries are installed. "
                "pip install open_clip_torch torch"
            )
        return values

    def embed_documents(self, texts: List[str]) -> List[List[float]]:
        text_features = []

        self.model.to('cuda')

        tokenized_text = self.tokenizer(texts)
        # for text in texts:
        #     # Tokenize the text
        #     tokenized_text = self.tokenizer(text)

        with torch.no_grad(), torch.cuda.amp.autocast():
            # Encode the text to get the embeddings
            embeddings_tensor = self.model.encode_text(tokenized_text.to('cuda'))

        # Normalize the embeddings
        norm = embeddings_tensor.norm(p=2, dim=1, keepdim=True)
        normalized_embeddings_tensor = embeddings_tensor.div(norm)

        # Convert normalized tensor to list and add to the text_features list
        embeddings_list = normalized_embeddings_tensor.squeeze(0).tolist()
        text_features.append(embeddings_list)

        return text_features

    def embed_query(self, text: str) -> List[float]:
        return self.embed_documents([text])[0]

    def embed_image(self, uri: str) -> List[float]:
        return self.embed_images([uri])[0]

    def embed_images(self, uris: List[str]) -> List[List[float]]:

        try:
            from PIL import Image as _PILImage
        except ImportError:
            raise ImportError("Please install the PIL library: pip install pillow")

        # Open images directly as PIL images
        pil_images = []
        for uri in uris:
          pil_images.append(Image.open(uri))

        self.model.to('cuda')

        # Preprocess the image for the model
        preprocessed_image = [self.preprocess(pil_image) for pil_image in pil_images]

        with torch.no_grad(), torch.cuda.amp.autocast():
            # Encode the image to get the embeddings
            embeddings_tensor = self.model.encode_image(torch.stack(preprocessed_image).to('cuda'))

        # Normalize the embeddings tensor
        norm = embeddings_tensor.norm(p=2, dim=1, keepdim=True)
        normalized_embeddings_tensor = embeddings_tensor.div(norm)

        # Convert tensor to list and add to the image_features list
        embeddings_list = normalized_embeddings_tensor.squeeze(0).tolist()

        return embeddings_list

Here, we proceed to ingest the scraped data into DeepLake. Note that it is important to set `ingestion_batch_size` appropriately for your GPU capacity, to avoid running out of memory during embedding prediction.

In [None]:
# loading the model
clip = OpenCLIPEmbeddings(model_name="ViT-g-14", checkpoint="laion2b_s34b_b88k")

In [None]:
import requests
from PIL import Image
from io import BytesIO

def save_image_from_url(url, restaurant_idx):
    # Download the image
    if not os.path.exists("images"):
        os.mkdir("images")

    response = requests.get(url)
    image = Image.open(BytesIO(response.content))
    path_image = f"images/{restaurant_idx}.png"
    image.save(path_image, 'PNG')
    return path_image

In [None]:
from tqdm import tqdm

all_image_urls=image_dict_concat['image']
path_images = []
for image_idx, image_url in tqdm(enumerate(all_image_urls), total=len(all_image_urls)):
    try:
        path_images.append(save_image_from_url(image_url, image_idx))
    except Exception:
        continue

In [None]:
# texts
reviews_texts.add(
    text = text_dict_concat['text'],
    metadata = text_dict_concat['metadata'],
    embedding_function = clip.embed_documents,
    embedding_data = text_dict_concat['text'],
    embedding_tensor="embedding",
)
# images
reviews_images.add(
    image = path_images,
    metadata = image_dict_concat['metadata'],
    embedding_function = clip.embed_images,
    embedding_data = path_images,
    embedding_tensor="embedding",
)
# tags
restaurants_tags.add(
    tag = tag_dict_concat['tags'],
    metadata = tag_dict_concat['metadata'],
    embedding_function = clip.embed_documents,
    embedding_data = tag_dict_concat['tags'],
    embedding_tensor="embedding",
)

In the case of 9607 textual reviews this took around 30 minutes, in the case of 7813 images this was around 1.5 hours and for tags it was under 5 minutes. This Was given mostly by the long inference time of the OpenCLIP model.

### 4) Finding the Best Reviewed Restaurant with Your Favourite Food
Finally, it's time to get some useful insights into our embedded dataset! We start by finding the 200 most relevant texts and images for search input `burger`.

In [None]:
search = 'burger'

text_search_results = reviews_texts.search(
    embedding_data = [search],
    embedding_function = clip.embed_documents,
    k=200,
)
image_search_results = reviews_images.search(
    embedding_data = [search],
    embedding_function = clip.embed_documents,
    k=200,
)


Another way to get results from deep lake is via TQL. It stands for Total Quality Logistics, which is a third-party logistics provider that connects shippers with carriers to move freight. They offer services such as truckload, LTL (less than truckload), intermodal, and supply chain solutions. TQL is one of the largest freight brokerage firms in the United States.

In [None]:
query_emb = clip.embed_documents([search])

In [None]:
query_emb_str = "ARRAY["+",".join([f"{emb}" for emb in query_emb[0]])+"]"

In [None]:
info_to_scrape = 10
tql_images = f"select image, metadata, id, score from (select *, cosine_similarity(embedding, {query_emb_str}) as score where cosine_similarity(embedding, {query_emb_str}) >0.2 order by metadata['rating'] desc limit {info_to_scrape})"
tql_reviews = f"select metadata, id, score from (select *, cosine_similarity(embedding, {query_emb_str}) as score where cosine_similarity(embedding, {query_emb_str}) >0.2 order by metadata['rating'] desc limit {info_to_scrape})"

In [None]:
image_search_results_tql = reviews_images.search(query=tql_images)
text_search_results_tql = reviews_texts.search(query=tql_reviews)

Now we can aggregate the results

In [None]:
# aggregating the results
results = defaultdict(lambda: defaultdict(list))

for md, img, id in zip(image_search_results['metadata'], image_search_results['image'],image_search_results['id']):
    results[md['title']]['images'].append(img)
    results[md['title']]['image_likes'].append(md['likes'])
    results[md['title']]['image_stars'].append(md['stars'])
    results[md['title']]['image_review_ids'].append(md['review_id'])
    results[md['title']]['image_ids'].append(id)


for md, txt in zip(text_search_results['metadata'], text_search_results['text']):
    results[md['title']]['texts'].append(txt)
    results[md['title']]['text_likes'].append(md['likes'])
    results[md['title']]['text_stars'].append(md['stars'])
    results[md['title']]['text_review_ids'].append(md['review_id'])

Now let's summarize the text reviews. For this, we will use a simple prompt template that extracts a summary of keywords from a list of reviews and also includes an example. Since the reviews are typically short messages, we can only concatenate each set together and do not need to chain the calls with tools that are offered by [LangChain](https://python.langchain.com/docs/get_started/introduction.html). If you're interested in delving deeper into LangChain and exploring its capabilities further, we invite you to explore our comprehensive guide available at this [link](https://www.activeloop.ai/resources/langchain/).

In [None]:
llm = OpenAI(model_name='gpt-3.5-turbo-instruct', temperature=0.5)

In [None]:
prompt_template = """You are provided with a list of {search} reviews. Summarize what customers write about it:

Example:
List of {search} reviews:
Great spicy Burger !\nThe burger is solid and delicious. Just be aware that it's high in calories (1100 calories!).\nVery good food, I would recommend to the burger lovers out there.\nThe burgers here are pretty solid\nthey also have a rotating beer top which has some good variety\nFantastic food\nBest Burgers In Town!!!\nGreat food\nDELICIOUS! BISON BURGER IS THE BEST"
delicious, cheap, good atmosphere, quick service, many options in menu.

Keyword summary of the {search} reviews:
Spicy burger, Solid and delicious, Recommended for burger lovers, Good variety of beers, Fantastic food, Best burgers in town, Bison burger is delicious


List of {search} reviews:
{reviews}

Keyword summary of the {search} reviews:
"""

To put it all together, we are going to loop through all of the 200 texts and 200 images that have the most similar embedding to `burger`, group them by restaurant `title` and define the following:
- `avg_txt_stars` - average stars on text messages for a given restaurant
- `n_texts` - number of text messages for a given restaurant
- `text_summary` - keyword summary based on all of the text messages for a given restaurant
- `avg_img_stars` - average stars on images for a given restaurant
- `n_images` - number of images for a given restaurant
- `img_in_text_perc` - percentage number of images selected along with their original text message (% of images connected with text by the `review_id`)
- `image_{i}` - image in top n most similar images

To make the table more clear, we subset only the top 5 images (sorted by the similarity score) and the top 5 restaurants (sorted by the number of images)

Since we will now use a langchain service we will use the Activeloop Vector Store in the following way

In [None]:
from langchain.vectorstores import DeepLake

reviews_images_lc = DeepLake(
    dataset_path = reviews_path_images,
)

reviews_texts_lc = DeepLake(
    dataset_path = reviews_path_texts,
)

restaurants_tags_lc = DeepLake(
    dataset_path = reviews_path_tags,
)


In [None]:
df_1 = pd.DataFrame(columns=['title','info', 'text_summary'])
top_n = 5 # maximum number of images for each restaurant
n_restaurants = 5
const = 1

i = 0
visualizer_images = []
for title, values in results.items():
    df_1.loc[i, 'title'] = title

    info = {}

    if len(values['texts']) > 0:
        weights = np.add(values['text_likes'], const)
        avg_txt_stars = round(np.average(values['text_stars'], weights=weights), 2)
        info['avg_txt_stars'] = avg_txt_stars
        n_texts = len(values['text_stars'])
        info['n_texts'] = n_texts

        # set the prompt template
        PROMPT = PromptTemplate(
            template=prompt_template,
            input_variables=["reviews"],
            partial_variables={"search": search},
        )
        reviews = "\n".join(values['texts'])
        review_summary = llm(PROMPT.format(reviews=reviews, search=search))
        df_1.loc[i, 'text_summary'] = review_summary

    if len(values['images']) > 0:
        weights = np.add(values['image_likes'], const)
        avg_img_stars = round(np.average(values['image_stars'], weights=weights), 2)
        info['avg_img_stars'] = avg_img_stars
        n_images = len(values['image_stars'])
        info['n_images'] = n_images
        df_1.loc[i, 'n_images'] = n_images
        visualizer_images += values['image_ids']

        images_in_text = sum([i in values['text_review_ids'] for i in values['image_review_ids']])
        img_in_text_perc = round(images_in_text / len(values['image_review_ids']) * 100, 2)
        info['img_in_text_perc'] = img_in_text_perc
        sorted_images = [x for _, x in sorted(zip(values['image_likes'], values['images']), reverse=True, key=lambda x: x[0])]
        for j, img in enumerate(sorted_images):
            if j < top_n:
                df_1.loc[i, f'image_{j+1}'] = Image.fromarray(img).convert('RGB')

    df_1.loc[i, 'info'] = str(info)

    i+=1


For better visualisation, we also define HTML formatters as inspired by [this notebook](https://www.kaggle.com/code/stassl/displaying-inline-images-in-pandas-dataframe) and render the HTML as generated by pandas.

In [None]:
def get_thumbnail(path):
    i = Image.open(path)
    i.thumbnail((150, 150), Image.LANCZOS)
    return i

def image_base64(im):
    if isinstance(im, str):
        im = get_thumbnail(im)
    with BytesIO() as buffer:
        im.save(buffer, 'jpeg')
        return base64.b64encode(buffer.getvalue()).decode()

def image_formatter(im):
    return f'<img src="data:image/jpeg;base64,{image_base64(im)}">'

def bullet_formatter(text):
    text = eval(text)
    l = '<div> <ul style="text-align: left;">'
    for key, value in text.items():
        l += f"\n <li>{key}: {value}</li>"
    l += "\n</ul></div>"
    return l

In [None]:
# sort by n_images
df_1 = df_1.sort_values(by=['n_images'], ascending=False).drop(['n_images'],axis=1)
# render HTML
formatters = [None, bullet_formatter, None] + [image_formatter] * top_n
HTML(df_1[:n_restaurants].to_html(formatters=formatters, escape=False, col_space=[150]*df_1.shape[1]))

The image results as you can see are very accurate, especially when you sort by vector similarity score. Summarizing textual reviews may seem to be sufficient too, but there's plenty of room for prompt engineering. Also, note that `img_in_text_perc` is very low and it was more of an experiment rather than proof that the results make sense. In other words, it is difficult to filter the images and text from the same review based on one embedding (in our example from `burger`).

## Activeloop Visualizer
ActiveLoop Visualizer is a tool provided by [Activeloop](https://www.activeloop.ai/), a company specializing in creating databases for artificial intelligence applications. The Visualizer allows users to interact with and visualize data stored in their databases. It provides a user-friendly interface for exploring and understanding the data, making it easier for researchers and developers to work with large datasets effectively.

For a detailed understanding of visualizer integration and how it enhances your ability to comprehend the relationships between tensors within a dataset, we encourage you to visit our technical documentation at this [link](https://docs.activeloop.ai/technical-details/visualizer-integration).

If your dataset is not public you must also pass the private token in the request as follow:

`iframe_url = f"https://app.activeloop.ai/visualizer/iframe?url={reviews_path_images}&token=<YOUR_ACTIVELOOP_TOKEN>&query=`


In [None]:
def activeloop_visualizer(result:dict=None, list_images_id:list[str] = None):
  iframe_html = '<iframe src={url} width="570px" height="400px"/iframe>'

  if result is not None:
    images_id = [el for el in result['id']]
  elif list_images_id is not None:
    images_id = list_images_id
  else:
    raise Exception("specify the images ids")

  images_id = str(images_id).strip("[]")
  query = f"select image where id in ({images_id})"
  if activeloop_token is not None:
    iframe_url = f"https://app.activeloop.ai/visualizer/iframe?url={reviews_path_images}&token={activeloop_token}&query="
  else:
    iframe_url = f"https://app.activeloop.ai/visualizer/iframe?url={reviews_path_images}&query="

  urls = [iframe_url + urllib.parse.quote(query)]
  html = iframe_html.format(url=urls[0])
  return HTML(html)

In [None]:
activeloop_visualizer(list_images_id=visualizer_images)

### 5) Question Answering Based on Reviews
Of course, there are many use cases for [LangChain](https://python.langchain.com/docs/get_started/introduction.html) as well. In particular, you could utilize the text reviews as a custom document to answer any question. Note that as we embedded the tensors by OpenCLIP, we also need to set this function in the `retriever`. Also, as answering questions from the whole data does not make much sense in this case, we selected a single restaurant via the `filter` option in `search_kwargs`.

In [None]:
retriever = reviews_texts_lc.as_retriever(
    search_type = "similarity",
    search_kwargs = {
        "k": 20,
        "embedding_function": clip.embed_documents,
        "filter": {'metadata': {'title': 'Taqueria La Espuela'}}
    }
)

To see what reviews based on similarity search are provided to the LLM model to answer your question, we can inspect the `relevant_documents` as seen in the 5 examples below.

In [None]:
query = 'What customers like about the restaurant?'
relevant_docs = retriever.get_relevant_documents([query])
relevant_docs[0:5]

Now, let's define the QA retrieval and run your questions. But again, we stress the importance of further improvements of the prompt templates as they have significant effect on the results.

In [None]:
qa = RetrievalQA.from_llm(llm, retriever=retriever)
qa.run([query])

In [None]:
query = 'What would customers improve about this restaurant?'
qa.run([query])

### 6) Categorizing Images to Restaurant Tags
Typically, if your task is to categorize images, you need to train a model on a labelled set, which then limits the capabilities as it can predict only classes included in the training data. Here, however, we try to achieve similar results without training or finetuning any categorization model at all. Of course, the model isn't perfect, but the results seem to be pretty cool considering that we did not perform any finetuning on restaurant data and it is just the original OpenCLIP.

Again, we filter only a single restaurant to make the predictions more clear.

In [None]:
tensors = reviews_images_lc.vectorstore.search(
    return_tensors = ['image','embedding','id'],
    filter = {'metadata': {'title':'Taqueria La Espuela'}},
)

Finally, we are going to utilize the third Deep Lake Vector Store which stores the restaurant tags along with their embeddings. The categorization is pretty straightforward as we are searching for the closest `tag` embedding for each of our images from the selected restaurant. After that, we sort them by similarity scores and display the top 10 images for each category.

Notice that for practical reasons we also included additional tags `interior`, `menu` and `drink` for each restaurant as these were quite frequent images not included in the tags.

In [None]:
df_2 = pd.DataFrame()
i_dict = defaultdict(lambda: 1)
n_images = 200
max_cols = 10
categories = []
scores = []

for embedding in tensors['embedding']:
    closest = restaurants_tags_lc.vectorstore.search(
        embedding = embedding,
        k = 1,
        filter = {'metadata': {'title':'Taqueria La Espuela'}},
    )
    categories += [closest['tag'][0]]
    scores += [closest['score']]

sorted_images = [x for _, x in sorted(zip(scores, tensors['image']), reverse=True)]
sorted_categories = [x for _, x in sorted(zip(scores, categories), reverse=True)]

n = 0
for category, img in zip(sorted_categories, sorted_images):
    if n < n_images:
        i = i_dict[category]
        df_2.loc[category, f'image_{i}'] = Image.fromarray(img).convert('RGB')
        i_dict[category]+=1
        n+=1

Again, rendering the formatted HTML.

In [None]:
formatters = [image_formatter] * min(max_cols, df_2.shape[1])
HTML(df_2.iloc[:,:max_cols].to_html(formatters=formatters, escape=False))

We can visualize the images in our Activeloop dataset

In [None]:
activeloop_visualizer(result=tensors)

### 7) Clustering All Images to Find the Most Popular Dishes

What if we want to group all of the images based on their similarity without any particular label to find the most popular meals? Of course, we can do that too! At the time of writing this article, DeepLake, unfortunately, does not support computing the cluster groups and extracting them. Anyways, it is currently on a road map and meanwhile, you can visualise them in the DeepLake UI that computes them on the fly or follow this guide that extracts the embeddings from Deep Lake Vector Store and calculates the clusters locally.

We start by taking out 5000 images whose embedding is similar to `food`. This process is quite time-consuming since we are also extracting the images with metadata information.

In [None]:
tensors = reviews_images_lc.vectorstore.search(
    return_tensors = ['metadata','image','embedding', 'id'],
    embedding_data = ['food'],
    embedding_function = clip.embed_documents,
    k = 5000,
)

Then simply run the K-means clustering algorithm from `sklearn`. The number of clusters here is arbitrary as clustering is an unsupervised algorithm so you can play with other parameters too.

In [None]:
n_clusters = 5
kmeans_model = KMeans(n_clusters = n_clusters)
clusters = kmeans_model.fit_predict(tensors['embedding']).tolist()

We then create a simple data frame that aggregates information about clusters, stars and likes for each image and then select the top 10 images (sorted by similarity) from the top 5 clusters (sorted by average number of likes).

In [None]:
agg = pd.DataFrame()
df_3 = pd.DataFrame(columns=['cluster', 'avg_likes', 'n_images'])
max_cols = 10
max_rows = 5
cluster_visualizer_images = []
n = 0
for cluster, img, md, image_ids in zip(clusters, tensors['image'], tensors['metadata'], tensors['id']):
    agg.loc[n, 'cluster'] = cluster
    agg.loc[n, 'stars'] = md['stars']
    agg.loc[n, 'likes'] = md['likes']
    agg.loc[n, 'image'] = Image.fromarray(img).convert('RGB')
    agg.loc[n, 'image_ids'] = image_ids
    n += 1


agg = agg.sort_values(by=['likes'], ascending=False)
agg = agg.groupby('cluster').agg({'likes':['mean','count'], 'image': list,'image_ids': list})
agg = agg.sort_values(by=[('likes', 'mean')], ascending=False)

r = 1
for index, row in agg.iterrows():
    if r <= max_rows:
        df_3.loc[r, 'cluster'] = int(index)
        df_3.loc[r, 'avg_likes'] = round(row['likes']['mean'], 2)
        df_3.loc[r, 'n_images'] = row['likes']['count']
        cluster_visualizer_images.append(row['image_ids'])

        c=1
        for img in row['image']['list']:
            if c <= max_cols:
                df_3.loc[r, f'image_{c}'] = img
                c+=1
        r+=1

*And* rendering it.

In [None]:
formatters = [None, None, None] + [image_formatter] * max_cols
HTML(df_3.to_html(formatters=formatters, escape=False))

As you can see, the food in each cluster is indeed quite similar. However, average likes might not be the appropriate metric to sort the most popular food as the number of likes on each image is typically low. If the cluster consists of many images, it is then more likely that we will not see it on top of this list.

It is possible to view the images of the different clusters directly on Activeloop by specifying the number of the cluster that interests us


In [None]:
cluster_number = 0
activeloop_visualizer(list_images_id=cluster_visualizer_images[cluster_number][0])

In [None]:
cluster_number = 1
activeloop_visualizer(list_images_id=cluster_visualizer_images[cluster_number][0])

<h2>Introduction to LangGraph</h2>
LangGraph extends the LangChain Expression Language with the ability to coordinate multiple chains, or actors, across multiple steps of computation in a cyclic manner, inspired by Pregel and Apache Beam. The main use is for adding cycles to your LLM application, crucial for agent-like behaviors, where you call an LLM in a loop, asking it what action to take next. Specifically, we utilized Agent Supervisor within LangGraph, acting as a coordinator to delegate tasks between independent agents, orchestrating interactions and workflows efficiently.

![Agent Supervisor](https://raw.githubusercontent.com/langchain-ai/langgraph/35188d9ed51ebbb0e2527f16068f2df50b62bd34/examples/multi_agent/img/supervisor-diagram.png)

Below, we will create an agent group, with an agent supervisor to help delegate tasks and to simplify the code in each agent node, we will use the AgentExecutor class from LangChain.

##Create tools

For this example, you will create an agent to search the web with a search engine and an agent to retrieve the review from Activeloop datasets. Define the tools they'll use below:

In [None]:
from typing import Annotated, List, Tuple, Union
from langchain_core.tools import tool
from langchain.agents import AgentExecutor, create_openai_tools_agent
from langchain_core.messages import BaseMessage, HumanMessage
from langchain_openai import ChatOpenAI

def create_agent(llm: ChatOpenAI, tools: list, system_prompt: str):
    # Each worker node will be given a name and some tools.
    prompt = ChatPromptTemplate.from_messages(
        [
            (
                "system",
                system_prompt,
            ),
            MessagesPlaceholder(variable_name="messages"),
            MessagesPlaceholder(variable_name="agent_scratchpad"),
        ]
    )
    agent = create_openai_tools_agent(llm, tools, prompt)
    executor = AgentExecutor(agent=agent, tools=tools)
    return executor


In [None]:
def agent_node(state, agent, name):
    result = agent.invoke(state)
    return {"messages": [HumanMessage(content=result["output"], name=name)]}

##Create Agent Supervisor
It will use function calling to choose the next worker node OR finish processing.

In [None]:
from langchain.output_parsers.openai_functions import JsonOutputFunctionsParser
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

#members = ["Researcher", "QAAgent"]
members = ["Researcher", "Reviewer"]
system_prompt = (
    "You are a supervisor tasked with managing a conversation between the"
    " following workers:  {members}. Given the following user request,"
    " respond with the worker to act next. Each worker will perform a"
    " task and respond with their results and status. When finished,"
    " respond with FINISH."
)
# Our team supervisor is an LLM node. It just picks the next agent to process
# and decides when the work is completed
options = ["FINISH"] + members
# Using openai function calling can make output parsing easier for us
function_def = {
    "name": "route",
    "description": "Select the next role.",
    "parameters": {
        "title": "routeSchema",
        "type": "object",
        "properties": {
            "next": {
                "title": "Next",
                "anyOf": [
                    {"enum": options},
                ],
            }
        },
        "required": ["next"],
    },
}
prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        MessagesPlaceholder(variable_name="messages"),
        (
            "system",
            "Given the conversation above, who should act next?"
            " Or should we FINISH? Select one of: {options}",
        ),
    ]
).partial(options=str(options), members=", ".join(members))

llm = ChatOpenAI(model="gpt-4-1106-preview")

supervisor_chain = (
    prompt
    | llm.bind_functions(functions=[function_def], function_call="route")
    | JsonOutputFunctionsParser()
)

##Define the tools functions

In [None]:
@tool
def FindBestReviewFromQuery(search:str):
  """
  Find the best review and return them
  """
  text_search_results = reviews_texts_lc.vectorstore.search(
      embedding_data = [search],
      embedding_function = clip.embed_documents,
      k=200,
  )
  return text_search_results

@tool
def QAToolFunction(query):
  """
  Define the question answering model
  """
  retriever = reviews_texts_lc.as_retriever(
    search_type = "similarity",
    search_kwargs = {
        "k": 20,
        "embedding_function": clip.embed_documents,
        "filter": {'metadata': {'title': 'Taqueria La Espuela'}}
    }
  )
  qa = RetrievalQA.from_llm(llm, retriever=retriever)

  return qa.run([query])

##Construct Graph
We're ready to start building the graph. Below, define the state and worker nodes using the function we just defined.

In [None]:
import operator
from typing import Annotated, Any, Dict, List, Optional, Sequence, TypedDict
import functools

from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langgraph.graph import StateGraph, END
from langchain_community.tools.tavily_search import TavilySearchResults

# The agent state is the input to each node in the graph
class AgentState(TypedDict):
    # The annotation tells the graph that new messages will always
    # be added to the current states
    messages: Annotated[Sequence[BaseMessage], operator.add]
    # The 'next' field indicates where to route to next
    next: str

tavily_tool = TavilySearchResults(max_results=5)
research_agent = create_agent(llm, [tavily_tool], "You are a web researcher.")
research_node = functools.partial(agent_node, agent=research_agent, name="Researcher")

reviewer_agent = create_agent(llm, [FindBestReviewFromQuery], "You are a Review Retriever and take these from the Activeloop dataset")
reviewer_node = functools.partial(agent_node, agent=reviewer_agent, name="Reviewer")

#qa_agent = create_agent(llm,[QAToolFunction], "You can answer questions the user asks you")
#qa_node = functools.partial(agent_node, agent=qa_agent, name="QAAgent")

workflow = StateGraph(AgentState)
workflow.add_node("Researcher", research_node)
workflow.add_node("Reviewer", reviewer_node)
workflow.add_node("supervisor", supervisor_chain)

Now connect all the edges in the graph.

In [None]:
for member in members:
    # We want our workers to ALWAYS "report back" to the supervisor when done
    workflow.add_edge(member, "supervisor")
# The supervisor populates the "next" field in the graph state
# which routes to a node or finishes
conditional_map = {k: k for k in members}
conditional_map["FINISH"] = END
workflow.add_conditional_edges("supervisor", lambda x: x["next"], conditional_map)
# Finally, add entrypoint
workflow.set_entry_point("supervisor")

graph = workflow.compile()

##Invoke the team
With the graph created, we can now invoke it and see how it performs!

Invoke the Reviewer Agent

In [None]:
for s in graph.stream(
    {
        "messages": [
            HumanMessage(content="Retrieve restaurant reviews by the word 'burger'")
        ]
    }
):
    if "__end__" not in s:
        print(s)
        print("----")

Invoke the Researcher Agent

In [None]:
for s in graph.stream(
    {
        "messages": [
            HumanMessage(content="Find the best restaurant in Rome.")
        ]
    }
):
    if "__end__" not in s:
        print(s)
        print("----")

# Summarizing the Findings
To conclude what is and is not possible in the context of restaurant insights, the OpenCLIP embeddings are surprisingly accurate in not just recognizing food in general, but also the particular dish. In combination with DeepLake, it then provides valuable insights into the restaurant reviews and can help you better imagine what people enjoy eating in your neighbourhood. This can be especially helpful if the dining options are rich and it's difficult to check restaurants one-by-one. The biggest weakness, however, is the data preparation process which is highly time-consuming. To scrape, process and ingest data from 130 restaurants, the total runtime was around 4.5 hours, nevertheless, there are still ways to make this faster and more efficient.

Overall, we see that the OpenCLIP embeddings are very powerful and can be very useful for LangChain as well, even though there is currently no integration. The highest potential we see in the unsupervised categorization and image search by text without any context, which as you could see worked pretty well and is far from being limited only to restaurant data.

Additionally, LangGraph played a significant role in enhancing our capabilities. It provided a framework for building stateful, multi-actor applications with LLMs, allowing us to coordinate multiple chains across multiple steps of computation in a cyclic manner. This facilitated efficient task distribution and coordination, particularly through the use of Agent Supervisor, which delegated tasks between independent agents within the system, orchestrating interactions and workflows effectively.

We hope that you find this article interesting and useful for your future projects and hopefully see you next time.
Have a good day!

##FAQs:

<h2 id="faq">What is CLIP Model in AI?</h2>
CLIP is a neural network developed by OpenAI that connects text and images efficiently by learning visual concepts from natural language supervision. It utilizes a simple pre-training task where the model predicts which text snippet is paired with an image from a set of 32,768 options. This approach allows CLIP to recognize a wide range of visual concepts in images, enabling it to be applied to various visual classification tasks without the need for extensive labeled datasets. Unlike traditional deep learning models that rely on costly manually labeled datasets, CLIP learns from text-image pairs available on the internet, reducing the dependency on expensive data collection processes.

<h2 id="faq">What is LangGraph in LangChain?</h2>
LangGraph in LangChain is a library designed for building stateful, multi-actor applications with LLMs (Large Language Models). It is intended to be used with LangChain and extends the LangChain Expression Language to coordinate multiple chains or actors across multiple steps. LangGraph allows for adding cycles to LLM applications, particularly useful for agent-like behaviors where an LLM is called in a loop to determine the next action. LangGraph's main purpose is to enhance LLM applications with cycles, unlike LangChain, which is optimized for Directed Acyclic Graph (DAG) workflows.

<h2 id="faq">What is the difference between LangGraph and LangChain?</h2>
LangGraph and LangChain are related components within the LangChain framework. LangGraph is a library designed for building stateful, multi-actor applications with LLMs, focusing on coordinating multiple chains or actors across various steps. It extends the LangChain Expression Language to enable cycles in LLM applications, particularly useful for agent-like behaviors where actions are determined through iterative interactions.
On the other hand, LangChain is a broader framework for developing applications powered by language models. It offers composable tools, off-the-shelf chains, and integrations for working with language models. LangChain emphasizes context-awareness and reasoning capabilities, connecting language models to contextual sources and enabling them to reason based on provided context. In essence, LangGraph is more specialized in facilitating the creation of complex application structures involving multiple actors, while LangChain provides a comprehensive framework for developing diverse applications powered by language models.

<h2 id="faq">Is LangGraph Free?</h2>
LangGraph is an open-source library, which means it is freely available for anyone to use. You can access and use LangGraph without any cost, subject to the terms of its open-source license. However, it's essential to review the specific licensing terms associated with LangGraph to ensure compliance with its usage requirements.