# Anime recommender

This notebook can be used to import MyAnimeList data into the Weaviate, a vector database that can perform semantic searches and Retrieval-augmented generation (RAG).

In this notebook, we will cover:
- Setting up the database
    - Running Weaviate with Docker Compose
    - The anime dataset
    - Populate the database
    - Explore the dataset
- Keyword search
- Vector similarity search
- Content-Based Recommendation System
- Image search
- Reranker
- Retrieval-Augmented Generation (RAG)

## Setting up the database
### Running Weaviate with Docker Compose

You can use the following docker-compose.yml file to run a local instance of Weaviate:

```yaml
services:
  weaviate:
    command:
    - --host
    - 0.0.0.0
    - --port
    - '8080'
    - --scheme
    - http
    image: cr.weaviate.io/semitechnologies/weaviate:1.28.2
    ports:
    - 8080:8080
    - 50051:50051
    volumes:
    - weaviate_data:/var/lib/weaviate
    restart: on-failure:0
    environment:
      CLIP_INFERENCE_API: 'http://multi2vec-clip:8080'
      QUERY_DEFAULTS_LIMIT: 25
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
      PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
      ENABLE_MODULES: 'multi2vec-clip'
      ENABLE_API_BASED_MODULES: 'true'
      CLUSTER_HOSTNAME: 'node1'
  multi2vec-clip:
    image: cr.weaviate.io/semitechnologies/multi2vec-clip:sentence-transformers-clip-ViT-B-32-multilingual-v1
    environment:
      ENABLE_CUDA: '0'
volumes:
  weaviate_data:
```

### The anime dataset

In [78]:
import pandas as pd

file_path = 'dataset/anime-subset.csv'

pd.set_option('display.max_columns', 50)

anime_df=pd.read_csv('dataset/anime-subset.csv', engine='python', on_bad_lines='skip')
print("Shape of the Dataset:",anime_df.shape)
anime_df.head(3)

Shape of the Dataset: (293, 24)


Unnamed: 0,anime_id,Name,English name,Other name,Score,Genres,Synopsis,Type,Episodes,Aired,Premiered,Status,Producers,Licensors,Studios,Source,Duration,Rating,Rank,Popularity,Favorites,Scored By,Members,Image URL
0,1,Cowboy Bebop,Cowboy Bebop,カウボーイビバップ,8.75,"Action, Award Winning, Sci-Fi","Crime is timeless. By the year 2071, humanity ...",TV,26.0,"Apr 3, 1998 to Apr 24, 1999",spring 1998,Finished Airing,Bandai Visual,"Funimation, Bandai Entertainment",Sunrise,Original,24 min per ep,R - 17+ (violence & profanity),41.0,43,78525,914193.0,1771505,https://cdn.myanimelist.net/images/anime/4/196...
1,5,Cowboy Bebop: Tengoku no Tobira,Cowboy Bebop: The Movie,カウボーイビバップ 天国の扉,8.38,"Action, Sci-Fi","Another day, another bounty—such is the life o...",Movie,1.0,"Sep 1, 2001",UNKNOWN,Finished Airing,"Sunrise, Bandai Visual",Sony Pictures Entertainment,Bones,Original,1 hr 55 min,R - 17+ (violence & profanity),189.0,602,1448,206248.0,360978,https://cdn.myanimelist.net/images/anime/1439/...
2,6,Trigun,Trigun,トライガン,8.22,"Action, Adventure, Sci-Fi","Vash the Stampede is the man with a $$60,000,0...",TV,26.0,"Apr 1, 1998 to Sep 30, 1998",spring 1998,Finished Airing,Victor Entertainment,"Funimation, Geneon Entertainment USA",Madhouse,Manga,24 min per ep,PG-13 - Teens 13 or older,328.0,246,15035,356739.0,727252,https://cdn.myanimelist.net/images/anime/7/203...


### Populate the database

In [87]:
import weaviate
import os

headers = {
    "X-OpenAI-Api-Key": os.environ["OPENAI_APIKEY"],
    "X-Cohere-Api-Key": os.environ["COHERE_APIKEY"],
} 

client = weaviate.connect_to_local(headers=headers)

# Be careful when clearing the whole database, this means that you will vectorize the data again
client.collections.delete_all()

client.close()

Some items may not be imported correctly because they are missing properties or the URL of the poster might be out of date.  
We will not be focusing on that right now because to make our life a bit easier. 

In [88]:
import requests
from tqdm import tqdm
from weaviate.classes.config import Configure, DataType, Multi2VecField, Property
from weaviate.util import generate_uuid5

client = weaviate.connect_to_local(headers=headers)

anime_collection = client.collections.create(
    name="Anime",
    properties=[
        Property(name="anime_id", data_type=DataType.INT),
        Property(name="name", data_type=DataType.TEXT),
        Property(name="english_name", data_type=DataType.TEXT),
        Property(name="other_name", data_type=DataType.TEXT),
        Property(name="score", data_type=DataType.NUMBER),
        Property(name="genres", data_type=DataType.TEXT),
        Property(name="synopsis", data_type=DataType.TEXT),
        Property(name="episodes", data_type=DataType.NUMBER),
        Property(name="poster", data_type=DataType.BLOB),
    ],
    vectorizer_config=[
        Configure.NamedVectors.text2vec_openai(
            name="name", source_properties=["name"]
        ),
        Configure.NamedVectors.text2vec_openai(
            name="english_name", source_properties=["english_name"]
        ),
        Configure.NamedVectors.text2vec_openai(
            name="other_name", source_properties=["other_name"]
        ),
        Configure.NamedVectors.multi2vec_clip(
            name="poster_synopsis",
            image_fields=[
                Multi2VecField(name="poster", weight=0.5)
            ],
            text_fields=[
                Multi2VecField(name="synopsis", weight=0.5)
            ],
        ),
    ],
    generative_config=Configure.Generative.openai(),
    reranker_config=Configure.Reranker.cohere()
)

with anime_collection.batch.dynamic() as batch:
    for i, anime in tqdm(anime_df.iterrows()):
        response = requests.get(anime["Image URL"])
        poster_base64 = base64.b64encode(response.content).decode('utf-8')
        anime_obj = {
            "anime_id": anime["anime_id"],
            "name": anime["Name"],
            "english_name": anime["English name"],
            "other_name": anime["Other name"],
            "score": anime["Score"],
            "genres": anime["Genres"],
            "synopsis": anime["Synopsis"],
            "episodes": int(float(anime["Episodes"])) if anime["Episodes"] != "UNKNOWN" else -1,
            "poster": poster_base64
        }
        batch.add_object(
            properties=anime_obj,
            uuid=generate_uuid5(anime["anime_id"])
        )

if len(anime_collection.batch.failed_objects) > 0:
    print(f"Failed to import {len(anime_collection.batch.failed_objects)} objects")
    print(f"Failed: {anime_collection.batch.failed_objects[0].message}")

client.close()

293it [01:42,  2.86it/s]


Failed to import 1 objects
Failed: fail with status 500: cannot identify image file <_io.BytesIO object at 0xffff05b5e020>


### Explore the collection

Let's get some basic information about the collection `Anime` we just created: 

In [89]:
client = weaviate.connect_to_local(headers=headers)

# Get the collection
anime_collection = client.collections.get("Anime")

print(f"Collection has been created: {anime_collection.exists()}")
print(f"Collection length is: {len(anime_collection)}")

client.close()

Collection has been created: True
Collection length is: 292


## Keyword search

In [90]:
client = weaviate.connect_to_local(headers=headers)

anime_collection = client.collections.get("Anime")

response = anime_collection.query.bm25(
    query="pirate", 
    limit=5, 
    return_metadata=wq.MetadataQuery(score=True)
)

for o in response.objects:
    print(f"""
        Name: {o.properties["english_name"]}
        Score: {o.properties["score"]}
        Plot: {o.properties["synopsis"]}
        BM25 score: {o.metadata.score:.3f}
    """)

client.close()


        Name: One Piece
        Score: 8.69
        Plot: Gol D. Roger was known as the "Pirate King," the strongest and most infamous being to have sailed the Grand Line. The capture and execution of Roger by the World Government brought a change throughout the world. His last words before his death revealed the existence of the greatest treasure in the world, One Piece. It was this revelation that brought about the Grand Age of Pirates, men who dreamed of finding One Piece—which promises an unlimited amount of riches and fame—and quite possibly the pinnacle of glory and the title of the Pirate King.

Enter Monkey D. Luffy, a 17-year-old boy who defies your standard definition of a pirate. Rather than the popular persona of a wicked, hardened, toothless pirate ransacking villages for fun, Luffy's reason for being a pirate is one of pure wonder: the thought of an exciting adventure that leads him to intriguing people and ultimately, the promised treasure. Following in the footsteps of

## Vector similarity search

Search for anime based on natural language queries using the near_text operator. 

In [91]:
from weaviate.classes.query import Filter

client = weaviate.connect_to_local(headers=headers)

anime_collection = client.collections.get("Anime")

response = anime_collection.query.near_text(
    query="anime about bounty hunters in space",
    target_vector="poster_synopsis", 
    limit=1,
    filters=Filter.by_property("score").greater_or_equal(8.0),
    return_metadata=wq.MetadataQuery(distance=True),
    return_properties=["english_name", "score", "synopsis"],
    include_vector=True
)

for o in response.objects:
    print(f"""
        Name: {o.properties["english_name"]}
        Score: {o.properties["score"]}
        Plot: {o.properties["synopsis"]}
        BM25 score: {o.metadata.distance:.3f}
    """)

client.close()


        Name: Cowboy Bebop
        Score: 8.75
        Plot: Crime is timeless. By the year 2071, humanity has expanded across the galaxy, filling the surface of other planets with settlements like those on Earth. These new societies are plagued by murder, drug use, and theft, and intergalactic outlaws are hunted by a growing number of tough bounty hunters.

Spike Spiegel and Jet Black pursue criminals throughout space to make a humble living. Beneath his goofy and aloof demeanor, Spike is haunted by the weight of his violent past. Meanwhile, Jet manages his own troubled memories while taking care of Spike and the Bebop, their ship. The duo is joined by the beautiful con artist Faye Valentine, odd child Edward Wong Hau Pepelu Tivrusky IV, and Ein, a bioengineered Welsh Corgi.

While developing bonds and working to catch a colorful cast of criminals, the Bebop crew's lives are disrupted by a menace from Spike's past. As a rival's maniacal plot continues to unravel, Spike must choose be

## Content-Based Recommendation System

Find similar anime based on the vector of a specific anime.

In [92]:
from weaviate.classes.query import Filter, MetadataQuery

client = weaviate.connect_to_local(headers=headers)

anime_collection = client.collections.get("Anime")

response = anime_collection.query.fetch_objects(
    filters=Filter.by_property("english_name").equal("One Piece"),
    limit=1,
    include_vector=True
)

one_piece_object = response.objects[0]
print(f"English name: {one_piece_object.properties["english_name"]}")
print(f"Vector (poster_synopsis): {one_piece_object.vector["poster_synopsis"]}")

response = anime_collection.query.near_vector(
    near_vector=one_piece_object.vector["poster_synopsis"],
    target_vector="poster_synopsis", 
    limit=3,
    offset=1,
    return_metadata=MetadataQuery(distance=True)
)

for o in response.objects:
    print(f"""
        Name: {o.properties["english_name"]}
        Score: {o.properties["score"]}
        Plot: {o.properties["synopsis"]}
    """)
    
client.close()

English name: One Piece
Vector (poster_synopsis): [0.03531700372695923, 0.20869019627571106, 0.01693410985171795, 0.024529552087187767, 0.10282574594020844, 0.14012451469898224, 0.07822360098361969, 0.024743158370256424, -0.08048374205827713, -0.019245469942688942, 0.1589871048927307, 0.022359605878591537, 0.20114386081695557, 0.02674214169383049, -0.09577889740467072, 0.04984594136476517, 0.04390791803598404, 0.1380622833967209, -0.06774572283029556, 0.09531454741954803, -0.14828570187091827, 0.005110412836074829, -0.03579455614089966, 0.11408592760562897, 0.070151187479496, 0.020919566974043846, 0.027732031419873238, 0.0785229355096817, 0.04521721228957176, -0.07332268357276917, -0.061275627464056015, -0.04643363133072853, 0.03922853246331215, 0.19956932961940765, -0.15326379239559174, 0.011827686801552773, -0.08691646158695221, 0.07266625761985779, -0.02557445876300335, -0.04355801269412041, -0.06350294500589371, 0.05518697574734688, 0.14568237960338593, 0.16958093643188477, 0.02374

## Image search

In [93]:
client = weaviate.connect_to_local(headers=headers)

anime_collection = client.collections.get("Anime")

with open("dataset/straw_hat.jpeg", "rb") as image_file:
        # Read the image file as binary
        image_binary = image_file.read()
        # Encode the binary data to Base64
        base64_encoded = base64.b64encode(image_binary).decode('utf-8')

response = anime_collection.query.near_image(
    near_image=base64_encoded,
    target_vector="poster_synopsis", 
    return_properties=["english_name"],
    limit=5,
)

for o in response.objects:
    print(f"Name: {o.properties["english_name"]}")  

client.close()

Name: Gankutsuou: The Count of Monte Cristo
Name: Fighting Spirit
Name: One Piece
Name: s-CRY-ed
Name: The Gokusen


## Reranker

Let's improve the image search by using a reranker.

In [94]:
from weaviate.classes.query import Rerank

client = weaviate.connect_to_local(headers=headers)

anime_collection = client.collections.get("Anime")

with open("dataset/straw_hat.jpeg", "rb") as image_file:
        # Read the image file as binary
        image_binary = image_file.read()
        # Encode the binary data to Base64
        base64_encoded = base64.b64encode(image_binary).decode('utf-8')

response = anime_collection.query.near_image(
    near_image=base64_encoded,
    target_vector="poster_synopsis", 
    rerank=Rerank(
        prop="synopsis",
        query="pirates"
    ),
    return_properties=["english_name"],
    limit=5,
)

for o in response.objects:
    print(f"Name: {o.properties["english_name"]}")  

client.close()

Name: One Piece
Name: Fighting Spirit
Name: The Gokusen
Name: s-CRY-ed
Name: Gankutsuou: The Count of Monte Cristo


## Retrieval-Augmented Generation (RAG)

Retrieve anime information and generate a personalized response.

In [95]:
from weaviate.classes.query import Filter

client = weaviate.connect_to_local(headers=headers)

anime_collection = client.collections.get("Anime")

response = anime_collection.generate.near_text(
    query="What are some good action-packed anime like Cowboy Bebop?",
    target_vector="poster_synopsis", 
    filters=Filter.by_property("score").greater_or_equal(7.0),
    limit=3,
    grouped_task="Why do you think these are great recommendations?",
    grouped_properties=["english_name", "synopsis"],
    return_metadata=MetadataQuery(distance=True)
)

for o in response.objects:
    print(f"Name: {o.properties["english_name"]}")
    print(f"Score: {o.properties["score"]}\n")

print(f"ChatGPT response: {response.generated}")

client.close()

Name: Gungrave
Score: 7.83

Name: Cromartie High School
Score: 7.9

Name: Bleach
Score: 7.92

ChatGPT response: These are great recommendations because they offer a diverse range of genres and themes that can appeal to a wide audience. 

"Gungrave" is a compelling story of friendship, betrayal, and ambition set in the world of mafia syndicates. It has a gripping narrative that spans several years and keeps viewers engaged until the thrilling conclusion. The themes of loyalty, power, and sacrifice make it a captivating watch for fans of action and drama.

"Cromartie High School" offers a unique and comedic take on high school life, with a protagonist who finds himself surrounded by eccentric and delinquent classmates. The absurd humor and quirky characters make it a fun and entertaining series that is sure to bring laughs to viewers.

"Bleach" combines elements of supernatural powers, action, and friendship in a story about a high school student who becomes a Soul Reaper to protect his 