[Go to Home Page](https://weaviate.oneblink.ai)


# Introduction to Semantic Search with Weaviate

This section introduces the lab, focusing on semantic search capabilities using Weaviate with a wine reviews dataset. The goal is to set up Weaviate, utilize a transformer model for vectorization, and perform search queries.

## Setting Up the Environment

This section covers the initial setup: importing necessary libraries and establishing a connection to Weaviate.

In [None]:
import weaviate
from weaviate import Config
import weaviate.classes as wvc
import os
import pandas as pd
import pdb
from tqdm import tqdm
from tabulate import tabulate

# Sandbox is already setup with Weaviate running on port 8080
client = weaviate.connect_to_local(port=8080, grpc_port=50051)

## Preparing the WineReviews Collection

This part involves creating a new 'WineReviews' collection in Weaviate and checking its existence.

In [None]:
# Deleting any previously existing 'WineReviews' collections
print(client.collections.delete('WineReviews'))

# Creating a new collection 'WineReviews' with properties 'title' and 'description'
client.collections.create(
    name='WineReviews',
    properties=[
        wvc.Property(name='title', data_type=wvc.DataType.TEXT),
        wvc.Property(name='description', data_type=wvc.DataType.TEXT)
    ],
    vectorizer_config=wvc.Configure.Vectorizer.text2vec_transformers()
)

# Checking if the collection exists
is_exist = client.collections.exists('WineReviews')
if is_exist:
    print('Collection WineReviews is created successfully')
else:
    print('Collection WineReviews is not created successfully')
    assert False

## Importing and Inserting Data

Here, we import the wine review data using pandas, and insert it into the 'WineReviews' collection.

In [None]:
# Importing the data
data = pd.read_csv('./data/wine_reviews.csv', index_col=0)
# shuffle
data = data.sample(frac=1)
data = data[:100]  # Limiting for quick test

# Getting the 'WineReviews' collection
wine_collection = client.collections.get('WineReviews')

# Preparing data for insertion
progress_bar = tqdm(total=len(data))
wines_to_add = []
for index, row in data.iterrows():
    wine = {'title': row['title'] + '.', 'description': row['description']}
    wines_to_add.append(wine)
    response = wine_collection.data.insert(wine)
    progress_bar.update(1)
progress_bar.close()

# Inserting the data
response = wine_collection.data.insert_many(wines_to_add)
# Fetching and printing a sample of the data
response = wine_collection.query.fetch_objects(limit=2)
print(response)

## Executing Search Queries

In this final section, we execute search queries based on user input to find relevant wine reviews.

In [None]:
# Ensuring the collection exists
if not client.collections.exists('WineReviews'):
    raise Exception('Collection does not exist')

reviews = client.collections.get('WineReviews')

# User input for search query
user_query = input('Please enter the type of wine you are looking for: ')

# Executing the search query
query_response = reviews.query.near_text(user_query, limit=10)
print('-' * 20)
print('The query responses are as follows:\n')
for i, obj in enumerate(query_response.objects):
    title = obj.properties['title']
    description = obj.properties['description']
    print(f'{i+1}. Title: {title}\n   Description: {description}\n')
print('-' * 20)

[Go to Home Page](https://weaviate.oneblink.ai)
