# **Contextual Search for Hotel Review**

Inspired by
https://github.com/UKPLab/sentence-transformers/tree/master/sentence_transformers

Data source
https://www.kaggle.com/datasets/hamzafarooq50/hotel-listings-and-reviews?resource=download&select=HotelListInDubai__en2019100120191005.csv

### **Import Package**
First install the library that would help us use BERT in an easy to use interface.

In [2]:
!pip install -U spacy
!pip install -U sentence-transformers

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting spacy
  Downloading spacy-3.5.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.7/6.7 MB[0m [31m30.8 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: spacy
  Attempting uninstall: spacy
    Found existing installation: spacy 3.4.4
    Uninstalling spacy-3.4.4:
      Successfully uninstalled spacy-3.4.4
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
en-core-web-sm 3.4.1 requires spacy<3.5.0,>=3.4.0, but you have spacy 3.5.0 which is incompatible.[0m[31m
[0mSuccessfully installed spacy-3.5.0
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting sentence-transformers
  Downloading sentence-transf

In [3]:
!pip install opendatasets
!pip install pandas

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting opendatasets
  Downloading opendatasets-0.1.22-py3-none-any.whl (15 kB)
Installing collected packages: opendatasets
Successfully installed opendatasets-0.1.22
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [4]:
import opendatasets as od
import pandas

# download data from Kaggle (using key and username)
od.download("https://www.kaggle.com/datasets/hamzafarooq50/hotel-listings-and-reviews?select=hotelReviewsInDubai__en2019100120191005.csv")

Please provide your Kaggle credentials to download this dataset. Learn more: http://bit.ly/kaggle-creds
Your Kaggle username: yyunchien
Your Kaggle Key: ··········
Downloading hotel-listings-and-reviews.zip to ./hotel-listings-and-reviews


100%|██████████| 8.54M/8.54M [00:00<00:00, 60.8MB/s]







In [5]:
import spacy
from spacy.lang.en.stop_words import STOP_WORDS
from string import punctuation
from collections import Counter
from heapq import nlargest



In [6]:
!python -m spacy download en_core_web_sm

2023-02-18 00:10:13.906205: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2023-02-18 00:10:13.906337: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2023-02-18 00:10:16.145940: E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:267] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting en-core-web-sm==3.5.0
  Downloading https://github.com/explosion/spacy-models/releases/download

In [7]:
!ls

hotel-listings-and-reviews  sample_data


# **Basic NLP**

In [8]:
# Data Cleaning

import re
#sample review from the IMDB dataset.
review = "<b>A touching movie!!</b> It is full of emotions and wonderful acting.<br> I could have sat through it a second time."
cleaned_review = re.sub(re.compile('<.*?>'), '', review) #removing HTML tags
cleaned_review = re.sub('[^A-Za-z0-9]+', ' ', cleaned_review) #taking only words

print(cleaned_review)

A touching movie It is full of emotions and wonderful acting I could have sat through it a second time 


In [9]:
#Lowercase

cleaned_review = cleaned_review.lower()
print(cleaned_review)

a touching movie it is full of emotions and wonderful acting i could have sat through it a second time 


In [10]:
# Tokenization

import nltk
nltk.download('punkt')

from nltk.tokenize import word_tokenize
tokens = nltk.word_tokenize(cleaned_review)

print(cleaned_review)
print(tokens)

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


a touching movie it is full of emotions and wonderful acting i could have sat through it a second time 
['a', 'touching', 'movie', 'it', 'is', 'full', 'of', 'emotions', 'and', 'wonderful', 'acting', 'i', 'could', 'have', 'sat', 'through', 'it', 'a', 'second', 'time']


In [11]:
# Stop words removal

nltk.download('stopwords')

from nltk.corpus import stopwords
stop_words = stopwords.words('english')
filtered_review = [word for word in tokens if word not in stop_words] # removing stop words
print(filtered_review)

['touching', 'movie', 'full', 'emotions', 'wonderful', 'acting', 'could', 'sat', 'second', 'time']


[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


In [12]:
nltk.download('wordnet')
nltk.download('omw-1.4')
from nltk.stem import WordNetLemmatizer

lemmatizer = WordNetLemmatizer()
lemm_review = [lemmatizer.lemmatize(word) for word in filtered_review]
print(lemm_review)

[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data] Downloading package omw-1.4 to /root/nltk_data...


['touching', 'movie', 'full', 'emotion', 'wonderful', 'acting', 'could', 'sat', 'second', 'time']


# **Moving to Deep Learning Part**

In [13]:
import os
import spacy
nlp = spacy.load("en_core_web_sm")
from spacy import displacy

In [14]:
text = """Looking for a hotel in New York near Times Square with free breakfast and cheaper 
than $100 for 2nd June which is really kids friendly and has a swimming pool and I want to stay there for 8 days"""
doc = nlp(text)
sentence_spans = list(doc.sents)
displacy.render(doc, jupyter = True, style="ent")

In [15]:
text = """Close to the Effiel Tower and is very high end with great shopping nearby"""
doc = nlp(text)
sentence_spans = list(doc.sents)
displacy.render(doc, jupyter = True, style="ent")

In [17]:
text = "I want to stay in a European city that filmed Game of Thrones and has very cheap booze and art galleries for 4 days"
#text = """My very photogenic mother died in a freak accident (picnic, lightning) when I was three, and, save for a pocket of warmth in the darkest past, nothing of her subsists within the hollows and dells of memory, over which, if you can still stand my style (I am writing under observation), the sun of my infancy had set: surely, you all know those redolent remnants of day suspended, with the midges, about some hedge in bloom or suddenly entered and traversed by the rambler, at the bottom of a hill, in the summer dusk; a furry warmth, golden midges"""
doc = nlp(text)
sentence_spans = list(doc.sents)
displacy.render(doc, jupyter = True, style="ent")

In [18]:
stopwords=list(STOP_WORDS)
from string import punctuation
punctuation=punctuation+ '\n'

In [19]:
import pandas as pd
from sentence_transformers import SentenceTransformer
import scipy.spatial
import pickle as pkl

embedder = SentenceTransformer('all-MiniLM-L6-v2')
#embedder = SentenceTransformer('bert-base-nli-mean-tokens')

Downloading (…)e9125/.gitattributes:   0%|          | 0.00/1.18k [00:00<?, ?B/s]

Downloading (…)_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Downloading (…)7e55de9125/README.md:   0%|          | 0.00/10.6k [00:00<?, ?B/s]

Downloading (…)55de9125/config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

Downloading (…)ce_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

Downloading (…)125/data_config.json:   0%|          | 0.00/39.3k [00:00<?, ?B/s]

Downloading (…)"pytorch_model.bin";:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

Downloading (…)nce_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

Downloading (…)e9125/tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

Downloading (…)9125/train_script.py:   0%|          | 0.00/13.2k [00:00<?, ?B/s]

Downloading (…)7e55de9125/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)5de9125/modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

# **Hotel data in Dubai**

In [44]:
import opendatasets as od
import pandas
  
od.download("https://www.kaggle.com/datasets/hamzafarooq50/hotel-listings-and-reviews?resource=download&select=HotelListInDubai__en2019100120191005.csv")

Skipping, found downloaded files in "./hotel-listings-and-reviews" (use force=True to force download)


### (1) Hotel list

In [21]:
import pandas as pds
  
# reading the XLSX file
file =('/content/hotel-listings-and-reviews/HotelListInDubai__en2019100120191005.csv')
df_list = pds.read_csv(file)
  
# displaying the contents of the XLSX file
df_list.head()

Unnamed: 0.1,Unnamed: 0,hotel_name,url,locality,reviews,tripadvisor_rating,checkIn,checkOut,price_per_night,booking_provider,no_of_deals,hotel_features
0,0,Four Points By Sheraton Downtown Dubai,http://www.tripadvisor.com/Hotel_Review-g29542...,Dubai,2046,,2019/10/01,2019/10/05,$74,FourPoints.com,15,
1,1,FIVE Palm Jumeirah Dubai,http://www.tripadvisor.com/Hotel_Review-g29542...,Dubai,5388,,2019/10/01,2019/10/05,,Booking.com,15,
2,2,"Atlantis, The Palm",http://www.tripadvisor.com/Hotel_Review-g29542...,Dubai,25417,,2019/10/01,2019/10/05,,Booking.com,10,
3,3,Citymax Hotel Bur Dubai,http://www.tripadvisor.com/Hotel_Review-g29542...,Dubai,3704,,2019/10/01,2019/10/05,,TripAdvisor,14,
4,4,Premier Inn Dubai International Airport Hotel,http://www.tripadvisor.com/Hotel_Review-g29542...,Dubai,5215,,2019/10/01,2019/10/05,,Booking.com,14,


### (2) Hotel Reviews

In [22]:
# reading the XLSX file
file =('/content/hotel-listings-and-reviews/hotelReviewsInDubai__en2019100120191005.csv')
df_reviews = pds.read_csv(file)
  
# displaying the contents of the XLSX file
df_reviews.head()

Unnamed: 0.1,Unnamed: 0,review_body,review_date,hotelName,hotelUrl
0,0,Just to say this is really an excellent hotel ...,"July 14, 2019",0 Four Points By Sheraton Downtown Dubai\nN...,http://www.tripadvisor.com/Hotel_Review-g29542...
1,1,"Found this pub by chance, what a great place, ...","July 12, 2019",0 Four Points By Sheraton Downtown Dubai\nN...,http://www.tripadvisor.com/Hotel_Review-g29542...
2,2,"House keeping is perfect , the rooms are alway...","July 9, 2019",0 Four Points By Sheraton Downtown Dubai\nN...,http://www.tripadvisor.com/Hotel_Review-g29542...
3,3,Although we had a few issues in terms of check...,"July 6, 2019",0 Four Points By Sheraton Downtown Dubai\nN...,http://www.tripadvisor.com/Hotel_Review-g29542...
4,4,I was stayed over 3 night in room ( 730 ) my f...,"July 4, 2019",0 Four Points By Sheraton Downtown Dubai\nN...,http://www.tripadvisor.com/Hotel_Review-g29542...


In [23]:
df_reviews['hotelName'].value_counts()

0    Four Points By Sheraton Downtown Dubai\nName: hotel_name, dtype: object           54
8    Orient Guest House\nName: hotel_name, dtype: object                               54
23    Signature 1 Hotel Tecom\nName: hotel_name, dtype: object                         54
20    Golden Tulip Al Barsha\nName: hotel_name, dtype: object                          54
18    Winchester Grand Hotel Apartments\nName: hotel_name, dtype: object               54
1    FIVE Palm Jumeirah Dubai\nName: hotel_name, dtype: object                         54
9    Barjeel Heritage Guest House\nName: hotel_name, dtype: object                     54
29    London Creek Hotel Apartments\nName: hotel_name, dtype: object                   54
7    Address Dubai Marina\nName: hotel_name, dtype: object                             54
5    JW Marriott Hotel Dubai\nName: hotel_name, dtype: object                          54
2    Atlantis, The Palm\nName: hotel_name, dtype: object                               54
3    Citym

In [24]:
# Strip/Trim
df_reviews[['Hotel_Name_Clean','Extra']] = df_reviews.hotelName.str.split("\n",expand=True)
df_reviews['Hotel_Name_Clean'] = df_reviews['Hotel_Name_Clean'].str.slice(4,).str.strip()

In [25]:
df_reviews['Hotel_Name_Clean'].drop_duplicates()

0             Four Points By Sheraton Downtown Dubai
54                          FIVE Palm Jumeirah Dubai
108                               Atlantis, The Palm
162                          Citymax Hotel Bur Dubai
216    Premier Inn Dubai International Airport Hotel
270                          JW Marriott Hotel Dubai
324                Four Points by Sheraton Bur Dubai
378                             Address Dubai Marina
432                               Orient Guest House
486                     Barjeel Heritage Guest House
540       DAMAC Towers by Paramount Hotels & Resorts
545                                 Hotel Beit Bahar
546                             Roda Boutique Villas
566                              Vida Emirates Hills
569                                   Vasantam Hotel
575                  Hyatt Place Dubai/Wasl District
612                                    BackPacker 16
634                    Crowne Plaza Dubai Apartments
635                Winchester Grand Hotel Apar

In [26]:
df_reviews.head()

Unnamed: 0.1,Unnamed: 0,review_body,review_date,hotelName,hotelUrl,Hotel_Name_Clean,Extra
0,0,Just to say this is really an excellent hotel ...,"July 14, 2019",0 Four Points By Sheraton Downtown Dubai\nN...,http://www.tripadvisor.com/Hotel_Review-g29542...,Four Points By Sheraton Downtown Dubai,"Name: hotel_name, dtype: object"
1,1,"Found this pub by chance, what a great place, ...","July 12, 2019",0 Four Points By Sheraton Downtown Dubai\nN...,http://www.tripadvisor.com/Hotel_Review-g29542...,Four Points By Sheraton Downtown Dubai,"Name: hotel_name, dtype: object"
2,2,"House keeping is perfect , the rooms are alway...","July 9, 2019",0 Four Points By Sheraton Downtown Dubai\nN...,http://www.tripadvisor.com/Hotel_Review-g29542...,Four Points By Sheraton Downtown Dubai,"Name: hotel_name, dtype: object"
3,3,Although we had a few issues in terms of check...,"July 6, 2019",0 Four Points By Sheraton Downtown Dubai\nN...,http://www.tripadvisor.com/Hotel_Review-g29542...,Four Points By Sheraton Downtown Dubai,"Name: hotel_name, dtype: object"
4,4,I was stayed over 3 night in room ( 730 ) my f...,"July 4, 2019",0 Four Points By Sheraton Downtown Dubai\nN...,http://www.tripadvisor.com/Hotel_Review-g29542...,Four Points By Sheraton Downtown Dubai,"Name: hotel_name, dtype: object"


## Combine reviews

In [27]:
df_combined = df_reviews.sort_values(['Hotel_Name_Clean']).groupby('Hotel_Name_Clean', sort=False).review_body.apply(''.join).reset_index(name='all_review')

In [28]:
df_combined.head()

Unnamed: 0,Hotel_Name_Clean,all_review
0,Address Dubai Marina,"Excellent Hotel and service, i enjoyed my stay..."
1,Al SEEF Hotel,AMAZING palace with beautiful design the servi...
2,"Atlantis, The Palm",Nice hotel for the family. Everywhere in the h...
3,BackPacker 16,It's not a fancy hotel and it's not a real hos...
4,Barjeel Heritage Guest House,Only had two days here to break the long trip ...


In [29]:
import re

df_combined['all_review'] = df_combined['all_review'].apply(lambda x: re.sub('[^a-zA-z0-9\s]','',x))

def lower_case(input_str):
    input_str = input_str.lower()
    return input_str

df_combined['all_review']= df_combined['all_review'].apply(lambda x: lower_case(x))

In [30]:
df = df_combined

In [52]:
df.head()

Unnamed: 0,Hotel_Name_Clean,all_review
0,Address Dubai Marina,excellent hotel and service i enjoyed my stay ...
1,Al SEEF Hotel,amazing palace with beautiful design the servi...
2,"Atlantis, The Palm",nice hotel for the family everywhere in the ho...
3,BackPacker 16,its not a fancy hotel and its not a real hoste...
4,Barjeel Heritage Guest House,only had two days here to break the long trip ...


In [31]:
df_sentences = df_combined.set_index("all_review")
df_sentences = df_sentences["Hotel_Name_Clean"].to_dict()
df_sentences_list = list(df_sentences.keys())
len(df_sentences_list)

28

In [33]:
list(df_sentences.keys())[:5]

['excellent hotel and service i enjoyed my stay and the business meetings i recommend this hotel to all my friends and anyone wants to stay in dubai all facilities in the room were very good and the internet also was very goodumair was vary good gay very friendly we just visited shades tottaly enjoyed the feel of the place food ambiance sarvice were upto the mark specially umair made us to like the place even more view and pool sid sitting area were classy a must visit umair was told me we start pool party the name is aquaholic we wating for that we come back very soon thankshave not stayed at the hotel but have used the restaurants the chefs are amazing a recent experience organizing an iftar get together for my team has left me feeling very satisfied and wanting to recommend address marina to others our booking was very last minute and was handled very efficiently by neha gidwani kohli lifestyle events executive not only did she handle our booking efficiently and patiently we kept ch

In [34]:
from tqdm import tqdm
from sentence_transformers import SentenceTransformer, util

In [35]:
df_sentences_list = [str(d) for d in tqdm(df_sentences_list)]

100%|██████████| 28/28 [00:00<00:00, 137518.16it/s]


## Embeddings

In [36]:
# Corpus with example sentences
corpus = df_sentences_list
corpus_embeddings = embedder.encode(corpus,show_progress_bar=True)

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

In [37]:
corpus_embeddings[0].shape

(384,)

In [38]:
corpus_embeddings[0]

array([-3.44585348e-03,  8.86060181e-04,  8.68610397e-04,  3.60185578e-02,
       -6.95023611e-02,  2.47212537e-02,  9.77890566e-02, -7.92217702e-02,
        1.00702122e-02, -7.40536749e-02,  1.12373689e-02, -4.12639827e-02,
        3.06909606e-02,  4.42646779e-02,  4.96442840e-02, -3.43752317e-02,
        7.74144605e-02, -6.06893785e-02, -7.87047595e-02, -7.74550671e-03,
       -4.96977791e-02, -1.50003433e-02, -3.74407060e-02, -5.43095432e-02,
       -1.11573189e-01,  1.14605315e-02, -1.74497385e-02,  5.20471595e-02,
       -3.99421988e-04, -5.95689155e-02,  1.95702389e-02,  1.44654170e-01,
        1.78821012e-02, -3.28862518e-02,  3.80284525e-02,  1.19153351e-01,
       -6.34656399e-02, -1.23817235e-01,  1.60905756e-02, -2.17618030e-02,
       -5.15343971e-04, -1.93868745e-02,  6.99806586e-02, -3.44565324e-02,
        4.28743614e-03,  6.81685144e-03, -4.64240182e-03,  1.37275066e-02,
        3.49233486e-02, -1.31877400e-02, -8.93387794e-02,  2.07583848e-02,
        2.24333517e-02, -

In [None]:
# model = SentenceTransformer('all-MiniLM-L6-v2')
# paraphrases = util.paraphrase_mining(model, corpus)
# query_embeddings_p =  util.paraphrase_mining(model, queries,show_progress_bar=True)

In [None]:
# import pickle as pkl
# with open("/content/drive/MyDrive/BertSentenceSimilarity/Pickles/corpus_embeddings.pkl" , "wb") as file_:
# pkl.dump(corpus_embeddings,file_)

## **Query Setences input**

In [42]:
import torch

# Query sentences:
queries = ['hotel that is close to the airport ',
           'Hotel with easy access for taxi']


# Find the closest 5 sentences of the corpus for each query sentence based on cosine similarity
top_k = min(5, len(corpus))
for query in queries:
    query_embedding = embedder.encode(query, convert_to_tensor=True)

    # We use cosine-similarity and torch.topk to find the highest 5 scores
    cos_scores = util.pytorch_cos_sim(query_embedding, corpus_embeddings)[0]
    top_results = torch.topk(cos_scores, k=top_k)

    print("\n\n======================\n\n")
    print("Query:", query)
    print("\nTop 5 most similar sentences in corpus:")

    for score, idx in zip(top_results[0], top_results[1]):
        print("(Score: {:.4f})".format(score))
        print(corpus[idx], "(Score: {:.4f})".format(score))
        row_dict = df.loc[df['all_review']== corpus[idx]]
        print("paper_id:  " , row_dict['Hotel_Name_Clean'] , "\n")
    # for idx, distance in results[0:closest_n]:
    #     print("Score:   ", "(Score: %.4f)" % (1-distance) , "\n" )
    #     print("Paragraph:   ", corpus[idx].strip(), "\n" )
    #     row_dict = df.loc[df['all_review']== corpus[idx]]
    #     print("paper_id:  " , row_dict['Hotel'] , "\n")
    """
    # Alternatively, we can also use util.semantic_search to perform cosine similarty + topk
    hits = util.semantic_search(query_embedding, corpus_embeddings, top_k=5)
    hits = hits[0]      #Get the hits for the first query
    for hit in hits:
        print(corpus[hit['corpus_id']], "(Score: {:.4f})".format(hit['score']))
    """





Query: hotel that is close to the airport 

Top 5 most similar sentences in corpus:
(Score: 0.5451)
hotel is good and clean only food not good enough and expensive and we was not able to watch afcon on tv the have to go with time really i was happy with the hotel service price of the food is expensive and not good test the workers are very friendly nice and near easy transportation from and to airportfair price friendly stuff free bus to airport totally it was good experiment to me and my family also near most of tourism places and activities in dabai we booked this hotel for our layover and were pleasantly surprised it had free shuttle bus service to the hotel and to a number of attractions checkin and checkout were speedy and the staff very friendly and helpful it is very nice  cool and calmplace to stay and have a nice staff and good rates  rooms were clean and have all facilities  also parking is perfect have easy access to rooms and i would like to stay in future again and its

In [47]:
model = SentenceTransformer('sentence-transformers/paraphrase-xlm-r-multilingual-v1')
embeddings = model.encode(corpus)
#print(embeddings)

Downloading (…)31d34/.gitattributes:   0%|          | 0.00/345 [00:00<?, ?B/s]

Downloading (…)_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Downloading (…)e4a1a31d34/README.md:   0%|          | 0.00/3.74k [00:00<?, ?B/s]

Downloading (…)a1a31d34/config.json:   0%|          | 0.00/718 [00:00<?, ?B/s]

Downloading (…)ce_transformers.json:   0%|          | 0.00/122 [00:00<?, ?B/s]

Downloading (…)"pytorch_model.bin";:   0%|          | 0.00/1.11G [00:00<?, ?B/s]

Downloading (…)nce_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

Downloading (…)tencepiece.bpe.model:   0%|          | 0.00/5.07M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/150 [00:00<?, ?B/s]

Downloading (…)31d34/tokenizer.json:   0%|          | 0.00/9.10M [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/550 [00:00<?, ?B/s]

Downloading (…)1a31d34/modules.json:   0%|          | 0.00/229 [00:00<?, ?B/s]

In [48]:
query_embedding.shape

torch.Size([384])