# Setup

Installing required libraries

In [None]:
!pip install sacremoses
!pip install hugchat

Class for similarity measure using BERT models

In [7]:
from transformers import BertTokenizer, BertModel, DistilBertTokenizer, DistilBertModel
from hugchat import hugchat
from hugchat.login import Login
import random
import torch

class Bert_similarity:
  def __init__(self, pretrained_name, tokenizer_class, model_class, tokenizer_name=None):
    self.device = "cuda:0" if torch.cuda.is_available() else "cpu"
    self.pretrained_name = pretrained_name
    self.tokenizer_name = tokenizer_name
    if self.tokenizer_name is None:
      self.tokenizer_name = self.pretrained_name
    self.tokenizer = tokenizer_class.from_pretrained(self.tokenizer_name)
    self.model = model_class.from_pretrained(self.pretrained_name).to(self.device)
    self.__similarity = torch.nn.CosineSimilarity(dim=0, eps=1e-6)

  def embedding(self, text):
    encoded_input = self.tokenizer(text, return_tensors='pt').to(self.device)
    output = self.model(**encoded_input)
    #embedding extraction
    return output['last_hidden_state'][0,-1,:]

  def similarity(self, text1, text2):
    embed1 = self.embedding(text1)
    embed2 = self.embedding(text2)
    sim = self.__similarity(embed1, embed2)
    return sim

# Bert Models

In [4]:
sim = Bert_similarity("bert-base-uncased", BertTokenizer, BertModel)
sim.similarity("Replace me by any text you'd like.", "Replace me by any text you'd hate.")

tensor(0.9866, device='cuda:0', grad_fn=<SumBackward1>)


Similarity look alright, these sentences should be similar. Bert on gpu takes ~35.1ms [35.1 ms ± 4.33 ms per loop (mean ± std. dev. of 7 runs, 1000 loops each)]


In [5]:
sim2 = Bert_similarity("distilbert-base-uncased", DistilBertTokenizer, DistilBertModel)
sim2.similarity("Replace me by any text you'd like.", "Replace me by any text you'd hate.")

tensor(0.9849, device='cuda:0', grad_fn=<SumBackward1>)


Distilbert takes 17.9 ms ± 1.19 ms per loop (mean ± std. dev. of 7 runs, 1000 loops each)

# Hugchat

https://github.com/Soulter/hugging-chat-api/tree/master

API to https://huggingface.co/chat/ chat, which is a substitution for chat-gpt using LLama 2 hosted by huggingface.

4 models are available:

    meta-llama/Llama-2-70b-chat-hf - most likely our focus
    codellama/CodeLlama-34b-Instruct-hf
    tiiuae/falcon-180B-chat
    mistralai/Mistral-7B-Instruct-v0.1

Wrappers for the chat api and a function wrapping all of proof of concept code, end to end for a single pair.

In [None]:
def login():
  email = input("Input mail")
  password = input("Input password")
  sign = Login(email, password)
  cookies = sign.login()
  cookie_path_dir = "./cookies_snapshot"
  sign.saveCookiesToDir(cookie_path_dir)
  return hugchat.ChatBot(cookies=cookies.get_dict())

CHATBOT = login()

def query_wrapper(text):
  id = CHATBOT.new_conversation()
  CHATBOT.change_conversation(id)
  return CHATBOT.query(text)

def poc(similarity, title_left, description_left, title_right, description_right, prompt):
  query_left = query_wrapper(prompt + description_left)
  query_right = query_wrapper(prompt + description_right)
  return similarity.similarity(title_left+str(query_left), title_right+str(query_right))

# Polish Data
Polish dataset for e-commerce pairs comparison https://github.com/grant-TraDA/mlt4pm . Examples on two row samples.

In [10]:
lenor_pl = {"id_left":2549,"cluster_id_left":"8001841376004","identifiers_left":[{"EAN":["8001841376004"]}],"title_left":"lenor p\u0142yn do p\u0142ukania tkanin sparkling bloom yellow poppy","description_left":"p\u0142yn do p\u0142ukania tkanin lenor zapewnia d\u0142ugotrwa\u0142\u0105 \u015bwie\u017co\u015b\u0107 \u015bwie\u017co\u015b\u0107 po\u015bcieli ka\u017cdej nocy przez ca\u0142y tydzie\u0144 zapach sparkling bloom yellow poppy stopniowo uwalnia energetyzuj\u0105ce aromaty o\u017cywiaj\u0105c twoje zmys\u0142y eksplozj\u0105 kwiatowej \u015bwie\u017co\u015bci i daj\u0105c ci poczucie komfortu aby w pe\u0142ni cieszy\u0107 si\u0119 dzia\u0142aniem swojego ulubionego p\u0142ynu do zmi\u0119kczania tkanin lenor u\u017cywaj go razem z pere\u0142kami zapachowymi lenor unstoppables ","brand_left":"lenor","price_left":"14 99","specTableContent_left":"amount 1 42 l capacity 1 42 l extras  image_url https www frisco pl pid 119361 n lenor plyn do plukania tkanin sparkling bloom yellow poppy stn product ingredients  5 kationowe \u015brodki powierzchniowo czynne benzisothiazolinone kompozycje zapachowe alpha isomethyl ionone coumarin hexyl cinnamal origin  kraj pochodzenia czechy zapakowano w czechy  storage  url https www frisco pl pid 119361 n lenor plyn do plukania tkanin sparkling bloom yellow poppy stn product weight waga brutto 1471","keyValuePairs_left":{"amount":"1 42 l","capacity":"1 42 l","extras":"","image_url":"https www frisco pl pid 119361 n lenor plyn do plukania tkanin sparkling bloom yellow poppy stn product","ingredients":" 5 kationowe \u015brodki powierzchniowo czynne benzisothiazolinone kompozycje zapachowe alpha isomethyl ionone coumarin hexyl cinnamal","origin":" kraj pochodzenia czechy zapakowano w czechy ","storage":"","url":"https www frisco pl pid 119361 n lenor plyn do plukania tkanin sparkling bloom yellow poppy stn product","weight":"waga brutto 1471"},"id_right":2549,"cluster_id_right":"8001841376004","identifiers_right":[{"EAN":["8001841376004"]}],"title_right":"lenor p\u0142yn do p\u0142ukania tkanin sparkling bloom yellow poppy","description_right":"p\u0142yn do p\u0142ukania tkanin lenor zapewnia d\u0142ugotrwa\u0142\u0105 \u015bwie\u017co\u015b\u0107 \u015bwie\u017co\u015b\u0107 po\u015bcieli ka\u017cdej nocy przez ca\u0142y tydzie\u0144 zapach sparkling bloom yellow poppy stopniowo uwalnia energetyzuj\u0105ce aromaty o\u017cywiaj\u0105c twoje zmys\u0142y eksplozj\u0105 kwiatowej \u015bwie\u017co\u015bci i daj\u0105c ci poczucie komfortu aby w pe\u0142ni cieszy\u0107 si\u0119 dzia\u0142aniem swojego ulubionego p\u0142ynu do zmi\u0119kczania tkanin lenor u\u017cywaj go razem z pere\u0142kami zapachowymi lenor unstoppables ","brand_right":"lenor","price_right":"14 99","specTableContent_right":"amount 1 42 l capacity 1 42 l extras  image_url https www frisco pl pid 119361 n lenor plyn do plukania tkanin sparkling bloom yellow poppy stn product ingredients  5 kationowe \u015brodki powierzchniowo czynne benzisothiazolinone kompozycje zapachowe alpha isomethyl ionone coumarin hexyl cinnamal origin  kraj pochodzenia czechy zapakowano w czechy  storage  url https www frisco pl pid 119361 n lenor plyn do plukania tkanin sparkling bloom yellow poppy stn product weight waga brutto 1471","keyValuePairs_right":{"amount":"1 42 l","capacity":"1 42 l","extras":"","image_url":"https www frisco pl pid 119361 n lenor plyn do plukania tkanin sparkling bloom yellow poppy stn product","ingredients":" 5 kationowe \u015brodki powierzchniowo czynne benzisothiazolinone kompozycje zapachowe alpha isomethyl ionone coumarin hexyl cinnamal","origin":" kraj pochodzenia czechy zapakowano w czechy ","storage":"","url":"https www frisco pl pid 119361 n lenor plyn do plukania tkanin sparkling bloom yellow poppy stn product","weight":"waga brutto 1471"},"category_left":"chemia","category_right":"chemia","label":1}
sok_pl = {"id_left":629,"cluster_id_left":"5906395223572","identifiers_left":[{"EAN":["5906395223572"]}],"title_left":"sady wincenta sok wieloowocowy w kartonie t\u0142oczony","description_left":"sady wincenta sok wieloowocowy w kartonie t\u0142oczony sok wieloowocowy jab\u0142kowo gruszkowo malinowo aroniowy naturalnie m\u0119tny otrzymany w wyniku t\u0142oczenia miazgi jab\u0142kowej bez udzia\u0142u enzym\u00f3w filtrowany pasteryzowany bez dodatku jakichkolwiek substancji dodatkowych rozlewany na gor\u0105co w atmosferze azotu i pakowany w systemie bag in box ","brand_left":"sady wincenta","price_left":"16 99","specTableContent_left":"amount 3 l capacity 3 l extras  image_url https www frisco pl pid 119683 n sady wincenta sok wieloowocowy w kartonie tloczony stn product ingredients jab\u0142ko 65 gruszka 25 malina 5 aronia 5  origin  storage przechowywa\u0107 w suchym i ch\u0142odnym miejscu po otwarciu przechowywa\u0107 w lod\u00f3wce nie d\u0142u\u017cej ni\u017c 14 dni  url https www frisco pl pid 119683 n sady wincenta sok wieloowocowy w kartonie tloczony stn product weight ","keyValuePairs_left":{"amount":"3 l","capacity":"3 l","extras":"","image_url":"https www frisco pl pid 119683 n sady wincenta sok wieloowocowy w kartonie tloczony stn product","ingredients":"jab\u0142ko 65 gruszka 25 malina 5 aronia 5 ","origin":"","storage":"przechowywa\u0107 w suchym i ch\u0142odnym miejscu po otwarciu przechowywa\u0107 w lod\u00f3wce nie d\u0142u\u017cej ni\u017c 14 dni ","url":"https www frisco pl pid 119683 n sady wincenta sok wieloowocowy w kartonie tloczony stn product","weight":""},"id_right":629,"cluster_id_right":"5906395223572","identifiers_right":[{"EAN":["5906395223572"]}],"title_right":"sady wincenta sok wieloowocowy w kartonie t\u0142oczony","description_right":"sady wincenta sok wieloowocowy w kartonie t\u0142oczony sok wieloowocowy jab\u0142kowo gruszkowo malinowo aroniowy naturalnie m\u0119tny otrzymany w wyniku t\u0142oczenia miazgi jab\u0142kowej bez udzia\u0142u enzym\u00f3w filtrowany pasteryzowany bez dodatku jakichkolwiek substancji dodatkowych rozlewany na gor\u0105co w atmosferze azotu i pakowany w systemie bag in box ","brand_right":"sady wincenta","price_right":"16 99","specTableContent_right":"amount 3 l capacity 3 l extras  image_url https www frisco pl pid 119683 n sady wincenta sok wieloowocowy w kartonie tloczony stn product ingredients jab\u0142ko 65 gruszka 25 malina 5 aronia 5  origin  storage przechowywa\u0107 w suchym i ch\u0142odnym miejscu po otwarciu przechowywa\u0107 w lod\u00f3wce nie d\u0142u\u017cej ni\u017c 14 dni  url https www frisco pl pid 119683 n sady wincenta sok wieloowocowy w kartonie tloczony stn product weight ","keyValuePairs_right":{"amount":"3 l","capacity":"3 l","extras":"","image_url":"https www frisco pl pid 119683 n sady wincenta sok wieloowocowy w kartonie tloczony stn product","ingredients":"jab\u0142ko 65 gruszka 25 malina 5 aronia 5 ","origin":"","storage":"przechowywa\u0107 w suchym i ch\u0142odnym miejscu po otwarciu przechowywa\u0107 w lod\u00f3wce nie d\u0142u\u017cej ni\u017c 14 dni ","url":"https www frisco pl pid 119683 n sady wincenta sok wieloowocowy w kartonie tloczony stn product","weight":""},"category_left":"napoje","category_right":"napoje","label":1}

In [12]:
lenor_pl['title_left'], lenor_pl['description_left']

('lenor płyn do płukania tkanin sparkling bloom yellow poppy',
 'płyn do płukania tkanin lenor zapewnia długotrwałą świeżość świeżość pościeli każdej nocy przez cały tydzień zapach sparkling bloom yellow poppy stopniowo uwalnia energetyzujące aromaty ożywiając twoje zmysły eksplozją kwiatowej świeżości i dając ci poczucie komfortu aby w pełni cieszyć się działaniem swojego ulubionego płynu do zmiękczania tkanin lenor używaj go razem z perełkami zapachowymi lenor unstoppables ')

In [13]:
sok_pl['title_left'], sok_pl['description_left']

('sady wincenta sok wieloowocowy w kartonie tłoczony',
 'sady wincenta sok wieloowocowy w kartonie tłoczony sok wieloowocowy jabłkowo gruszkowo malinowo aroniowy naturalnie mętny otrzymany w wyniku tłoczenia miazgi jabłkowej bez udziału enzymów filtrowany pasteryzowany bez dodatku jakichkolwiek substancji dodatkowych rozlewany na gorąco w atmosferze azotu i pakowany w systemie bag in box ')

In [14]:
model_prompt = "Extract key information from the product description provided. Focus on identifying essential features, specifications, and unique aspects of each product. Provide a concise summary with relevant details. Ensure that the extracted information is clear and captures the essence of the product's functionality and appeal. "

Query to the hugchat api.

In [18]:
polish_query = query_wrapper(model_prompt + lenor_pl['description_left'])
print(polish_query)

 Product: Lenor Unstoppables Fabric Softener

Essential Features:

1. Long-lasting freshness
2. Sparkling bloom scent
3. Energizing aromas
4. Comfortable feel
5. Enhanced experience when used with Lenor Unstoppables perfume beads

Specifications:

1. Not specified

Unique Aspects:

1. Gradual release of energizing aromas
2. Enhanced experience when used with Lenor Unstoppables perfume beads

Summary:
Lenor Unstoppables Fabric Softener provides long-lasting freshness with a sparkling bloom scent that gradually releases energizing aromas, giving users a comfortable feel and an enhanced experience when used with Lenor Unstoppables perfume beads. It is designed to provide a pleasant and invigorating experience for those who want to enjoy their favorite fabric softener to the fullest.


In [19]:
polish_query_sok = query_wrapper(model_prompt + sok_pl['description_left'])
print(polish_query_sok)

 Product: Sady Wincenta Sok Wieloowocowy

Essential Features:

1. Multi-fruit juice made from apples, pears, raspberries, and blackberries
2. Natural cloudiness obtained through pressing without added enzymes or preservatives
3. Filtered and pasteurized
4. Packaged in a bag-in-box system

Specifications:

1. Made from 100% fruit juice
2. No added sugars or artificial flavors
3. Cloudy appearance due to natural sedimentation
4. Pasteurization process ensures food safety and extends shelf life

Unique Aspects:

1. Use of a bag-in-box packaging system for convenient storage and dispensing
2. Natural cloudiness gives the juice a homemade taste and texture
3. Absent of any additives or preservatives, making it a healthier choice for consumers

Summary:
Sady Wincenta Sok Wieloowocowy is a multi-fruit juice made from apples, pears, raspberries, and blackberries, featuring a natural cloudiness obtained through pressing without added enzymes or preservatives. It is filtered and pasteurized to e

Chat gpt attempt for comparison:

Certainly! Here are the crucial tags extracted from the Polish product description:

    Płyn do płukania tkanin
    Lenor
    Długotrwała świeżość
    Zapach Sparkling Bloom Yellow Poppy
    Świeżość pościeli
    Energetyzujące aromaty
    Eksplozja kwiatowej świeżości
    Poczucie komfortu
    Ulubiony płyn do zmiękczania tkanin
    Perełki zapachowe
    Lenor Unstoppables

It is interesting that it returned the tags in polish, while the other one was in english. We can measure similarity by adding the tags text to item title. These two items are different obviously, but the scores are high. The difference between manual and wrapped code is attributed to random seed in the chat queries.

In [24]:
sim.similarity(lenor_pl['title_left']+str(polish_query), sok_pl['title_left']+str(polish_query_sok))

tensor(0.9426, grad_fn=<SumBackward1>)

In [27]:
poc(sim, lenor_pl['title_left'], lenor_pl['description_left'], sok_pl['title_right'], sok_pl['description_right'], model_prompt)

tensor(0.9682, grad_fn=<SumBackward1>)

# English data
Similar dataset in english: https://webdatacommons.org/largescaleproductcorpus/v2/index.html . We test few different row samples.

In [29]:
negative = {"brand_left":"amd","brand_right":"gigabyte","category_left":"Computers_and_Accessories","category_right":"Computers_and_Accessories","cluster_id_left":355117,"cluster_id_right":152098,"description_left":"six core technology unlocked multiplier 3 50ghz clock speed 6mb l2 cache 6mb l3 cache hypertransport 3 0 technology 3 year warranty","description_right":"amd 990fx chipset x4 ddr3 x3 pci e x16 x1 pci e x1 x2 pci x6 sata 6gb s x1 gigabit lan x2 usb 3 0 x14 usb 2 0 realtek hd 7 1 audio sli xfire support","id_left":11185963,"id_right":14492431,"identifiers_left":[{"\/productID":"[amdfx6300]"},{"\/mpn":"[fd6300wmhkbox]"}],"identifiers_right":[{"\/mpn":"[ga990xaud3]"},{"\/gtin13":"[4719331818135]"}],"keyValuePairs_left":{"processor number":"6300 black edition","socket":"am3","architecture":"32 nm technology","clock speed":"3 50ghz","cores":"6","cache":"6 mb l2 6mb l3","memory controller":"dual channel ddr3 800 1066 1333 1600mhz","tdp":"95w","heatsink included":"yes","warranty":"3 years"},"keyValuePairs_right":None,"label":0,"pair_id":"11185963#14492431","price_left":None,"price_right":None,"specTableContent_left":"processor number 6300 black edition socket am3 architecture 32 nm technology clock speed 3 50ghz cores 6 cache 6 mb l2 6mb l3 memory controller dual channel ddr3 800 1066 1333 1600mhz tdp 95w heatsink included yes warranty 3 years","specTableContent_right":None,"title_left":"amd piledriver fx 6 six core 6300 black edition 3 50ghz socket am3 processor retail am3plus fd6300wmhkbox novatech","title_right":"gigabyte 990xa ud3 amd 990x socket am3 ddr3 motherboard ocuk"}
positive = {"brand_left":"intel","brand_right":None,"category_left":"Computers_and_Accessories","category_right":"Computers_and_Accessories","cluster_id_left":7209527,"cluster_id_right":7209527,"description_left":None,"description_right":"the intel ssd dc p3600 series is a pcie gen3 ssd architected with the high performance controller interface non volatile memory express nvme delivering leading performance low latency and quality of service matching the performance with world class reliability and endurance intel ssd dc p3600 series offers a range of capacity 1 2 tb in both add in card and 2 5 inch form factor with pcie gen3 support and nvme queuing interface the intel ssd dc p3600 series delivers excellent sequential read performance of up to 2 8 gb s and sequential write speeds of up to 1700 mb s intel ssd dc p3600 series delivers very high random read iops of 450 k and random write iops of 70 k for 4 kb operations taking advantage of the direct path from the storage to the cpu by means of nvme intel ssd dc p3600 series exhibits low latency of less than 20 s for sequential access to the ssd the 2 5 inch intel ssd dc p3600 series takes advantage of the 8639 connector and provides hot pluggable removal and insertion providing in service replacement options","id_left":6023438,"id_right":15856907,"identifiers_left":[{"\/mpn":"[ssdpedme012t401]"}],"identifiers_right":[{"\/gtin8":"[43201830]"},{"\/mpn":"[ssdpedme012t401]"}],"keyValuePairs_left":None,"keyValuePairs_right":None,"label":1,"pair_id":"6023438#15856907","price_left":None,"price_right":None,"specTableContent_left":None,"specTableContent_right":None,"title_left":"intel dc p3600 1 2tb pci e solid state drive ssdpedme012t401 pcpartpicker united kingdom","title_right":"intel solid state drive dc p3600 series 1 2 tb pci ssdpedme012t401 drives ssds cdwg com"}
positive2 = {"brand_left":"hp enterprise","brand_right":"hp enterprise","category_left":"Computers_and_Accessories","category_right":"Computers_and_Accessories","cluster_id_left":2224466,"cluster_id_right":2224466,"description_left":"description proliant bl20p g2 1p 3 06ghz 512mb fc manufacturer part 323146 b21","description_right":"description pl bl20p 3 06 xeon 1p m1 512k sa5i nc7781 ilo manufacturer part 323146 b21","id_left":2673261,"id_right":8786270,"identifiers_left":[{"\/sku":"[323146b21]"},{"\/mpn":"[323146b21]"}],"identifiers_right":[{"\/sku":"[323146b21]"},{"\/mpn":"[323146b21]"}],"keyValuePairs_left":{"category":"proliant server","sub category":"bl20","generation":"g2","part number":"323146 b21","products id":"15400","chassis form factor":"blade","model":"hp proliant bl20p","cache memory installed":"512 kb","ram installed":"512 mb","ram technology":"ddr sdram","ram maximum":"8 gb","networking protocol":"ethernet","controller raid level":"raid 0 raid 1 raid 10 raid 5","weight":"50 lbs","":""},"keyValuePairs_right":{"category":"proliant server","sub category":"bl20","generation":"","part number":"323146 b21","products id":"6184","chassis form factor":"blade","model":"hp proliant bl20p","cache memory installed":"512 kb","ram installed":"512 mb","ram technology":"ddr sdram","ram maximum":"8 gb","networking protocol":"ethernet fast ethernet gb ethernet","storage controller raid level":"raid 0 raid 1 raid 10 raid 5","actual weight":"50 lbs","":""},"label":1,"pair_id":"2673261#8786270","price_left":None,"price_right":None,"specTableContent_left":"specifications category proliant server sub category bl20 generation g2 part number 323146 b21 products id 15400 chassis form factor blade model hp proliant bl20p cache memory installed 512 kb ram installed 512 mb ram technology ddr sdram ram maximum 8 gb networking protocol ethernet controller raid level raid 0 raid 1 raid 10 raid 5 weight 50 lbs","specTableContent_right":"specifications category proliant server sub category bl20 generation part number 323146 b21 products id 6184 chassis form factor blade model hp proliant bl20p cache memory installed 512 kb ram installed 512 mb ram technology ddr sdram ram maximum 8 gb networking protocol ethernet fast ethernet gb ethernet storage controller raid level raid 0 raid 1 raid 10 raid 5 actual weight 50 lbs","title_left":"null , 323146 b21 bl20p g2 1p xeon 3 06ghz wholesale price","title_right":"323146 b21 bl20p xeon 3 06ghz , null wholesale price"}
positive3 = {"brand_left":"intel","brand_right":None,"category_left":"Computers_and_Accessories","category_right":"Computers_and_Accessories","cluster_id_left":74810,"cluster_id_right":74810,"description_left":"quad core with hyperthreading technology 3 60ghz clock speed 22nm process 8mb l3 cache dual channel ddr3 controller integrated hd 4600 graphics 3 year warranty","description_right":"micro intel i7 4790 lga 1150 quad core 3 6ghz 8mb","id_left":10446386,"id_right":17540927,"identifiers_left":[{"\/mpn":"[bx80646i74790]"},{"\/gtin13":"[5032037061551]"}],"identifiers_right":[{"\/sku":"[bx80646i74790]"}],"keyValuePairs_left":None,"keyValuePairs_right":None,"label":1,"pair_id":"10446386#17540927","price_left":None,"price_right":None,"specTableContent_left":None,"specTableContent_right":None,"title_left":"intel core i7 4790 3 60ghz haswell socket lga1150 processor retail processo ocuk","title_right":"micro intel i7 4790 lga 1150"}

In [None]:
query_1 = CHATBOT.query(model_prompt + positive3['description_left'])
print(query_1)

 Product: Quad Core Processor with Hyperthreading Technology

Essential Features:

* Quad core architecture
* Hyperthreading technology
* 3.6 GHz clock speed
* 22nm process technology
* 8MB L3 cache
* Dual channel DDR3 controller
* Integrated HD 4600 graphics

Specifications:

* Clock Speed: 3.6 GHz
* Number of Cores: 4
* Cache Memory: 8 MB L3 cache
* Process Technology: 22nm
* Memory Controller: Dual Channel DDR3
* Graphics: Integrated HD 4600

Unique Aspects:

* Hyperthreading technology allows for increased parallel processing capabilities, resulting in improved performance and efficiency.
* The 22nm process technology used in this processor results in lower power consumption and heat generation compared to older processes.
* The integrated HD 4600 graphics provide decent graphical performance for general use cases, eliminating the need for a separate graphics card in many situations.

Summary:
This quad core processor with hyperthreading technology offers excellent performance and 

In [30]:
query_2 = query_wrapper(model_prompt + positive3['description_right'])
print(query_2)

 Product: Micro Intel i7 4790 LGA 1150 Quad Core 3.6GHz 8MB

Essential Features:

* CPU processor: Intel Core i7-4790
* Socket type: LGA 1150
* Number of cores: Quad-core
* Clock speed: 3.6 GHz
* Cache memory: 8 MB

Specifications:

* Processor architecture: Haswell
* Thermal Design Power (TDP): 84W
* Memory support: DDR3L-1333/1600, DDR3-1066/1333
* Integrated graphics: HD Graphics 4600
* PCI Express lanes: 16

Unique Aspects:

* High-performance quad-core processor for demanding tasks
* Supports up to 32GB of RAM for efficient multitasking
* Integrated graphics card for improved visual performance
* Low power consumption with a TDP of 84W

Summary:
The Micro Intel i7 4790 LGA 1150 Quad Core 3.6GHz 8MB is a powerful processor designed for high-performance computing. It features a quad-core design, 8MB of cache memory, and supports up to 32GB of RAM. With integrated graphics and low power consumption, this processor offers excellent performance while minimizing energy usage. Ideal for 

In [None]:
sim.similarity(str(query_1), str(query_2))

tensor(0.8258, device='cuda:0', grad_fn=<SumBackward1>)

Similar items in dataset have lower similarity than non similar in the previous examples.

In [None]:
query_3 = query_wrapper(model_prompt + negative['description_right'])

In [None]:
print(query_3)

 Product: ASRock 990FX Extreme4 Motherboard

Essential Features:

* AMD 990FX Chipset
* Supports AM3+ CPUs
* 4 DDR3 slots, supporting up to 32GB RAM
* 3 PCI-Express 3.0 x16 slots (x16, x8, x4)
* 1 PCI-Express 2.0 x1 slot
* 2 PCI slots
* SATA 6Gb/s ports (6 ports)
* Gigabit LAN
* 2 USB 3.0 ports
* 14 USB 2.0 ports
* Realtek HD Audio 7.1 channel audio
* SLI and CrossFireX support

Unique Aspects:

* High-quality audio capacitors for improved sound quality
* Dual-Stack MOSFET design for better power delivery and lower temperatures
* 8-layer PCB for improved signal integrity and reduced noise
* Supports 3-Way SLI and Quad-GPU CrossFireX configurations
* Bundled with ASRock's XFast LAN software for improved network performance

Overall, the ASRock 990FX Extreme4 motherboard offers robust features and high-performance capabilities, making it an excellent choice for gaming enthusiasts and overclockers who demand fast processing speeds, ample storage options, and advanced connectivity. Its pre

In [None]:
query_4 = query_wrapper(model_prompt + negative['description_left'])
print(query_4)

 Product: AMD Ryzen 9 5900X

Essential Features:

* 16-core, 32-thread processor
* 50GHz clock speed
* 6MB L2 cache
* 6MB L3 cache
* HyperTransport 3.0 technology
* 3-year warranty

Unique Aspects:

* Unlocked multiplier for overclocking capabilities
* Highest clock speed in its class (50GHz)
* Large L2 and L3 caches for improved performance
* Advanced HyperTransport 3.0 technology for increased bandwidth and low latency

Summary: The AMD Ryzen 9 5900X is a high-performance desktop processor that offers 16 cores and 32 threads, with an impressive clock speed of 50GHz and large L2 and L3 caches. It also features advanced HyperTransport 3.0 technology and comes with a 3-year warranty. Its unlocked multiplier allows for overclocking, making it an attractive option for enthusiasts looking to push their system to the limit.


In [None]:
sim.similarity(str(query_3), str(query_4))

tensor(0.9800, device='cuda:0', grad_fn=<SumBackward1>)

Higher scores with the negative class than positive, which means we need to investigate it. However when runing it wrapped like below, it is a significantly dissimilar example.

In [32]:
pair = negative
poc(sim, pair['title_left'], pair['description_left'], pair['title_right'], pair['description_right'], model_prompt)

tensor(0.5025, grad_fn=<SumBackward1>)

Some items descriptions have Nones instead of text, which need to be handled somehow.

In [34]:
pair = positive
poc(sim, pair['title_left'], pair['description_left'], pair['title_right'], pair['description_right'], model_prompt)

TypeError: ignored

However a positive pair can also have a very low similarity score, which raises many questions.

In [35]:
pair = positive2
poc(sim, pair['title_left'], pair['description_left'], pair['title_right'], pair['description_right'], model_prompt)

tensor(0.4874, grad_fn=<SumBackward1>)

In [36]:
pair = positive3
poc(sim, pair['title_left'], pair['description_left'], pair['title_right'], pair['description_right'], model_prompt)

tensor(0.9090, grad_fn=<SumBackward1>)