## 4. Vector db
임베딩 모델을 활용한 문서 벡터화, 적재

<img style="float: right;" src="../img/logo.png" width="120"><br>

<div style="text-align: right"> <b>Kwang Myung Yu</b></div>
<div style="text-align: right"> Initial issue : 2025.11.16 </div>
<div style="text-align: right"> last update : 2025.11.16 </div>

개정 이력  
- `2025.11.16` : 노트북 초기 생성 

In [1]:
from dotenv import load_dotenv

load_dotenv()

True

In [2]:
import os
import pandas as pd
import numpy as np
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain_google_genai import GoogleGenerativeAI
from langchain_pinecone import PineconeEmbeddings
from rag_pkg.utils.path import RAW_DATA_PATH, INTERMEDIATE_DATA_PATH, PROCESSED_DATA_PATH
from rag_pkg.module.preprocess import preprocess_for_rag
from rag_pkg.module.vector_db import load_documents, get_vector_store

### 1. 데이터 로드

In [3]:
review_data_path = RAW_DATA_PATH / "whisky_reviews.csv"

reviews = pd.read_csv(review_data_path)

### 2. 데이터 전처리

In [4]:
review_processed = preprocess_for_rag(reviews, min_comments=2)
print(review_processed['document_text'].iloc[0])

위스키 이름: Springbank10-year-old
태그: Green-House

향(Nose) [점수: 94.0]: The nose is full of aromatic power. We still have a little touch of solvent.\nLeather, orange peel, star anise.\nPlum, prunes, dates, figs, clementine, passion fruit peel.\nOld dry wood, dust, old book.\nWe have aromas of old rum, Demerara, and almost a little cane sugar.

맛(Taste) [점수: 96.0]: On the palate it is surprisingly fresh and "almost" light.\nFresh apricot, pineapple, passion fruit, guava, papaya.\nIt is very tropical.\nBut the dominant remains woody, with pretty spices.\nCloves, anise, pepper, bitter chocolate, cinnamon, nutmeg. They are all there.\nWe have fresh mint, some dry aromatic herbs.\nA little barbecue charcoal, smoke.

피니쉬(Finish) [점수: 95.0]: Long finish on liquorice, camphor, smoke, ash and barbecue, light peat.\nPepper, cloves, fresh mint.\nIt's long, comforting, it feels like you're at the edge of the fireplace.


### 3. Document load

In [5]:
documents = load_documents(
    df=review_processed,
    document_text_col="document_text"
)

In [6]:
print(len(documents))
documents[:5]

716


[Document(metadata={'whisky_name': 'Springbank10-year-old', 'link': 'https://www.whiskybase.com/whiskies/whisky/41678/springbank-10-year-old', 'tags': 'Green-House', 'nose_score': 94.0, 'taste_score': 96.0, 'finish_score': 95.0}, page_content='위스키 이름: Springbank10-year-old\n태그: Green-House\n\n향(Nose) [점수: 94.0]: The nose is full of aromatic power. We still have a little touch of solvent.\\nLeather, orange peel, star anise.\\nPlum, prunes, dates, figs, clementine, passion fruit peel.\\nOld dry wood, dust, old book.\\nWe have aromas of old rum, Demerara, and almost a little cane sugar.\n\n맛(Taste) [점수: 96.0]: On the palate it is surprisingly fresh and "almost" light.\\nFresh apricot, pineapple, passion fruit, guava, papaya.\\nIt is very tropical.\\nBut the dominant remains woody, with pretty spices.\\nCloves, anise, pepper, bitter chocolate, cinnamon, nutmeg. They are all there.\\nWe have fresh mint, some dry aromatic herbs.\\nA little barbecue charcoal, smoke.\n\n피니쉬(Finish) [점수: 95.0]:

### 4. Vector db

In [7]:
embedding = GoogleGenerativeAIEmbeddings(model="gemini-embedding-001")

In [8]:
# 임베딩 테스트
vector_test = embedding.embed_query("hell0, world!")
print(vector_test[:5])
print(len(vector_test))

[-0.023279504850506783, -0.004911753814667463, 9.782671440916602e-06, -0.061770886182785034, -0.0026196828112006187]
3072


In [9]:
# embedding = PineconeEmbeddings(model="multilingual-e5-large")

In [10]:
# # 임베딩 테스트
# vector_test = embedding.embed_query("hell0, world!")
# print(vector_test[:5])
# print(len(vector_test))

In [13]:
vector_store = get_vector_store(
    documents = documents,
    embedding=embedding,
    type = "faiss",
    dimension=3072
)

In [14]:
retriever = vector_store.as_retriever()

In [15]:
retriever.invoke("헤비한 육향이나 거칠고 진득한 스모크 계열을 선호해요.")

[Document(id='39ad4619-39bf-4a53-be8d-f48af1ebec25', metadata={'whisky_name': 'Ardbeg1966 MI', 'link': 'https://www.whiskybase.com/whiskies/whisky/8672/ardbeg-1966-mi', 'tags': 'Chocolate,Citric,Dried Fruit,Sherried,Smokey,Tobacco,Hay-like,Kippery,Leathery,Medicinal,New Wood,Nutty,Toasted'}, page_content='위스키 이름: Ardbeg1966 MI\n태그: Chocolate,Citric,Dried Fruit,Sherried,Smokey,Tobacco,Hay-like,Kippery,Leathery,Medicinal,New Wood,Nutty,Toasted\n\n향(Nose): Shoe polish, hansaplast, dirty smoke, coal tar, soot, wood tar, sweet spices, dark wood, beeswax\n\n맛(Taste): Sweet spices, dirty smoke, wood tar, dark sugar, dark wood, chocolate, a bit of salmiak\n\n피니쉬(Finish): Sweet dirty smoke, sweet spices, dark sugar, dark wood, wood tar'),
 Document(id='f7a0c660-5f47-4e30-bac8-a8f326b28fdf', metadata={'whisky_name': 'Ardbeg1966 MI', 'link': 'https://www.whiskybase.com/whiskies/whisky/8671/ardbeg-1966-mi', 'tags': 'Oily,Chocolate,Citric,Fresh Fruit,Hay-like,Leafy,Mossy,New Wood,Smokey,Tobacco'}, 

In [16]:
retriever.invoke("Bowmore 1964 Black 위스키에 대해 알려주세요. 어떤 맛과 향이 두드러지나요?")

[Document(id='0246accc-1f3d-4e61-9a56-ef384295759e', metadata={'whisky_name': 'Bowmore1964 Black', 'link': 'https://www.whiskybase.com/whiskies/whisky/3681/bowmore-1964-black', 'tags': 'New Wood,Yeasty'}, page_content='위스키 이름: Bowmore1964 Black\n태그: New Wood,Yeasty\n\n향(Nose): Great! Greasy, voluminous and yet round, only very subtle smoke from Bowmore, in the form of extinguished incense sticks, lots of camphor, saffron, cumin, freshly grated cocoa and coffee beans, dark toffee, nougat, old Armagnac, Demerara sugar, sweet old tobacco leaves , candied ginger, plum and black cherry, sweet liquorice, a little chamomile in between, blackberries, marzipan, fresh coffee beans become more intense over time and dark chocolate becomes darker\n\n맛(Taste): Lots of high-quality, freshly poured coffee, camphor, nutmeg, licorice, plus great sweetness from freshly squeezed pomegranate, bitter and also blood orange, tobacco, cumin, ginger, beautiful anise, alcohol super gently integrated and yet very

In [17]:
retriever.invoke("Highland Park 1958과 비슷한 느낌의 위스키를 추천받고 싶어요. 어떤 걸 마셔보면 좋을까요?")

[Document(id='e30936b0-b51f-4652-a35f-447ee5343d02', metadata={'whisky_name': 'Highland Park1958', 'link': 'https://www.whiskybase.com/whiskies/whisky/16477/highland-park-1958', 'tags': 'Sherried'}, page_content="위스키 이름: Highland Park1958\n태그: Sherried\n\n향(Nose): A full bed of heather in springtime bloom, wild honey fresh from the hive, blood oranges, sandalwood, blood oranges and tangerines, sandalwood and pine log smoke\n\n맛(Taste): Pine sap, fir honey, roasted chestnuts, more mango, tangerines and blood oranges, Manuka honey on a croissant, liquorice, all-spice\n\n피니쉬(Finish): Stunning, long, resinous, warming. more heather-honey, smoke, espresso, roasted hazelnuts, dark fudge, it's a session killer"),
 Document(id='7955b73b-2134-4b6b-ad56-676f9aca5c69', metadata={'whisky_name': 'Highland Park1958', 'link': 'https://www.whiskybase.com/whiskies/whisky/45818/highland-park-1958', 'tags': 'Chocolate,Citric,Dried Fruit,Fresh Fruit,Honey,Malt Extract,New Wood,Sherried,Smokey,Solvent,Vani

In [18]:
retriever.invoke("어두운 초콜릿 느낌과 약간의 셰리 풍미를 함께 느껴보고 싶어요. 스모크는 적었으면 합니다.")

[Document(id='6cd52095-e893-4dd6-bac8-04ff844263f5', metadata={'whisky_name': 'Glen Grant1957 GM', 'link': 'https://www.whiskybase.com/whiskies/whisky/210218/glen-grant-1957-gm', 'tags': 'Chocolate'}, page_content='위스키 이름: Glen Grant1957 GM\n태그: Chocolate\n\n향(Nose): It starts with plenty of fruit, which goes towards all kinds of berries at first: red berries, raspberries, and blackberries. But also sweet cherries and red apples. Then loads of dark chocolate. Rich honey and hazelnuts are in the background. Later also pine needles and a hint of furniture polish.\n\n맛(Taste): Dark chocolate and orange marmalade. Some lovely rich honey. Damped wood and tobacco, followed by Earl Grey tea and eucalyptus. Then cranberries. A touch of oak. Liquorice. A hint of cinnamon and crushed black peppercorns.\n\n피니쉬(Finish): Very long and somewhat dry, with dark chocolate, toffee and tobacco, but also thyme and cranberries and a hint of oak and black pepper.'),
 Document(id='f0f1c3cd-dfdf-4688-a679-64b

In [19]:
vector_store = get_vector_store(
    documents = documents,
    embedding=embedding,
    type = "chroma"
)

In [20]:
retriever = vector_store.as_retriever()

In [21]:
retriever.invoke("헤비한 육향이나 거칠고 진득한 스모크 계열을 선호해요.")

[Document(metadata={'tags': 'Chocolate,Citric,Dried Fruit,Sherried,Smokey,Tobacco,Hay-like,Kippery,Leathery,Medicinal,New Wood,Nutty,Toasted', 'whisky_name': 'Ardbeg1966 MI', 'link': 'https://www.whiskybase.com/whiskies/whisky/8672/ardbeg-1966-mi'}, page_content='위스키 이름: Ardbeg1966 MI\n태그: Chocolate,Citric,Dried Fruit,Sherried,Smokey,Tobacco,Hay-like,Kippery,Leathery,Medicinal,New Wood,Nutty,Toasted\n\n향(Nose): Shoe polish, hansaplast, dirty smoke, coal tar, soot, wood tar, sweet spices, dark wood, beeswax\n\n맛(Taste): Sweet spices, dirty smoke, wood tar, dark sugar, dark wood, chocolate, a bit of salmiak\n\n피니쉬(Finish): Sweet dirty smoke, sweet spices, dark sugar, dark wood, wood tar'),
 Document(metadata={'link': 'https://www.whiskybase.com/whiskies/whisky/8671/ardbeg-1966-mi', 'whisky_name': 'Ardbeg1966 MI', 'tags': 'Oily,Chocolate,Citric,Fresh Fruit,Hay-like,Leafy,Mossy,New Wood,Smokey,Tobacco'}, page_content="위스키 이름: Ardbeg1966 MI\n태그: Oily,Chocolate,Citric,Fresh Fruit,Hay-like,Le

In [22]:
retriever.invoke("Bowmore 1964 Black 위스키에 대해 알려주세요. 어떤 맛과 향이 두드러지나요?")

[Document(metadata={'whisky_name': 'Bowmore1964 Black', 'tags': 'New Wood,Yeasty', 'link': 'https://www.whiskybase.com/whiskies/whisky/3681/bowmore-1964-black'}, page_content='위스키 이름: Bowmore1964 Black\n태그: New Wood,Yeasty\n\n향(Nose): Great! Greasy, voluminous and yet round, only very subtle smoke from Bowmore, in the form of extinguished incense sticks, lots of camphor, saffron, cumin, freshly grated cocoa and coffee beans, dark toffee, nougat, old Armagnac, Demerara sugar, sweet old tobacco leaves , candied ginger, plum and black cherry, sweet liquorice, a little chamomile in between, blackberries, marzipan, fresh coffee beans become more intense over time and dark chocolate becomes darker\n\n맛(Taste): Lots of high-quality, freshly poured coffee, camphor, nutmeg, licorice, plus great sweetness from freshly squeezed pomegranate, bitter and also blood orange, tobacco, cumin, ginger, beautiful anise, alcohol super gently integrated and yet very buttery and present\n\n피니쉬(Finish): Anise,

In [23]:
retriever.invoke("Highland Park 1958과 비슷한 느낌의 위스키를 추천받고 싶어요. 어떤 걸 마셔보면 좋을까요?")

[Document(metadata={'whisky_name': 'Highland Park1958', 'tags': 'Sherried', 'link': 'https://www.whiskybase.com/whiskies/whisky/16477/highland-park-1958'}, page_content="위스키 이름: Highland Park1958\n태그: Sherried\n\n향(Nose): A full bed of heather in springtime bloom, wild honey fresh from the hive, blood oranges, sandalwood, blood oranges and tangerines, sandalwood and pine log smoke\n\n맛(Taste): Pine sap, fir honey, roasted chestnuts, more mango, tangerines and blood oranges, Manuka honey on a croissant, liquorice, all-spice\n\n피니쉬(Finish): Stunning, long, resinous, warming. more heather-honey, smoke, espresso, roasted hazelnuts, dark fudge, it's a session killer"),
 Document(metadata={'nose_score': 89.0, 'tags': 'Chocolate,Citric,Dried Fruit,Fresh Fruit,Honey,Malt Extract,New Wood,Sherried,Smokey,Solvent,Vanilla', 'taste_score': 90.0, 'finish_score': 92.0, 'whisky_name': 'Highland Park1958', 'link': 'https://www.whiskybase.com/whiskies/whisky/45818/highland-park-1958'}, page_content='위스

In [24]:
embedding = PineconeEmbeddings(model="multilingual-e5-large")

vector_store = get_vector_store(
    documents = documents,
    embedding=embedding,
    type = "pinecone",
    dimension=1024
)

✓ Pinecone 인덱스 'whisky-reviews'에 716개 문서 추가 완료


In [25]:
retriever = vector_store.as_retriever()

In [26]:
retriever.invoke("헤비한 육향이나 거칠고 진득한 스모크 계열을 선호해요.")

[Document(id='fd4ecfa3-1892-4445-b75d-d0201e0175dc', metadata={'finish_score': 89.0, 'link': 'https://www.whiskybase.com/whiskies/whisky/6366/linlithgow-1973', 'nose_score': 93.0, 'tags': 'Hay-like,Honey,Leafy,Malt Extract,Tobacco', 'taste_score': 90.0, 'whisky_name': 'Linlithgow1973'}, page_content='위스키 이름: Linlithgow1973\n태그: Hay-like,Honey,Leafy,Malt Extract,Tobacco\n\n향(Nose) [점수: 93.0]: Honey, sweet nectar, tropical fruits, mango, passion fruit, pineapple, ripe bananas, fine notes of oak wood, little peppery, whiffs of vanilla - great!\n\n맛(Taste) [점수: 90.0]: Wow, very punchy, oily, creamy, good sweetness and fruitiness, peppery and zesty, more on grapefruit, some oak wood, little leafy notes - very good\n\n피니쉬(Finish) [점수: 89.0]: Long, burning, a bit more leafy and probably even a tad bitter but not much'),
 Document(id='fedafc46-c753-4def-b682-af3544d184f1', metadata={'link': 'https://www.whiskybase.com/whiskies/whisky/12346/clynelish-24-year-old-ca', 'whisky_name': 'Clynelish24

In [27]:
retriever.invoke("Bowmore 1964 Black 위스키에 대해 알려주세요. 어떤 맛과 향이 두드러지나요?")

[Document(id='a3f28dfd-3962-45e3-b93a-63d500d98d6e', metadata={'finish_score': 90.0, 'link': 'https://www.whiskybase.com/whiskies/whisky/639/bowmore-1964-black', 'nose_score': 94.0, 'tags': 'Malt Extract', 'taste_score': 92.0, 'whisky_name': 'Bowmore1964 Black'}, page_content='위스키 이름: Bowmore1964 Black\n태그: Malt Extract\n\n향(Nose) [점수: 94.0]: Dried fruits galore, plums, dates, figs, prunes, little bit of raisins and also whiffs of resin, some leafy or even a bit grassy notes, acacia honey, deep sherry wood, hints of burned sugar, old books - great\n\n맛(Taste) [점수: 92.0]: Little punchy, oily, liquorice, sugar cane, subtle sweetness, hints of dried fruits, toasted bread, dry oak wood, dark chocolate - very good\n\n피니쉬(Finish) [점수: 90.0]: Medium long, warm, little sweetness, more oak wood coming up, little bit drying'),
 Document(id='ea018fe0-05f2-4d6a-b6b2-f81c6b90db45', metadata={'finish_score': 93.0, 'link': 'https://www.whiskybase.com/whiskies/whisky/8341/bowmore-1964-black', 'nose_sc

In [28]:
retriever.invoke("Highland Park 1958과 비슷한 느낌의 위스키를 추천받고 싶어요. 어떤 걸 마셔보면 좋을까요?")

[Document(id='25f3b313-8b6f-42d2-9e33-a098527f2877', metadata={'link': 'https://www.whiskybase.com/whiskies/whisky/16477/highland-park-1958', 'tags': 'Sherried', 'whisky_name': 'Highland Park1958'}, page_content="위스키 이름: Highland Park1958\n태그: Sherried\n\n향(Nose): A full bed of heather in springtime bloom, wild honey fresh from the hive, blood oranges, sandalwood, blood oranges and tangerines, sandalwood and pine log smoke\n\n맛(Taste): Pine sap, fir honey, roasted chestnuts, more mango, tangerines and blood oranges, Manuka honey on a croissant, liquorice, all-spice\n\n피니쉬(Finish): Stunning, long, resinous, warming. more heather-honey, smoke, espresso, roasted hazelnuts, dark fudge, it's a session killer"),
 Document(id='ba8b3119-9df4-432d-a148-7a47079d3f2e', metadata={'finish_score': 92.0, 'link': 'https://www.whiskybase.com/whiskies/whisky/45818/highland-park-1958', 'nose_score': 89.0, 'tags': 'Chocolate,Citric,Dried Fruit,Fresh Fruit,Honey,Malt Extract,New Wood,Sherried,Smokey,Solven

In [29]:
retriever.invoke("어두운 초콜릿 느낌과 약간의 셰리 풍미를 함께 느껴보고 싶어요. 스모크는 적었으면 합니다.")

[Document(id='1d6e16bb-6f53-44bf-ba1f-f020b820f23f', metadata={'link': 'https://www.whiskybase.com/whiskies/whisky/8793/tobermory-1972-mi', 'tags': 'Dried Fruit,Chocolate,Coal-gas,Cooked Fruit,Leathery,Nutty,Old Wood,Sherried', 'whisky_name': 'Tobermory1972 MI'}, page_content="위스키 이름: Tobermory1972 MI\n태그: Dried Fruit,Chocolate,Coal-gas,Cooked Fruit,Leathery,Nutty,Old Wood,Sherried\n\n향(Nose): Wow ... What's that? First of all, I have smoke. Greasy smoke, not very strong, but noticeable. In addition, heavy sherry with its dark fruits. Plums wrapped in bacon. Grilled bananas with cinnamon. Forest soil with porcini mushrooms. Traditional balsamic vinegar, old and viscous. Some pot roast, served in a dusty library. Porrödes leather. It is fascinating that with the old, set and heavy impression, even a blood orange finds its way into the nose. Occasionally she comes through, but quickly loses behind leather, plum and bacon again.  Time weakens the smoke a bit. To the plums come over-ripe d