# MultiVector Retriever
- 문서당 여러 벡터를 저장하는 것이 유익할 수 있음
- LangChain의 기본 MultiVectorRetriever는 이러한 설정을 쉽게 쿼리할 수 있게 함
- 여러 벡터를 생성하는 방법:
  - 작은 청크: 문서를 작은 청크로 나누고 임베딩 (ParentDocumentRetriever)
  - 요약: 각 문서에 대한 요약을 생성하고, 문서와 함께 (또는 대신) 임베딩
  - 가설 질문: 각 문서가 적절히 답변할 수 있는 가설 질문을 생성하고, 문서와 함께 (또는 대신) 임베딩
- 수동으로 임베딩 추가 가능: 명시적으로 질문이나 쿼리를 추가하여 문서 검색 제어 가능

In [186]:
from langchain.retrievers.multi_vector import MultiVectorRetriever

In [187]:
from langchain.storage import InMemoryByteStore
from langchain_chroma import Chroma
from langchain_community.document_loaders import TextLoader
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from dotenv import load_dotenv
load_dotenv('../dot.env')

True

In [188]:
loaders = [
    TextLoader("./files/train-geon-example-double_org.txt"),
]
docs = []
for loader in loaders:
    docs.extend(loader.load())
text_splitter = RecursiveCharacterTextSplitter(chunk_size=10000)
docs = text_splitter.split_documents(docs)

# 작은 청크
- 더 큰 정보 청크를 검색하고, 더 작은 청크를 임베딩하는 것이 유용할 수 있음
- 이렇게 하면 임베딩이 의미를 최대한 정확하게 포착할 수 있지만, 가능한 한 많은 문맥을 전달할 수 있음
- 이것이 ParentDocumentRetriever가 하는 일임
- 여기서 내부적으로 어떤 일이 일어나는지 보여줌

In [189]:
# The vectorstore to use to index the child chunks
vectorstore = Chroma(
    collection_name="full_documents", embedding_function=OpenAIEmbeddings()
)
# The storage layer for the parent documents
store = InMemoryByteStore()
id_key = "doc_id"
# The retriever (empty to start)
retriever = MultiVectorRetriever(
    vectorstore=vectorstore, #
    byte_store=store,
    id_key=id_key,
)
import uuid
doc_ids = [str(uuid.uuid4()) for _ in docs] #uuid를 이용, docs별 고유 아이디 생성

In [190]:
# The splitter to use to create smaller chunks
# 작은 청크를 만들기위한 재귀형 splitter
child_text_splitter = RecursiveCharacterTextSplitter(chunk_size=400)

In [191]:
sub_docs = []
for i, doc in enumerate(docs):
    _id = doc_ids[i] #위에서 uuid로 생성한 문서 id
    _sub_docs = child_text_splitter.split_documents([doc]) #10000크기의 docs의 개별 문서를 400크기로 잘게 잘라냄
    for _doc in _sub_docs:
        _doc.metadata[id_key] = _id # 같은 doc을 parent로 가지고 있는 _sub_docs는 같은 id를 가지게 됨
    sub_docs.extend(_sub_docs)

# 10000길이의 문서를 400길이로 chunking
- 같은 부모 문서를 갖는 문서는 같은 아이디를 가짐

In [192]:
docs

[Document(metadata={'source': './files/train-geon-example-double_org.txt'}, page_content='지도생성\n지도를 생성 기능 제공 샘플코드\n"""\n<!DOCTYPE HTML>\n<html>\n<head>\n\t<meta charset="utf-8">\n\t<link href="https://developer.geon.kr/js/odf/odf.css" rel="stylesheet">\n\t<script type="text/javascript" src="https://developer.geon.kr/js/odf/odf.min.js"></script>\n</head>\n<body>\n\t<div id="map" class="odf-view"></div>\n</body>\n<script>\n\t/* 맵 타겟 */\n\tvar mapContainer = document.getElementById(\'map\');\n\n\t/* 맵 중심점 */\n\tvar coord = new odf.Coordinate(199312.9996,551784.6924);\n\n\t/* 맵객체 옵션 (외부에서 사용할 때는 proxyURL, proxyParam 옵션이 필요합니다.) */\n\tvar mapOption = {\n\t\tcenter : coord,\n\t\tzoom : 11,\n\t\tprojection : \'EPSG:5186\',\n\t\t//proxyURL: \'proxyUrl.jsp\',\n\t\t//proxyParam: \'url\',\n\t\tbaroEMapURL : \'https://geon-gateway.geon.kr/map/api/map/baroemap\',\n\t\tbaroEMapAirURL : \'https://geon-gateway.geon.kr/map/api/map/ngisair\',\n\n\t\tbasemap : {\n\t\t\tbaroEMap : [\'eMapBasic\', \'eMapAI

In [193]:
sub_docs

[Document(metadata={'source': './files/train-geon-example-double_org.txt', 'doc_id': '19e567c6-6b42-4dfa-a5b5-b439b059ca01'}, page_content='지도생성\n지도를 생성 기능 제공 샘플코드\n"""\n<!DOCTYPE HTML>\n<html>\n<head>\n\t<meta charset="utf-8">\n\t<link href="https://developer.geon.kr/js/odf/odf.css" rel="stylesheet">\n\t<script type="text/javascript" src="https://developer.geon.kr/js/odf/odf.min.js"></script>\n</head>\n<body>\n\t<div id="map" class="odf-view"></div>\n</body>\n<script>\n\t/* 맵 타겟 */\n\tvar mapContainer = document.getElementById(\'map\');'),
 Document(metadata={'source': './files/train-geon-example-double_org.txt', 'doc_id': '19e567c6-6b42-4dfa-a5b5-b439b059ca01'}, page_content='/* 맵 중심점 */\n\tvar coord = new odf.Coordinate(199312.9996,551784.6924);'),
 Document(metadata={'source': './files/train-geon-example-double_org.txt', 'doc_id': '19e567c6-6b42-4dfa-a5b5-b439b059ca01'}, page_content="/* 맵객체 옵션 (외부에서 사용할 때는 proxyURL, proxyParam 옵션이 필요합니다.) */\n\tvar mapOption = {\n\t\tcenter : coor

In [194]:
# 잘게 잘라낸 문서(sub_docs)를 저장하는 vector 저장소
retriever.vectorstore.add_documents(sub_docs) 

# docstore는 문서와 관련된 메타데이터를 저장하는 저장소입니다.
# mset은 doc_ids와 docs를 쌍으로 묶어 docstore에 저장하는 역할을 합니다.
retriever.docstore.mset(list(zip(doc_ids,docs)))

In [195]:
# Vectorstore alone retrieves the small chunks
print(retriever.vectorstore.similarity_search("스와이프 지도 코드")[0])

page_content='4. 현재 보유하고 있는 주소 혹은 좌표를 가지고 있는 엑셀데이터가 있으시면 맵픽에 업로드해보세요. 지도 기반에서 사용자님의 데이터를 한 눈에 파악이 가능하도록 변환해 드립니다.

이외에도 핫스팟 분석 등의 공간분석 기능 이용하여 지도상에서 다양한 활동이 가능하오니 로그인하셔서 나의 데이터를 관리하고 분석해보세요.

더 필요한 사항이 있으시면 알려주세요.

----------------------------------------
메인메뉴
메뉴는 챗봇의 메뉴와 맵픽의 메뉴로 구성되어 있습니다. 챗봇이 메뉴는 사용자의 이해를 돕기 위하여 자주 질문하는 내용으로 구성되어 있습니다. 맵픽의 메뉴는 맵픽소개, 지도, 맵갤러리, 공지사항으로 구성되어 있습니다.' metadata={'doc_id': 'ec31b15a-95a6-4f28-ae7f-764040989f37', 'source': './files/1train-geon-Manual-simple.txt'}


- 기본 검색 유형: 유사성 검색
- LangChain 벡터 저장소는 최대 여백 관련성(Max Marginal Relevance) 검색도 지원
- 검색 유형을 변경하려면 search_type 속성을 설정


# 요약
- 요약은 종종 청크가 무엇에 관한 것인지 더 정확하게 요약할 수 있어 더 나은 검색을 가능하게 합니다.
- 여기서는 요약을 생성하고 이를 임베드하는 방법을 보여줍니다.

In [199]:
import uuid

from langchain_core.documents import Document
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI

In [200]:
# 각 doc에 대한 요약을 생성하기 위한 chain
chain = (
    {"doc": lambda x: x.page_content}
    | ChatPromptTemplate.from_template("Summarize the following document:\n\n{doc}")
    | ChatOpenAI(max_retries=0)
    | StrOutputParser()
)

In [201]:
# docs가 list형태이기에 반복문을 사용하는 것이 아닌 batch 처리
summaries = chain.batch(docs, {"max_concurrency": 5})

In [202]:
summaries

['The document provides sample codes for creating maps, setting background maps, optimizing background maps, and managing custom background maps. It includes information on how to generate maps, set different types of background maps, change map projections for clearer background maps, and customize background maps by adding custom layers. It also demonstrates how to manage custom background map groups and layers, as well as how to use webGL vector tile layers for custom background maps. The sample codes include instructions for creating and configuring maps, setting basemaps, and managing custom basemaps using the odf library.',
 'The document provides code snippets for adding a new group to the base map, creating a webFGLVectorTile layer with custom styles, setting up custom base layers on the map, rebuilding the base map control, and checking and getting information about the current base map layer and group. The second document demonstrates creating a map with zoom and scale contro

In [203]:
# The vectorstore to use to index the child chunks(sub_docs)
vectorstore = Chroma(collection_name="summaries", embedding_function=OpenAIEmbeddings())
# The storage layer for the parent documents
store = InMemoryByteStore()
id_key = "doc_id"
# The retriever (empty to start)
retriever = MultiVectorRetriever(
    vectorstore=vectorstore,
    byte_store=store,
    id_key=id_key,
)
doc_ids = [str(uuid.uuid4()) for _ in docs]

In [204]:
summary_docs = [
    Document(page_content=s, metadata={id_key: doc_ids[i]})
    for i, s in enumerate(summaries)
]
retriever.vectorstore.add_documents(summary_docs)
retriever.docstore.mset(list(zip(doc_ids, docs)))

In [205]:
sub_docs = vectorstore.similarity_search("justice breyer")

In [206]:
sub_docs[0]

Document(metadata={'doc_id': 'cf37eed1-89c5-4529-a3cc-907d4c437f52'}, page_content="The document summarizes President Biden's State of the Union address, focusing on his nomination of Judge Ketanji Brown Jackson to the Supreme Court, immigration reform, protecting women's rights, supporting veterans, and ending cancer. He also discusses unity and progress in the face of challenges and expresses optimism for the future of the nation.")

In [207]:
retrieved_docs = retriever.invoke("배경지도 설정 코드")

In [253]:
print(retrieved_docs)

[Document(metadata={'source': './files/train-geon-example-double_org.txt'}, page_content='/* 배경지도 컨트롤 생성 */\n\tvar basemapControl = new odf.BasemapControl();\n\tbasemapControl.setMap(map);\n\n\t/* 줌 컨트롤 생성 */\n\tvar zoomControl = new odf.ZoomControl();\n\tzoomControl.setMap(map);\n\n\tvar dmc = new odf.DivideMapControl({\n\t\tdualMap : [\n\t\t\t{\n\t\t\t\tposition : 1,\n\t\t\t\tmapOption : {\n\t\t\t\t\t// 해당 분할지도의 basemap 옵션\n\t\t\t\t\t// 정의하지 않는 경우, map 객체 생성시 사용한 basemap option 사용\n\t\t\t\t\tbasemap : {\n\t\t\t\t\t\tbaroEMap : [ \'eMapWhite\' ]\n\t\t\t\t\t},\n\t\t\t\t},\n\t\t\t\t// 사용할 컨트롤 지정\n\t\t\t\t// 정의하지 않는 경우, 기본 값 적용(배경지도 컨트롤만 이용)\n\t\t\t\tcontrolOption : {\n\t\t             basemap: true,// 기본값 true\n\t\t             zoom: false,// 기본값 false\n\t\t             clear: false,// 기본값 false\n\t\t             download: false,// 기본값 false\n\t\t             print: false,// 기본값 false\n\t\t             overviewmap: false,// 기본값 false\n\t\t             draw: false,// 기본값 false\n\t\t     

In [209]:
len(retrieved_docs[0].page_content)

9975

In [210]:
functions = [
    {
        "name": "hypothetical_questions",
        "description": "Generate hypothetical questions",
        "parameters": {
            "type": "object",
            "properties": {
                "questions": {
                    "type": "array",
                    "items": {"type": "string"},
                },
            },
            "required": ["questions"],
        },
    }
]

In [211]:
from langchain.output_parsers.openai_functions import JsonKeyOutputFunctionsParser

chain = (
    {"doc": lambda x: x.page_content}
    # Only asking for 3 hypothetical questions, but this could be adjusted
    | ChatPromptTemplate.from_template(
        "Generate a list of exactly 3 hypothetical questions that the below document could be used to answer:\n\n{doc}"
    )
    | ChatOpenAI(max_retries=0, model="gpt-4o").bind( #모델에 tool을 묶어 주는 방식
        functions=functions, function_call={"name": "hypothetical_questions"}
    )
    | JsonKeyOutputFunctionsParser(key_name="questions")
)

In [212]:
# batch method를 이용, 문서 리스트를 받아 각 문서에 대해 가설 질문을 생성
hypothetical_questions = chain.batch(docs, {"max_concurrency": 5})

In [213]:
hypothetical_questions

[['How can I create a map using the provided sample code?',
  'What options are available for setting the background map?',
  'How can I customize the base map with user-defined layers?'],
 ['How can I add a new group and remove an existing group in a basemap control?',
  'What are the steps to create and configure a webFGLVectorTile layer in a map?',
  'How can I add zoom and scale controls to an ODF map instance?'],
 ['What types of map controls and functionalities are demonstrated in the document?',
  'How can you enable or disable the drag and zoom functionalities for the map?',
  'What are the steps to add a drawing tool to the map for creating various shapes?'],
 ['How can I measure the distance between two points on the map?',
  "Is it possible to display the coordinates of the user's mouse position on the map?",
  'Can I create a split-screen map view with different layers in each section?'],
 ['How can I create a map with multiple layers and controls using this code?',
  'What

In [214]:
# The vectorstore to use to index the child chunks
vectorstore = Chroma(
    collection_name="hypo-questions", embedding_function=OpenAIEmbeddings()
)
# The storage layer for the parent documents
store = InMemoryByteStore()
id_key = "doc_id"
# The retriever (empty to start)
retriever = MultiVectorRetriever(
    vectorstore=vectorstore,
    byte_store=store,
    id_key=id_key,
)
doc_ids = [str(uuid.uuid4()) for _ in docs]

In [215]:
question_docs = []
for i, question_list in enumerate(hypothetical_questions):
    question_docs.extend(
        [Document(page_content=s, metadata={id_key: doc_ids[i]}) for s in question_list]
    )

In [216]:
# vectore store에는 요약, 질의, 잘게 자른 문서가 저장됨
retriever.vectorstore.add_documents(question_docs)
# docstore에는 인덱스와 함께 긴 문서가 저장됨
retriever.docstore.mset(list(zip(doc_ids, docs)))

In [217]:
sub_docs = vectorstore.similarity_search("justice breyer")

In [218]:
print(sub_docs[0].page_content)

What qualifications and background does Judge Ketanji Brown Jackson have that make her a strong nominee for the Supreme Court?


In [223]:
!ollama list

NAME                                                        	ID          	SIZE  	MODIFIED     
codestral:22b-v0.1-q8_0                                     	d805f7d07b03	23 GB 	23 hours ago	
mistral-nemo:12b-instruct-2407-q8_0                         	550a4a7f593a	13 GB 	24 hours ago	
llama3.1:8b-instruct-q8_0                                   	9b90f0f552e7	8.5 GB	24 hours ago	
llama-3-bllossom:8b-q8                                      	d7f671164480	8.5 GB	24 hours ago	
nomic-embed-text:latest                                     	0a109f422b47	274 MB	3 days ago  	
spow12_Qwen2-7B-ko-Instruct-orpo-ver_2.0_wo_chat_Q8_0:latest	b543bb00f540	8.1 GB	4 days ago  	
spow12_Qwen2-7B-ko-Instruct-orpo-ver_2.0_wo_chat-F16:latest 	3713f91a3557	15 GB 	5 days ago  	
joongi007_Ko-Qwen2-7B-Instruct-GGUF:latest                  	7f2cc98a7ef4	15 GB 	5 days ago  	
QuantFactory_ko-gemma-2-9b-it-GGUF:latest                   	8f110e7f7e3c	9.8 GB	5 days ago  	


In [250]:
from langchain import hub
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_ollama import OllamaLLM



# prompt = hub.pull("rlm/rag-prompt")

# template = ''' 
# This is a GeOnPaaS solution developer support chatbot."\n"
# You must generate code using the given (question) and (context)."\n""\n"
# Please strictly adhere to the following conditions:"\n"
# Condition 1. The code must satisfy the syntax of html, javascript, and css."\n"
# Condition 2. When changing the value of odf.Coordinate, pay attention to the value of projection."\n"
# Condition 3. Remove duplicate code."\n"
# Condition 4. The answer must be executable and should not be an explanation."\n""\n"
# You must answer in Korean."\n""\n"

# (question): "\n"
# {question}

# (context): "\n"
# {context}

# Answer:
# '''

template = ''' 
이것은 GeOnPaaS 솔루션 개발자 지원 챗봇입니다."\n"
주어진 (question)과 (context)을 사용하여 코드를 생성해야 합니다."\n""\n"
다음 조건을 엄격히 준수하십시오:"\n"
조건 1. 코드는 html, javascript, css의 문법을 만족해야 합니다."\n"
조건 2. odf.Coordinate의 값을 변경할 때, projection 값을 주의하십시오."\n"
조건 3. 중복 코드를 제거하십시오."\n"
조건 4. 답변은 실행 가능해야 하며 설명이 되어서는 안 됩니다."\n""\n"
답변은 한국어로 작성해야 합니다."\n""\n"

(question): "\n"
{question}

(context): "\n"
{context}

Answer:
'''

prompt = PromptTemplate(template=template,
                        input_variables = ['question', 'context'])

# 모델 사용 시 주의점: instruct -> 코드 생성에 적합지 않음
# llm = ChatOpenAI(model = "gpt-4o")
# llm = OllamaLLM(model = "codestral:22b-v0.1-q8_0")
# llm = OllamaLLM(model = "mistral-nemo:12b-instruct-2407-q8_0",
#                 temperature=0.0)
# llm = OllamaLLM(model = "llama-3-bllossom:8b-q8",
#                 temperature=0.0)
llm = OllamaLLM(model = "QuantFactory_ko-gemma-2-9b-it-GGUF",
                temperature=0.0)
                

final_chain = (
    {"question": RunnablePassthrough(),
    "context": retriever}
    | prompt
    | llm
    | StrOutputParser()
)
# final_chain.invoke("배경지도 설정 코드")

In [251]:
retriever.invoke("배경지도 설정 코드")

[Document(metadata={'source': './files/train-geon-example-double_org.txt'}, page_content='지도생성\n지도를 생성 기능 제공 샘플코드\n"""\n<!DOCTYPE HTML>\n<html>\n<head>\n\t<meta charset="utf-8">\n\t<link href="https://developer.geon.kr/js/odf/odf.css" rel="stylesheet">\n\t<script type="text/javascript" src="https://developer.geon.kr/js/odf/odf.min.js"></script>\n</head>\n<body>\n\t<div id="map" class="odf-view"></div>\n</body>\n<script>\n\t/* 맵 타겟 */\n\tvar mapContainer = document.getElementById(\'map\');\n\n\t/* 맵 중심점 */\n\tvar coord = new odf.Coordinate(199312.9996,551784.6924);\n\n\t/* 맵객체 옵션 (외부에서 사용할 때는 proxyURL, proxyParam 옵션이 필요합니다.) */\n\tvar mapOption = {\n\t\tcenter : coord,\n\t\tzoom : 11,\n\t\tprojection : \'EPSG:5186\',\n\t\t//proxyURL: \'proxyUrl.jsp\',\n\t\t//proxyParam: \'url\',\n\t\tbaroEMapURL : \'https://geon-gateway.geon.kr/map/api/map/baroemap\',\n\t\tbaroEMapAirURL : \'https://geon-gateway.geon.kr/map/api/map/ngisair\',\n\n\t\tbasemap : {\n\t\t\tbaroEMap : [\'eMapBasic\', \'eMapAI

In [252]:
query = "배경지도 설정 코드"
print(final_chain.invoke(query))

This HTML and JavaScript code snippet demonstrates how to create and synchronize multiple maps using the Open Data Framework (ODF) library. 

Here's a breakdown of what the code does:

**1. HTML Structure:**

- The HTML sets up four divs (`map1`, `map2`, `map3`, `map4`) where the maps will be displayed.
- It also includes a button labeled "동기화여부 변경" (which translates to "Synchronization Status Change") that controls the synchronization between the maps.

**2. JavaScript Logic:**

- **Map Initialization:**
    -  `coord`: Defines a coordinate object for the initial map center.
    - `mapOption`: Contains configuration options for each map:
        - `center`: The initial map center (defined by `coord`).
        - `zoom`: Initial zoom level (11).
        - `projection`: Map projection (`EPSG:5186`).
        - `baroEMapURL`, `baroEMapAirURL`: URLs for accessing basemap tiles from the Korean Geospatial Information Authority (KIGAM).
        - `basemap`: Specifies the available basemaps to 

In [244]:
query = "배경지도 설정 코드에 스와이프 기능 추가해줘"
print(final_chain.invoke(query))

It appears that you have provided two code snippets, both of which are HTML files with JavaScript code. The first snippet is a sample code for managing user-defined background maps (webGL vector tile layers) and the second snippet is a basic map example.

To answer your question, I'll provide a brief explanation of how to use the `basemapControl` object in the context of the provided code snippets.

**Setting up the Basemap Control**

In both code snippets, you need to create an instance of the `BasemapControl` class and set it to the map object using the `setMap()` method:
```javascript
var basemapControl = new odf.BasemapControl();
basemapControl.setMap(map);
```
**Adding a New Group**

To add a new group to the basemap, use the `setGrp()` method:
```javascript
basemapControl.setGrp('myGrp');
```
This will create a new group with the name "myGrp".

**Removing a Group**

To remove a group from the basemap, use the `removeGrp()` method:
```javascript
basemapControl.removeGrp('myGrp');


In [232]:
query = "스와이프 기능과 분할지도 기능이 포함된 지도를 보여줘"
print(final_chain.invoke(query))

Here are the provided code snippets with proper formatting and explanations:

**1. Swiper Control Sample Code**

This code demonstrates how to use the `SwiperControl` in ODF (Open Data Framework) to control layers on a map.

```html
<!DOCTYPE HTML>
<html>
<head>
	<meta charset="utf-8">
	<link href="https://developer.geon.kr/js/odf/odf.css" rel="stylesheet">
	<script type="text/javascript" src="https://developer.geon.kr/js/odf/odf.min.js"></script>
</head>
<body>
	<div id="map"></div>
	<div style="margin-top: 15px">
		<input type="button" class="onoffOnlyBtn toggle grp1" onclick="changeStrictMode()" value="엄격모드 토글">
	</div>
</body>
<script>

// Map and control initialization
var mapContainer = document.getElementById('map');
var coord = new odf.Coordinate(127.0486, 37.509);
var mapOption = {
	center: coord,
	zoom: 10,
	projection: 'EPSG:4326',
	baroEMapURL: 'https://geon-gateway.geon.kr/map/api/map/baroemap',
	basemap: {
		baroEMap: ['eMapBasic', 'eMapAIR', 'eMapColor', 'eMapWhite']
	}
