Skip to content

Commit

Permalink
community[patch]: Fixed duplicate input id issue in clarifai vectorst…
Browse files Browse the repository at this point in the history
…ore (#14914)

- **Description:** 
This PR fixes the issue faces with duplicate input id in Clarifai
vectorstore class when ingesting documents into the vectorstore more
than the batch size.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
  • Loading branch information
mogith-pn and baskaryan committed Dec 20, 2023
1 parent 5642132 commit c53fab6
Showing 1 changed file with 8 additions and 6 deletions.
14 changes: 8 additions & 6 deletions libs/community/langchain_community/vectorstores/clarifai.py
Original file line number Diff line number Diff line change
Expand Up @@ -116,21 +116,23 @@ def add_texts(
batch_metadatas = (
metadatas[idx : idx + batch_size] if metadatas else None
)
if ids is None:
batch_ids = [uuid.uuid4().hex for _ in range(len(batch_texts))]
else:
batch_ids = ids[idx : idx + batch_size]
if batch_metadatas is not None:
meta_list = []
for meta in batch_metadatas:
meta_struct = Struct()
meta_struct.update(meta)
meta_list.append(meta_struct)
if ids is None:
ids = [uuid.uuid4().hex for _ in range(len(batch_texts))]
input_batch = [
input_obj.get_text_input(
input_id=ids[id],
raw_text=inp,
metadata=meta_list[id] if batch_metadatas else None,
input_id=batch_ids[i],
raw_text=text,
metadata=meta_list[i] if batch_metadatas else None,
)
for id, inp in enumerate(batch_texts)
for i, text in enumerate(batch_texts)
]
result_id = input_obj.upload_inputs(inputs=input_batch)
input_job_ids.extend(result_id)
Expand Down

0 comments on commit c53fab6

Please sign in to comment.