-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
这里代码写错了,把最重要的corpus_id位置的文本丢掉了。 #35
Comments
啥意思?应该咋改 |
我这边用了faiss向量库和其他embeding模型算法。所以截取不到代码了,大致修改如下: if self.num_expand_context_chunk > 0:
new_reference_results = []
for corpus_id, hit_chunk in hit_chunk_dict.items():
expanded_reference = self.sim_model.corpus.get(corpus_id - 1, '') + hit_chunk
for i in range(0, self.num_expand_context_chunk+1, 1):
expanded_reference += self.sim_model.corpus.get(corpus_id + i , '')
new_reference_results.append(expanded_reference)` |
另外,当文章内容很少时,这个机制会导致Prompt的内容大量重复,比如基于corpus_id=3去拉取上下文,有可能刚好拉取到了已经在相关列表中的chunks. 那这样提供的prompt正文参考,会出现大量重复的chunk。 |
好,我修复下。 |
好牛! |
fixed. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
代码行数 425, chatpdf.py中,上下扩充检索内容时,把自己给去掉了。
expanded_reference += self.sim_model.corpus.get(corpus_id + i + 1, '')
The text was updated successfully, but these errors were encountered: