Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add JsonStorage and ability to chat with References #127

Merged
merged 19 commits into from
Jun 13, 2023

Conversation

gjreda
Copy link
Collaborator

@gjreda gjreda commented Jun 9, 2023

fixes #102 and fixes #82

Note that you will need to upload some PDFs and rerun PDF ingestion first. That will create a references.json which is needed for chat interactions.

There is more backend work to do related to chat, but I've captured that in other issues (see comment below). I didn't want this PR to continue to increase in scope and also think it'd be good to have this piece in so that we can begin to demo it.

❯ yarn python

❯ ./src-tauri/bin/python/main-aarch64-apple-darwin/main chat --text "What can you tell me about hidden feedback loops in machine learning?" | jq
[
  {
    "index": 0,
    "text": "Hidden feedback loops in machine learning refer to situations where two systems indirectly influence each other through the world, leading to changes in behavior that may not be immediately visible. These loops may exist between completely disjoint systems and can make analyzing the effect of proposed changes extremely difficult, adding cost to even simple improvements. It is recommended to look carefully for hidden feedback loops and remove them whenever feasible."
  }
]
  • Add tests for chat.py
  • Add tests for ranker.py
  • Add tests for storage.py
  • Write response to stdout
  • Add source filename or title to response (or create as new issue)
  • Create issue to capture idea of "named threads"
  • How does this handle "chat history?"
  • How does this handle statements or non-questions?

@gjreda
Copy link
Collaborator Author

gjreda commented Jun 12, 2023

I have a preference for small PRs and it feel like this one is already getting pretty large. In order to move this PR forward, I've created three follow-up issues for myself.

  1. Add source Reference metadata to Chat response #143
  2. Capture chat history and create "named threads" for Chat interactions #144
  3. Move References Storage Path into an environment variable or settings #145

I think there's an open question on how and whether we want to handle chat interactions that are not questions, but IMO that might be something that is informed after spending some time with this implementation.

@gjreda gjreda marked this pull request as ready for review June 12, 2023 21:00
@gjreda gjreda requested review from sehyod, cguedes and sergioramos and removed request for sehyod June 12, 2023 21:00
def get_top_n(self, query: str, limit: int = 5) -> list:
"""
Rank documents based on input text
:param input_text: str
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the actual param name is query, not input_text

Comment on lines +14 to +15
with open(self.filepath, 'r') as f:
data = json.load(f)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have a try except here, in case the filepath or the file content is incorrect?

Copy link
Collaborator

@cguedes cguedes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks good. My only comment is for the chat return type being an array instead of an object that contains an array.

@@ -13,9 +14,15 @@
"items": {
"$ref": "#/definitions/RewriteChoice"
}
},
"chat": {
"type": "array",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we return an object in the root instead of array. That would be better for evolvability.

@sergioramos sergioramos changed the title feat: Add JsonStorage (#102) and ability to chat with References (#82) feat: add JsonStorage and ability to chat with References Jun 13, 2023
@sergioramos sergioramos merged commit 58b1c75 into main Jun 13, 2023
9 checks passed
@sergioramos sergioramos deleted the 102-backend-data-storage-for-reference-content branch June 13, 2023 09:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Backend data storage for reference content Add ability to chat with your Reference PDFs [sidecar]
4 participants