Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/capture chunks #564

Merged
merged 24 commits into from
Jun 24, 2024
Merged

Feature/capture chunks #564

merged 24 commits into from
Jun 24, 2024

Conversation

gecBurton
Copy link
Collaborator

@gecBurton gecBurton commented Jun 12, 2024

Context

As a User I want the django app to persist the specific text chunk referenced in RAG so that I can know exactly which part of the document has been referenced.

Changes proposed in this pull request

  1. I have renamed the ChatMessage source_files to old_source_files
  2. I have created a new model called TextChunk to capture the relationship between the ChatMessage, File and the specific text used in RAG
  3. I have added a new m2m field to ChatMessage called source_files that, like the old field, is a m2m field to Files but has a through relationship via TextChunk
  4. I copy over the data in old_source_files to source_files
  5. I do not delete old_source_files in case something goes wrong and or we want to revert this change in production

Guidance to review

Relevant links

Things to check

  • I have added any new ENV vars in all deployed environments
  • I have tested any code added or changed
  • I have run integration tests

@gecBurton gecBurton marked this pull request as draft June 12, 2024 10:06
@gecBurton gecBurton marked this pull request as ready for review June 20, 2024 15:01
Copy link
Contributor

@jamesrichards4 jamesrichards4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but I don't think I can really review the Django side of things in any detail.

On tying this up with other changes, this is still worth doing as retaining Chunks in parts of core-api is my plan short term. Replacing all the way through the system would be a scary change and I think we can do it incrementally.

@gecBurton gecBurton merged commit cc1bb0b into main Jun 24, 2024
7 checks passed
@gecBurton gecBurton deleted the feature/capture-chunks branch June 24, 2024 09:26
gecBurton added a commit that referenced this pull request Jun 24, 2024
gecBurton added a commit that referenced this pull request Jun 24, 2024
gecBurton added a commit that referenced this pull request Jun 24, 2024
rachaelcodes pushed a commit that referenced this pull request Jun 25, 2024
rachaelcodes pushed a commit that referenced this pull request Jun 25, 2024
rachaelcodes pushed a commit that referenced this pull request Jun 26, 2024
rachaelcodes pushed a commit that referenced this pull request Jun 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants