-
Notifications
You must be signed in to change notification settings - Fork 493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Geearl/6323 large tables #429
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…crosoft/PubSec-Info-Assistant into ryonsteele/function-autoscale
Update the process flow architecture with latest product naming
assign SP ID for Ci/CD shared
…crosoft/PubSec-Info-Assistant into ryonsteele/function-autoscale
Ryonsteele/function autoscale
…ployment-hf Add check for existing CUA deployment object and remove
…-1.0-release Azure estimation 1.0 release
…rt-and-ux-docs Update deployment troubleshooting and improve UX analysis panel docs
Updating hard link to redirect link for YouTube
Fixed typo and broken image links
…inks Update sample data links in user_experience.md
Remove functions_flow.md and update related documentation
…low-docs Merge pull request #454 from microsoft/geearl/function-flow-doc
Update bug_report.md template with additional instructions and details
Changed Azure Services The following list of Azure Services will be deployed for IA Accelerator, version 0.4 delta: to Azure Services The following list of Azure Services will be deployed for IA Accelerator, version 1.0:
…tant into geearl/6323-large-tables
…Assistant into geearl/6323-large-tables
…Assistant into geearl/6323-large-tables
…ft/PubSec-Info-Assistant into geearl/6323-large-tables
…/microsoft/PubSec-Info-Assistant into geearl/6323-large-tables" This reverts commit c6792d7, reversing changes made to 4fdc07d.
…Assistant into geearl/6323-large-tables
ryonsteele
approved these changes
Jan 18, 2024
lukasvalach
added a commit
to lukasvalach/PubSec-Info-Assistant
that referenced
this pull request
Apr 8, 2024
* Merge pull request microsoft#429 from microsoft/geearl/6323-large-tables Geearl/6323 large tables * Resolve function debug issue and add logic for multiple table spans * Update deployment.md Updates on Setting the right tenant if you're part of multiple tenant. Otherwise deployment will fail. * Bump fastapi from 0.103.2 to 0.109.1 in /app/enrichment Bumps [fastapi](https://github.com/tiangolo/fastapi) from 0.103.2 to 0.109.1. - [Release notes](https://github.com/tiangolo/fastapi/releases) - [Commits](tiangolo/fastapi@0.103.2...0.109.1) --- updated-dependencies: - dependency-name: fastapi dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> * gov changes * gov changes * gov changes * new app service app setting and readme update * url update * Pushing out temp fixes until python sdk is resolved. * Remove media_service and avam modules * script fix * updated link * Remove base_url breaking param * Resolve issue with new gov logic on aoai endpoint resolution * Merge pull request microsoft#523 and microsoft#530 from vNext-Dev for large table fixes * Update app.py added `.lower()` to ensure the str read is in correctly converted to a bool. --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dayland <dayland@microsoft.com> Co-authored-by: dayland <48474707+dayland@users.noreply.github.com> Co-authored-by: avidunixuser <avidunixuser@users.noreply.github.com> Co-authored-by: ryonsteele <ryonsteele@gmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Danimal <dbiscup@microsoft.com> Co-authored-by: Brandon Rohrer <brandon.rohrer@outlook.com> Co-authored-by: Nehemiah Kuhns <85817913+nhwkuhns@users.noreply.github.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR relates to GitHub issues #410 and #353.
The root cause is we add sequentially add paragraphs to a chunk until we hit the max token limit, 750 by default. Now with regular text when we reach a para that will push us over, we break it down sentence by sentence and add these to the chunk until we hit the limit, then add the remaining sentences to the next chunk.
This doesn't work for tables.
When we have a table that spans 2 pages the code to break it down by sentence doesn't work, it sees the whole table as a single sentence. The outcome is we have a single chunk holding the tables from each page. The user experience is a failure based on chunks being too large if they are returned as part of a RAG request.
The change includes the following: