Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added stats and orphan count #1

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

RobertHH-IS
Copy link

Some ideas on additional stats and paragraph charts

Added stats and orphan count
@gkamradt
Copy link
Owner

gkamradt commented Dec 9, 2023

Hey this is cool! Thanks for putting it up.

Few thoughts

  • I put the intro up at the top for the new folks who need a brief on what this tool is, I want to keep that on there
  • I'm I want to keep the colors from the default text as high at possible on the page so people see it when they first land on it
  • The chart is visually interesting but I'm unsure what it is telling me
  • I like the Chunks not ending in a paragraph split stat, but it bakes in the assumption that the split should end at a paragraph split which isn't always the case (especially for code).
  • The min/max chunk ratio is fun but doesn't give a lot of value add on top of the min/max chunk size

Edits to keep

  • The min/max chunk size

Thanks again for putting up the PR, this tool is a balance of information density and storytelling so I'm trying to toe the line with that

@RobertHH-IS
Copy link
Author

Agreed with points. Keep the intro in there - I was just reducing context size and forgot that I had taken it out. :-)
The chart simply maps the size of paragraphs in the text they uploaded. You can visually see how big the context needs to be for the biggest paragraph, or if there are outlier paragraphs which are fine to split etc. Needs to be labeled.
Agreed that you sometimes want to split paragraphs, but it gives a stat how many times you are doing so. The min / max ratio helps you locate "orphan" chunks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants