-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add text extraction? #5
Labels
enhancement
New feature or request
Comments
Hello. Seems like a useful feature. How about a mode that includes |
The challenge with the full html is it can get very lengthy and include a
lot of CSS and JavaScript that is just noise to an LLM. Inner Text could
work perhaps
…On Mon, Jun 24, 2024 at 7:49 PM polywock ***@***.***> wrote:
Hello. Seems like a useful feature. How about a mode that includes
document.body.innerHTML as an attached file? Is conversion to markdown
necessary?
—
Reply to this email directly, view it on GitHub
<#5 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAY6BA2P5SY2QEIN6OTJZ3DZJCWA5AVCNFSM6AAAAABJ2MFVYCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBXGY2DMMBSGQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
I was worried that including Turndown will balloon the extension's size too much, but it's very small. In this update, I've included a new Mode called "Page data". Assuming you're on Chrome/Edge, you can try it out by...
A few issues
|
My bookmarklet is far less advanced than your extension and merely opens
the text in a new window for me to copy and paste which I can then do
selectively.
I also do some guesswork at extracting only the body/main content of the
page; doesn’t always work though: see remEls in
https://github.com/oaustegard/bookmarklets/blob/main/markdown_body.js for
the logic. Could certainly be improved.
(Most of these bookmarklets were generated with ChatGPT or Claude)
…On Wed, Jun 26, 2024 at 2:14 AM polywock ***@***.***> wrote:
I was worried that including Turndown will balloon the extension's size
too much, but it's very small.
In this update, I've included a new Mode called "Page data". You can try
it out by...
1. Extract the packed.zip into folder.
2. Go to chrome://extensions
3. Enabling Developer mode
4. Click "Load unpacked" and load the extracted folder.
packed.zip <https://github.com/user-attachments/files/15983186/packed.zip>
A few issues
- Sometimes the page data is too large and doesn't fit Claude's
context limit. How do you get around this?
—
Reply to this email directly, view it on GitHub
<#5 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAY6BA3IW43I257JD4S5XX3ZJJL2TAVCNFSM6AAAAABJ2MFVYCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOJQHAYTKNJUG4>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
While this is really neat and a quick shortcut, would be even better if you could add text extractor, since the OCR of Claude seems to not capture all the text of a screenshot. Today I use a bookmarklet for this (converting page to Markdown using turndown.js), but an extension like yours would definitely be more convenient
The text was updated successfully, but these errors were encountered: