Add text extraction? #5

oaustegard · 2024-06-24T19:00:38Z

While this is really neat and a quick shortcut, would be even better if you could add text extractor, since the OCR of Claude seems to not capture all the text of a screenshot. Today I use a bookmarklet for this (converting page to Markdown using turndown.js), but an extension like yours would definitely be more convenient

polywock · 2024-06-24T23:49:12Z

Hello. Seems like a useful feature. How about a mode that includes document.body.innerHTML as an attached file? Is conversion to markdown necessary?

oaustegard · 2024-06-25T03:41:13Z

The challenge with the full html is it can get very lengthy and include a lot of CSS and JavaScript that is just noise to an LLM. Inner Text could work perhaps

…

On Mon, Jun 24, 2024 at 7:49 PM polywock ***@***.***> wrote: Hello. Seems like a useful feature. How about a mode that includes document.body.innerHTML as an attached file? Is conversion to markdown necessary? — Reply to this email directly, view it on GitHub <#5 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAY6BA2P5SY2QEIN6OTJZ3DZJCWA5AVCNFSM6AAAAABJ2MFVYCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBXGY2DMMBSGQ> . You are receiving this because you authored the thread.Message ID: ***@***.***>

polywock · 2024-06-26T06:13:40Z

I was worried that including Turndown will balloon the extension's size too much, but it's very small.

In this update, I've included a new Mode called "Page data". Assuming you're on Chrome/Edge, you can try it out by...

Extract the packed.zip into folder.
Go to chrome://extensions
Enabling Developer mode
Click "Load unpacked" and load the extracted folder.

packed.zip

A few issues

Sometimes the page data is too large and doesn't fit Claude's context limit. How do you get around this?
Claude's website in general is very finicky, sometimes there's error messages like "format not supported".

oaustegard · 2024-06-26T11:23:57Z

My bookmarklet is far less advanced than your extension and merely opens the text in a new window for me to copy and paste which I can then do selectively. I also do some guesswork at extracting only the body/main content of the page; doesn’t always work though: see remEls in https://github.com/oaustegard/bookmarklets/blob/main/markdown_body.js for the logic. Could certainly be improved. (Most of these bookmarklets were generated with ChatGPT or Claude)

…

On Wed, Jun 26, 2024 at 2:14 AM polywock ***@***.***> wrote: I was worried that including Turndown will balloon the extension's size too much, but it's very small. In this update, I've included a new Mode called "Page data". You can try it out by... 1. Extract the packed.zip into folder. 2. Go to chrome://extensions 3. Enabling Developer mode 4. Click "Load unpacked" and load the extracted folder. packed.zip <https://github.com/user-attachments/files/15983186/packed.zip> A few issues - Sometimes the page data is too large and doesn't fit Claude's context limit. How do you get around this? — Reply to this email directly, view it on GitHub <#5 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAY6BA3IW43I257JD4S5XX3ZJJL2TAVCNFSM6AAAAABJ2MFVYCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOJQHAYTKNJUG4> . You are receiving this because you authored the thread.Message ID: ***@***.***>

polywock added enhancement New feature or request labels Jun 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add text extraction? #5

Add text extraction? #5

oaustegard commented Jun 24, 2024

polywock commented Jun 24, 2024

oaustegard commented Jun 25, 2024 via email

polywock commented Jun 26, 2024 •

edited

Loading

oaustegard commented Jun 26, 2024 via email

Add text extraction? #5

Add text extraction? #5

Comments

oaustegard commented Jun 24, 2024

polywock commented Jun 24, 2024

oaustegard commented Jun 25, 2024 via email

polywock commented Jun 26, 2024 • edited Loading

oaustegard commented Jun 26, 2024 via email

polywock commented Jun 26, 2024 •

edited

Loading