Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use streaming html parser #40

Closed
seanbreckenridge opened this issue Feb 4, 2023 · 3 comments
Closed

use streaming html parser #40

seanbreckenridge opened this issue Feb 4, 2023 · 3 comments
Assignees

Comments

@seanbreckenridge
Copy link
Owner

loading the whole html document into memory is pretty expensive memory wise, could either use a streaming html parser, or maybe split the file before loading it?

@seanbreckenridge
Copy link
Owner Author

tried using lxml for this, havent been able to figure it out yet

09307da

@seanbreckenridge
Copy link
Owner Author

If anyone else has libraries they'd recommend here, I'm very open to suggestions, all my experiments haven't gone well

@seanbreckenridge
Copy link
Owner Author

ended up just using an html tokenizer in go

this is all legacy anyways, so I dont know if anyone else is ever even going to use this, is more for my own usage

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant