Skip to content
This repository has been archived by the owner on Feb 16, 2021. It is now read-only.

WARC AST #5

Closed
BubuAnabelas opened this issue Aug 4, 2018 · 3 comments
Closed

WARC AST #5

BubuAnabelas opened this issue Aug 4, 2018 · 3 comments
Labels
🧘 status/waiting This may go somewhere but needs more information 💬 type/discussion This is a request for comments

Comments

@BubuAnabelas
Copy link

I'd bee great to have an Abstract Syntax Tree for WARC files since sometimes can become huge and searching through all the file is tedious.
I would do it myself but have no idea where to start.

@wooorm
Copy link
Member

wooorm commented Aug 5, 2018

Why do you think a syntax tree would solve this?

To start you could write something like mdast, nlcst, or hast.

@BubuAnabelas
Copy link
Author

Because it would be useful to derive some other formats as CDX. Besides it can help to do some statistical analysis and to find some important fields (e.g first and last WARC-Record-ID) to be able to continuously add more fields without having to re-parse the file each time.

I'll take a look to those repos and start my own. Thanks!

@wooorm wooorm added 🧘 status/waiting This may go somewhere but needs more information 💬 type/discussion This is a request for comments labels Aug 11, 2019
@ChristianMurphy
Copy link
Member

Thanks for starting the discussion @BubuAnabelas!
We're in the process unifying ideas in with discussions unifiedjs/collective#44
If you'd like to continue this thread, or start a new one https://github.com/syntax-tree/unist/discussions/categories/ideas will be the home for ideas going forward.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
🧘 status/waiting This may go somewhere but needs more information 💬 type/discussion This is a request for comments
Development

No branches or pull requests

3 participants