You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
many thanks for releasing this great crawler! Particulary, the supported number of German publishers is amazing - I am planing to collect some articles for LM pretraining.
I opened this issue, because I couldn't find an example in the docs: what is the best and recommended way to export articles into e.g. a jsonl file? I could think of adding a to_json function to an Article object and then write it to a file 🤔
But it would be great if the documention could also cover exporting articles :)
Many thanks in advance!
The text was updated successfully, but these errors were encountered:
I think it would be good for Fundus to offer support for serializing articles. We'd need some helper methods to serialize/deserialize articles. JSON seems like a good fit since it is human-readable. @addie9800 what do you think?
I definitely agree, also since we are already using JSON to represent the parsed articles within our tests. @MaxDall has also already started working on a solution implementing it.
Question
Hi,
many thanks for releasing this great crawler! Particulary, the supported number of German publishers is amazing - I am planing to collect some articles for LM pretraining.
I opened this issue, because I couldn't find an example in the docs: what is the best and recommended way to export articles into e.g. a jsonl file? I could think of adding a
to_json
function to anArticle
object and then write it to a file 🤔But it would be great if the documention could also cover exporting articles :)
Many thanks in advance!
The text was updated successfully, but these errors were encountered: