-
Notifications
You must be signed in to change notification settings - Fork 10.9k
Add .jsonl file extension #4848
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
.jsonl is suggested by https://jsonlines.org/ as file extension for JSON lines files.
Codecov Report
@@ Coverage Diff @@
## master #4848 +/- ##
==========================================
- Coverage 88.71% 83.18% -5.54%
==========================================
Files 162 162
Lines 10740 10740
Branches 1834 1832 -2
==========================================
- Hits 9528 8934 -594
- Misses 939 1546 +607
+ Partials 273 260 -13
|
Looks good, but it's missing a mention in the corresponding docs ( |
@@ -154,6 +154,7 @@ | |||
FEED_EXPORTERS_BASE = { | |||
'json': 'scrapy.exporters.JsonItemExporter', | |||
'jsonlines': 'scrapy.exporters.JsonLinesItemExporter', | |||
'jsonl': 'scrapy.exporters.JsonLinesItemExporter', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
+1 to add .jsonl as an auto-detected extension. I'm not so sure about recommending it by default instead of .jl in examples though; it might be a more complex discussion. What do you think about splitting these two changes (supporting .jsonl vs switching to .jsonl), so that they can be discussed & merged separately? |
I personally don't see a reason to use .jl extension in examples. jsonlines.org suggests .jsonl file extension. .jl file extension is used by julia language for its scripts and is recognized as such by vs code editor default setup. |
Thanks for the mention @kmike. I'm personally not married to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✔️ given @pablohoffman’s stance and the Julia argument
docs/topics/feed-exports.rst
Outdated
* Value for the ``format`` key in the :setting:`FEEDS` setting: ``jsonlines`` | ||
or ``jsonl`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a slight preference towards leaving this as it was, even if we support jsonl
and jl
as aliases.
But, if we add it here, we should also add jl
for completion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to keep it without alias.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's your reasons to keep just jsonlines
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We must give the freedom to users to use whatever file extension they wish.
We should recognize file extensions of JSON Lines files, to use JSON Lines format automatically when users use any of those file extensions.
We can ask users to set "format": "jsonlines"
if they want files with arbitrary extensions to use JSON Lines format.
I believe we happen to support "format": "jl"
and "format": "jsonl"
because that makes our code simpler (otherwise we would need to split FEED_EXPORTERS
, e.g. have a setting mapping MIME types to exporters, and a separate setting mapping file extensions to MIME types), not because we want to allow shorter, less-readable setting definitions.
So I would call the fact that we support "format": "jl"
and "format": "jsonl"
an implementation detail, and by not documenting it, it could be argued that we are free to drop support for it without backward-compatibility concerns if that makes sense in the future.
I'm fine with switching to .jsonl by default. |
.jl support is still available in this PR though intentionally not mentioned in the docs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR looks good to me as-is; +1 to merge after resolving merge conflicts.
@tmpbook it has a custom icon because VS Code thinks it's https://code.visualstudio.com/docs/languages/julia Which is an argument in favor of using |
thanks |
.jsonl is suggested by https://jsonlines.org/ as file extension for JSON lines files.