Skip to content

Add .jsonl file extension #4848

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 17, 2022
Merged

Add .jsonl file extension #4848

merged 4 commits into from
Jun 17, 2022

Conversation

divtiply
Copy link
Contributor

.jsonl is suggested by https://jsonlines.org/ as file extension for JSON lines files.

.jsonl is suggested by https://jsonlines.org/ as file extension for JSON lines files.
@codecov
Copy link

codecov bot commented Oct 17, 2020

Codecov Report

Merging #4848 (28f793e) into master (de0e2cc) will decrease coverage by 5.53%.
The diff coverage is n/a.

❗ Current head 28f793e differs from pull request most recent head e6cb0d0. Consider uploading reports for the commit e6cb0d0 to get more accurate results

@@            Coverage Diff             @@
##           master    #4848      +/-   ##
==========================================
- Coverage   88.71%   83.18%   -5.54%     
==========================================
  Files         162      162              
  Lines       10740    10740              
  Branches     1834     1832       -2     
==========================================
- Hits         9528     8934     -594     
- Misses        939     1546     +607     
+ Partials      273      260      -13     
Impacted Files Coverage Δ
scrapy/settings/default_settings.py 98.76% <ø> (ø)
scrapy/core/http2/stream.py 0.00% <0.00%> (-91.38%) ⬇️
scrapy/core/downloader/handlers/http2.py 14.47% <0.00%> (-85.53%) ⬇️
scrapy/core/http2/agent.py 13.25% <0.00%> (-83.14%) ⬇️
scrapy/core/http2/protocol.py 3.51% <0.00%> (-79.90%) ⬇️
scrapy/pipelines/images.py 27.82% <0.00%> (-66.96%) ⬇️
scrapy/robotstxt.py 75.30% <0.00%> (-22.23%) ⬇️
scrapy/core/downloader/contextfactory.py 75.92% <0.00%> (-11.12%) ⬇️
scrapy/utils/test.py 52.94% <0.00%> (-10.30%) ⬇️
scrapy/pipelines/media.py 92.90% <0.00%> (-5.68%) ⬇️
... and 8 more

@elacuesta
Copy link
Member

Looks good, but it's missing a mention in the corresponding docs (jl is missing too). Could you please update that section as well?

@@ -154,6 +154,7 @@
FEED_EXPORTERS_BASE = {
'json': 'scrapy.exporters.JsonItemExporter',
'jsonlines': 'scrapy.exporters.JsonLinesItemExporter',
'jsonl': 'scrapy.exporters.JsonLinesItemExporter',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@kmike
Copy link
Member

kmike commented Oct 18, 2020

+1 to add .jsonl as an auto-detected extension.

I'm not so sure about recommending it by default instead of .jl in examples though; it might be a more complex discussion.
Most likely, @pablohoffman (Scrapy original author) coined "json lines" term, and it is a long tradition of naming these files .jl in Scrapy. See also: wardi/jsonlines#14

What do you think about splitting these two changes (supporting .jsonl vs switching to .jsonl), so that they can be discussed & merged separately?

@divtiply
Copy link
Contributor Author

I personally don't see a reason to use .jl extension in examples. jsonlines.org suggests .jsonl file extension. .jl file extension is used by julia language for its scripts and is recognized as such by vs code editor default setup.

@pablohoffman
Copy link
Member

Thanks for the mention @kmike. I'm personally not married to the jl extension. If the world has adopted a new standard extension for these file types, Scrapy should probably switch to it.

Copy link
Member

@Gallaecio Gallaecio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️ given @pablohoffman’s stance and the Julia argument

Comment on lines 47 to 48
* Value for the ``format`` key in the :setting:`FEEDS` setting: ``jsonlines``
or ``jsonl``
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a slight preference towards leaving this as it was, even if we support jsonl and jl as aliases.

But, if we add it here, we should also add jl for completion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to keep it without alias.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's your reasons to keep just jsonlines?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We must give the freedom to users to use whatever file extension they wish.
We should recognize file extensions of JSON Lines files, to use JSON Lines format automatically when users use any of those file extensions.
We can ask users to set "format": "jsonlines" if they want files with arbitrary extensions to use JSON Lines format.

I believe we happen to support "format": "jl" and "format": "jsonl" because that makes our code simpler (otherwise we would need to split FEED_EXPORTERS, e.g. have a setting mapping MIME types to exporters, and a separate setting mapping file extensions to MIME types), not because we want to allow shorter, less-readable setting definitions.

So I would call the fact that we support "format": "jl" and "format": "jsonl" an implementation detail, and by not documenting it, it could be argued that we are free to drop support for it without backward-compatibility concerns if that makes sense in the future.

@kmike
Copy link
Member

kmike commented Nov 29, 2020

I'm fine with switching to .jsonl by default.

@divtiply
Copy link
Contributor Author

.jl support is still available in this PR though intentionally not mentioned in the docs

Copy link
Member

@kmike kmike left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR looks good to me as-is; +1 to merge after resolving merge conflicts.

@Gallaecio Gallaecio merged commit 6e87849 into scrapy:master Jun 17, 2022
@divtiply divtiply deleted the patch-1 branch August 31, 2022 19:24
@tmpbook
Copy link

tmpbook commented Nov 13, 2022

image

I prefer .jl to .jsonl like vscode do.

@wRAR
Copy link
Member

wRAR commented Nov 13, 2022

@tmpbook it has a custom icon because VS Code thinks it's https://code.visualstudio.com/docs/languages/julia

Which is an argument in favor of using .jsonl.

@tmpbook
Copy link

tmpbook commented Nov 14, 2022

@tmpbook it has a custom icon because VS Code thinks it's https://code.visualstudio.com/docs/languages/julia

Which is an argument in favor of using .jsonl.

thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants