Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Resolved issue #546. Output format parsing from filename extension. #659

Merged
merged 5 commits into from Apr 14, 2014
Merged

[MRG] Resolved issue #546. Output format parsing from filename extension. #659

merged 5 commits into from Apr 14, 2014

Conversation

@denysbutenko
Copy link
Contributor

@denysbutenko denysbutenko commented Mar 19, 2014

Instead of explicitly specify the output format type, we can learn it from the file extension.

@kmike
Copy link
Member

@kmike kmike commented Mar 19, 2014

I think we shouldn't drop support for explicit -t option.

@kmike
Copy link
Member

@kmike kmike commented Mar 19, 2014

Also, some tests for this feature would be great!

…xtension `--output`
@@ -33,6 +34,8 @@ def process_options(self, args, opts):
else:
self.settings.overrides['FEED_URI'] = opts.output
valid_output_formats = self.settings['FEED_EXPORTERS'].keys() + self.settings['FEED_EXPORTERS_BASE'].keys()
if not opts.output_format:
opts.output_format = os.path.splitext(opts.output)[1].replace(".", "")

This comment has been minimized.

@kmike

kmike Mar 20, 2014
Member

I think this will raise an unhelpful exception when user runs scrapy crawl myspider -o ./data.

This comment has been minimized.

@denysbutenko

denysbutenko Mar 20, 2014
Author Contributor

Yep, I'm not noticed what os not imported.
After import:
Runspider

~/dev/spider/pws/spiders ❯ ~/dev/scrapy/bin/scrapy runspider pws_spider.py -o ./data.                                                 
Usage
=====
  scrapy runspider [options] <spider_file>

runspider: error: Invalid/unrecognized output format: , Expected ['xml', 'jsonlines', 'json', 'csv', 'pickle', 'marshal']

Crawl:

~/dev/spider/pws/spiders ❯ ~/dev/scrapy/bin/scrapy crawl pws_spider.py -o ./data.                                                    
Usage
=====
  scrapy crawl [options] <spider>

crawl: error: Invalid/unrecognized output format: , Expected ['xml', 'jsonlines', 'json', 'csv', 'pickle', 'marshal']

This comment has been minimized.

@kmike

kmike Mar 20, 2014
Member

What about

scrapy crawl myspider -o ./data

(without trailing dot)?

This comment has been minimized.

@denysbutenko

denysbutenko Mar 20, 2014
Author Contributor

Function os.path.splitext() will return tuple of file name and file extension. Then we select file extension and replace dot. Of course may be needed replace only first dot, but I not see file types with double extension.

This comment has been minimized.

@kmike

kmike Mar 20, 2014
Member

Yes, you're right.

This comment has been minimized.

@denysbutenko

denysbutenko Mar 20, 2014
Author Contributor

Without trailing dot:

~/dev/spider/pws/spiders ❯ ~/dev/scrapy/bin/scrapy crawl pws_spider.py -o ./data                                                      
Usage
=====
  scrapy crawl [options] <spider>

crawl: error: Invalid/unrecognized output format: , Expected ['xml', 'jsonlines', 'json', 'csv', 'pickle', 'marshal']

Result of splitext of ./data will be empty file ext.

>>> import os
>>> fname = "./data"
>>> print os.path.splitext(fname)
('./data', '')
@nyov
Copy link
Contributor

@nyov nyov commented Mar 25, 2014

Could we have a more verbose error message, for less confusion with the newbies?

raise UsageError("Unrecognized output format '%s', set one using the '-t' switch or as a fileextension from the supported list %s" % (opts.output_format, tuple(valid_output_formats)))
@kmike
Copy link
Member

@kmike kmike commented Mar 26, 2014

This PR looks good to me. There is one possible improvement: check docs and tutorials and remove -t option from them when possible. But I'm fine with merging it without this improvement.

@kmike kmike changed the title Resolved issue #546. Output format parsing from filename extension. [MRG] Resolved issue #546. Output format parsing from filename extension. Apr 5, 2014
dangra added a commit that referenced this pull request Apr 14, 2014
[MRG] Resolved issue #546. Output format parsing from filename extension.
@dangra dangra merged commit 2c8eceb into scrapy:master Apr 14, 2014
1 check passed
1 check passed
default The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

4 participants