[MRG] Resolved issue #546. Output format parsing from filename extension. #659
Conversation
I think we shouldn't drop support for explicit |
Also, some tests for this feature would be great! |
@@ -33,6 +34,8 @@ def process_options(self, args, opts): | |||
else: | |||
self.settings.overrides['FEED_URI'] = opts.output | |||
valid_output_formats = self.settings['FEED_EXPORTERS'].keys() + self.settings['FEED_EXPORTERS_BASE'].keys() | |||
if not opts.output_format: | |||
opts.output_format = os.path.splitext(opts.output)[1].replace(".", "") |
kmike
Mar 20, 2014
Member
I think this will raise an unhelpful exception when user runs scrapy crawl myspider -o ./data
.
I think this will raise an unhelpful exception when user runs scrapy crawl myspider -o ./data
.
denysbutenko
Mar 20, 2014
Author
Contributor
Yep, I'm not noticed what os
not imported.
After import:
Runspider
~/dev/spider/pws/spiders ❯ ~/dev/scrapy/bin/scrapy runspider pws_spider.py -o ./data.
Usage
=====
scrapy runspider [options] <spider_file>
runspider: error: Invalid/unrecognized output format: , Expected ['xml', 'jsonlines', 'json', 'csv', 'pickle', 'marshal']
Crawl:
~/dev/spider/pws/spiders ❯ ~/dev/scrapy/bin/scrapy crawl pws_spider.py -o ./data.
Usage
=====
scrapy crawl [options] <spider>
crawl: error: Invalid/unrecognized output format: , Expected ['xml', 'jsonlines', 'json', 'csv', 'pickle', 'marshal']
Yep, I'm not noticed what os
not imported.
After import:
Runspider
~/dev/spider/pws/spiders ❯ ~/dev/scrapy/bin/scrapy runspider pws_spider.py -o ./data.
Usage
=====
scrapy runspider [options] <spider_file>
runspider: error: Invalid/unrecognized output format: , Expected ['xml', 'jsonlines', 'json', 'csv', 'pickle', 'marshal']
Crawl:
~/dev/spider/pws/spiders ❯ ~/dev/scrapy/bin/scrapy crawl pws_spider.py -o ./data.
Usage
=====
scrapy crawl [options] <spider>
crawl: error: Invalid/unrecognized output format: , Expected ['xml', 'jsonlines', 'json', 'csv', 'pickle', 'marshal']
kmike
Mar 20, 2014
Member
What about
scrapy crawl myspider -o ./data
(without trailing dot)?
What about
scrapy crawl myspider -o ./data
(without trailing dot)?
denysbutenko
Mar 20, 2014
Author
Contributor
Function os.path.splitext()
will return tuple of file name and file extension. Then we select file extension and replace dot. Of course may be needed replace only first dot, but I not see file types with double extension.
Function os.path.splitext()
will return tuple of file name and file extension. Then we select file extension and replace dot. Of course may be needed replace only first dot, but I not see file types with double extension.
kmike
Mar 20, 2014
Member
Yes, you're right.
Yes, you're right.
denysbutenko
Mar 20, 2014
Author
Contributor
Without trailing dot:
~/dev/spider/pws/spiders ❯ ~/dev/scrapy/bin/scrapy crawl pws_spider.py -o ./data
Usage
=====
scrapy crawl [options] <spider>
crawl: error: Invalid/unrecognized output format: , Expected ['xml', 'jsonlines', 'json', 'csv', 'pickle', 'marshal']
Result of splitext
of ./data
will be empty file ext.
>>> import os
>>> fname = "./data"
>>> print os.path.splitext(fname)
('./data', '')
Without trailing dot:
~/dev/spider/pws/spiders ❯ ~/dev/scrapy/bin/scrapy crawl pws_spider.py -o ./data
Usage
=====
scrapy crawl [options] <spider>
crawl: error: Invalid/unrecognized output format: , Expected ['xml', 'jsonlines', 'json', 'csv', 'pickle', 'marshal']
Result of splitext
of ./data
will be empty file ext.
>>> import os
>>> fname = "./data"
>>> print os.path.splitext(fname)
('./data', '')
Could we have a more verbose error message, for less confusion with the newbies? raise UsageError("Unrecognized output format '%s', set one using the '-t' switch or as a fileextension from the supported list %s" % (opts.output_format, tuple(valid_output_formats))) |
This PR looks good to me. There is one possible improvement: check docs and tutorials and remove |
[MRG] Resolved issue #546. Output format parsing from filename extension.
Instead of explicitly specify the output format type, we can learn it from the file extension.