I'm scraping a website to export the data into a semantic format (n3). However, I also want to perform some data analysis on that data, so having it in a csv format is more convenient. To get the data in both formats I can do.
However, this scrapes the data twice and I cannot afford it with big amounts of data.
A solution that avoids scraping the data twice consists on implementing Pipeline that exports the data (see alecxe suggestion for details). However, as the documentation explains, this is not the preferred way to export data.
Thus, I consider it would be interesting scrappy's support for multiple exporters.
you can achieve this by implementing multiple pipelines which uses exporters, this way once item hits first pipeline it gets recorded in first format, then scrapy releases item to the second pipeline where you can export different format
see scrapy.exporters in code and docs, it should be pretty easy
@lufte: Not necessarily, we can throw an error if -t is present and the amount of -t and -o args doesn't match, we can deduce the mapping otherwise since the order is preserved. We could support both versions, with and without the -t option I think.
@kmike: Yes I understand that, but I could still use them and pass them in a weird order like scrapy crawl -t csv spidername -o output1 -o output2 -o output3.xml -t json. Scrapy would have to check if there aren't more -t args than -o and if the order makes sense (I think it shouldn't because they are not positional arguments, but otherwise how do I match them?). Removing the -t option makes it look a lot cleaner and simpler to check: scrapy crawl spidername -o output1.csv -o output2.json -o output3.xml, but it doesn't allow me to use other extensions or no extensions at all.