Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a command-line option for overwriting exported file #547

Closed
kmike opened this issue Jan 18, 2014 · 17 comments · Fixed by #4512
Closed

Add a command-line option for overwriting exported file #547

kmike opened this issue Jan 18, 2014 · 17 comments · Fixed by #4512

Comments

@kmike
Copy link
Member

kmike commented Jan 18, 2014

What do you think about adding an option to overwrite/recreate exported file?
Something like

$ scrapy crawl myspider -o data.jl --overwrite

or

$ scrapy crawl myspider -O data.jl

This is useful during development where old data is not needed. I usually run

$ rm data.jl; scrapy crawl myspider -o data.jl

multiple times. This is not DRY: if I want to change file name for the next iteration and preserve existing file then I must be careful and update both names (of course, the command comes from shell autocompletion).

@dangra
Copy link
Member

dangra commented Jan 20, 2014

I've been bitten by this too, I prefer the single option to the modifier.

+1 for -O as short option name and --overwrite-output (or --truncate-output or --new-output) as long option name

@rmax
Copy link
Contributor

rmax commented Jan 20, 2014

Why not overwrite by default and provide an option to append?

On Mon, Jan 20, 2014 at 9:16 AM, Daniel Graña notifications@github.comwrote:

I've been bitten by this too, I prefer the single option to the modifier.

+1 for -O as short option name and --overwrite-output (or
--truncate-output or --new-output) as long option name


Reply to this email directly or view it on GitHubhttps://github.com//issues/547#issuecomment-32758317
.

@dangra
Copy link
Member

dangra commented Jan 20, 2014

@darkrho : it's an option, but it changes current behavior, does it really deserve a backward incompatible change?

@nramirezuy
Copy link
Contributor

@darkrho and is nicer to have -o and -O.

@kmike
Copy link
Member Author

kmike commented Jan 20, 2014

If we have -o and -O then how can you tell which one is default? Both must be written explicitly.

But @darkrho raised an interesting question because -o doesn't play well with csv, xml and pickle currently: csv adds an extra header, xml starts a new xml instead of adding to previous, and pickle does the same.

As a side note: scrapy crawl myspider -o combined.jl -O last.jl can also be useful, and users will try writing this.

@nyov
Copy link
Contributor

nyov commented May 12, 2014

How about emulating wget here, of all things. By default create new files if one exists, as filename.out.1 with incrementing numbers. Then have --append and --overwrite/--replace as seperate options.
Feels like the safest to me, since some exporters apparently don't play nice with appending right now anyway.

We have a PR, could anyone take a look (the question was asked why unittests are failing for it).

@kmike
Copy link
Member Author

kmike commented Sep 2, 2014

-o option creating filename.out.1 filename.out.2 filename.out.3 files by default looks good to me.

@kmike
Copy link
Member Author

kmike commented Sep 2, 2014

By the way, does anyone need appending option? What is it useful for?

@umrashrf
Copy link

umrashrf commented Sep 2, 2014

+1

I realized it was appending the other day when my app broke updating JSON file with scrapy crawl. I don't want it to append.

Not sure if current design allows it, how about sending "only items" to stdout with -o and use it with >/>> ?

@tanmaysahay94
Copy link

+1
-o option for creating serialized file names makes sense
Should I go ahead with it?

@alfg
Copy link

alfg commented Jan 28, 2015

+1

Any progress on this?

@AaronTao1990
Copy link

+1 for -o option

@nyov
Copy link
Contributor

nyov commented Mar 23, 2015

The current PR looks inadequate to me. Sorry.
As an intermediate step, it should just fix the current issue, which is: appending doesn't work. This would mean producing filename.out.1 filename.out.2 filename.out.3 filenames, instead.
(This could be the easy task)

Changing the behaviour to append/overwrite, looks more daunting. Doing it right would mean modifying the Exporter's codebase to handle append or overwrite options like a File interface (IMHO); implementation depending on the actual Exporter backend (local disk, s3, datastore, ...).
E.g. the FTP exporter would need to say "APPE" instead of "STOR" or even ftp-server dependend alternatives.

Also @kmike's proposal

scrapy crawl myspider -o combined.jl -O last.jl can also be useful, and users will try writing this.

would require more thought. It sounds great, but if --append/--overwrite is not just a switch to -o, but an argument taking a destination, Exporters would have to be able to handle that (opening multiple "filepointers" as given). That doesn't warrant an easy tag, I think.

@jschilling1
Copy link

I've been using a custom CsvItemExporter to overwrite existing exports and use a custom CSV_DELIMITER in settings. Now i need to set the FEED_URI in the spider. The problem with that is that my custom CsvItemExporter no longer clears existing exports which it only does on init.

@dannykopping
Copy link

You can simply output to stdout and redirect that output to a file:
scrapy runspider spider.py -t json --nolog -o - > out.json

@lucaspottersky
Copy link

You can simply output to stdout and redirect that output to a file:

@dannykopping this is not an option when scheduling through scrapyd.

@robkorv
Copy link

robkorv commented Nov 26, 2016

In the meantime one could subclass FileFeedStorage. I just typed an answer over at http://stackoverflow.com/a/40823149/2546958 before I spotted this issue.

@ghost ghost added the stale label Feb 20, 2018
@kmike kmike added help wanted and removed stale labels Jul 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.