Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scrapy genspider should not overwrite existing file #4561

Closed
metaperl opened this issue May 10, 2020 · 8 comments · Fixed by #4623
Closed

scrapy genspider should not overwrite existing file #4561

metaperl opened this issue May 10, 2020 · 8 comments · Fixed by #4623

Comments

@metaperl
Copy link

metaperl commented May 10, 2020

Summary

If the file mentioned in scrapy genspider already exists, then genspider should refuse to generate the the file.

Motivation

As it stands, existing code can be blown away if this command runs twice.

Describe alternatives you've considered

Prompting the user for overwriting existing spider.

@jay24rajput
Copy link
Contributor

jay24rajput commented May 11, 2020

Hey, is this Issue up for grabs??

@metaperl
Copy link
Author

metaperl commented May 11, 2020

@jay24rajput
Copy link
Contributor

jay24rajput commented May 11, 2020

@Gallaecio @elacuesta thoughts??

@elacuesta
Copy link
Member

elacuesta commented May 11, 2020

I would not have thought this was an issue, as I'd expect spider code to be covered by source control. On the other hand, it should be only a matter of doing a file system check, so I think a clean implementation could be approved.

@jay24rajput
Copy link
Contributor

jay24rajput commented May 12, 2020

Alright! I will work on this one

@sivoham
Copy link

sivoham commented Jun 3, 2020

Hi @metaperl @elacuesta,

I would like to understand this properly. I've tried to create a spider with the same name twice but it does not allow me to do so.

(p3env) bash-3.2$ scrapy genspider toi 'https://timesofindia.indiatimes.com/'
Created spider 'toi' using template 'basic' in module:
  testscrapy.spiders.toi
(p3env) bash-3.2$
(p3env) bash-3.2$
(p3env) bash-3.2$ scrapy genspider toi 'https://timesofindia.indiatimes.com/'
Spider 'toi' already exists in module:
  testscrapy.spiders.toi

Could you please let me know what exactly is the issue so that I can try attempting to resolve or enhance?

Thanks & Regards.

3Dook added a commit to 3Dook/scrapy that referenced this issue Jun 8, 2020
Issue scrapy#4561 enhancement. The bug occurs when genspider is called outside of a startproject. I added a small and simple check to compare the new spider names to the current files in the directory. If there is already a spider with the file name it will return and stop the function.
@faraz16iqbal
Copy link

faraz16iqbal commented Jun 11, 2020

Is this issue still up for grabs?

@elacuesta
Copy link
Member

elacuesta commented Jun 13, 2020

@sivoham Indeed, there is already a check in place, but what it does is try to load a spider by its name attribute. So a file can get overridden if you have a spider which name doesn't match its file name.

@faraz16iqbal I'd recommend you not to duplicate efforts, since there are already open PRs about this. You're more than welcome to check any other open issues though. You could check the "good first issue" tag if you're unsure about where to start, and don't hesitate to ask for guidance if you need.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants