Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Genspider command prepends an inconvenient 'www' in start_urls #2299

Closed
stummjr opened this issue Oct 1, 2016 · 6 comments
Closed

Genspider command prepends an inconvenient 'www' in start_urls #2299

stummjr opened this issue Oct 1, 2016 · 6 comments

Comments

@stummjr
Copy link
Member

@stummjr stummjr commented Oct 1, 2016

Almost every time I use the genspider command, I end up removing the www prefix from the value that this command generates for start_urls.

For example, this command:

$ scrapy genspider quotes quotes.toscrape.com

generates this spider:

class QuotesSpider(scrapy.Spider):
    name = "quotes"
    allowed_domains = ["quotes.toscrape.com"]
    start_urls = (
        'http://www.quotes.toscrape.com/',
    )
    ...

IMHO, there's no need to add www to the start_urls, because it might be more annoying than helpful to people using genspider.

Thoughts on it?

@eliasdorneles
Copy link
Member

@eliasdorneles eliasdorneles commented Oct 3, 2016

+1 to remove the www from the templates, that seems unnecessary.

@kmike
Copy link
Member

@kmike kmike commented Oct 5, 2016

+1

@redapple redapple added the help wanted label Oct 5, 2016
@eLRuLL
Copy link
Member

@eLRuLL eLRuLL commented Oct 5, 2016

Totally agree with @stummjr , www.domain.com is a different site that domain.com

@ThunderMind2019
Copy link

@ThunderMind2019 ThunderMind2019 commented Dec 28, 2018

genspider also prepend http:// But when i enter address like https://example.com it becomes http://https://example.com that, when run scrapy crawl throws an error.
What it should do, it should first check the receiving domain than take decision according to the passing domain whether it needs a http:// or nothing.

@kmike
Copy link
Member

@kmike kmike commented Dec 28, 2018

@ThunderMind2019 would you mind creating a separate ticket for this?

@ThunderMind2019
Copy link

@ThunderMind2019 ThunderMind2019 commented Dec 28, 2018

ok sure thing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

6 participants
You can’t perform that action at this time.