Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clean the crawled urls #1603

Closed
wants to merge 863 commits into from
Closed

clean the crawled urls #1603

wants to merge 863 commits into from

Conversation

simonkuang
Copy link

clean urls which have whitespace/tab/cr/lf chars.

@codecov-io
Copy link

codecov-io commented Nov 16, 2015

Current coverage is 83.45% (diff: 82.48%)

No coverage report found for master at 54216d7.

Powered by Codecov. Last update 54216d7...3ab46d8

@kmike
Copy link
Member

kmike commented Dec 4, 2015

See also: #1021. I still prefer fixing it in w3lib (see #838).

eliasdorneles and others added 28 commits September 12, 2016 13:21
[MRG+1] interpreting json-amazonui-streaming as TextResponse
if user has some custom subclass of Image pipeline and no setting for
this pipeline, he should get default settings defined for Image Pipeline.

Fixes scrapy#2198
[MRG+1] Update broken Scrapy tutorial to use quotes.toscrape.com
So, this will replace the spider example code from the overview that
scrapes questions from StackOverflow by a spider scraping quotes (much
like the one in the tutorial), and upates the text around it to be
consistent.

There are also minor wording changes plus a small Sphinx/reST syntax fix
on the features list at the bottom (it was creating a definition list,
causing one line to be bold).
This changes the tutorial, removing the step of creating an item class
and also starts by presenting the start_requests method instead of
start_urls.
adding an AuthorSpider to demonstrate further a different crawling
arrangement.
[MRG+1] docs: update overview spider code to use toscrape.com and minor changes
Mentions stackoverflow as support channel (fixes scrapy#2255)
[MRG+1] Make scrapy available in shell without explicit import statement
kmike and others added 27 commits December 19, 2016 21:46
[MRG+1] Warn user instead of failing for wrong SPIDER_MODULES setting
This not only use the standard form but helps error aggregation
libraries (i.e.: Sentry) to avoid duplicating the message.
[MRG+1] Transparently handle redirections in fetch and shell
Update changelog for upcoming 1.3.0 release
TST: Randomize FILES_EXPIRES above 90 days
…gging-tweak

[MRG+1] ENH Pass arguments to logger rather than formatted message.
[MRG+1] Upgrade CoC and mention it after main contributing docs
[MRG+1] Document copying of spider arguments to attributes
In the data flow image arrows are red.
@kmike
Copy link
Member

kmike commented Feb 21, 2017

Fixed in #2547.

@kmike kmike closed this Feb 21, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet