Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added custom scheduler breakage in news.rst #4274

Merged
merged 7 commits into from Feb 6, 2020

Conversation

joybh98
Copy link
Contributor

@joybh98 joybh98 commented Jan 11, 2020

@Gallaecio created an issue #4264, in which I had to add in the backward-incompatible changes that scrapy 1.7.0 version breaks custom schedulers.
I have added it to news.rst, please check it out and let me know if it needs to be more thorough.

I have ran all the tests using tox

(edit) Fixes #4264

@codecov
Copy link

@codecov codecov bot commented Jan 11, 2020

Codecov Report

Merging #4274 into master will decrease coverage by 0.16%.
The diff coverage is n/a.

@@            Coverage Diff            @@
##           master   #4274      +/-   ##
=========================================
- Coverage   84.06%   83.9%   -0.17%     
=========================================
  Files         166     166              
  Lines        9730    9869     +139     
  Branches     1454    1469      +15     
=========================================
+ Hits         8180    8281     +101     
- Misses       1296    1334      +38     
  Partials      254     254
Impacted Files Coverage Δ
scrapy/utils/test.py 49.35% <0%> (-8.99%) ⬇️
scrapy/utils/ftp.py 23.8% <0%> (-6.2%) ⬇️
scrapy/pipelines/files.py 61.66% <0%> (-3.99%) ⬇️
scrapy/core/downloader/handlers/datauri.py 93.33% <0%> (-0.79%) ⬇️
scrapy/crawler.py 89.26% <0%> (-0.36%) ⬇️
scrapy/core/downloader/handlers/http10.py 100% <0%> (ø) ⬆️
scrapy/http/response/text.py 100% <0%> (ø) ⬆️
scrapy/http/request/__init__.py 100% <0%> (ø) ⬆️
scrapy/core/downloader/handlers/file.py 100% <0%> (ø) ⬆️
scrapy/core/downloader/handlers/__init__.py 83.63% <0%> (ø) ⬆️
... and 12 more

@elacuesta
Copy link
Member

@elacuesta elacuesta commented Jan 11, 2020

Hi there, thanks for the PR. Could you add a few words explaining why this happens? Thanks again!

@joybh98
Copy link
Contributor Author

@joybh98 joybh98 commented Jan 12, 2020

@elacuesta , sure I'll do that!
EDIT: Closed the pull request by mistake, I've reopened it

@joybh98 joybh98 closed this Jan 12, 2020
@joybh98 joybh98 reopened this Jan 12, 2020
@joybh98
Copy link
Contributor Author

@joybh98 joybh98 commented Jan 13, 2020

hey @elacuesta , I'm still new to scrapy, can you explain what the changes are exactly doing, all I got was that we have set crawler==None in the init function

docs/news.rst Outdated
@@ -288,6 +288,8 @@ Backward-incompatible changes
:class:`~scrapy.http.Request` objects instead of arbitrary Python data
structures.

* This version of scrapy breaks custom schedulers
Copy link
Member

@Gallaecio Gallaecio Jan 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is how I phrased it in the issue, but I think the actual documentation text should go more in the line of the entries above.

It should indicate that the __init__ method of the scrapy.core.scheduler.Scheduler class now received an additional crawler parameter, and that because of that custom Scheduler subclasses that don’t accept arbitrary parameters in their __init__ method will stop working.

Since we do not have API documentation of scrapy.core.scheduler.Scheduler at the moment that we can link, scrapy.core.scheduler.Scheduler should probably link to the documentation of the SCHEDULER setting.

Copy link
Contributor Author

@joybh98 joybh98 Jan 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Gallaecio , thanks for the help! Appreciate it!

docs/news.rst Outdated
@@ -288,6 +288,14 @@ Backward-incompatible changes
:class:`~scrapy.http.Request` objects instead of arbitrary Python data
structures.

* An additional `crawler` paramter has been added to the `__init__` method of
the `scrapy.core.scheduler.Scheduler` class. Custom scheduler subclassses
don't accept binary parameters in their `__init__` method. This version of
Copy link
Member

@kmike kmike Jan 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does "binary" mean here?

Copy link
Contributor Author

@joybh98 joybh98 Jan 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kmike my bad, I meant to write arbitrary.

@elacuesta
Copy link
Member

@elacuesta elacuesta commented Jan 22, 2020

This version of Scrapy breaks custom schedulers

This is not necessarily true. The note should say that custom schedulers might break, if they override the __init__ method. For instance, consider a custom Scheduler which only overrides the next_request method: in this case, it should continue to work correctly.

@joybh98
Copy link
Contributor Author

@joybh98 joybh98 commented Jan 22, 2020

@elacuesta, okay will change it! Is there anything else that needs to be changed?

@elacuesta
Copy link
Member

@elacuesta elacuesta commented Jan 22, 2020

Nothing substantial, just keep an eye open for typos, and change "Scrapy" to be uppercase.

Thanks!

@joybh98
Copy link
Contributor Author

@joybh98 joybh98 commented Jan 23, 2020

@elacuesta @kmike @Gallaecio let me know if the latest changes are okay, I'll squash all commits into one.

docs/news.rst Outdated Show resolved Hide resolved
docs/news.rst Outdated Show resolved Hide resolved
docs/news.rst Outdated
@@ -288,6 +288,14 @@ Backward-incompatible changes
:class:`~scrapy.http.Request` objects instead of arbitrary Python data
structures.

* An additional `crawler` paramter has been added to the `__init__` method of
the `scrapy.core.scheduler.Scheduler` class. Custom scheduler subclassses
Copy link
Member

@elacuesta elacuesta Jan 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is still not 100% accurate. What do you think of something like the following?

Custom scheduler subclasses which don't accept arbitrary parameters in their __init__ method might break because of this change.

Copy link
Contributor Author

@joybh98 joybh98 Jan 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elacuesta , you're right. Your sentence is clear & concise.

Copy link
Member

@elacuesta elacuesta Jan 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I apologize if I wasn't clear, I didn't mean you should suppress the first sentence ("An additional crawler parameter has been added to the __init__ method of the scrapy.core.scheduler.Scheduler class"). That sentence is the explanation for the following one (currently the paragraph mentions "this change" without mentioning the specific change).
Also, I just realized that two accent characters are needed for Sphinx to render text as monospace, one means italic. You could check this by running make html from the docs directory.
Lastly, I think it would be good to refer to the Scheduler class using the class directive. There is no API reference page for the Scheduler at the moment, but doing so would make this part link to it if we add it in the future. What I mean is:

:class:`scrapy.core.scheduler.Scheduler`

Sorry for the back and forth, thanks for your patience.

Copy link
Contributor Author

@joybh98 joybh98 Jan 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elacuesta, I'm here to learn. Thanks for taking the time out to help a beginner like me!
I'll make the requested changes!

Copy link
Member

@elacuesta elacuesta left a comment

BTW I plan to resume work on #3559, hopefully we'll have more docs about the Scheduler soon.

docs/news.rst Outdated Show resolved Hide resolved
docs/news.rst Outdated Show resolved Hide resolved
@elacuesta
Copy link
Member

@elacuesta elacuesta commented Jan 31, 2020

Build failures seem to be completely unrelated, but let's close and reopen just in case.

Update: Intersphinx is failing because https://twistedmatrix.com is down.

@elacuesta elacuesta closed this Jan 31, 2020
@elacuesta elacuesta reopened this Jan 31, 2020
Copy link
Member

@elacuesta elacuesta left a comment

Thanks for the resilience @joybhallaa! 😄

@joybh98
Copy link
Contributor Author

@joybh98 joybh98 commented Feb 1, 2020

Thanks for the resilience @joybhallaa! smile

You're welcome! Is there anything else that needs to be done?

@Gallaecio Gallaecio merged commit 4f31c3c into scrapy:master Feb 6, 2020
0 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants