-
Notifications
You must be signed in to change notification settings - Fork 10.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document the Scrapy component API #5439
base: master
Are you sure you want to change the base?
Changes from all commits
fb92049
b1212e3
705122e
92f9c05
ef73383
2589484
308029a
9e0c4f7
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
@@ -0,0 +1,123 @@ | ||||||||||||||
=========================== | ||||||||||||||
Class Factory Methods | ||||||||||||||
=========================== | ||||||||||||||
Comment on lines
+1
to
+3
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||
|
||||||||||||||
Factory methods create an instance of their implementer class by | ||||||||||||||
extracting the components needed for it from the argument that the method takes. | ||||||||||||||
Throughout Scrapy the most common factory methods are ``from_crawler`` and ``from_settings`` where | ||||||||||||||
they each take one parameter namely, a crawler or a settings object respectively. | ||||||||||||||
|
||||||||||||||
|
||||||||||||||
The ``from_crawler`` class method is implemented in the following objects: | ||||||||||||||
* ItemPipeline | ||||||||||||||
* DownloaderMiddleware | ||||||||||||||
* SpiderMiddleware | ||||||||||||||
* Scheduler | ||||||||||||||
* BaseScheduler | ||||||||||||||
* Spider | ||||||||||||||
|
||||||||||||||
The ``from_settings`` class method is implemented in the following objects: | ||||||||||||||
* MailSender | ||||||||||||||
* SpiderLoader | ||||||||||||||
Comment on lines
+11
to
+21
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Some of these are classes, some are interface definitions (they define an API, but do not really exist as classes in Python code), but none of them are objects. This also seems to suggest that the upper list supports only |
||||||||||||||
|
||||||||||||||
|
||||||||||||||
.. py:classmethod:: from_crawler(cls, crawler) | ||||||||||||||
|
||||||||||||||
Factory method that if present, is used to create an instance of the implementer class | ||||||||||||||
OrestisKan marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||
using a :class:`~scrapy.crawler.Crawler`. It must return a new instance | ||||||||||||||
of the implementer class. The Crawler object is needed in order to provide | ||||||||||||||
access to all Scrapy core components like settings and signals; It is a | ||||||||||||||
way for the implenter class to access them and hook its functionality into Scrapy. | ||||||||||||||
|
||||||||||||||
:param crawler: crawler that uses this middleware | ||||||||||||||
:type crawler: :class:`~scrapy.crawler.Crawler` object | ||||||||||||||
|
||||||||||||||
|
||||||||||||||
.. py:classmethod:: from_settings(cls, settings) | ||||||||||||||
|
||||||||||||||
This class method is used by Scrapy to create an instance of the implementer class | ||||||||||||||
using the settings passed as arguments. | ||||||||||||||
This class method will not be called at all if from_crawler is defined. | ||||||||||||||
|
||||||||||||||
|
||||||||||||||
:param settings: project settings | ||||||||||||||
:type settings: :class:`~scrapy.settings.Settings` instance | ||||||||||||||
OrestisKan marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||
|
||||||||||||||
|
||||||||||||||
|
||||||||||||||
Implementing Factory Methods | ||||||||||||||
============================ | ||||||||||||||
|
||||||||||||||
The goal when extending these factory methods should be: given the arguments passed to it, | ||||||||||||||
reate a class instance, regardless of it being a crawler, settings or other. | ||||||||||||||
The main reason to include the Crawler object or the Settings object is the amount of | ||||||||||||||
information these objects hold and can be used in the instantiation of the class. | ||||||||||||||
|
||||||||||||||
``Crawler`` specifically gives access to ``settings``, ``signals``, ``stats``, ``extensions``, | ||||||||||||||
``engine``, and ``spider`` which maybe very useful when wanting to instantiate a class. | ||||||||||||||
|
||||||||||||||
For example, lets say that we want to create a new spider, TestSpider will look like this:: | ||||||||||||||
|
||||||||||||||
class TestSpider: | ||||||||||||||
|
||||||||||||||
def __init__(self, ex1, ex2, ex3, name=None **kwargs): | ||||||||||||||
super().__init__(name, **kwargs) | ||||||||||||||
self.extra_param1: str = ex1 | ||||||||||||||
self.extra_param2: int = ex2 | ||||||||||||||
self.extra_param3: bool = ex3 | ||||||||||||||
|
||||||||||||||
# Other methods are ommited for the sake of the example | ||||||||||||||
|
||||||||||||||
@classmethod | ||||||||||||||
def from_crawler(cls, crawler, ex1, ex2, ex3): | ||||||||||||||
# Do some configs if needed | ||||||||||||||
# For example: | ||||||||||||||
# first check if the extension should be enabled and raise | ||||||||||||||
# NotConfigured otherwise | ||||||||||||||
if not crawler.settings.getbool('MYEXT_ENABLED'): | ||||||||||||||
raise NotConfigured | ||||||||||||||
|
||||||||||||||
# E.g.: get the number of items from settings | ||||||||||||||
item_count = crawler.settings.getint('MYEXT_ITEMCOUNT', 1000) | ||||||||||||||
|
||||||||||||||
# Instantiate the extension object | ||||||||||||||
spider = cls(ex1, ex2, ex3) | ||||||||||||||
|
||||||||||||||
# Maybe connect the extension object to signals | ||||||||||||||
crawler.signals.connect(ext.spider_opened, signal=signals.spider_opened) | ||||||||||||||
|
||||||||||||||
# Validate some more settings | ||||||||||||||
my_settings_dict = crawler.settings.getdict(f'MYEXT_DICT'): | ||||||||||||||
if 'some_key' not in my_settings_dict: | ||||||||||||||
raise SomeException | ||||||||||||||
|
||||||||||||||
#.... | ||||||||||||||
# Do some more configs if needed | ||||||||||||||
#.... | ||||||||||||||
|
||||||||||||||
# Finaly return the extension object | ||||||||||||||
return spider | ||||||||||||||
|
||||||||||||||
Similarly, when one wants to extend a class that implements the ``from_settings`` method, it will | ||||||||||||||
look similar to the following example. | ||||||||||||||
Say you want to create :: | ||||||||||||||
|
||||||||||||||
class MyNewSender: | ||||||||||||||
def __init__(self, is_enabled, send_at): | ||||||||||||||
self.is_enabled = is_enabled | ||||||||||||||
self.send_at = send_at | ||||||||||||||
|
||||||||||||||
#Some more methods... | ||||||||||||||
|
||||||||||||||
@classmethod | ||||||||||||||
def from_settings(cls, settings): | ||||||||||||||
# Get the needed values to instantiate the class from the settings object | ||||||||||||||
is_enabled = settings.getbool('MY_SENDER_ENABLED') | ||||||||||||||
send_at = settings.get("DATETIME_OF_SENDING") | ||||||||||||||
|
||||||||||||||
# ... | ||||||||||||||
# Maybe some more configs | ||||||||||||||
# ... | ||||||||||||||
|
||||||||||||||
# Finaly return the extension object | ||||||||||||||
return cls(is_enabled, send_at) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am thinking factory method signatures should still be kept in the API reference. It is specially important because some components may get extra arguments, I believe, and in those cases we should document them in one of the methods (e.g.
from_crawler
) and in the other method refer to the former for parameter details.What we can do now is simplify the description, both of
from_crawler
andfrom_settings
, to something like:Assuming we also add
.. _class-methods:
at the beginning of the new topic.Also, I imagine (please check) that this class also supports
from_crawler
. In that case, we should include an entry for that class method as well, with its own signature, but the same description that basically points to the new topic.