Frontier Middleware <frontera.core.components.Middleware>
sits between FrontierManager <frontera.core.manager.FrontierManager>
and Backend <frontera.core.components.Backend>
objects, using hooks for Request <frontera.core.models.Request>
and Response <frontera.core.models.Response>
processing according to frontier data flow <frontier-data-flow>
.
It’s a light, low-level system for filtering and altering Frontier’s requests and responses.
To activate a Middleware <frontera.core.components.Middleware>
component, add it to the MIDDLEWARES
setting, which is a list whose values can be class paths or instances of Middleware <frontera.core.components.Middleware>
objects.
Here’s an example:
MIDDLEWARES = [
'frontera.contrib.middlewares.domain.DomainMiddleware',
]
Middlewares are called in the same order they've been defined in the list, to decide which order to assign to your middleware pick a value according to where you want to insert it. The order does matter because each middleware performs a different action and your middleware could depend on some previous (or subsequent) middleware being applied.
Finally, keep in mind that some middlewares may need to be enabled through a particular setting. See each middleware documentation <frontier-built-in-middleware>
for more info.
Writing your own frontier middleware is easy. Each Middleware <frontera.core.components.Middleware>
component is a single Python class inherited from Component <frontera.core.components.Component>
.
FrontierManager <frontera.core.manager.FrontierManager>
will communicate with all active middlewares through the methods described below.
frontera.core.components.Middleware
Methods
frontera.core.components.Middleware.frontier_start
frontera.core.components.Middleware.frontier_stop
frontera.core.components.Middleware.add_seeds
- return
Request <frontera.core.models.Request>
object list orNone
Should either return None
or a list of Request <frontera.core.models.Request>
objects.
If it returns None
, FrontierManager <frontera.core.manager.FrontierManager>
won't continue processing any other middleware and seed will never reach the Backend <frontera.core.components.Backend>
.
If it returns a list of Request <frontera.core.models.Request>
objects, this will be passed to next middleware. This process will repeat for all active middlewares until result is finally passed to the Backend <frontera.core.components.Backend>
.
If you want to filter any seed, just don't include it in the returned object list.
frontera.core.components.Middleware.page_crawled
- return
Response <frontera.core.models.Response>
orNone
Should either return None
or a Response <frontera.core.models.Response>
object.
If it returns None
, FrontierManager <frontera.core.manager.FrontierManager>
won't continue processing any other middleware and Backend <frontera.core.components.Backend>
will never be notified.
If it returns a Response <frontera.core.models.Response>
object, this will be passed to next middleware. This process will repeat for all active middlewares until result is finally passed to the Backend <frontera.core.components.Backend>
.
If you want to filter a page, just return None.
frontera.core.components.Middleware.request_error
- return
Request <frontera.core.models.Request>
orNone
Should either return None
or a Request <frontera.core.models.Request>
object.
If it returns None
, FrontierManager <frontera.core.manager.FrontierManager>
won't continue processing any other middleware and Backend <frontera.core.components.Backend>
will never be notified.
If it returns a Response <frontera.core.models.Request>
object, this will be passed to next middleware. This process will repeat for all active middlewares until result is finally passed to the Backend <frontera.core.components.Backend>
.
If you want to filter a page error, just return None.
Class Methods
frontera.core.components.Middleware.from_manager
This page describes all Middleware <frontera.core.components.Middleware>
components that come with Frontera. For information on how to use them and how to write your own middleware, see the middleware usage guide. <frontier-writing-middleware>
.
For a list of the components enabled by default (and their orders) see the MIDDLEWARES
setting.
frontera.contrib.middlewares.domain.DomainMiddleware()
frontera.contrib.middlewares.fingerprint.UrlFingerprintMiddleware()
frontera.utils.fingerprint.hostname_local_fingerprint
frontera.contrib.middlewares.fingerprint.DomainFingerprintMiddleware()