Skip to content

Latest commit

 

History

History
147 lines (89 loc) · 5.79 KB

frontier-middlewares.rst

File metadata and controls

147 lines (89 loc) · 5.79 KB

Middlewares

Frontier Middleware <frontera.core.components.Middleware> sits between FrontierManager <frontera.core.manager.FrontierManager> and Backend <frontera.core.components.Backend> objects, using hooks for Request <frontera.core.models.Request> and Response <frontera.core.models.Response> processing according to frontier data flow <frontier-data-flow>.

It’s a light, low-level system for filtering and altering Frontier’s requests and responses.

Activating a middleware

To activate a Middleware <frontera.core.components.Middleware> component, add it to the MIDDLEWARES setting, which is a list whose values can be class paths or instances of Middleware <frontera.core.components.Middleware> objects.

Here’s an example:

MIDDLEWARES = [
    'frontera.contrib.middlewares.domain.DomainMiddleware',
]

Middlewares are called in the same order they've been defined in the list, to decide which order to assign to your middleware pick a value according to where you want to insert it. The order does matter because each middleware performs a different action and your middleware could depend on some previous (or subsequent) middleware being applied.

Finally, keep in mind that some middlewares may need to be enabled through a particular setting. See each middleware documentation <frontier-built-in-middleware> for more info.

Writing your own middleware

Writing your own frontier middleware is easy. Each Middleware <frontera.core.components.Middleware> component is a single Python class inherited from Component <frontera.core.components.Component>.

FrontierManager <frontera.core.manager.FrontierManager> will communicate with all active middlewares through the methods described below.

frontera.core.components.Middleware

Methods

frontera.core.components.Middleware.frontier_start

frontera.core.components.Middleware.frontier_stop

frontera.core.components.Middleware.add_seeds

return

Request <frontera.core.models.Request> object list or None

Should either return None or a list of Request <frontera.core.models.Request> objects.

If it returns None, FrontierManager <frontera.core.manager.FrontierManager> won't continue processing any other middleware and seed will never reach the Backend <frontera.core.components.Backend>.

If it returns a list of Request <frontera.core.models.Request> objects, this will be passed to next middleware. This process will repeat for all active middlewares until result is finally passed to the Backend <frontera.core.components.Backend>.

If you want to filter any seed, just don't include it in the returned object list.

frontera.core.components.Middleware.page_crawled

return

Response <frontera.core.models.Response> or None

Should either return None or a Response <frontera.core.models.Response> object.

If it returns None, FrontierManager <frontera.core.manager.FrontierManager> won't continue processing any other middleware and Backend <frontera.core.components.Backend> will never be notified.

If it returns a Response <frontera.core.models.Response> object, this will be passed to next middleware. This process will repeat for all active middlewares until result is finally passed to the Backend <frontera.core.components.Backend>.

If you want to filter a page, just return None.

frontera.core.components.Middleware.request_error

return

Request <frontera.core.models.Request> or None

Should either return None or a Request <frontera.core.models.Request> object.

If it returns None, FrontierManager <frontera.core.manager.FrontierManager> won't continue processing any other middleware and Backend <frontera.core.components.Backend> will never be notified.

If it returns a Response <frontera.core.models.Request> object, this will be passed to next middleware. This process will repeat for all active middlewares until result is finally passed to the Backend <frontera.core.components.Backend>.

If you want to filter a page error, just return None.

Class Methods

frontera.core.components.Middleware.from_manager

Built-in middleware reference

This page describes all Middleware <frontera.core.components.Middleware> components that come with Frontera. For information on how to use them and how to write your own middleware, see the middleware usage guide. <frontier-writing-middleware>.

For a list of the components enabled by default (and their orders) see the MIDDLEWARES setting.

DomainMiddleware

frontera.contrib.middlewares.domain.DomainMiddleware()

UrlFingerprintMiddleware

frontera.contrib.middlewares.fingerprint.UrlFingerprintMiddleware()

frontera.utils.fingerprint.hostname_local_fingerprint

DomainFingerprintMiddleware

frontera.contrib.middlewares.fingerprint.DomainFingerprintMiddleware()