Skip to content

Latest commit

 

History

History
90 lines (56 loc) · 2.62 KB

message_bus.rst

File metadata and controls

90 lines (56 loc) · 2.62 KB

Message bus

Is the transport layer abstraction mechanism. It provides interface and several implementations. Only one message bus can be used in crawler at the time, and it's selected with MESSAGE_BUS setting.

Spiders process can use

frontera.contrib.backends.remote.messagebus.MessageBusBackend

to communicate using message bus.

Built-in message bus reference

ZeroMQ

It's the default option, implemented using lightweight ZeroMQ library in

frontera.contrib.messagebus.zeromq.MessageBus

and can be configured using zeromq-settings.

ZeroMQ message bus requires installed ZeroMQ library and running broker process, see running_zeromq_broker.

Kafka

Can be selected with

frontera.contrib.messagebus.kafkabus.MessageBus

and configured using kafka-settings.

Requires running Kafka service and more suitable for large-scale web crawling.

Protocol

Depending on stream Frontera is using several message types to code it's messages. Every message is a python native object serialized using msgpack or JSON. The codec module can be selected using MESSAGE_BUS_CODEC, and it's required to export Encoder and Decoder classes.

Here are the classes needed to subclass to implement own codec:

frontera.core.codec.BaseEncoder

frontera.core.codec.BaseEncoder.encode_add_seeds

frontera.core.codec.BaseEncoder.encode_page_crawled

frontera.core.codec.BaseEncoder.encode_request_error

frontera.core.codec.BaseEncoder.encode_request

frontera.core.codec.BaseEncoder.encode_update_score

frontera.core.codec.BaseEncoder.encode_new_job_id

frontera.core.codec.BaseEncoder.encode_offset

frontera.core.codec.BaseDecoder

frontera.core.codec.BaseDecoder.decode

frontera.core.codec.BaseDecoder.decode_request

Available codecs

MsgPack

frontera.contrib.backends.remote.codecs.msgpack

Module: frontera.contrib.backends.remote.codecs.msgpack

JSON

frontera.contrib.backends.remote.codecs.json

Module: frontera.contrib.backends.remote.codecs.json