Is the transport layer abstraction mechanism. It provides interface and several implementations. Only one message bus can be used in crawler at the time, and it's selected with MESSAGE_BUS
setting.
Spiders process can use
frontera.contrib.backends.remote.messagebus.MessageBusBackend
to communicate using message bus.
It's the default option, implemented using lightweight ZeroMQ library in
frontera.contrib.messagebus.zeromq.MessageBus
and can be configured using zeromq-settings
.
ZeroMQ message bus requires installed ZeroMQ library and running broker process, see running_zeromq_broker
.
Can be selected with
frontera.contrib.messagebus.kafkabus.MessageBus
and configured using kafka-settings
.
Requires running Kafka service and more suitable for large-scale web crawling.
Depending on stream Frontera is using several message types to code it's messages. Every message is a python native object serialized using msgpack or JSON. The codec module can be selected using MESSAGE_BUS_CODEC
, and it's required to export Encoder
and Decoder
classes.
Here are the classes needed to subclass to implement own codec:
frontera.core.codec.BaseEncoder
frontera.core.codec.BaseEncoder.encode_add_seeds
frontera.core.codec.BaseEncoder.encode_page_crawled
frontera.core.codec.BaseEncoder.encode_request_error
frontera.core.codec.BaseEncoder.encode_request
frontera.core.codec.BaseEncoder.encode_update_score
frontera.core.codec.BaseEncoder.encode_new_job_id
frontera.core.codec.BaseEncoder.encode_offset
frontera.core.codec.BaseDecoder
frontera.core.codec.BaseDecoder.decode
frontera.core.codec.BaseDecoder.decode_request
frontera.contrib.backends.remote.codecs.msgpack
Module: frontera.contrib.backends.remote.codecs.msgpack
frontera.contrib.backends.remote.codecs.json
Module: frontera.contrib.backends.remote.codecs.json