# Data Streaming

![](images/stream-data.png)

## A bit of history

![](https://i.imgflip.com/54ff5r.jpg)
[NicsMeme](https://imgflip.com/i/54ff5r)

![](https://i.imgflip.com/54ffdn.jpg)
[NicsMeme](https://imgflip.com/i/54ffdn)

# Message-oriented middleware (MOM)
https://www.researchgate.net/publication/271436605_Extending_message-oriented_middleware_using_interception

Software or hardware infrastructure supporting sending and receiving messages between distributed systems. 

MOM allows application modules to be distributed over heterogeneous platforms and reduces the complexity of developing applications that span multiple operating systems and network protocols. 

The middleware creates a distributed communications layer that insulates the application developer from the details of the various operating systems and network interfaces. APIs that extend across diverse platforms and networks are typically provided by MOM

![MOM async](images/mom-async.PNG)

![](images/207px-Google_Voice_icon.png)

![](images/messaggi-vocali-whatsapp.jpg)

# Advantages
https://en.wikipedia.org/wiki/Message-oriented_middleware#Advantages

## Asynchronicity

A client makes an API call to send a message to a destination managed by the provider. 

The call invokes provider services to route and deliver the message. 

Once it has sent the message, the client can continue to do other work, confident that the provider retains the message until a receiving client retrieves it. 

![](images/async-pray.jpg)

## Loosely Coupled
The message-based model, coupled with the mediation of the provider, makes it possible to create a system of loosely coupled components

![](images/coupling-sketches-cropped-1.png)

https://dailyfintech.com/2017/02/20/applying-loose-coupling-software-principles-to-enterprise-digital-transformation/

## Routing

Many message-oriented middleware implementations depend on a message queue system

Some implementations permit routing logic to be provided by the messaging layer itself, while others depend on client applications to provide routing information or allow for a mix of both paradigms. 

Some implementations make use of broadcast or multicast distribution paradigms.

![](images/message-queue.jpg)
https://www.slideshare.net/Bozho/overview-of-message-queues

## Transformation

In a message-based middleware system, the message received at the destination need not be identical to the message originally sent. A MOM system with built-in intelligence can transform messages en route to match the requirements of the sender or of the recipient.[3]

In conjunction with the routing and broadcast/multicast facilities, one application can send a message in its own native format, and two or more other applications may each receive a copy of the message in their own native format. 

Many modern MOM systems provide sophisticated message transformation (or mapping) tools which allow programmers to specify transformation rules applicable to a simple GUI drag-and-drop operation.

![](images/fax.png)

# Disadvantages

The primary disadvantage of many message-oriented middleware systems is that they require an extra component in the architecture, the message transfer agent (message broker). 

As with any system, adding another component can lead to reductions in performance and reliability, and can also make the system as a whole more difficult and expensive to maintain.

In addition, many inter-application communications have an intrinsically synchronous aspect, with the sender specifically wanting to wait for a reply to a message before continuing (see real-time computing and near-real-time for extreme cases). 

Because message-based communication inherently functions asynchronously, it may not fit well in such situations. 
That said, most MOM systems have facilities to group a request and a response as a single pseudo-synchronous transaction.

With a synchronous messaging system, the calling function does not return until the called function has finished its task.

In a loosely coupled asynchronous system, the calling client can continue to load work upon the recipient until the resources needed to handle this work are depleted and the called component fails. 

Of course, these conditions can be minimized or avoided by monitoring performance and adjusting message flow, but this is work that is not needed with a synchronous messaging system. 

The important thing is to understand the advantages and liabilities of each kind of system. Each system is appropriate for different kinds of tasks. 

Sometimes, a combination of the two kinds of systems is required to obtain the desired behavior.

# MOM Implementation

# Message Broker

https://en.wikipedia.org/wiki/Message_broker

A message broker (also known as an integration broker or interface engine) is an intermediary computer program module that translates a message from the formal messaging protocol of the sender to the formal messaging protocol of the receiver.

Message brokers are elements in telecommunication or computer networks where software applications communicate by exchanging formally-defined messages

![](images/message-broker.png)

# Primary Role

The primary purpose of a broker is to take incoming messages from applications and perform some action on them. Message brokers can decouple end-points, meet specific non-functional requirements, and facilitate reuse of intermediary functions. For example, a message broker may be used to manage a workload queue or message queue for multiple receivers, providing reliable storage, guaranteed message delivery and perhaps transaction management

# Other Actions

- Route messages to one or more destinations
- Transform messages to an alternative representation
- Perform message aggregation, decomposing messages into multiple messages and sending them to their destination, then recomposing the responses into one message to return to the user
- Interact with an external repository to augment a message or store it
- Invoke web services to retrieve data
- Respond to events or errors
- Provide content and topic-based message routing using the publish–subscribe pattern

##  Message Broker Software
- RabbitMQ
- IBM MQ
- Apache Active MQ
- Apache Kafka 
- Java Message Service 

https://en.wikipedia.org/wiki/Message_broker

## Cloud 
- Amazon Simple Queue Service
- Cloud Pub/Sub
- Azure Service Bus

# Without Message Broker
- 0MQ 

http://wiki.zeromq.org/whitepapers:brokerless

[![](https://img.youtube.com/vi/s5hs9mw-GGg/0.jpg)](https://www.youtube.com/watch?v=s5hs9mw-GGg)

# Data Stream
https://www.researchgate.net/publication/326508370_Definition_of_Data_Streams

A data stream is a countably inﬁnite sequence of elements. 
Different models of data streams exist that take different approaches with respectto the mutability of the stream and to the structure of stream elements

Stream processing refers to analyzing data streams on-the-ﬂy to produce new results as new input data becomes available. 

Time is a central concept in stream processing: inalmost all models of streams, each stream ele-ment is associated with one or more timestamps from a given time domain that might indicate,for instance, when the element was generated,the validity of its content, or when it became available for processing.

# Stream Processing
https://medium.com/stream-processing/what-is-stream-processing-1eadfca11b97

Stream Processing is a Big data technology. 

It is used to query continuous data stream and detect conditions, quickly, within a small time period from the time of receiving the data. 

The detection time period varies from few milliseconds to minutes. 

For example, with stream processing, you can receive an alert when the temperature has reached the freezing point, querying data streams coming from a temperature sensor.

It is also called by many names: real-time analytics, streaming analytics, Complex Event Processing, real-time streaming analytics, and event processing. Although some terms historically had differences, now tools (frameworks) have converged under term stream processing. 

[see this Quora Question for a list of frameworks and last section of this article for history]
(https://www.quora.com/How-is-stream-processing-and-complex-event-processing-CEP-different)

If you want to build the App yourself:

- place events in a message broker topic (e.g. ActiveMQ, RabbitMQ, or Kafka), 

- write code to receive events from topics in the broker ( they become your stream) 

- publish results back to the broker. 

# Data Stream Framework

- Amazon Kinesis
- Google Cloud DataFlow
- Azure Stream Analytics
- IBM Streaming Analytics
- Apache Storm
- Apache Samza 
- Apache Flink
- Apache Spark

# Stream Processing vs MOM


![](images/momvsstream.svg)

# AWS



Amazon SQS offers a reliable, highly-scalable hosted queue for storing messages as they travel between applications or microservices. It moves data between distributed application components and helps you decouple these components. Amazon SQS provides common middleware constructs such as dead-letter queues and poison-pill management. It also provides a generic web services API and can be accessed by any programming language that the AWS SDK supports. Amazon SQS supports both standard and FIFO queues.

Amazon Kinesis Streams allows real-time processing of streaming big data and the ability to read and replay records to multiple Amazon Kinesis Applications. The Amazon bbKinesis Client Library (KCL) delivers all records for a given partition key to the same record processor, making it easier to build multiple applications that read from the same Amazon Kinesis stream (for example, to perform counting, aggregation, and filtering).

![](images/aws-sqs.png)

![](images/aws-kinesis.jfif)

# Kafka 
can do both 
https://itnext.io/is-kafka-a-message-queue-or-a-stream-processing-platform-7decc3cf1cf

![](https://miro.medium.com/max/786/1*6iWJA-Pk0t20_h7ry_j5IQ.png)

## Research is on going

### A Survey on the Evolution of Stream Processing Systems
Marios Fragkoulis, Paris Carbone, Vasiliki Kalavri, Asterios Katsifodimos

Preprint, 3 Aug 2020
>>
Stream processing has been an active research field for more than 20 years, but it is now witnessing its prime time due to recent successful efforts by the research community and numerous worldwide open-source communities. This survey provides a comprehensive overview of fundamental aspects of stream processing systems and their evolution in the functional areas of out-of-order data management, state management, fault tolerance, high availability, load management, elasticity, and reconfiguration. We review noteworthy past research findings, outline the similarities and differences between early ('00-'10) and modern ('11-'18) streaming systems, and discuss recent trends and open problems.

![](images/overview-evolution-stream-processing-fragkoulis-2020.png)

# Demo
https://www.altair.com/resource/panopticon-demo-build-stream-processing-applications-with-zero-coding

https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-twitter-sentiment-analysis-trends

https://streamsets.com/blog/streaming-data-twitter-analysis-spark/

# Biblio

https://www.bbdc.berlin/fileadmin/news/photos/BD_SummerSchool-17-18/StreamProcessing-TilmannRabl.pdf

https://medium.com/stream-processing/what-is-stream-processing-1eadfca11b97

https://www.altexsoft.com/blog/real-time-analytics/

https://searchdatamanagement.techtarget.com/feature/A-comparison-of-open-source-real-time-data-streaming-vendors

https://hackernoon.com/how-stream-processing-makes-your-event-driven-architecture-better-ep1ht2g6d

https://eng.uber.com/chaperone-audit-kafka-messages/

https://netflixtechblog.com/keystone-real-time-stream-processing-platform-a3ee651812a

https://netflixtechblog.com/stream-processing-with-mantis-78af913f51a6

https://www.upsolver.com/blog/streaming-data-architecture-key-components

http://www.ce.uniroma2.it/courses/sdcc1718/slides/ComunicazioneSD_2.pdf

https://techbeacon.com/app-dev-testing/what-apache-kafka-why-it-so-popular-should-you-use-it

https://www.slideshare.net/manuswath/message-oriented-middleware-183419318

https://www.pluralsight.com/guides/data-stream-management-with-apache-kafka-streams

http://www.cs.iit.edu/~iraicu/teaching/CS550-S11/lecture08.pdf

https://mapr.com/blog/real-time-message-driven-service-oriented-architecture-bringing-boom/

https://towardsdatascience.com/introduction-to-stream-mining-8b79dd64e460

https://thenewstack.io/apache-kafka-primer/

https://itnext.io/is-kafka-a-message-queue-or-a-stream-processing-platform-7decc3cf1cf

https://dzone.com/articles/an-overview-of-kafka-distributed-message-system

https://www.researchgate.net/publication/343414352_A_Survey_on_the_Evolution_of_Stream_Processing_Systems