### **Streaming data from Kafka to Elasticsearch using Apache Flink and Monitoring using Kibana**

In this experience, you will learn how to build and run an end-to-end PyFlink pipeline for data analytics, covering the following steps:

* Reading data from a Kafka source;
* Creating data using a UDF;
* Performing a simple aggregation over the source data;
* Writing the results to Elasticsearch and visualizing them in Kibana.


For more details, you can follow this [link](https://github.com/apache/flink-playgrounds/tree/master/pyflink-walkthrough).

![alt text](img/flink002_architecture.png)

**Pipeline Components:**

* Apache Kafka
* Apache Flink
* Elasticsearch
* Kibana


In [4]:
##Embedding IP into notebook
from dotenv import load_dotenv
import os

load_dotenv()
ip = os.environ.get("PUBLIC_IP")
ip="18.193.119.47"

### Step 1:

In [5]:
from IPython.display import IFrame

#Run cell to activate Terminal 1

terminal_link_ksql = f'http://{ip}:8888/terminals/1'   
IFrame(terminal_link_ksql, width=1000, height=400)

#You can use the following command to read data from the Kafka topic and check whether it's generated correctly:
## docker-compose exec kafka kafka-console-consumer.sh --bootstrap-server kafka:9092 --topic payment_msg
# CTRL+C to close the kafka consumer in terminal

#Submit the PyFlink job by running the following code in terminal 
##docker-compose exec jobmanager ./bin/flink run -py /opt/pyflink-walkthrough/payment_msg_proccessing.py -d

   Navigate to the Flink Web UI after the job is submitted successfully. There should be a job in the running job list. Click the job to get more details. 

   You should see that the StreamGraph of the payment_msg_proccessing consists of two nodes, each with a parallelism of 1. 

   There is also a table in the bottom of the page that shows some metrics for each node (e.g. bytes received/sent, records received/sent). 

   Note that Flink's metrics only report bytes and records and records communicated within the Flink cluster, and so will always report 0 bytes and 0 records received by sources, and 0 bytes and 0 records sent to sinks - so don't be confused that noting is reported as being read from Kafka, or written to Elasticsearch.

In [6]:
#Run cell to activate Flink Web UI

flink = f'http://{ip}:8081/'   
IFrame(flink, width=1000, height=400)

### Step 2:

Navigate to the Kibana UI, open the menu list by clicking the menu button in the upper left corner, then choose the Dashboard item to turn to the dashboard page and choose the pre-created dashboard payment_dashboard. 

There will be a vertical bar chart and a pie chart demonstrating the total amount and the proportion of each province.

In [7]:
#Run cell to activate Kibana UI

kibana = f'http://{ip}:5601/'   
IFrame(kibana, width=1000, height=400)

### Step 3:

Stop the PyFlink job

Visit the Flink Web UI , select the job, and click *Cancel Job* in the upper right corner.

In [8]:
#Run cell to activate Flink Web UI

IFrame(flink, width=1000, height=400)