This project is a data science project. First, we extract real time data from Velib API using Kafka. The second step, is to create a Pyspark consumer to insert the real time data into Elasticsearch.Meanwhile, we should create an Elasticsearch index. Finally, we can visualize the data in Kibana.
- "producer-kafka.py" file represents the Kafka producer.
- "spark.py" file represents the Pyspark consumer.
- "create_index.py" file is for creating the Elasticsearch index.
- $ python3 producer-kafka.py
- $ spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.1 spark.py
- $ sudo systemctl enable elasticsearch
- $ sudo systemctl start elasticsearch
- $ sudo systemctl status elasticsearch (to check if Elasticsearch is running)
- $ sudo systemctl enable kibana
- $ sudo systemctl start kibana
- $ sudo systemctl status kibana (to check if kibana is running)
- $ python3 crate_index.py