Skip to content

grania38/Data-engineering-Bikes-availability-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Bikes-availability-project

This project is a data science project. First, we extract real time data from Velib API using Kafka. The second step, is to create a Pyspark consumer to insert the real time data into Elasticsearch.Meanwhile, we should create an Elasticsearch index. Finally, we can visualize the data in Kibana.

  • "producer-kafka.py" file represents the Kafka producer.
  • "spark.py" file represents the Pyspark consumer.
  • "create_index.py" file is for creating the Elasticsearch index.

To run the project:

  • $ python3 producer-kafka.py
  • $ spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.1 spark.py
  • $ sudo systemctl enable elasticsearch
  • $ sudo systemctl start elasticsearch
  • $ sudo systemctl status elasticsearch (to check if Elasticsearch is running)
  • $ sudo systemctl enable kibana
  • $ sudo systemctl start kibana
  • $ sudo systemctl status kibana (to check if kibana is running)
  • $ python3 crate_index.py

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages