Bikes-availability-project

This project is a data science project. First, we extract real time data from Velib API using Kafka. The second step, is to create a Pyspark consumer to insert the real time data into Elasticsearch.Meanwhile, we should create an Elasticsearch index. Finally, we can visualize the data in Kibana.

"producer-kafka.py" file represents the Kafka producer.
"spark.py" file represents the Pyspark consumer.
"create_index.py" file is for creating the Elasticsearch index.

To run the project:

$ python3 producer-kafka.py
$ spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.1 spark.py
$ sudo systemctl enable elasticsearch
$ sudo systemctl start elasticsearch
$ sudo systemctl status elasticsearch (to check if Elasticsearch is running)
$ sudo systemctl enable kibana
$ sudo systemctl start kibana
$ sudo systemctl status kibana (to check if kibana is running)
$ python3 crate_index.py

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
create_index.py		create_index.py
producer-kafka.py		producer-kafka.py
spark.py		spark.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bikes-availability-project

To run the project:

About

Releases

Packages

Languages

grania38/Data-engineering-Bikes-availability-project

Folders and files

Latest commit

History

Repository files navigation

Bikes-availability-project

To run the project:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages