Spark-Main

Spark project In this project we have analyse a coronavirus dataset basically

1)First we have make multiple topics on kafka and schema registry 2)Then write producer program and read diff csv files that are present in the dataset 3)Through producer program , streamed the data to Confluent kafka topic 4)After Streaming the data , we consumed it by making consumer and while consuming it we have dump the data in mongodb

5)Now from mongodb we have consumed the data in pyspark and then perform diff analytical operation on top of it.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
consumer		consumer
csv		csv
producer		producer
.gitattributes		.gitattributes
Case.csv		Case.csv
Proof_of_work2.png		Proof_of_work2.png
README.md		README.md
Region.csv		Region.csv
TimeProvince.csv		TimeProvince.csv
final.ipynb		final.ipynb
prof of work 3.jpg		prof of work 3.jpg
proof1.png		proof1.png
proof_of_work 4.jpg		proof_of_work 4.jpg
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spark-Main

About

Releases

Packages

Languages

karangarg218/Spark-Main

Folders and files

Latest commit

History

Repository files navigation

Spark-Main

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages