-
Event (log) producer implemented in Java,
cofigured to send data via TCP socket. -
Flume agent configuration attached,
source: Netcat
channel: Memory
sink: Data stream into Textfile
Configured to store data on hdfs 'partitioned' by date. -
Hive QL queries with some simple data processing
with scores stored in Hive tables.Data for countries can be found here:
https://geolite.maxmind.com/download/geoip/database/GeoLite2-Country-CSV.zip -
Scoop commands for transfering HiveQL results into
PostgreSQL RDBMS. -
Spark approach written in Scala, consisting
same processing like in HiveQL, but using different Spark API's- RDD (Resilient Distributed Datased)
- Dataset/Dataframe with loading data into RDBMS.
saturator22/big-data-example-project
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
No description, website, or topics provided.
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published