Big Data Simple Workflow example

Event (log) producer implemented in Java,
cofigured to send data via TCP socket.
Flume agent configuration attached,
source: Netcat
channel: Memory
sink: Data stream into Textfile
Configured to store data on hdfs 'partitioned' by date.
Hive QL queries with some simple data processing
with scores stored in Hive tables.

Data for countries can be found here:
https://geolite.maxmind.com/download/geoip/database/GeoLite2-Country-CSV.zip
Scoop commands for transfering HiveQL results into
PostgreSQL RDBMS.
Spark approach written in Scala, consisting
same processing like in HiveQL, but using different Spark API's
- RDD (Resilient Distributed Datased)
- Dataset/Dataframe with loading data into RDBMS.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
HdfsToRdbms		HdfsToRdbms
flume_agent_conf		flume_agent_conf
hiveql		hiveql
postgres_tables		postgres_tables
project/src		project/src
sqoop_commands		sqoop_commands
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback