Skip to content

saturator22/big-data-example-project

Repository files navigation

Big Data Simple Workflow example

  • Event (log) producer implemented in Java,
    cofigured to send data via TCP socket.

  • Flume agent configuration attached,
    source: Netcat
    channel: Memory
    sink: Data stream into Textfile
    Configured to store data on hdfs 'partitioned' by date.

  • Hive QL queries with some simple data processing
    with scores stored in Hive tables.

    Data for countries can be found here:
    https://geolite.maxmind.com/download/geoip/database/GeoLite2-Country-CSV.zip

  • Scoop commands for transfering HiveQL results into
    PostgreSQL RDBMS.

  • Spark approach written in Scala, consisting
    same processing like in HiveQL, but using different Spark API's

    • RDD (Resilient Distributed Datased)
    • Dataset/Dataframe with loading data into RDBMS.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published