Skip to content
Bin Shi edited this page Apr 27, 2018 · 4 revisions

Access Log Analysis System

  • Data processing using Pig/Spark/Hive, writing custom UDF, loader and storer to implement business logic (Pig, Spark, Hive)
  • Implementing a data processing pipeline (Oozie, Sqoop, Pig, Spark)
  • Created a Hadoop web crawler based on Nutch (Hadoop, Nutch)
  • Real-time data ingestion (Flink) (todo)
  • Interactive query and data visualization (Presto, Superset)

Skills: Hadoop, Pig, Spark, Hive, Oozie, Sqoop, Nutch, Flink, Presto, Superset

Pages

  1. Random Sampler Pattern
  2. Single Node Counter
  3. EMR Cluster Counter
  4. Analyze Data (Pig & Hive)
Clone this wiki locally