Design a batch ETL job using HDFS and Hive
The objectives of this project can be categorized as :
Design a full batch data pipeline
How to use Hive to prepare raw data for data transformation
How to use partitioning (sharding) in Hive
Data Set Used : STM GTFS