Skip to content

Apache Pig Latin script to count letters in multiple input text files, using the HortonWorks Hadoop Sandbox or Google Cloud Platform

Notifications You must be signed in to change notification settings

rishabhindoria/Big-Data-Hadoop-Pig-Latin

Repository files navigation

Hadoop-Pig

• Objective: To determine which characters occur how many times in a dataset of textfiles (para1.txt to para6.txt) and performing big data analysis.

• Created a script countChar.pig which automatically maps SQL-like user commands to multiple mappers and reducers in the background which can be executed all in parallel to handle big data, thus listing character count for each alphabet in the dataset.

• Created a script popularFlavor.pig which used two text files purchases.txt (which contains all the purchases made by kids over time) and kids.txt (which contains the count of purchases made by each individual kid) to come up with the answer for the most popular flavor amongst the kids (thus analyzing big data)

About

Apache Pig Latin script to count letters in multiple input text files, using the HortonWorks Hadoop Sandbox or Google Cloud Platform

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published