Skip to content
Newer
Older
100644 50 lines (26 sloc) 2.55 KB
5f4352f @flacrosse Update Word Count With Pig.md
flacrosse authored Oct 24, 2013
1 ## Community Tutorial 03: Word Counting with Pig
e8c26de @flacrosse Create Word Count With Pig
flacrosse authored Oct 24, 2013
2
1152608 @vivganes Forgot to add what to download. Adding "the Hortonworks Sandbox".
vivganes authored Mar 14, 2014
3 **This tutorial is from the Community part of tutorial for [Hortonworks Sandbox](http://hortonworks.com/products/sandbox) - a single-node Hadoop cluster running in a virtual machine. [Download](http://hortonworks.com/products/sandbox) the Hortonworks Sandbox to run this and other tutorials in the series.**
e8c26de @flacrosse Create Word Count With Pig
flacrosse authored Oct 24, 2013
4
5 ### Summary
6
1a25ac2 @flacrosse Update Word Count With Pig.md
flacrosse authored Oct 24, 2013
7 This tutorial describes how to use Pig with the Hortonwork Sandbox to do a word count of an imported text file.
e8c26de @flacrosse Create Word Count With Pig
flacrosse authored Oct 24, 2013
8
1a25ac2 @flacrosse Update Word Count With Pig.md
flacrosse authored Oct 24, 2013
9 ### Create a text file with data
e8c26de @flacrosse Create Word Count With Pig
flacrosse authored Oct 24, 2013
10
1a25ac2 @flacrosse Update Word Count With Pig.md
flacrosse authored Oct 24, 2013
11 This can be anything but I ended up using the output of some textual data I had in SQL and dumping it into a text file. It’s definitely a little more interesting if you can work with some data you know or at least have an interest in.
e8c26de @flacrosse Create Word Count With Pig
flacrosse authored Oct 24, 2013
12
5f4352f @flacrosse Update Word Count With Pig.md
flacrosse authored Oct 24, 2013
13 ### Import the file into the Sandbox
14
15 Go to the File Browser tab and upload the .txt file. Take note of the default location it is loading to (/user/hue).
16
e0da02b @flacrosse <Message>
flacrosse authored Oct 24, 2013
17 ![Alt text](./images/tutorial-03/images/screenshot1.png)
1e3b4f5 @flacrosse <Message>
flacrosse authored Oct 24, 2013
18
5f4352f @flacrosse Update Word Count With Pig.md
flacrosse authored Oct 24, 2013
19 ### Write a Pig script to parse the data and dump to a file
20
21 I put this code together from snippets I found on the web. The key thing here is to make sure your load statement is referencing the location where your file lives and that you specify an output location to store the file. Note: I didn’t create the /pig_wordcount folder before I ran this, the script ended up creating the location which was a handy feature. Just hit execute and sit back, you can check the run status on the query history tab.
22
0230ffd @vivganes Added syntax highlighting to the sample code listings.
vivganes authored Mar 9, 2014
23 ```pig
4020375 @flacrosse msg
flacrosse authored Oct 24, 2013
24 a = load '/user/hue/word_count_text.txt';
25 b = foreach a generate flatten(TOKENIZE((chararray)$0)) as word;
26 c = group b by word;
27 d = foreach c generate COUNT(b), group;
28 store d into '/user/hue/pig_wordcount';
29 ```
e0da02b @flacrosse <Message>
flacrosse authored Oct 25, 2013
30 ![Alt text](./images/tutorial-03/images/screenshot2.png)
4020375 @flacrosse msg
flacrosse authored Oct 25, 2013
31
3d998e8 @flacrosse Update Word Count With Pig.md
flacrosse authored Oct 24, 2013
32 ### Use HCatalog to load the file to a “table”
33
34 Being a SQL developer by day I wanted to be able to query the results in a familiar way so I decided to create a table using HCatalog so that it would be easily accessible through Hive. So I went into the HCatalog tab and chose the file from the folder I specified, named the table and columns, and hit create table. It churned for a while but eventually completed.
35
e0da02b @flacrosse <Message>
flacrosse authored Oct 25, 2013
36 ![Alt text](./images/tutorial-03/images/screenshot3.png)
37
3d998e8 @flacrosse Update Word Count With Pig.md
flacrosse authored Oct 24, 2013
38 ### Use Hive to query and sort the data for final output
39
40 Finally, I went into the Hive tab and wrote a quick query to return and organize the results. Once it was completed I downloaded it and put the results in Excel so I could print and frame them.
41
e0da02b @flacrosse <Message>
flacrosse authored Oct 25, 2013
42 ![Alt text](./images/tutorial-03/images/screenshot4.png)
43
44 ![Alt text](./images/tutorial-03/images/screenshot5.png)
45
46
3d998e8 @flacrosse Update Word Count With Pig.md
flacrosse authored Oct 24, 2013
47
48
e8c26de @flacrosse Create Word Count With Pig
flacrosse authored Oct 24, 2013
49
Something went wrong with that request. Please try again.