Tutorial 1: Spark Line Count

Program for Spark Line Count and Sort

Input: Here is the screenshot of an input file. I have included same sentences to check if a program works correctly or not.

Input1

Workflow: The basic workflow I used for counting lines and sorting count is here.

Workflow

Code Snippet: The spark code for above workflow is given below.

Code Snippet

Explanation:

Consider the following input for a simple explanation.

This is apache storm line count example. It is map reduce procedure.

This is apache storm line count example.

It is very fast and accurate.

It is 100 times faster than Hadoop.

It is map reduce procedure.

####Procedure of Line Count and Sorting.

1. Split sentence by periods (.)

Output of this stage is:

This is apache storm line count example

It is map reduce procedure

This is apache storm line count example

It is very fast and accurate

It is 100 times faster than Hadoop

It is map reduce procedure

This is input for next stage where mapping is performed.

2. Generate key value pair using map

Output of this stage is:

(This is apache storm line count example,1)

(It is map reduce procedure,1)

(This is apache storm line count example,1)

(It is very fast and accurate,1)

(It is 100 times faster than Hadoop,1)

(It is map reduce procedure,1)

3. Reduce operation to count similar line

Output:

(This is apache storm line count example,2)

(It is very fast and accurate,1)

(It is 100 times faster than Hadoop,1)

(It is map reduce procedure,2)

4. Swap Key and Values with each other

Output:

(2,This is apache storm line count example)

(1,It is very fast and accurate)

(1,It is 100 times faster than Hadoop)

(2,It is map reduce procedure)

5. Use of sortByKey function for sorting

Output:

(1,It is very fast and accurate)

(1,It is 100 times faster than Hadoop)

(2,This is apache storm line count example)

(2,It is map reduce procedure)

6. Again swap key and value with each other

Output:

(It is very fast and accurate,1)

(It is 100 times faster than Hadoop,1)

(This is apache storm line count example,2)

(It is map reduce procedure,2)

Output Here is the screenshots of output for input given above.

output1

output2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tutorial 1: Spark Line Count

Program for Spark Line Count and Sort

Reference

Clone this wiki locally