Skip to content

khodeprasad/java-spark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

java-spark

The following are the steps to execute the application in Hadoop Eco System

Step 1: Create directory in Hadoop Distributed File System (HDFS) to place the input file to process using the below command.

hadoop fs -mkdir <your_hdfs_path>/input/

Step 2: Copy the input_data.csv file in HDFS by using the below command

hadoop fs -put /root/java-spark/input/input_data.csv <your_hdfs_path>/input/

Step 3: Build the project by using the following command

cd <project_location>/java-spark/

mvn clean install

Step 4: Execute by issuing the spark-submit command

spark-submit --class com.khodeprasad.AppExecutor <project_location>/java-spark/target/java-spark-jar-with-dependencies.jar

Step 5: See the output at location <your_hdfs_path>/output/

hadoop fs -ls <your_hdfs_path>/output/

The same can be run as a Java Application by following the below instructions.

Step 1: Build the project by using the following command

cd <project_location>/java-spark/

mvn clean install

Step 2: Execute by issuing the spark-submit command

java -cp <project_location>/java-spark/target/java-spark-jar-with-dependencies.jar com.khodeprasad.AppExecutor

Step 3: See the output in the console

Note: To execute the application as a Java Application you need to comment the line

resultRDD.saveAsTextFile("output");

as it cannot save the processed output to HDFS

You can download the complete dataset from

https://data.cityofchicago.org/api/views/ijzp-q8t2/rows.csv?accessType=DOWNLOAD

or

https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/ijzp-q8t2

About

code snippets to write Apache Spark applications using Java

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages