Skip to content

ritchie-xl/File-Summary-Java

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

File_Summary

I. File List

  1. main.java
  2. hive.sql
  3. mysql.sql
  4. most_common.java
  5. most_common_by_frequency.java
  6. Node.java
  7. Node_for_avg.java
  8. word_length.java
  9. year.java

Program can be built using default make argument in Eclipse/Intelliji IDEA

II.How to run

  1. For Java program, first locate the data file path, then compile the program, simply run the program in the command line by typing following in the shell:
  • java main [input_data_file_path] or run the program in Eclipse or Intelliji IDEA by simply clicking the run button, then follow the prompt of the program
  1. For Hive query, first locate the data file in HDFS, then run the Hive script by typing following:
  • hive -e f hive.sql
  1. For mysql script, the same as above

III, Project Detail

  1. Project Name: Google Ngram
  2. Description: Analyze the ngram data from Google to find out all the detail of the data,including:
  • the information of all the words' length(min, max, med, avg, std, etc)
  • the information of all the words' frequency(min,max,med,avg,std,etc)
  • the information of all the total year the word apprears(min, max, med, avg, std,etc)
  • the most common words according to its years' count
  • the most common words according to its total frequency
  1. Input/Output:
  • Input: the file path of the data.
    • Eg, $java main ./file_path
  • Output: display of the result according to the user's choose

About

BitBootCamp Big Data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages