Skip to content

sanket0001/Exploring-and-Analyzing-Conference-Papers-using-Hadoop-Ecosystems

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Exploring-and-Analyzing-Conference-Papers-using-Hadoop-Ecosystems

Hadoop Ecosystem’s is one of the big data system that consists of different tools to maintain the data and process it. So by using Hadoop tools, we have analyzed conference papers for finding results on different problems such as finding similar papers, datasets used in all papers and many more.

In this project, we have analyzed the conference papers that are represented at NIPS conference in 2015. Using Hadoop Ecosystems tools such as MapReduce, Hive and Spark programming and machine learning algorithms. Also we find the most similar papers for given paper from dataset. Also, authors affiliations, Datasets used in NIPS conference. Datasets. Number of published papers for each author and finally we cluster the titles of papers into groups using Spark and machine learning algorithms.

Each paper is in PDF format. The total number of papers is 403 that has been accepted in NIPS 2015 and the average pages is 10 pages.

Dataset link - https://www.kaggle.com/benhamner/nips-2015-papers

Read How to run File in Source Folder. Datasets files folder contains all txt files of papers and also authors.csv and paperauthors.csv files for hive problem.

Documentation folder contains conatains documentation.

About

No description or website provided.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages