Sweeper, an HDFS browser¶ ↑

Sweeper is a small HDFS browser that let’s you visualize files stats on Hadoop. We use it at Ning to quickly find large directories to cleanup (with large amounts of small files) to keep the number of files managed by our namenode under control.

Usage¶ ↑

java -jar metrics.sweeper-*-jar-with-dependencies.jar

You can configure the namenode url via sweeper.hadoop.namenode.url:

java  -Dsweeper.hadoop.namenode.url=hdfs://namenode.mycompany.com:9000 -jar metrics.sweeper-*-jar-with-dependencies.jar

You can configure the credentials via sweeper.hadoop.ugi (make sure to escape the coma):

java -Dsweeper.hadoop.ugi='hadoop-user\,hadoop-group' -jar metrics.sweeper-*-jar-with-dependencies.jar

By default, the browser will look at the entire cluster. You can specify a specific path via sweeper.hadoop.path:

java -Dsweeper.hadoop.path=/user/pierre -jar metrics.sweeper-*-jar-with-dependencies.jar

By default, files and directories size are shown (taking into account the replication factor). To show the number of files, use sweeper.content_summary=NUMBER_OF_FILES:

java -Dsweeper.content_summary=NUMBER_OF_FILES -jar metrics.sweeper-*-jar-with-dependencies.jar

Build¶ ↑

mvn assembly:assembly

License¶ ↑

See COPYING file.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
examples		examples
src/main/java/com/ning/sweeper		src/main/java/com/ning/sweeper
.gitignore		.gitignore
COPYING		COPYING
README.rdoc		README.rdoc
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sweeper, an HDFS browser¶ ↑

Usage¶ ↑

Build¶ ↑

License¶ ↑

About

Releases

Packages

License

ning/sweeper

Folders and files

Latest commit

History

Repository files navigation

Sweeper, an HDFS browser¶ ↑

Usage¶ ↑

Build¶ ↑

License¶ ↑

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages