Spark Samples

This represents the beginning of a tutorial I was creating for Spark users on the Gordon supercomputer at the San Diego Supercomputer Center. I ran out of time when building out the full set of training material, so now I just keep some Spark scripts I've written and used.

wordcount-spark.py

This is the canonical word count example implemented in pyspark.

analyze-mmapplypolicy.py

This is a script to perform some analysis of the output of the mmapplypolicy command that dumps every file and most of its metadata on a GPFS cluster. As of July 5, 2016, this only works on a pre-filtered format and not the full mmapplypolicy output.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md		README.md
analyze-mmapplypolicy.py		analyze-mmapplypolicy.py
parse-event-log.py		parse-event-log.py
wordcount-spark.py		wordcount-spark.py
wordcount-spark.qsub		wordcount-spark.qsub

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spark Samples

wordcount-spark.py

analyze-mmapplypolicy.py

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Spark Samples

wordcount-spark.py

analyze-mmapplypolicy.py

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages