Spark Log Parser

The Parser for Apache Spark parses unmodified Spark history server event logs extracting information to a compact format that can more readily be applied to generating Sync predictions. See the user guides for information on where to find event logs. Related tools with their documentation may also be helpful: client_tools.

Parsed logs contain metadata pertaining to your Apache Spark application execution. Particularly, the run time for a task, the amount of data read & written, the amount of memory used, etc. These logs do not contain sensitive information such as the data that your Apache Spark application is processing. Below is an example of the output of the log parser.

Installation

Install the package in this repo to your Python 3 environment, e.g.

pip3 install https://github.com/synccomputingcode/spark_log_parser/archive/main.tar.gz

Parsing your Spark logs

Step 0: Generate the appropriate Apache Spark History Server Event log

If you have not already done so, complete the instructions to download the Apache Spark event log.

Step 1: Parse the log to strip away sensitive information

To process a log file execute the spark-log-parser command with a log file path and a directory in which to store the result like so:
```
spark-log-parser -l <log file location> -r <result directory>
```
The parsed file parsed-<log file name> will appear in the result directory.
Send Sync Computing the parsed log

Email Sync Computing (or upload to the Sync Auto-tuner) the parsed event log.

Name		Name	Last commit message	Last commit date
Latest commit History 96 Commits
.github/workflows		.github/workflows
docs		docs
results		results
spark_log_parser		spark_log_parser
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
makefile		makefile
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github/workflows

.github/workflows

docs

docs

results

results

spark_log_parser

spark_log_parser

tests

tests

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

makefile

makefile

pyproject.toml

pyproject.toml

Repository files navigation

Spark Log Parser

Installation

Parsing your Spark logs

Step 0: Generate the appropriate Apache Spark History Server Event log

Step 1: Parse the log to strip away sensitive information

About

Releases 9

Packages

Contributors 6

Languages

License

synccomputingcode/spark_log_parser

Folders and files

Latest commit

History

Repository files navigation

Spark Log Parser

Installation

Parsing your Spark logs

Step 0: Generate the appropriate Apache Spark History Server Event log

Step 1: Parse the log to strip away sensitive information

About

Resources

License

Stars

Watchers

Forks

Languages