The Parser for Apache Spark parses unmodified Spark history server event logs extracting information to a compact format that can more readily be applied to generating Sync predictions. See the user guides for information on where to find event logs. Related tools with their documentation may also be helpful: client_tools.
Parsed logs contain metadata pertaining to your Apache Spark application execution. Particularly, the run time for a task, the amount of data read & written, the amount of memory used, etc. These logs do not contain sensitive information such as the data that your Apache Spark application is processing. Below is an example of the output of the log parser.
Install the package in this repo to your Python 3 environment, e.g.
pip3 install https://github.com/synccomputingcode/spark_log_parser/archive/main.tar.gz
If you have not already done so, complete the instructions to download the Apache Spark event log.
-
To process a log file execute the spark-log-parser command with a log file path and a directory in which to store the result like so:
spark-log-parser -l <log file location> -r <result directory>
The parsed file
parsed-<log file name>
will appear in the result directory. -
Send Sync Computing the parsed log
Email Sync Computing (or upload to the Sync Auto-tuner) the parsed event log.