pyspark-plaso

A tool for distributed extraction of timestamps from various files using extractors adapted from the Plaso engine to Apache Spark.

Usage

The PySpark Plaso is running in a Docker container and it is accessible as a Web service via a REST API.

See the project Wiki Pages for details.

Deployment

Use a Prebuilt Docker Image

There is a prebuilt Docker image.

See the webapp-prebuilt.yml docker-compose file.

Build and Deploy

cd ./deployment
# create a Python virtual environment including a required Python packages
./010-make-python-virtualenv.sh
# pack the Python packages into a ZIP file ready to use in PySpark
./020-make-site-packages-zip.sh
# create JAR packages for Java dependencies
./030-make-java-helpers.sh
# run the PySpark Plaso infrastructure as Docker containers by docker-compose
./040-run-docker-webapp.sh

See the project Wiki Pages for details and also the webapp.yml docker-compose file.

Kubernetes

For Kubernetes see its resource files.

Dependencies

The PySpark Plaso is

utilizing extractors adapted from the Plaso Project
running at Docker containers from the TARZAN Docker Infrastructure Project

Acknowledgements

This work was supported by the Ministry of the Interior of the Czech Republic as a part of the project Integrated platform for analysis of digital data from security incidents VI20172020062.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.idea		.idea
deployment		deployment
docs		docs
java-helpers @ 81703d8		java-helpers @ 81703d8
misc		misc
src		src
test_data		test_data
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
.gitmodules		.gitmodules
.project		.project
.pydevproject		.pydevproject
LICENSE		LICENSE
README.md		README.md
fetch-test-data-plaso.sh		fetch-test-data-plaso.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pyspark-plaso

Usage

Deployment

Use a Prebuilt Docker Image

Build and Deploy

Kubernetes

Dependencies

Acknowledgements

About

Releases

Packages

Contributors 2

Languages

License

nesfit/pyspark-plaso

Folders and files

Latest commit

History

Repository files navigation

pyspark-plaso

Usage

Deployment

Use a Prebuilt Docker Image

Build and Deploy

Kubernetes

Dependencies

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages