This application populates a neo4j database with advisories from the pypi feed and associates them with packages in the pypi index.
This database servers as a backend for a larger project that aims to provide a service that can be used to help identify if your application is vulnerable.
This project uses uv to manage the packages. The tools being used at present are:
- isort
- mypy
- ruff
The following are the commands useful for this project:
# Check the package for linting and code formatting.
uvx ruff format
uvx ruff check
uvx isort .
uvx --with types-PyYAML --with types-requests --with types-defusedxml --with types-python-dateutil mypy pip_security_worker testsThe configuration for this application is stored in environment variables. These can also come from an .env file, the
location of which is shown in settings.py.
There are two environment variables used in this project that relate to PIP:
The URL for the advisory Git repository.
The URL for the package update feed provided by Pypi.org.
This package relies on access to a KAFKA queue. Development has been carried out using the apache/kafka docker image.
Once the image has been created, the following command is required to create the required topic on the docker container:
./kafka-topics.sh --bootstrap-server localhost:9092 --create --topic analyze --partitions 10For this to function, the following environment variables are required:
The name of the group that will be used to collect tasks. This is to ensure that we can run multiple analysis tasks at the same time.
This is the timeout in ms that for the consumer when it retrieves a new task.
This is a list of Kafka bootstrap servers, this can be a singular server and should be in the format localhost:9092.
Multiple servers can be specified by providing a comma seperated list.
This is the name of the topic that is to be used for posting too and retrieving tasks from the Kafka server.
The resulting dependency tree and links to advisories are stored in a NEO4j database. This is using the standard neo4j docker image.
For this to function, the following environment variables are required:
This is the URL for the neo4j database and should be in the format neo4j://localhost:7687.
This is the neo4j username.
This is the neo4j password.
This package can also report errors to the Sentry service. To enable this, the following environment variable should be populated:
This is the full DSN provided by Sentry when setting up a project and will be in the format:
https://1234567890abcdef1234567890@o123456.ingest.us.sentry.io/1234567890123456
The following is an example of how a complete .env file should look:
PIP_ADVISORY_DB_URL=https://github.com/pypa/advisory-database.git
PYPI_UPDATE_FEED=https://pypi.org/rss/updates.xml
KAFKA_GROUP=analyze
KAFKA_TIMEOUT=5000
KAFKA_BOOTSTRAP_SERVERS=localhost:9092
KAFKA_TOPIC=analyze
NEO4J_URL=neo4j://localhost:7687
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=password
SENTRY_DSN=https://1234567890abcdef1234567890@o123456.ingest.us.sentry.io/1234567890123456