Skip to content
This repository has been archived by the owner on Oct 26, 2023. It is now read-only.

snyk-labs/snyk-scm-refresh

Repository files navigation

Snyk logo

snyk-scm-refresh

⛔️ THIS REPOSITORY IS ARCHIVED.

This repository is archived and will not receive any updates or accept issues and pull requests. Please make use of the snyk-api-import-tool instead of snyk-scm-refresh. The snyk-api-import tool benefits from longer-term support and covers the majority of use cases that scm-refresh does. You can follow the migration guide to help you make the translation. This repo will be archived as of October 1st 2023.

Description

Keeps Snyk projects in sync with their associated Github repos

For repos with at least 1 project already in Snyk:

  • Detect and import new manifests
  • Remove projects for manifests that no longer exist
  • Update projects when a repo has been renamed
  • Detect and update default branch change (not renaming)
  • Enable Snyk Code analysis for repos
  • Detect deleted repos and log for review

STOP NOW IF ANY OF THE FOLLOWING ARE TRUE

  • Monitoring non-default branches
  • Using an SCM other than Github.com or Github Enterprise Server

Usage

usage: snyk_scm_refresh.py [-h] [--org-id ORG_ID] [--repo-name REPO_NAME] [--sca {on,off}]
                           [--container {on,off}] [--iac {on,off}] [--code {on,off}] [--dry-run]
                           [--skip-scm-validation] [--debug]

optional arguments:
  -h, --help            show this help message and exit
  --org-id ORG_ID       The Snyk Organisation Id found in Organization > Settings. If omitted,
                        process all orgs the Snyk user has access to.
  --repo-name REPO_NAME
                        The full name of the repo to process (e.g. githubuser/githubrepo). If
                        omitted, process all repos in the Snyk org.
  --sca {on,off}        scan for SCA manifests (on by default)
  --container {on,off}  scan for container projects, e.g. Dockerfile (on by default)
  --iac {on,off}        scan for IAC manifests (experimental, off by default)
  --code {off}          code analysis is deprecated with off only option
  --on-archived {ignore,deactivate,delete}
                        Deletes or deactivates projects associated with archived repos (ignore by default)
  --on-unarchived {ignore,reactivate}
                        If there is a deactivated project in Snyk, should the tool reactivate it if the repo is not
                        archived? (Warning: Use with caution, this will reactivate ALL projects associated with a repo)
  --dry-run             Simulate processing of the script without making changes to Snyk
  --skip-scm-validation
                        Skip validation of the TLS certificate used by the SCM
  --audit-large-repos   only query github tree api to see if the response is truncated and 
                        log the result. These are the repos that would have be cloned via this tool
  --debug               Write detailed debug data to snyk_scm_refresh.log for troubleshooting

Sync with defaults

./snyk_scm_refresh.py --org-id=12345

Sync SCA projects only

./snyk_scm_refresh.py --org-id=12345 --container=off

Sync Container projects only

./snyk_scm_refresh.py --org-id=12345 --sca=off --container=on

Deprecated

Snyk Code analysis for repos (Deprecated)

only: ./snyk_scm_refresh.py --org-id=12345 --sca=off --container=off --code=on
defaults + snyk code enable: ./snyk_scm_refresh.py --org-id=12345 --code=on

Dependencies

pip install -r requirements.txt

or

python3 -m pip install -r requirements.txt

Environment

export SNYK_TOKEN=<snyk-token>
export GITHUB_TOKEN=<github-token>
export GITHUB_ENTERPRISE_TOKEN=<github-enterprise-token>
export GITHUB_ENTERPRISE_HOST=<github-enterprise-host>

If GITHUB_TOKEN is set, your Github.com repos will be processed

If GITHUB_ENTERPRISE_TOKEN and GITHUB_ENTERPRISE_HOST are BOTH set, your Github Enterprise Server repos will be processed

ℹ️ If Snyk Github Enterprise Integration type is used for your Github.com repositories, then set GITHUB_ENTERPRISE_HOST=api.github.com

Getting a GitHub token

  1. In GitHub.com browse: https://github.com/settings/tokens/new. Or in GitHub Enterprise select your user icon (top-right), then 'Settings', then 'Developer settings', then 'Personal access tokens'.
  2. Scopes - Public repos do not need a scope. If you want to scan private repos, then you'll need to enable this scope: repo (Full control of private repositories)

Handling self-signed certificates

This tool uses the python requests library, therefore you can point REQUESTS_CA_BUNDLE environment variable to the location of your cert bundle

export REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt

If you are not able to validate the self-signed certificate, you may skip validation by providing the --skip-scm-validation option.

Instructions

Make sure to use a user API Token that has access to the Snyk Orgs you need to process with the script. A service account will not work for GitHub, which is the only SCM currently supported at this time.

Ensure that your GITHUB_TOKEN or GITHUB_ENTERPRISE_TOKEN has access to the repos contained in the Snyk Orgs in scope If unsure, try one org at a time with --org-id

Recommended: This tool will delete projects from Snyk that are detected as stale or have since been renamed

Use the --dry-run option to verify the execution plan for the first run

Each run generates a set of output files:

File Name Description
snyk-scm-refresh.log debug log output good for troubleshooting
_potential-repo-deletes.csv repo no longer exists
_stale-manifests-deleted.csv monitored manifest files no longer exists
_renamed-manifests-deleted.csv manifests of renamed repos that were removed
_renamed-manifests-pending.csv manifests of renamed repos that were not removed. Only when the import of the repo under the new name is completed are the old ones removed.
_completed-project-imports.csv manifests that were imported during this job run
_updated-project-branches.csv projects with updated default branch
_update-project-branches-errors.csv projects that had an error attempting to update default branch
_repos-skipped-on-error.csv repos skipped due to import error
_manifests-skipped-on-limit.csv manifest projects skipped due to import limit

Handling of large repositories

The primary method used by this tool to retrieve the GIT tree from each repository for the basis of comparison is via the Github API.
For sufficiently large repositories, though, Github truncates the API response. When a truncated Github response is detected when retrieving the GIT tree, this tool will fall back on using the local git if available and configured to perform a shallow clone of the repository's default branch in order to build the tree.

It will use /tmp to perform the git clone and then capture the output of git ls-tree -r

When this situation occurs, you will see the following in the console:

Large repo detected, falling back to cloning. This may take a few minutes ...

image

The truncated GIT tree response is described here. The last known limits are: 100,000 files or 7 MB of response data, whichever is first.

Auditing which repos are considered large

In order to detect which repositories in snyk are subject the tree truncation issue mentioned above, there is another available option --audit-large-repos. This will only query the git tree via API and look for a truncated response, and then log the results to a file snyk-scm-refresh_large-repos-audit-results.csv

To find all the repos based on a Snyk org, use the --org-id parameter in conjunction with --audit-large-repos Optionally you can also supply a repo name to check a single repo by also supplying the --repo-name filter.

Importing manifest limit

There is a set manifest projects import limit per execution. Skipped manifests projects above the limit will be logged to a CSV file. Relaunch snyk_scm_refresh at the next execution schedule to import any skipped projects.