Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Registry Spec v1 #1179

Closed
underyx opened this issue Jul 1, 2020 · 2 comments
Closed

RFC: Registry Spec v1 #1179

underyx opened this issue Jul 1, 2020 · 2 comments
Labels
stale needs prioritization; label applied by stalebot

Comments

@underyx
Copy link
Member

underyx commented Jul 1, 2020

This is meant to be an open discussion for an eventually formalized Semgrep registry standard. This issue is not a deliverable currently.

Why create this issue now, then? 🤔

Two questions emerged as I was working on Semgrep today:

  1. What shorthand should we support in the CLI?
  2. What's the best way to namespace rules and packs down the line?

Instead of writing down my registry design thoughts in the respective issues, I thought it'd be better to collect all registry design thoughts in one place, and reference this from the other two issues.

Glossary

Pattern

Atomic unit for code search, e.g. $X == $X

Rule

Bundles one or more patterns, their relationships (not, or, inside), and some associated metadata (message, severity, etc.)

Registry

A dot-separated directory structure hierarchy of rules — this technically tacks on some metadata to the rule, as where it's stored is categorization information.

Registry Reference

e.g. python.flask.security.no-debug

Snippet

A rule kept outside the registry hierarchy, consider it a github repo/gist distinction. Snippets can be promoted to be Registry rules.

Snippet Alias

As snippets get random IDs by default, aliases are used to name them. Aliases are mutable to allow updating the underlying rules. Aliasing is mostly invisible to users who will think of this as their snippet's name. I.e. we can allow snippet renaming on client apps, while in the background we 1) clone the snippet with edits to a new ID, 2) update the alias to point to the new ID

Pack

A collection of one or more of the following references: registry references, snippet IDs, snippet aliases, and pack IDs. Packs are stored outside a hierarchy, like snippets.

Namespacing

Users can create their own snippet aliases and packs. We use : as in <username>:<item-name> to prevent item name clashes between users.

Moonshot: namespacing in the registry hierarchy 🚀

Maybe we'll also let people have their own categories in the registry, such as python.acmecorp:acme-api.* for Acme Corp.'s internal API framework. In this case, if a category doesn't specify a namespace, we can consider it to be returntocorp: (r2c: later for brevity's sake).

In this case, contrib rules such as nodejsscan or dlint might actually go under dlint:python.security.rule-name or python.dlint:security.rule-name. Only rules from namespaces mentioned in the queries should be used, so --config=python wouldn't implicitly run python.dlint:security.

  • This is better than / as request routing will be more robust (imagine a request like POST semgrep.live/s/underyx/my-rule/comments)
  • This is better than . as the registry hierarchy already uses dots
  • This is better than - as github allows dashes in usernames, which is what namespaces will be based on
  • _ was considered but it doesn't feel much like a namespace separator, feels more like a replacement for a space character in case of multi-word names (we already have some registry references using it)
  • As one example, MediaWiki uses : for denoting namespaces, so I expect all tooling to support the : character in the path part of the registry.
  • Writing a regex to find all used namespaces like this is trivial. Search \b(\w+): on python.acmecorp:acme-api.security.underyx:my-rule (which is an example from the collapsed moonshot section) and you get ['acmecorp', 'underyx'] so we can check if you have write access to both these namespaces when saving.

URLs

This is how you find the various types in registry:

Reference Type URL
Pack semgrep.live/p/underyx:pack-name
Snippet semgrep.live/aRsT
semgrep.live/underyx:rule-name
Registry query semgrep.live/r/python.flask.security.rule-id
semgrep.live/r/python.*.security
semgrep.live/r/python
semgrep.live/r/python.acmecorp:acme-api.security

Shorthand

For use in the CLI or easier configuration via typing on the web UI, the following shorthand is available:

Prefix Examples
p/ Pack p/gosec
p/underyx:custom-pack
r/ Registry query r/python.flask.security.rule-id
r/python.*.security
r/python
r/python.acmecorp:acme-api.security
s/ Snippet s/aRsT
s/underyx:rule-name

Data Model

The data model is still changing a lot so I'm not going to document it here. This is the main reason I called this issue an "open discussion" as opposed to a spec to eventually implement.

@underyx underyx changed the title Registry Spec v0 Registry Spec v1 Jul 1, 2020
@underyx underyx changed the title Registry Spec v1 RFC: Registry Spec v1 Jul 1, 2020
@stale
Copy link

stale bot commented Feb 18, 2021

This issue is being marked stale because there hasn't been any activity in 30 days. Please leave a comment if you think this issue is still relevant and should be prioritized, otherwise it will be automatically closed in 7 days (you can always reopen it later).

@stale stale bot added the stale needs prioritization; label applied by stalebot label Feb 18, 2021
@stale
Copy link

stale bot commented Feb 25, 2021

Stale-bot has closed this stale item. Please reopen it if this is in error.

@stale stale bot closed this as completed Feb 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale needs prioritization; label applied by stalebot
Development

No branches or pull requests

1 participant