You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Instead of writing down my registry design thoughts in the respective issues, I thought it'd be better to collect all registry design thoughts in one place, and reference this from the other two issues.
Glossary
Pattern
Atomic unit for code search, e.g. $X == $X
Rule
Bundles one or more patterns, their relationships (not, or, inside), and some associated metadata (message, severity, etc.)
Registry
A dot-separated directory structure hierarchy of rules — this technically tacks on some metadata to the rule, as where it's stored is categorization information.
Registry Reference
e.g. python.flask.security.no-debug
Snippet
A rule kept outside the registry hierarchy, consider it a github repo/gist distinction. Snippets can be promoted to be Registry rules.
Snippet Alias
As snippets get random IDs by default, aliases are used to name them. Aliases are mutable to allow updating the underlying rules. Aliasing is mostly invisible to users who will think of this as their snippet's name. I.e. we can allow snippet renaming on client apps, while in the background we 1) clone the snippet with edits to a new ID, 2) update the alias to point to the new ID
Pack
A collection of one or more of the following references: registry references, snippet IDs, snippet aliases, and pack IDs. Packs are stored outside a hierarchy, like snippets.
Namespacing
Users can create their own snippet aliases and packs. We use : as in <username>:<item-name> to prevent item name clashes between users.
Moonshot: namespacing in the registry hierarchy 🚀
Maybe we'll also let people have their own categories in the registry, such as python.acmecorp:acme-api.* for Acme Corp.'s internal API framework. In this case, if a category doesn't specify a namespace, we can consider it to be returntocorp: (r2c: later for brevity's sake).
In this case, contrib rules such as nodejsscan or dlint might actually go under dlint:python.security.rule-name or python.dlint:security.rule-name. Only rules from namespaces mentioned in the queries should be used, so --config=python wouldn't implicitly run python.dlint:security.
This is better than / as request routing will be more robust (imagine a request like POST semgrep.live/s/underyx/my-rule/comments)
This is better than . as the registry hierarchy already uses dots
This is better than - as github allows dashes in usernames, which is what namespaces will be based on
_ was considered but it doesn't feel much like a namespace separator, feels more like a replacement for a space character in case of multi-word names (we already have some registry references using it)
As one example, MediaWiki uses : for denoting namespaces, so I expect all tooling to support the : character in the path part of the registry.
Writing a regex to find all used namespaces like this is trivial. Search \b(\w+): on python.acmecorp:acme-api.security.underyx:my-rule (which is an example from the collapsed moonshot section) and you get ['acmecorp', 'underyx'] so we can check if you have write access to both these namespaces when saving.
URLs
This is how you find the various types in registry:
The data model is still changing a lot so I'm not going to document it here. This is the main reason I called this issue an "open discussion" as opposed to a spec to eventually implement.
The text was updated successfully, but these errors were encountered:
This issue is being marked stale because there hasn't been any activity in 30 days. Please leave a comment if you think this issue is still relevant and should be prioritized, otherwise it will be automatically closed in 7 days (you can always reopen it later).
This is meant to be an open discussion for an eventually formalized Semgrep registry standard. This issue is not a deliverable currently.
Glossary
Pattern
Atomic unit for code search, e.g.
$X == $X
Rule
Bundles one or more patterns, their relationships (
not
,or
,inside
), and some associated metadata (message, severity, etc.)Registry
A dot-separated directory structure hierarchy of rules — this technically tacks on some metadata to the rule, as where it's stored is categorization information.
Registry Reference
e.g.
python.flask.security.no-debug
Snippet
A rule kept outside the registry hierarchy, consider it a github repo/gist distinction. Snippets can be promoted to be Registry rules.
Snippet Alias
As snippets get random IDs by default, aliases are used to name them. Aliases are mutable to allow updating the underlying rules. Aliasing is mostly invisible to users who will think of this as their snippet's name. I.e. we can allow snippet renaming on client apps, while in the background we 1) clone the snippet with edits to a new ID, 2) update the alias to point to the new ID
Pack
A collection of one or more of the following references: registry references, snippet IDs, snippet aliases, and pack IDs. Packs are stored outside a hierarchy, like snippets.
Namespacing
Users can create their own snippet aliases and packs. We use
:
as in<username>:<item-name>
to prevent item name clashes between users./
as request routing will be more robust (imagine a request likePOST semgrep.live/s/underyx/my-rule/comments
).
as the registry hierarchy already uses dots-
as github allows dashes in usernames, which is what namespaces will be based on_
was considered but it doesn't feel much like a namespace separator, feels more like a replacement for a space character in case of multi-word names (we already have some registry references using it):
for denoting namespaces, so I expect all tooling to support the:
character in the path part of the registry.\b(\w+):
onpython.acmecorp:acme-api.security.underyx:my-rule
(which is an example from the collapsed moonshot section) and you get['acmecorp', 'underyx']
so we can check if you have write access to both these namespaces when saving.URLs
This is how you find the various types in registry:
semgrep.live/p/underyx:pack-name
semgrep.live/aRsT
semgrep.live/underyx:rule-name
semgrep.live/r/python.flask.security.rule-id
semgrep.live/r/python.*.security
semgrep.live/r/python
semgrep.live/r/python.acmecorp:acme-api.security
Shorthand
For use in the CLI or easier configuration via typing on the web UI, the following shorthand is available:
p/
Packp/gosec
p/underyx:custom-pack
r/
Registry queryr/python.flask.security.rule-id
r/python.*.security
r/python
r/python.acmecorp:acme-api.security
s/
Snippets/aRsT
s/underyx:rule-name
Data Model
The data model is still changing a lot so I'm not going to document it here. This is the main reason I called this issue an "open discussion" as opposed to a spec to eventually implement.
The text was updated successfully, but these errors were encountered: