Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for diagrams in SARIF? #588

Open
davidmalcolm opened this issue May 31, 2023 · 1 comment
Open

Support for diagrams in SARIF? #588

davidmalcolm opened this issue May 31, 2023 · 1 comment

Comments

@davidmalcolm
Copy link

For GCC 14 I'm experimenting with the idea of diagnostics being able to have accompanying diagrams; see e.g. https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620277.html

For example, given:

#include <string.h>

void
test_non_ascii ()
{
  char buf[5];
  strcpy (buf, "文字化け");
}

my patched version of GCC's -fanalyzer is able to emit this "unicode art" diagram:

demo-2.c: In function ‘test_non_ascii’:
demo-2.c:7:3: warning: stack-based buffer overflow [CWE-121] [-Wanalyzer-out-of-bounds]
    7 |   strcpy (buf, "文字化け");
      |   ^~~~~~~~~~~~~~~~~~~~~~~~
  ‘test_non_ascii’: events 1-2
    |
    |    6 |   char buf[5];
    |      |        ^~~
    |      |        |
    |      |        (1) capacity: 5 bytes
    |    7 |   strcpy (buf, "文字化け");
    |      |   ~~~~~~~~~~~~~~~~~~~~~~~~
    |      |   |
    |      |   (2) out-of-bounds write from byte 5 till byte 12 but ‘buf’ ends at byte 5
    |
demo-2.c:7:3: note: write of 8 bytes to beyond the end of ‘buf’
    7 |   strcpy (buf, "文字化け");
      |   ^~~~~~~~~~~~~~~~~~~~~~~~
demo-2.c:7:3: note: valid subscripts for ‘buf’ are ‘[0]’ to ‘[4]’

  ┌─────┬─────┬─────┬────┬────┐┌────┬────┬────┬────┬────┬────┬────┬──────┐
  │ [0] │ [1] │ [2] │[3] │[4] ││[5] │[6] │[7] │[8] │[9] │[10]│[11]│ [12] │
  ├─────┼─────┼─────┼────┼────┤├────┼────┼────┼────┼────┼────┼────┼──────┤
  │0xe6 │0x96 │0x87 │0xe5│0xad││0x97│0xe5│0x8c│0x96│0xe3│0x81│0x91│ 0x00 │
  ├─────┴─────┴─────┼────┴────┴┴────┼────┴────┴────┼────┴────┴────┼──────┤
  │     U+6587      │    U+5b57     │    U+5316    │    U+3051    │U+0000│
  ├─────────────────┼───────────────┼──────────────┼──────────────┼──────┤
  │       文        │      字       │      化      │      け      │ NUL  │
  ├─────────────────┴───────────────┴──────────────┴──────────────┴──────┤
  │                  string literal (type: ‘char[13]’)                   │
  └──────────────────────────────────────────────────────────────────────┘
     │     │     │    │    │     │    │    │    │    │    │    │     │
     │     │     │    │    │     │    │    │    │    │    │    │     │
     v     v     v    v    v     v    v    v    v    v    v    v     v
  ┌─────┬────────────────┬────┐┌─────────────────────────────────────────┐
  │ [0] │      ...       │[4] ││                                         │
  ├─────┴────────────────┴────┤│            after valid range            │
  │  ‘buf’ (type: ‘char[5]’)  ││                                         │
  └───────────────────────────┘└─────────────────────────────────────────┘
  ├─────────────┬─────────────┤├────────────────────┬────────────────────┤
                │                                   │
       ╭────────┴────────╮              ╭───────────┴──────────╮
       │capacity: 5 bytes│              │⚠️  overflow of 8 bytes│
       ╰─────────────────╯              ╰──────────────────────╯

showing that the overflow occurs partway through the UTF-8 encoding of
the U+5b57 code point (with colorization on terminal output to distinguish the valid vs invalid parts of the write).

Right now I'm supporting the diagram in the SARIF output by emitting a location in "relatedLocations", with the diagram as a code block in Markdown within a "markdown" property of a message.

Has there been any discussion of supporting e.g. SVG or other formats for diagrams in SARIF? (or would this need to be generated as some kind of external file?)

@michaelcfanning
Copy link
Contributor

Hey, David. I think we're finally firing up the SARIF 2.2 engines here. Hope to close on all open items in the next couple of weeks and move onto to authoring relevant content for a incremental update.

So there is a property artifactContent.rendered intended to help with this scenario. Basically, every region.snippet property can include the literal snippet of objectionable data or region.snippet.rendered can be populated with some alternate representation, such as your unicode art.

The original SARIF design driver here was to provide alternate visual representations of binary level data flagged by low-level security checkers.

Let me know what you think. A very logical, general purpose consumption of snippet data would be to prefer consuming its 'rendered' data if available, otherwise to fall back to rendering text or binary, as necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants