Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Localization and policies #338

Closed
ghost opened this issue Mar 1, 2019 · 4 comments
Closed

Localization and policies #338

ghost opened this issue Mar 1, 2019 · 4 comments
Labels
2.1.0-CSD.1 Will be fixed in SARIF v2.1.0 CSD.1. e-ballot e-ballot-3 enhancement impact-breaks-consumers impact-breaks-producers merged Changes merged into provisional draft. p1 Priority 1 issue to close resolved-fixed tc-33 Issues to present at SARIF TC33

Comments

@ghost
Copy link

ghost commented Mar 1, 2019

EBALLOT PROPOSAL

Introduce a new localization mechanism that meets the following requirements:

  1. Don't define a new file type to hold localizations. Instead, define new SARIF objects that can be persisted as external property files (and therefore also inlined in the sarifLog.inlineExternalPropertyFiles property, should Provide mechanism for inlining externalized properties data into the root log #321 be approved).
  2. It should be easy to switch from one language to another (which suggests that the design should permit -- but not require -- all the localized strings for one language to be stored together).
  3. It should be possible for a log file to include a "partial" set of localized strings, containing just those strings that are referenced elsewhere in the log file.
  4. It should also be possible to ship "comprehensive" sets of strings, that is, strings for all rules and notifications defined by a tool component. Since rule ids are not required to be unique, and since the order (index) in which rules appear in a translation object cannot be enforced, this implies that each rule needs a unique id.

API IMPACT

  • In the toolComponent object:
    • Add a guid property of type string in GUID format.
  • In the reportingDescriptor object:
    • Add a guid property of type string in GUID format.
  • In the run object:
    • Add a translations property of type array of translation objects, which can be externalized.
  • Define a translation object with the following properties:
    • language of type string in ISO 639 format, e.g., "en-US".
    • toolComponentTranslations of type array of toolComponentTranslation objects.
  • Define a toolComponentTranslation object with the following properties:
    • toolComponentGuid of type string in GUID format: matches toolComponent.guid.
    • location of type artifactLocation: where the translation can be obtained.
    • semanticVersion of type string: the semantic version of the tool component for which the translation was made.
    • partialTranslation of type bool: true if this object contains a subset of the strings defined by the tool component.
    • globalMessageStrings of type object with multiformatMessageString-valued properties: its property names are a subset of the names in toolComponent.globalMessageStrings.
    • reportingDescriptors of type array of reportingDescriptorTranslation objects.
    • notificationDescriptors of type array of reportingDescriptorTranslation objects.
  • Define a reportingDescriptorTranslation object with the following properties:
    • id of type string: matches reportingDescriptor.id.
    • guid of type string in GUID format: matches reportingDescriptor.guid.
    • shortDescription of type multiformatMessageString.
    • fullDescription of type multiformatMessageString.
    • messageStrings of type object with multiformatMessageString-valued properties: its property names are a subset of the names in reportingDescriptor.messageStrings.

NOTES

SAMPLE SARIF

{
  "runs": [
    {
      "tool": {
        "driver": {
          "name": "CodeScanner",
          "guid": "<driverGuid>",
          "ruleDescriptors": [
            {
              "id": "CA2101",
              "guid": "<ruleGuid>",
              "defaultConfiguration": {
                "level": "error"
              },
              "messageStrings": {
                "default": {
                  "text": "This is bad: {0}.",
                  "markdown": "This is _bad_: {0}."
                }
              }

          ],
          "notificationDescriptors": [
          ]
        }
      },
      "results": [
        {
          "ruleId": "CA2101",
          "rulePointer": "drive/ruleDescriptors/0",
          "message": {
            "messageId": "default",
            "arguments": [ "42" ]
          }
        }
      ],
      "translations": [
        {
          "language": "en-US",
          "toolComponentTranslations": [
            {
              "toolComponentGuid": "<driverGuid>",
              "location": {
                "uri": "https://example.com/tools/CodeScanner/en-US/1.3.4/translation.sarif.external-property-file"
              },
              "semanticVersion": "1.3.4",
              "partialTranslation": true,
              "fullName": "The localized full name of my application",
              "shortDescription": {
                "text": "A tool for finding bad things."
              },
              "fullDescription": {
                "text": "The best tool for finding bad things. Get your copy today!"
              },
              "globalMessageStrings": {
                "call": {
                  "text": "Function {0} was called.",
                  "markdown": "Function `{0}` was called."
                }
              },
              "ruleDescriptors": [
                {
                  "id": "CA2101",
                  "guid": "<ruleGuid>",
                  "shortDescription": {
                    "text": "This is what happens when you do a bad thing.",
                    "markdown": "This is what happens when you do a _bad_ thing.."
                  },
                  "messageStrings": {
                    "default": {
                      "text": "This is bad: {0}.",
                      "markdown": "This is _bad_: {0}."
                    }
                  }
                }
              ],
              "notificationDescriptor": [
              ]
            }
          ]
        }
      ]
    }
  ]
}
@kupsch
Copy link

kupsch commented Mar 7, 2019

semanticVersion would be better as localizationVersion

Allow the following optional properties:

  • releaseDate - date of the release
  • translatorUri - URI for the translation (mirror tool property as they may no be the same)
  • translatorOrganization - organization producing the translation
  • translationDownloadUri - download location for the translation if different from the tool

@ghost
Copy link
Author

ghost commented Mar 7, 2019

Microsoft post-ballot recommendation

  • Additional useful property: For consistency with toolComponent.downloadUri, should toolComponentTranslation should be have a string-valued downloadUri from which the translation can be obtained.

  • Missing property: We really do need a language property, because it affects the message string lookup order.

    Suppose a tool declares its language as en-US. Then if a viewer wants en-US strings, it should first check to see if the string is inlined in (for example) the result message, and only if not should it look for en-US in run.translations. Whereas if the viewer wants fr-FR strings, it should not look for an inlined string; instead it should immediately look up fr-FR in run.translations.

    I also think language is more a property of run than of tool. It means “the language of the messages emitted into the log file during this run.”

@michaelcfanning michaelcfanning added the tc-33 Issues to present at SARIF TC33 label Mar 8, 2019
harleenkohli added a commit to microsoft/sarif-sdk that referenced this issue Mar 8, 2019
* first cut of the schema changes

* undo typo

* fixing spacing issue in toolcomponent

* space alignment

* new localization mechanism

* updating md file

* post merger manual fixes

* cleanup + typo fix

* rc++++
@ghost ghost added the e-ballot-3 label Mar 19, 2019
@ghost
Copy link
Author

ghost commented Mar 26, 2019

E-BALLOT #3 PROPOSAL

We introduce a new localization mechanism that allows a log file to include translations into multiple languages. We overload the toolComponent object to allow it to represent translations as well as its existing uses in representing the driver, extensions such as plugins, and taxonomies.

Rather than having all these different kinds of tool components exist in a single, flat list in tool.extensions, we separate them out into their own properties. And since some of these components, such as standard taxonomies, are not really properties of the tool at all, we place them on the run object rather than on the tool object, hence: run.taxonomies and run.translations, both of type toolComponent[].

We take the opportunity to introduce one more type of tool component, the "post-processing configuration override". The scenario here is that you've run a tool and produced a log file, relying on the tool's default set of rule severities (possibly modified with command line switches). But now you need to view the results through the lens of compliance with some corporate policy, which might (for example) require you to treat as errors certain conditions that the tool classifies as warnings. To accomplish this, a post-processing tool can introduce into the log file a toolComponent whose purpose is to redefine the severities of certain rules. So we introduce run.configurationOverrides of type toolComponent[]. It is an array to allow a SARIF viewer to view a log file through the lenses of multiple policies.

Finally, we simplify some property names, for example, toolComponent.ruleDescriptors => rules.

SCHEMA CHANGES

  • In the tool object:

    • Remove the language property (move to run).
  • In the run object:

    • Add a property language of type string, optional, default: "en-US": the language of the localizable strings emitted in this run.
    • Add a property translations of type toolComponent[], optional, minItems: 0, default: [], externalizable: contains translations for other components.
    • Add a property taxonomies of type toolComponent[], optional, minItems: 0, default: []externalizable: contains standard taxonomies such as CWE; the driver and its extensions can also define their own custom taxonomies.
    • Add a property policies of type toolComponent[], optional, minItems: 0, default: []externalizable: contains configurations that override both reportingDescriptor.defaultConfiguration (the tool's default severities) and invocation.configurationOverrides (severities established at run-time from the command line).
  • In the toolComponent object:

    • Rename the property ruleDescriptors to rules.
    • Rename the property taxonDescriptors to taxa.
    • Rename the property notificationDescriptors to notifications.
    • Add a property language of type string, required for translations; otherwise optional: the language of the localized strings defined in this component.
    • Add a property contents of type string[] with enumerated values "localizedData" and "nonLocalizedData", unique, default: [ "localizedData", "nonLocalizedData" ]: the kinds of data contained in this object.
    • Add a property isComprehensive of type boolean: true if this object contains a complete definition of the localizable and/or non-localizable data for this component.
    • Add a property localizedDataSemanticVersion, optional, defaults to the semanticVersion property of the component: the semantic version of the localized strings defined in this component; used by components that define translations.
    • Add a property minimumRequiredLocalizedDataSemanticVersion: optional, defaults to the semanticVersion property of the component: the minimum value of localizedDataSemanticVersion required in translations consumed by this component; used by components that consume translations.
    • Add a property associatedComponent of type toolComponentReference, optional: specifies the component for which the current component is a translation or a plugin.
    • Add a property translationMetadata of type translationMetadata, required for a translation, forbidden for other component types.
  • Define a translationMetadata object with the following properties:

    • name of type string, required.
    • fullName of type string, optional.
    • shortDescription of type multiformatString, optional.
    • fullDescription of type multiformatString, optional.
    • downloadUri of type string in uri format, optional.
    • informationUri of type string in uri format, optional
  • In the result object:

    • Rename the property ruleDescriptorReference to rule
    • Rename the property taxonomyReferences to taxa.
  • In the notification object:

    • Rename the property associatedRuleDescriptorReference to associatedRule.
    • Rename the property notificationDescriptorReference to descriptor.
  • In the artifact object:

    • In the roles property:
      • Add the values driver, extension, translation, taxonomy, policy, and referencedOnCommandLine.
      • Remove the value toolComponent.
  • In the reportingDescriptor object:

    • Rename the property taxonReferences to taxa.
    • Rename the property optionalTaxonReferences to optionalTaxa.
  • In the reportingDescriptorReference object:

    • Rename the property toolComponentReference to toolComponent.

@ghost ghost changed the title Introduce new localization mechanism Localization and post-processed configuration Mar 26, 2019
@michaelcfanning michaelcfanning changed the title Localization and post-processed configuration Localization and policies Mar 27, 2019
ghost pushed a commit that referenced this issue Mar 27, 2019
ghost pushed a commit that referenced this issue Mar 28, 2019
ghost pushed a commit that referenced this issue Mar 28, 2019
ghost pushed a commit that referenced this issue Mar 28, 2019
ghost pushed a commit that referenced this issue Mar 28, 2019
@ghost ghost added change-draft-available merged Changes merged into provisional draft. labels Mar 28, 2019
@ghost ghost self-assigned this Mar 28, 2019
@ghost ghost removed the change-draft-available label Apr 6, 2019
@ghost ghost added resolved-fixed and removed schema-todo labels Apr 6, 2019
@ghost
Copy link
Author

ghost commented Apr 6, 2019

Approved in e-ballot-3.

@ghost ghost closed this as completed Apr 6, 2019
@ghost ghost mentioned this issue Apr 6, 2019
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.1.0-CSD.1 Will be fixed in SARIF v2.1.0 CSD.1. e-ballot e-ballot-3 enhancement impact-breaks-consumers impact-breaks-producers merged Changes merged into provisional draft. p1 Priority 1 issue to close resolved-fixed tc-33 Issues to present at SARIF TC33
Projects
None yet
Development

No branches or pull requests

2 participants