Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should raise error when STIX object's identifier doesn't follow the form object-type--UUIDv4 #188

Closed
win911 opened this issue Jun 11, 2018 · 9 comments
Assignees
Labels
Milestone

Comments

@win911
Copy link

win911 commented Jun 11, 2018

According to 2.5​ Identifier, STIX object's identifier MUST follow the form object-type--UUIDv4.

The regular expression about UUIDv4 should be [0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}.

The following example should raise error but it didn't.

import stix2

bundle_str = """{
    "type": "bundle",
    "id": "bundle--00000000-0000-0000-0000-000000000007",
    "spec_version": "2.0",
    "objects": [
        {
            "type": "indicator",
            "id": "indicator--00000000-0000-0000-0000-000000000001",
            "created": "2017-01-01T12:34:56.000Z",
            "modified": "2017-01-01T12:34:56.000Z",
            "pattern": "[file:hashes.MD5 = 'd41d8cd98f00b204e9800998ecf8427e']",
            "valid_from": "2017-01-01T12:34:56Z",
            "labels": [
                "malicious-activity"
            ]
        }
    ]
}"""

bundle = stix2.parse(bundle_str, version="2.0")
@clenk
Copy link
Contributor

clenk commented Jun 11, 2018

Thank you for bringing this to our attention, @win911. I see the problem. We were using uuid.UUID() (here) to check if it is a valid UUID, but we weren't using the version parameter. We'll fix this soon.

@gtback
Copy link
Contributor

gtback commented Jun 12, 2018

A lot of our tests use the not-actually-a-valid-v4-UUID strings to make them more predicatable and readable. Any solution should (for instance) allow IDs that match 00000000-0000-0000-0000-0000[0-9a-fA-F]{8}, or at least have a mode for doing so (a global ALLOW_NON_UUIDV4_IDS that can be set at the beginning of a test run.)

@gtback gtback added the bug label Jun 12, 2018
@gtback gtback added this to the 1.0.2 milestone Jun 12, 2018
@clenk
Copy link
Contributor

clenk commented Jun 12, 2018

I believe our tests could use predictable and valid IDs by merely setting two additional digits (the 4 and the 8):
00000000-0000-4000-8000-0000[0-9a-fA-F]{8}

@packet-rat
Copy link

There is no basis for requiring UUIDv4 in the specification. Any well formed UUID should be acceptable from a functional perspective. The functional requirement is the uniqueness of the UUID. There are some of us who are using UUIDv5 -- Why can't uuid.UUID() suffice? Besides the requirement to change the language to remove the arbitrary v4 constraint?

@clenk
Copy link
Contributor

clenk commented Jun 12, 2018

This library aims to implement the spec as written, which requires UUID version 4. If the specification drops this requirement we can change it here.

@gtback gtback self-assigned this Jun 18, 2018
@gtback gtback modified the milestones: 1.0.2, 1.0.3 Jun 18, 2018
@gtback
Copy link
Contributor

gtback commented Jun 21, 2018

I started to fix this and ran into an interesting question. Our general approach is to "coerce" invalid values into correct values. That means that you can do something like this:

>>> print(stix2.Campaign(name="test", created="20180601", modified="20180602"))
{
    "type": "campaign",
    "id": "campaign--8ba0438a-0e56-4125-ae9e-bd816a0337e8",
    "created": "2018-06-01T00:00:00.000Z",
    "modified": "2018-06-02T00:00:00.000Z",
    "name": "test"
} 

In fact, you can even parse "invalid" dates in this way.

bundle = """{
    "type": "campaign",
    "id": "campaign--8ba0438a-0e56-4125-ae9e-bd816a0337e8",
    "created": "20180601",
    "modified": "20180602",
    "name": "test"
} """
print(stix2.parse(bundle))

and this should return the same thing as the first snippet (with timestamps in the correct ISO format). More generally, it's not necessarily the case that print(stix2.parse(bundle)) will return JSON that is equal to bundle, but it SHOULD be valid STIX 2 JSON, even when the input isn't. python-stix2 is not a validator.

We are violating this in the example that @win911 provided, though. However, if we use uuid.UUID(..., version=4), this will silently rewrite IDs to have valid bits in the right places for a v4 ID. This is arguably much more difficult to notice than rewriting date formats, and also has the potential to silently change IDs, which IMO would be really bad.

I started to write code that looks like this, which also prevents you from using IDs like campaign--8ba0438a0e564125ae9ebd816a0337e8 (no hyphens between the hex digits):

orig_id = value.split('--', 1)[1]
new_id = str(uuid.UUID(orig_id, version=4))
# uuid.UUID will accept several representations. We only want
# it to be 32 hex digits in the standard 8-4-4-4-12 pattern.
if orig_id != new_id:
    raise ValueError()

But then came the question of what "clean" should do. Theoretically, we could rewrite campaign--8ba0438a0e564125ae9ebd816a0337e8 to campaign--8ba0438a-0e56-4125-ae9e-bd816a0337e8, but do we want to? Or should the ID property be a "pass unmodified or reject entirely"?

@chisholm
Copy link
Contributor

Seems like in this case, "version conversion" changes the uuid in an unacceptable way. So I'd say, create the UUID object in a way that does not modify it. To be spec-compliant, if its variant is not uuid.RFC_4122 or version is not 4, produce an error. Otherwise, treat as compliant and return its stringification. Does that make sense?

@clenk
Copy link
Contributor

clenk commented Jun 21, 2018

There's potential issues with both options for clean(), but I think I like raising an error better than rewriting it. Coercing was easier for timestamps because there's already a pretty robust library for parsing them. But for IDs we'd have to guess how to parse whatever values the user might pass in to a UUID.

@gtback
Copy link
Contributor

gtback commented Jun 22, 2018

@chisholm . I agree. My concern is that:

The version argument is optional; if given, the resulting UUID will have its variant and version number set according to RFC 4122, overriding bits in the given hex, bytes, bytes_le, fields, or int.
source.

Obviously, it makes more sense to construct it without passing in the version (which could potentially modify it), and then checking the version and variant attributes. The alternative I had been thinking about was to compare str(UUID(orig)) to str(UUID(orig, version=4)), and error if they weren't the same, but that seems a bit convoluted.

I think I'm ok with rewriting campaign--8ba0438a0e564125ae9ebd816a0337e8 to campaign--8ba0438a-0e56-4125-ae9e-bd816a0337e8, but not with rewriting campaign--00000000-0000-0000-0000-000000000000 to campaign--00000000-0000-4000-8000-000000000000. Just using uuid.UUID() should work for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants