Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

types: Fixes #146, VisibleString character set #147

Merged
merged 3 commits into from
Oct 11, 2023

Conversation

Nicceboy
Copy link
Contributor

@Nicceboy Nicceboy commented Aug 21, 2023

Resolves #146

There is standard compliant character set.

However, there is weird side-effect on UPER tests, and some will fail.
Is the codec somehow using the allowed character list as part of codec?

I could not figure out the reason yet.

For reference, array was generated with following to be correct:

python -c 'print("[{}]".format(", ".join("0x{:02X}".format(i) for i in range(0x20, 0x7F))))'

@Nicceboy
Copy link
Contributor Author

Nicceboy commented Aug 21, 2023

Seems like CHARACTER_WIDTH of the list is used on encoding.
Would it be okay to just replace the expected values with new encoded values?

Edit: Also the whole set in to_index_string or index_map in String constraints. I assume that it is related to packing and might be just safe to replace with new values.

@Nicceboy
Copy link
Contributor Author

I modified the VisibleString output values but there might be other data race condition similar to this issue: ferrilab/bitvec#229

When running cargo test, one test fails

failures:

---- uper::tests::extension_additions stdout ----
thread 'uper::tests::extension_additions' panicked at 'assertion failed: `(left == right)`

Diff < left / right > :
 [
     199,
<    93,
<    57,
<    17,
<    105,
<    82,
<    178,
>    91,
>    53,
>    9,
>    89,
>    50,
>    114,
     7,
     1,
     128,
     5,
     150,
<    154,
<    19,
<    233,
<    84,
>    150,
>    11,
>    217,
>    52,
 ]

', src/uper.rs:785:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


failures:
    uper::tests::extension_additions

test result: FAILED. 76 passed; 1 failed; 2 ignored; 0 measured; 0 filtered out; finished in 2.62s

But alone it succeeds cargo test --lib uper::tests::extension_additions:

    Finished test [unoptimized + debuginfo] target(s) in 0.05s
     Running unittests src/lib.rs (target/debug/deps/rasn-8b41135ae5a97fd6)

running 1 test
test uper::tests::extension_additions ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 78 filtered out; finished in 0.13s

@XAMPPRocky
Copy link
Collaborator

Seems like CHARACTER_WIDTH of the list is used on encoding.

Both the width and the order are used in UPER.

Would it be okay to just replace the expected values with new encoded values?

Hmm, are you sure these are correct? These are adapted from tests in other ASN.1 projects, so I would expect that those values are the correct ones, and not the new values.

@Nicceboy
Copy link
Contributor Author

Hmm, are you sure these are correct? These are adapted from tests in other ASN.1 projects, so I would expect that those values are the correct ones, and not the new values.

I do not know about UPER codec so it might be very time consuming to verify the code. But I can check if they use similar character sets for VisibleString encoding.

@XAMPPRocky
Copy link
Collaborator

@Nicceboy Yeah I would look at packages like asn1-tools from the python world, they're the most extensive.

@Nicceboy
Copy link
Contributor Author

Seems like asn1rs uses similar set:

pub const VISIBLE_STRING_CHARACTERS: &'static str =
        " !\"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~";

Also asn1tools:

VISIBLE_STRING = ''.join([chr(v) for v in range(32, 127)]) # 0x20, 0x7f, end value not inclusive

@XAMPPRocky
Copy link
Collaborator

@Nicceboy There must be something different though because their test is the same as the src/uper.rs one and has the original output not the new output.

https://github.com/eerimoq/asn1tools/blob/de25657f7c79100d1ba5312dd7474ff3e0d0ad2e/tests/test_uper.py#L1175-L1177

@Nicceboy
Copy link
Contributor Author

Yeah, I tried it manually and new values are different indeed when compared to asn1tools output.

But since the character set changed, these values are also expected to change, since codec depends on it, so how can they be same with different character sets?

I am trying to see if asn1tools respects those character sets and maybe also test some other tool.

@Nicceboy
Copy link
Contributor Author

I would assume that the new character list is correct, but codecs use them differently, and the asn1tools output is correct. Also asn1.io seems to produce same output.

If we look at the same asn1tools test file, there is also test for ia5_string with identical constraints for the linked VisibleString test ( "D ::= IA5String (SIZE (19..133)) "):

It uses the same character list as the "old" one here, #L1123-L1126.

('D',
 'HejHoppHappHippAbcde',
 b'\x03\x23\x2e\xa9\x1b\xf8\x70\x91\x87\x87\x09\x1a\x78\x70\x83\x8b'
 b'\x1e\x4c\xa0'),

The output is same for smaller VisibleString character list, #L1175-L1177 :

('A',
 'HejHoppHappHippAbcde',
 b'\x03\x23\x2e\xa9\x1b\xf8\x70\x91\x87\x87\x09\x1a\x78\x70\x83\x8b'
 b'\x1e\x4c\xa0'),

So this change should not actually change the output, if we trust these outputs.
Here the output changes when the character list is different.

But finding the difference in UPER implementation might be a bit too time consuming for me right now.

@Nicceboy
Copy link
Contributor Author

I compared asn1tools and rans side by side with debugger, and I think there might a logic problem in rasn when encoding known-multiplier-strings. I also took a look for standard, but this might require confirmation @XAMPPRocky .

The main issue might be on using indexes when they should not be. rasn seems to always use just indexes, but on certain scenarios values should be used directly from the allowed character list.

This has gone unnoticed, because ia5 and VisibleString character list had full ASCII starting from zero, and the encoded index value is the same as the value itself.

Tests are also missing for PrintableString type, which would have caught this.

If you add similar test from asn1tools for PrintableString, it fails:

    #[test]
    fn printable_string() {
        round_trip_with_constraints!(
            uper,
            PrintableString,
            Constraints::new(&[Constraint::Size(Size::new(Bounded::Single(16)).into())]),
            PrintableString::from_bytes("0123456789abcdef".as_bytes()).unwrap(),
            &[0x60, 0xc5, 0x93, 0x36, 0x8d, 0x5b, 0x37, 0x70, 0xe7, 0x0e, 0x2c, 0x79, 0x32, 0xe6]
        );
    }

NumericString type is encoded correctly, because the allowed character list is so short, that is is expected to be encoded by using indexes.

You can see the difference also in asn1tools encode/decode mapping generation:

class NumericString(KnownMultiplierStringType):

    ALPHABET = bytearray(NUMERIC_STRING.encode('ascii'))
    ENCODE_MAP = {v: i for i, v in enumerate(ALPHABET)}
    DECODE_MAP = {i: v for i, v in enumerate(ALPHABET)}
    PERMITTED_ALPHABET = PermittedAlphabet(ENCODE_MAP,
                                           DECODE_MAP)


class PrintableString(KnownMultiplierStringType):

    ALPHABET = bytearray(PRINTABLE_STRING.encode('ascii'))
    ENCODE_MAP = {v: v for v in ALPHABET}
    DECODE_MAP = {v: v for v in ALPHABET}
    PERMITTED_ALPHABET = PermittedAlphabet(ENCODE_MAP,
                                           DECODE_MAP)


class IA5String(KnownMultiplierStringType):

    ALPHABET = bytearray(IA5_STRING.encode('ascii'))
    ENCODE_DECODE_MAP = {v: v for v in ALPHABET}
    PERMITTED_ALPHABET = PermittedAlphabet(ENCODE_DECODE_MAP,
                                           ENCODE_DECODE_MAP)

@Nicceboy Nicceboy changed the title draft: types: Fixes #146 draft: types: Fixes #146, VisibleString character set Aug 27, 2023
@XAMPPRocky
Copy link
Collaborator

@Nicceboy Can we make an issue for fixing that? Just so I don't forget about it.

@Nicceboy
Copy link
Contributor Author

I can try to debug it again since I know a bit more PER now. If I can't figure it out, I will create issue.

@XAMPPRocky
Copy link
Collaborator

@Nicceboy Nice, thank you again for all your contributions to rasn! ❤️

@Nicceboy Nicceboy changed the title draft: types: Fixes #146, VisibleString character set types: Fixes #146, VisibleString character set Oct 11, 2023
@Nicceboy
Copy link
Contributor Author

@XAMPPRocky I think this can be reviewed now. The issue was on not checking whether alphabet list should be indexed or not. By default, both types and constraints were always indexed, but based on the standard, it depends on the highest value in that alphabet group. There are also tests for PrintableString.

@XAMPPRocky
Copy link
Collaborator

Thank you for your PR!

@XAMPPRocky XAMPPRocky merged commit 15f8774 into librasn:main Oct 11, 2023
65 checks passed
@Nicceboy Nicceboy deleted the fix-visiblestring branch October 20, 2023 18:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Invalid allowed characters for VisibleString type
2 participants