Skip to content

1750 Overhaul of EXPath binary spec#1753

Merged
ndw merged 5 commits intoqt4cg:masterfrom
michaelhkay:1750-expath-binary-copy-edits
Feb 4, 2025
Merged

1750 Overhaul of EXPath binary spec#1753
ndw merged 5 commits intoqt4cg:masterfrom
michaelhkay:1750-expath-binary-copy-edits

Conversation

@michaelhkay
Copy link
Contributor

@michaelhkay michaelhkay commented Feb 2, 2025

Apart from general copy-editing, the main changes are:

  • A lot more examples, presented in executable markup format (though they are not yet tested)
  • Many functions now have formal equivalents (again, currently untested)
  • Allow underscores and spaces in input to bin:hex, bin:octal, and bin:bin
  • Use type xs:unsignedByte for octet arguments
  • Use an enum() type for the octet-order argument

Fix #1750

@michaelhkay michaelhkay added Editorial Minor typos, wording clarifications, example fixes, etc. Enhancement A change or improvement to an existing feature Tests Needed Tests need to be written or merged labels Feb 2, 2025
@johnlumley
Copy link
Contributor

I very much like the addition of more precise definitions in the early sections, and the expansion of some of the function rules and notes and additional use of other new 4.0 features such as enum.

Other remarks/issues:

  • The last two ASN encode-ASN-integer() examples are incorrectly shown as decode-ASN-integer() - I did a copy and paste and missed changing the last two.

  • I'm not sure I agree with the remarks on bin:octal('177'), as (recalling my PDP days), 177 octal only has seven, not nine, significant bits, and thus should be representable within a single octet. Perhaps this is a consequence of the rules given which conservatively expand to multiples of 3 bits. I think these rules should be revised (or reverted to something like the original) to permit any valid octal number to be generated.

  • bin:pad-left() needs a change note about $octet as xs:unsigned-byte?

  • bin:pad-right() needs $octet as xs:unsigned-byte? and a change note

  • The bin:pack-*() functions still have a bin:unknown-significance-order error described

  • G2 - Compatibility change to xs:unsignedByte has the wrong type error code which should be err:XPTY0004

@michaelhkay
Copy link
Contributor Author

michaelhkay commented Feb 3, 2025

Re bin:octal - the revised spec reflects the current Saxon implementation, I believe. The 1.0 spec says

$in will be effectively zero-padded from the left to generate an integral number of octets.

A strict reading of that says that '177' should be expanded to '00000177' which gives you 24 bits or 3 octets, but that's clearly not what it means to say.

If we strip leading zero bits before padding to an integral number of octets, that would make bin:octal('0') give a zero-length binary value, which doesn't feel right either. And what should '077' give you?

Perhaps the rule we're looking for is: first create a sequence of n*3 bits where n is the string length. Remove a maximum of two leading zero bits. Then add as many leading zero bits as are needed to make the bit length a multiple of 8.

That's mightly complicated, but perhaps the result is more intuitive.

Does anyone actually use octal?

@johnlumley
Copy link
Contributor

Does anyone actually use octal?

I do, but it's only in a PDP-11 emulator!

@kosek
Copy link

kosek commented Feb 3, 2025

Thanks, I've read the document and found the following issues:

  • Mike, you should add yourself as an editor
  • in abstract "generate date" -> "generate data"
  • in introduction there is excessive ">" at the end of the last list item "Functions to decode or encode strings from within or into binary data>."
  • section 3, the first examples contain strange function prefixes like fn:fn and fn:file. I think the just file: should be used there
  • bin:insert-before -- type of $extra parameter should be union with xs:hexBinary not just xs:base64Binary
  • in references link to File Module is broken

@benibela
Copy link

benibela commented Feb 4, 2025

$in will be effectively zero-padded from the left to generate an integral number of octets.

A strict reading of that says that '177' should be expanded to '00000177' which gives you 24 bits or 3 octets, but that's clearly not what it means to say.

integral number of octets clearly means bytes

'177' becomes 0x7F. '77' becomes 0x3f, and '7' becomes 0x07.

If the length of $in is not divisible by three, it prepends either '0' or '00' to make it divisible by three

@michaelhkay
Copy link
Contributor Author

If the length of $in is not divisible by three

$in is a character string in which each character represents three bits. I'm sure you don't mean the length of the character string. I guess you mean the length of the bit-string obtained by converting to binary and removing leading zeros? But then you have to decide what to do when $in is the string '0', or '00', etc. Please review my proposal and see if it makes sense to you.

@ndw
Copy link
Contributor

ndw commented Feb 4, 2025

The CG agreed to merge this PR at meeting 108.

@ndw ndw merged commit 83cb7ab into qt4cg:master Feb 4, 2025
3 checks passed
@benibela
Copy link

benibela commented Feb 5, 2025

If the length of $in is not divisible by three

$in is a character string in which each character represents three bits. I'm sure you don't mean the length of the character string. I guess you mean the length of the bit-string obtained by converting to binary and removing leading zeros?

I meant the input string. I was sure the function would only accept complete octets like bin:from-octets, 3 characters for 8 bits joined together. Then 177 is 127, 377 is 255, and 777 raises an error for being too large. 1377 (i.e. 001_377) would have been 511

@michaelhkay
Copy link
Contributor Author

The 1.0 specification of bin:octal is pretty vague, but interpreting it as groups of 3 octal digits each mapping to one octet in the result wouldn't give the right answer for the only example in the 1.0 spec. I've added a note in PR1765 that that is NOT the way it works.

ChristianGruen added a commit to qt4cg/qt4tests that referenced this pull request Oct 28, 2025
ChristianGruen added a commit to qt4cg/qt4tests that referenced this pull request Nov 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Editorial Minor typos, wording clarifications, example fixes, etc. Enhancement A change or improvement to an existing feature Tests Needed Tests need to be written or merged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

EXPath Binary : copy-edits and minor enhancements

5 participants