Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarification on constant char encodings #40

Closed
billsegall opened this issue Dec 1, 2016 · 20 comments
Closed

Clarification on constant char encodings #40

billsegall opened this issue Dec 1, 2016 · 20 comments

Comments

@billsegall
Copy link

billsegall commented Dec 1, 2016

This relates to version 1.0 - draft standard - June 16 2016 with the relevant section being 2.7.1.2 and 2.7.2.

My issue is that I'm unable to read a clear interpretation of how I should represent constant char encodings of length 1. e.g.:

<type name="CharConstant" primitiveType="char" presence="constant">P

It's clear that a char encoding without a presence="constant" is a single character and that if presence="constant" with a constant value of length > 1, then we are to use an array.

If the constant value is of length 1, then it would be possible to represent that as an array of length 1 or as a single char.

We might compare this to a string of length 1:

<type name="ascii1" primitiveType="char" semanticType="String" length="1" CharacterEncoding="ASCII"/>

In sbe-tool's Java generator, the character constant of length 1 is represented as a byte array of length 1, whereas the String of length 1 is represented as a single character. I'm happy with this interpretation but I'd like to see the spec be explicit.

It might be nice to be able to have a constant encoding of a character (rather than a String) but I can see you could accomplish that with an enum if it were not possible. Was this the intent?

@adkapur
Copy link

adkapur commented May 25, 2017

Is there any update on this?

Constant char are treated as String array in the real logic generated code. Below is generated code.

public byte crossType(final int index)
{
return CROSSTYPE_VALUE[index];
}

This should have been, no index as it is of length 1.

public byte crossType()
{
return 51;
}

<type description="Cross Type" primitiveType="char" name="CrossType" semanticType="char" length="1" presence="constant">3</type>

@donmendelson
Copy link
Member

The spec has this example of constant character array, so length is not limited to 1:

<type name="EurexMarketID" primitiveType="char" length="4" description="MIC code"
 presence="constant">XEUR</type>

Are you proposing that in the special case length is 1 that the storage type is char rather than char array?

@billsegall
Copy link
Author

billsegall commented May 26, 2017

In the special case of a string of length 1 the real-logic Java generator uses a char rather than a char array.

It might make sense for a string and char to be consistent but I think what's most important is that the spec tells us which is correct so there cannot be a difference between implementations causing incompatibility.

@mjpt777
Copy link

mjpt777 commented May 28, 2017

An option could be that for single char constants then accessors are generated for both the char type and for a string of length 1. For greater than 1 in length then just the string accessors are generated.

@adkapur
Copy link

adkapur commented May 31, 2017

Are you proposing that in the special case length is 1 that the storage type is char rather than char array?

Yes exactly

An option could be that for single char constants then accessors are generated for both the char type and for a string of length 1. For greater than 1 in length then just the string accessors are generated.

Yes correct amen to that

@donmendelson
Copy link
Member

The SBE spec describes an interoperable wire format, not how implementations access fields in memory. That will be necessarily different for different programming languages, and may also vary according to the judgement of implementers. We can make suggestions about accessors, but that would be non-normative.

Getting back to the original issue entered by @billsegall , I think that the needed clarification is that if length=1, then the wire format is for type character (FIX datatype char), described in the spec as "A single US-ASCII character", as opposed to a fixed-length character array, described as "Array of char of specified length, delimited by NUL character if a string is shorter than the length specified for a field". The difference between the two is slight but significant. The length of character is exactly 1 while the length of a fixed-length character array is no more than the length attribute.

@billsegall
Copy link
Author

That clarification would be great!

@mjpt777
Copy link

mjpt777 commented Jun 1, 2017

The challenge with single char vs array is what should be the naming convention for the property? That results with an overload that has to either be consistent for the single char or the array as a string. We choose an array even when length of one because that allows for the constant to later be converted to a encoded value or have a multi character value. The other way round does not work.

@donmendelson
Copy link
Member

I suggest that someone make a proposal for a clarification or convention as a pull request.

@adkapur
Copy link

adkapur commented Jun 1, 2017

We use constants to ensure that FIX semantics continue to be represented, for example a template representing execution report new from exchange to customer will have order status (tag 39) set to constant value of 0

The other use is validation i.e. if exchange accepts only limit orders from customers then order type (tag 40) will be set as constant to 2 in NewOrderSingle

@mjpt777
Copy link

mjpt777 commented Jun 1, 2017

@adkapur The examples you give are integer values. Why do you need the char type of this?

@mjpt777
Copy link

mjpt777 commented Jun 2, 2017

I do not believe the specification can be silent on this issue. If the XML schema will allow a string of more than one character to be a constant then it must be detailed. Arrays of other types are not supported because there is no defined separator character for constants. We discussed this years ago and it was agreed that strings are allowed in constants, and not just single characters.

@mjpt777
Copy link

mjpt777 commented Jun 2, 2017

I've implemented an approach for the Real Logic Java version that operates as follows:

  • If a provided constant value is single character and no length is provided then generate a property accessor which returns a byte, plus generate the methods for accessing a byte array but it is an array of 1.
  • If the provided constant is greater than a single character then generate a property accessor that returns a String, plus generate the methods for accessing a byte array that is the length of the constant value if no length is provided.
  • If a length is provided that is larger than the constant value then a byte array is created for the length specified with the unused elements containing the ASCII byte value for 0, i.e. \0.
  • If a length is provided that is smaller than the constant then an error is raised.

I would recommend a length value is provided to future proof for if the constant is turned into required or optional field and reduce ambiguity.

@adkapur
Copy link

adkapur commented Jun 6, 2017

The examples you give are integer values. Why do you need the char type of this?

Yes correct these are integer but were just meant to illustrate the point. We also have market protect orders designated as OrderType=K and things like restatement messages with ExecType=D and so on and so forth

@adkapur
Copy link

adkapur commented Jun 6, 2017

I've implemented an approach for the Real Logic Java version that operates as follows:
•If a provided constant value is single character and no length is provided then generate a property accessor which returns a byte, plus generate the methods for accessing a byte array but it is an array of 1.
•If the provided constant is greater than a single character then generate a property accessor that returns a String, plus generate the methods for accessing a byte array that is the length of the constant value if no length is provided.
•If a length is provided that is larger than the constant value then a byte array is created for the length specified with the unused elements containing the ASCII byte value for 0, i.e. \0.
•If a length is provided that is smaller than the constant then an error is raised.

This is awesome thanks, for the third condition when length is set to larger than constant value then is a string returned in addition to a byte array with the ASCII byte value for 0, i.e. \0 for the unused portion?

I would recommend a length value is provided to future proof for if the constant is turned into required or optional field and reduce ambiguity

Just wanted to clarify if having a constant field be optional is an oxymoron or not, is this example correct or should constant fields always have presence=constant? Is it fair to surmise that if the constant field is not present then it has the default null value (0) and if it is present then it always has the designated constant value (A)?

<type description="Char with a null value" sinceVersion="1" primitiveType="char" name="charNULL" semanticType="char" presence="optional" nullValue="0">A</type>

adkapur added a commit to adkapur/fix-simple-binary-encoding that referenced this issue Jun 6, 2017
Added clarification for issue FIXTradingCommunity#40 with regards to constant char encodings
adkapur added a commit that referenced this issue Jun 6, 2017
Updated section for constant character encoding to clarify issue #40
@adkapur adkapur mentioned this issue Jun 6, 2017
@adkapur
Copy link

adkapur commented Jun 6, 2017

I suggest that someone make a proposal for a clarification or convention as a pull request.

https://github.com/FIXTradingCommunity/fix-simple-binary-encoding/pull/52

@mjpt777
Copy link

mjpt777 commented Jun 6, 2017

It does not make sense to have something optional and constant. presence must be one of required, optional, or constant according to the spec.

@aSapozhnikov2
Copy link

just tried this with the latest snapshot and I don't think this is working with constant presence, length of 1 and char primitive type.

I have a type defined with the following attributes:
(presence="constant" length="1" primitiveType="char" semanticType="char">C)

However this creates the java code in the accessor:
"return (byte)C;"

because there's no single quotes around the C, this causes compilation failures , removing the length="1" part does fix it to be "return (byte)57;"

@mjpt777
Copy link

mjpt777 commented Aug 29, 2017

@aSapozhnikov2 Can you raise an issue on the Real Logic repo and provide a test case?

@adkapur adkapur mentioned this issue Sep 5, 2017
@donmendelson
Copy link
Member

This issue has been mooted by the acceptance of issue #31. The presence attribute and possible constant value are now only on the field element, not on the type element. Therefore, length will always be specified by the type and cannot be inferred from the length of constant value. As I said before, SBE specifies a wire format, not how implementations access data in memory. However, there is now a clear distinction between a char type with length=1, which may be implemented as access to a single character, versus a type with a greater length, which may be implemented as access to a character array or string class in the programming language of choice.

donmendelson added a commit to donmendelson/fix-simple-binary-encoding that referenced this issue Feb 16, 2018
donmendelson added a commit that referenced this issue May 9, 2018
Clarification on constant char encodings #40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants