validate field names #12

mdavidsaver · 2015-09-23T19:38:21Z

Check for invalid characters in field names (eg '.') as they are created. Restrict character set to ascii alpha numeric and '_' (can extend later).

check for invalid characters in field names (eg '.'). Restrict character set to ascii alpha numeric and '_'.

dhickin · 2015-09-23T20:39:23Z

Checking for "."s in field names is reasonable as we've definitely disallowed this to allow getting subfields of structure subfields (timeStamp.nanoseconds, for example).

However, other than that I don't believe we've ever made any specification on the set of allowed characters. I think this is definitely a case where we need to leave time for discussion.

My immediate reaction is that the set of characters is too restrictive. Should "-" be included, for example?

Anyway it's not something that should go into the 4.5 release - definitely, as you say, an enhancement, not a bug fix. I suggest we hold off a merge into master until after 4.5. It will give time to discuss a specification and will make the release simpler.

anjohnson · 2015-09-23T21:09:41Z

I don't disagree with the suggestion that we hold off to allow time for discussion, but there is no technical reason to avoid merging this into master as soon as the discussion has concluded. We use release branches precisely so we can continue doing development on master at the same time as a release is being prepared.

On the subject of what characters to allow in field names, I agree with Michael's idea of starting with the basic set of alpha-numerics plus underscore, we can expand later if necessary.

On the implementation though wouldn't isascii(c) && isalnum(c) || c == '_' be simpler? In that case c should be an int though.

gregoryraymondwhite · 2015-09-24T11:50:19Z

Parenthesis? Hash? “@“? Any reason why not?

On Sep 23, 2015, at 11:09 PM, Andrew Johnson notifications@github.com wrote:

I don't disagree with the suggestion that we hold off to allow time for discussion, but there is no technical reason to avoid merging this into master as soon as the discussion has concluded. We use release branches precisely so we can continue doing development on master at the same time as a release is being prepared.

On the subject of what characters to allow in field names, I agree with Michael's idea of starting with the basic set of alpha-numerics plus underscore, we can expand later if necessary.

On the implementation though wouldn't isascii(c) && isalnum(c) || c == '_' be simpler? In that case c should be an int though.

—
Reply to this email directly or view it on GitHub.

ralphlange · 2015-09-24T11:57:11Z

Sounds like a case for a solid religious discussion at the November meeting.
I will be packing enough German umlauts.

anjohnson · 2015-09-24T15:40:10Z

In V3, record field names become C identifiers, thus they must start with a letter or underscore and may be followed by any number of underscores or ASCII alphanumeric characters. The 3.15 server-side filtering code added some specific use of other characters in channel names, which was only possible because those characters could never have been used in field names. If we allow other characters in V4 field names we might reduce the possibility of adding extra functionality at a later date.

mdavidsaver · 2015-09-24T17:13:38Z

@anjohnson I've factored out the range tests as xisalnum() for the present. I generally avoid is*() from ctypes.h for network messages to avoid potential problems with system locale, although in this case that may not be relevant. I do see a note in the ctypes.h manpage that isascii() is spec'd by posix, not C89.

I must also admit to having avoided learning about C locale handling, so I'll defer to your judgment on safety and portability.

mdavidsaver · 2015-09-24T17:29:50Z

I agree that this should not be in a 4.5 release, however I'd like to merge this now to avoid further inadvertent errors. @arkilic had mongodb names with '.' sneaking in.

mdavidsaver · 2015-09-24T17:35:07Z

... Any reason why not?

None what so ever.

I would strongly encourage the adoption of an existing spec (C identifier) or convention (epics record name).

anjohnson · 2015-09-24T17:38:03Z

@mdavidsaver You're right, isascii() is not available on VxWorks, and without that qualifier the locale would probably adversely affect isalnum().

anjohnson · 2015-11-18T18:11:15Z

F2F Meeting agrees that for now field names should be restricted to follow the C89 identifier rules.
This github issue does not propose or discuss limitations on PV names.

Enforce C identifer syntax [A-Za-z_][A-Za-z0-9_]*

field names may not begin with a digit

mdavidsaver · 2015-11-23T20:12:15Z

Updated to implement C89 identifier syntax (regex "[A-Za-z_][A-Za-z0-9_]*").

Strictly speaking this test should also check and fail for C keywords (eg. "goto" or "double") as keywords may not be used as identifiers. I'm inclined not to enforce this technically (just discourage it) as doing so would not be useful unless java and c++ keywords were also blacklisted. The union of these three sets has 104 entries.

FYI this isn't entirely academic, testFieldBuilder uses "double" as a field name.

mdavidsaver · 2015-11-23T20:15:20Z

Since there was agreement last week, unless someone wants to veto I'll merge this on Wednesday.

anjohnson · 2015-11-23T20:49:06Z

The Perl DBD-file parser in 3.15 has a list of C++ reserved keywords (which might be out of date by now) and prevents their lower-case versions from being used as record field names. This was found to be necessary with the aSub record type's field named NOT which lower-cases to a C++ keyword, so it was impossible to write an aSub record subroutine in C++. The record structure in the generated header file now uses the original (upper-case) name if the lower-case version is reserved, while the parser aborts if the original name is reserved.

I don't see a particular need to do that same check in pvData though, problems would only arise if the name were to be used as a member name in a structure that gets output to a header file.

dhickin · 2015-11-25T00:12:12Z

The request itself looks good.

However we definitely shouldn't go any further in not allowing reserved C keywords. (I hadn't realised this was intended - I assumed the references to C89 identifiers meant using this for the allowed character set).

Specifying the allowed character set is valid and useful, particular to avoid issues like we hit with "."s. There is no good reason to disallow keywords from any language and in any case pvData is independent of any language binding. At what point to we stop (do we ban Python keywords?).

The only reason I could possibly see is to avoid the situation where a programmer tries to name a variable to match a field name, but this should be handled by the programmer by choosing a suitable variable name.

There might be an argument for avoiding pvData meta language keywords. However I don't think this causes an issue and we have seen already people naturally use double as a field name.

I suggest we just say that (unless we extend at a later date) the allowed field names are "[A-Za-z_][A-Za-z0-9_]*" which is completely clear and avoid specifying in terms of C89.

mdavidsaver · 2015-12-01T01:53:16Z

With a vote of 3-0 I consider that the keywords are out leaving only the simple lexical definition ([A-Za-z_][A-Za-z0-9_]*).

validate field names

f24f565

check for invalid characters in field names (eg '.'). Restrict character set to ascii alpha numeric and '_'.

mdavidsaver added the enhancement label Sep 23, 2015

fail zero length field names

3714be4

mdavidsaver added 2 commits November 23, 2015 14:31

field names may not begin with a digit

f4a00f2

Enforce C identifer syntax [A-Za-z_][A-Za-z0-9_]*

fix testCreateRequest

6641e7f

field names may not begin with a digit

mdavidsaver self-assigned this Nov 23, 2015

mdavidsaver merged commit 6641e7f into epics-base:master Dec 1, 2015

mdavidsaver mentioned this pull request Dec 1, 2015

field name validation epics-rip/pvDataJava#4

Merged

mdavidsaver deleted the validatefieldnames branch February 26, 2016 17:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

validate field names #12

validate field names #12

mdavidsaver commented Sep 23, 2015

dhickin commented Sep 23, 2015

anjohnson commented Sep 23, 2015

gregoryraymondwhite commented Sep 24, 2015

ralphlange commented Sep 24, 2015

anjohnson commented Sep 24, 2015

mdavidsaver commented Sep 24, 2015

mdavidsaver commented Sep 24, 2015

mdavidsaver commented Sep 24, 2015

anjohnson commented Sep 24, 2015

anjohnson commented Nov 18, 2015

mdavidsaver commented Nov 23, 2015

mdavidsaver commented Nov 23, 2015

anjohnson commented Nov 23, 2015

dhickin commented Nov 25, 2015

mdavidsaver commented Dec 1, 2015

validate field names #12

validate field names #12

Conversation

mdavidsaver commented Sep 23, 2015

dhickin commented Sep 23, 2015

anjohnson commented Sep 23, 2015

gregoryraymondwhite commented Sep 24, 2015

ralphlange commented Sep 24, 2015

anjohnson commented Sep 24, 2015

mdavidsaver commented Sep 24, 2015

mdavidsaver commented Sep 24, 2015

mdavidsaver commented Sep 24, 2015

anjohnson commented Sep 24, 2015

anjohnson commented Nov 18, 2015

mdavidsaver commented Nov 23, 2015

mdavidsaver commented Nov 23, 2015

anjohnson commented Nov 23, 2015

dhickin commented Nov 25, 2015

mdavidsaver commented Dec 1, 2015