Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

validate field names #12

Merged
merged 4 commits into from Dec 1, 2015

Conversation

mdavidsaver
Copy link
Member

Check for invalid characters in field names (eg '.') as they are created. Restrict character set to ascii alpha numeric and '_' (can extend later).

check for invalid characters in field names (eg '.').

Restrict character set to ascii alpha numeric and '_'.
@dhickin
Copy link
Contributor

dhickin commented Sep 23, 2015

Checking for "."s in field names is reasonable as we've definitely disallowed this to allow getting subfields of structure subfields (timeStamp.nanoseconds, for example).

However, other than that I don't believe we've ever made any specification on the set of allowed characters. I think this is definitely a case where we need to leave time for discussion.

My immediate reaction is that the set of characters is too restrictive. Should "-" be included, for example?

Anyway it's not something that should go into the 4.5 release - definitely, as you say, an enhancement, not a bug fix. I suggest we hold off a merge into master until after 4.5. It will give time to discuss a specification and will make the release simpler.

@anjohnson
Copy link
Member

I don't disagree with the suggestion that we hold off to allow time for discussion, but there is no technical reason to avoid merging this into master as soon as the discussion has concluded. We use release branches precisely so we can continue doing development on master at the same time as a release is being prepared.

On the subject of what characters to allow in field names, I agree with Michael's idea of starting with the basic set of alpha-numerics plus underscore, we can expand later if necessary.

On the implementation though wouldn't isascii(c) && isalnum(c) || c == '_' be simpler? In that case c should be an int though.

@gregoryraymondwhite
Copy link

Parenthesis? Hash? “@“? Any reason why not?

On Sep 23, 2015, at 11:09 PM, Andrew Johnson notifications@github.com wrote:

I don't disagree with the suggestion that we hold off to allow time for discussion, but there is no technical reason to avoid merging this into master as soon as the discussion has concluded. We use release branches precisely so we can continue doing development on master at the same time as a release is being prepared.

On the subject of what characters to allow in field names, I agree with Michael's idea of starting with the basic set of alpha-numerics plus underscore, we can expand later if necessary.

On the implementation though wouldn't isascii(c) && isalnum(c) || c == '_' be simpler? In that case c should be an int though.


Reply to this email directly or view it on GitHub.

@ralphlange
Copy link
Contributor

Sounds like a case for a solid religious discussion at the November meeting.
I will be packing enough German umlauts.

@anjohnson
Copy link
Member

In V3, record field names become C identifiers, thus they must start with a letter or underscore and may be followed by any number of underscores or ASCII alphanumeric characters. The 3.15 server-side filtering code added some specific use of other characters in channel names, which was only possible because those characters could never have been used in field names. If we allow other characters in V4 field names we might reduce the possibility of adding extra functionality at a later date.

@mdavidsaver
Copy link
Member Author

@anjohnson I've factored out the range tests as xisalnum() for the present. I generally avoid is*() from ctypes.h for network messages to avoid potential problems with system locale, although in this case that may not be relevant. I do see a note in the ctypes.h manpage that isascii() is spec'd by posix, not C89.

I must also admit to having avoided learning about C locale handling, so I'll defer to your judgment on safety and portability.

@mdavidsaver
Copy link
Member Author

I agree that this should not be in a 4.5 release, however I'd like to merge this now to avoid further inadvertent errors. @arkilic had mongodb names with '.' sneaking in.

@mdavidsaver
Copy link
Member Author

... Any reason why not?

None what so ever.

I would strongly encourage the adoption of an existing spec (C identifier) or convention (epics record name).

@anjohnson
Copy link
Member

@mdavidsaver You're right, isascii() is not available on VxWorks, and without that qualifier the locale would probably adversely affect isalnum().

@anjohnson
Copy link
Member

F2F Meeting agrees that for now field names should be restricted to follow the C89 identifier rules.
This github issue does not propose or discuss limitations on PV names.

Enforce C identifer syntax

[A-Za-z_][A-Za-z0-9_]*
field names may not begin with a digit
@mdavidsaver
Copy link
Member Author

Updated to implement C89 identifier syntax (regex "[A-Za-z_][A-Za-z0-9_]*").

Strictly speaking this test should also check and fail for C keywords (eg. "goto" or "double") as keywords may not be used as identifiers. I'm inclined not to enforce this technically (just discourage it) as doing so would not be useful unless java and c++ keywords were also blacklisted. The union of these three sets has 104 entries.

FYI this isn't entirely academic, testFieldBuilder uses "double" as a field name.

@mdavidsaver
Copy link
Member Author

Since there was agreement last week, unless someone wants to veto I'll merge this on Wednesday.

@mdavidsaver mdavidsaver self-assigned this Nov 23, 2015
@anjohnson
Copy link
Member

The Perl DBD-file parser in 3.15 has a list of C++ reserved keywords (which might be out of date by now) and prevents their lower-case versions from being used as record field names. This was found to be necessary with the aSub record type's field named NOT which lower-cases to a C++ keyword, so it was impossible to write an aSub record subroutine in C++. The record structure in the generated header file now uses the original (upper-case) name if the lower-case version is reserved, while the parser aborts if the original name is reserved.

I don't see a particular need to do that same check in pvData though, problems would only arise if the name were to be used as a member name in a structure that gets output to a header file.

@dhickin
Copy link
Contributor

dhickin commented Nov 25, 2015

The request itself looks good.

However we definitely shouldn't go any further in not allowing reserved C keywords. (I hadn't realised this was intended - I assumed the references to C89 identifiers meant using this for the allowed character set).

Specifying the allowed character set is valid and useful, particular to avoid issues like we hit with "."s. There is no good reason to disallow keywords from any language and in any case pvData is independent of any language binding. At what point to we stop (do we ban Python keywords?).

The only reason I could possibly see is to avoid the situation where a programmer tries to name a variable to match a field name, but this should be handled by the programmer by choosing a suitable variable name.

There might be an argument for avoiding pvData meta language keywords. However I don't think this causes an issue and we have seen already people naturally use double as a field name.

I suggest we just say that (unless we extend at a later date) the allowed field names are "[A-Za-z_][A-Za-z0-9_]*" which is completely clear and avoid specifying in terms of C89.

@mdavidsaver
Copy link
Member Author

With a vote of 3-0 I consider that the keywords are out leaving only the simple lexical definition ([A-Za-z_][A-Za-z0-9_]*).

@mdavidsaver mdavidsaver merged commit 6641e7f into epics-base:master Dec 1, 2015
@mdavidsaver mdavidsaver deleted the validatefieldnames branch February 26, 2016 17:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants