Skip to content

Conversation

@alejandro-perez
Copy link
Contributor

The old escaping code only escapes '"' characters, but RFC4627 also mentions other characters that must be escaped, such as '\n', '/', '\b', etc.

The new code adds support for escaping these characters, and provides a simplified implementation. In particular, instead of iterating the original string twice (one for counting characters, and the other to replace them), only a single iteration is performed always. Memory is allocated as the worst case scenario (i.e. all the characters are to be escaped).

I have used:

case '"':  memcpy(p, "\\\"", 2); p += 2; break;

because I thougth it was clearer than:

case '"': *(p++) = '\\'; *(p++) = '"'; break;

But we can change that if you prefer.

src/environ.c Outdated
case '\r': memcpy(p, "\\r", 2); p += 2; break;
case '\f': memcpy(p, "\\f", 2); p += 2; break;
case '\n': memcpy(p, "\\n", 2); p += 2; break;
default: *p = value[i]; p += 1; break;
Copy link
Contributor

@simo5 simo5 Feb 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please change to:

switch (value[i]) {
case '"':
case '/':
[...]
case '\n':
    *p = '\\';
    p++;
    break;
default:
    break;
}
*p = value[i];
p++;

(also keep case aligned to switch, not indented, only the actions are indented).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah also, I am reading RFC 4627 2.5. Strings
As far as I can tell it says to escape all control characters, so perhaps we should just:

if ((value[i] < 0x1F) || (value[i] == '"') || (value[i] == '\\') {
    *p = '\\';
    p++;
}
*p = value[i];
p++;

Copy link
Contributor

@simo5 simo5 Feb 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh we can't just use '\' for all chars, silly me ... then we have a few options, one is to allocate 6x and escape all unrepresentable chars with \uXXXX notation, or just allocate double as you do now, but check bounds, and if we get to fill the string reallocate doubling the buffer.
If we want to keep using the \X notation for common chars then we'd do something like:

switch (value[i]) {
case '"':
case '/':
[...]
case '\n':
    *p = '\\';
    p++;
    break;
default:
    if (value[i] < 0x1F) {
        snprintf(p, "\u%04d", (int)value[i]);
        p += 6;
        continue;
    }
    break;
}
*p = value[i];
p++;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The escaping in my example is always wrong for \n \b etc, get that right :)
The main point here is to escape all control chars < 1F and style.
using memcpy is fine, but do it on a new line and break on the next line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too many comments :). First, I wouldn't expect having unrepresentable chars as part of the output of the display_value, since they are not displayable. But, to be safe, we better generate valid JSON regardless the input. I don't really like having to reallocate the buffer taking care of the length. I'd rather allocate the worst case (size*6+1). The memory is associated from the request pool, and hence is to be freed soon.

Let me push another commit and let me know what you think.

Copy link
Contributor

@simo5 simo5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please escape all needed chars, see style hints too :)

@alejandro-perez alejandro-perez force-pushed the master branch 2 times, most recently from c0f1767 to 2ef85d8 Compare February 28, 2017 17:29
Copy link
Member

@frozencemetery frozencemetery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two nits (first one I don't really care about and wouldn't bring up without the second; second one is important for reducing code churn for backport purposes). Otherwise, I'm happy with this.

src/environ.c Outdated
* escaped length will be original length * 6 + 1 in the worst case */
p = escaped_value = apr_palloc(req->pool, disp_value.length * 6 + 1);
for (i = 0, j = 0; i < disp_value.length; i++, j++) {
if ((value[i] < 0x1F) || (value[i] == '"') || (value[i] == '\\')) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parens are not needed here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... and the second one was addressed while I was reviewing, awesome. Ignore this if we have no other comments :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I noticed about the change in '\0' by 0 and fixed with force. Shouldn't do that, I know... :)

src/environ.c Outdated
escaped_value[j] = '\0';

/* Make the string NULL terminated */
escaped_value[j] = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These three lines shouldn't change from the original.

src/environ.c Outdated
* escaped length will be original length * 6 + 1 in the worst case */
p = escaped_value = apr_palloc(req->pool, disp_value.length * 6 + 1);
for (i = 0, j = 0; i < disp_value.length; i++, j++) {
if ((value[i] < 0x1F) || (value[i] == '"') || (value[i] == '\\')) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the if/else? you are alredy doing a switch/case here ... it can be handled in there

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question. You need if you want to do the

           escaped_value[j] = '\\';
            j++;

only once. Otherwise the default case would comprehend the non-escape case and you cannot do it for the common case (well, I didn't find a way to do it). So I found out that doing first the if to differenciate escape/non-escape and the using the switch/case for selecting the replacement characters was better suited.

Copy link
Contributor

@simo5 simo5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use just the switch case not if/else around it, it's redundant

src/environ.c Outdated
break;
default:
apr_snprintf(&escaped_value[j], 6, "u%04d", (int) value[i]);
j += 4;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look right, you should add 5 to j, seem like this has not been tested ...
I think I preferred the previous version where you would increment a pointer and write to it, reades better than setting stuff in escaped_value[j] for some reason ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is exactly the opposite... I've tested it in an independent C program and used valgrind and "jq" utility to check memory and JSON formatting. At first I thougth it was as you propose but
note that the "for" loop will increment j for you.

Nonetheless, let me convert this into the pointer-based version (without rebasing) and see if that looks nicer.

@alejandro-perez
Copy link
Contributor Author

Now I have a comment. I have a lot of

*p = something;
p++;

which can be written as:

*p++ = something;

Which one do you prefer?

@alejandro-perez
Copy link
Contributor Author

I do prefer this last version. I think autoincrents in this case read fairly well.

@simo5
Copy link
Contributor

simo5 commented Feb 28, 2017

In all honesty I think I prefer the initial version based on memcpy(), with the addition of the \u00XX handling.
In this version I really do not like the if/else wrapping the switch/case, something like:

switch (value[i) {
case ..
case...
    memcpy(p, "\\n", 2);
    p += 2;
    break;
default:
    if (value[i] < 0x1F) {
        apr_psnprintf(...);
        p += 6;
    } else {
        *p = value[i];
       p += 1;
    }
    break;
}

@alejandro-perez
Copy link
Contributor Author

Sure, let me do it :).

@alejandro-perez
Copy link
Contributor Author

BTW I used <= 0x1F since it seems value 0x1F should be escaped as well.

Copy link
Contributor

@simo5 simo5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM,
please squash all in one commit and I'll push

@alejandro-perez
Copy link
Contributor Author

Done

Cover all the cases mentioned in RFC4627.

Reviewed-by: Simo Sorce <simo@redhat.com>
Closes gssapi#132
@simo5
Copy link
Contributor

simo5 commented Feb 28, 2017

Ok, so I made some minor changes and pushed them to your tree.
Changes:

  • Minor style changes
  • Comment was outdated, changed
  • Commit message fixed (subject was too long)

Here is the interdiff:

diff -u b/src/environ.c b/src/environ.c
--- b/src/environ.c
+++ b/src/environ.c
@@ -101,16 +101,16 @@
                                       gss_buffer_desc disp_value)
 {
     /* This function returns a copy (in the pool) of the given gss_buffer_t
-     * where every occurrence of " has been replaced by \". This string is
+     * where some characters are escaped as required by RFC4627. The string is
      * NULL terminated */
-    int i = 0;
-    char *escaped_value = NULL, *p = NULL;
-    char *value = (char*) disp_value.value;
+    char *value = disp_value.value;
+    char *escaped_value = NULL;
+    char *p = NULL;
 
     /* gss_buffer_t are not \0 terminated, but our result will be. Hence,
      * escaped length will be original length * 6 + 1 in the worst case */
     p = escaped_value = apr_palloc(req->pool, disp_value.length * 6 + 1);
-    for (i = 0; i < disp_value.length; i++) {
+    for (size_t i = 0; i < disp_value.length; i++) {
         switch (value[i]) {
         case '"':
             memcpy(p, "\\\"", 2);
@@ -142,7 +142,7 @@
             break;
         default:
             if (value[i] <= 0x1F) {
-                apr_snprintf(p, 7, "\\u%04d", (int) value[i]);
+                apr_snprintf(p, 7, "\\u%04d", (int)value[i]);
                 p += 6;
             } else {
                 *p = value[i];

If you are ok with these changes I'll push.

@alejandro-perez
Copy link
Contributor Author

alejandro-perez commented Feb 28, 2017 via email

@simo5 simo5 merged commit a9bbb84 into gssapi:master Mar 1, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants