Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

logmsg: sanitize sdata keys #1650

Merged
merged 1 commit into from
Aug 25, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
24 changes: 23 additions & 1 deletion lib/logmsg/logmsg.c
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@
#include <unistd.h>
#include <string.h>
#include <stdlib.h>
#include <ctype.h>

/*
* Reference/ACK counting for LogMessage structures
Expand Down Expand Up @@ -861,6 +862,27 @@ log_msg_is_tag_by_name(LogMessage *self, const gchar *name)

/* structured data elements */

static void
log_msg_sdata_append_key_escaped(GString *result, const gchar *sstr, gssize len)
{
/* The specification does not have any way to escape keys.
* The goal is to create syntactically valid structured data fields. */
const guchar *ustr = (const guchar *) sstr;

for (gssize i = 0; i < len; i++)
{
if (!isascii(ustr[i]) || ustr[i] == '=' || ustr[i] == ' '
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a suggestion:
Not sure how much performance penalty is included due to this, but maybe it would worth build a global constant table: ascii->bool which can tell in one step if escape is needed, instead of the 4 comparison above.

|| ustr[i] == '[' || ustr[i] == ']' || ustr[i] == '"')
{
gchar hex_code[4];
g_sprintf(hex_code, "%%%02X", ustr[i]);
g_string_append(result, hex_code);
}
else
g_string_append_c(result, ustr[i]);
}
}

static void
log_msg_sdata_append_escaped(GString *result, const gchar *sstr, gssize len)
{
Expand Down Expand Up @@ -993,7 +1015,7 @@ log_msg_append_format_sdata(const LogMessage *self, GString *result, guint32 se
if (value)
{
g_string_append_c(result, ' ');
g_string_append_len(result, sdata_param, sdata_param_len);
log_msg_sdata_append_key_escaped(result, sdata_param, sdata_param_len);
g_string_append(result, "=\"");
log_msg_sdata_append_escaped(result, value, len);
g_string_append_c(result, '"');
Expand Down
36 changes: 35 additions & 1 deletion lib/logmsg/tests/test_log_message.c
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ assert_sdata_value_with_seqnum_equals(LogMessage *msg, guint32 seq_num, const gc
GString *result = g_string_sized_new(0);

log_msg_append_format_sdata(msg, result, seq_num);
cr_assert_str_eq(result->str, expected, "SDATA value does not match");
cr_assert_str_eq(result->str, expected, "SDATA value does not match, '%s' vs '%s'", expected, result->str);
g_string_free(result, TRUE);
}

Expand Down Expand Up @@ -332,6 +332,40 @@ Test(log_message, test_local_logmsg_created_with_the_right_flags_and_timestamps)
cr_assert(are_equals, "The timestamps in a LogMessage created by log_msg_new_local() should be equals");
}

Test(log_message, test_sdata_sanitize_keys)
{
LogMessage *msg;
/* These keys looks strange, but JSON object can be parsed to SDATA,
* so the key could contain any character, while the specification
* does not declare any way to encode the the keys, just the values.
* The goal is to have a syntactically valid syslog message. */

msg = log_msg_new_empty();
log_msg_set_value_by_name(msg, ".SDATA.foo.bar[0]", "value[0]", -1);
assert_sdata_value_equals(msg, "[foo bar%5B0%5D=\"value[0\\]\"]");
log_msg_unref(msg);

msg = log_msg_new_empty();
log_msg_set_value_by_name(msg, ".SDATA.foo.bácsi", "bácsi", -1);
assert_sdata_value_equals(msg, "[foo b%C3%A1csi=\"bácsi\"]");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a little bit strange to see here that the key is urlencoded, but the value isn't. Why do not we have the same problem with the values?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RFC 5424 says (on page 7):

      STRUCTURED-DATA = NILVALUE / 1*SD-ELEMENT
      SD-ELEMENT      = "[" SD-ID *(SP SD-PARAM) "]"
      SD-PARAM        = PARAM-NAME "=" %d34 PARAM-VALUE %d34
      SD-ID           = SD-NAME
      PARAM-NAME      = SD-NAME
      PARAM-VALUE     = UTF-8-STRING ; characters '"', '\' and
                                     ; ']' MUST be escaped.
      SD-NAME         = 1*32PRINTUSASCII
                        ; except '=', SP, ']', %d34 (")

We do the escaping in value, and any UTF-8 character is accepted there.
In the param name (key) there is no escaping mechanism, and only ASCII characters are allowed, so we have to replace the non-valid characters with valid ones. We did not want to loss data, so sanitizing with a constant character wouldn't be a good idea (for example with _, both "ár" and "űr" would be "_r").

log_msg_unref(msg);

msg = log_msg_new_empty();
log_msg_set_value_by_name(msg, ".SDATA.foo.sp ace", "sp ace", -1);
assert_sdata_value_equals(msg, "[foo sp%20ace=\"sp ace\"]");
log_msg_unref(msg);

msg = log_msg_new_empty();
log_msg_set_value_by_name(msg, ".SDATA.foo.eq=al", "eq=al", -1);
assert_sdata_value_equals(msg, "[foo eq%3Dal=\"eq=al\"]");
log_msg_unref(msg);

msg = log_msg_new_empty();
log_msg_set_value_by_name(msg, ".SDATA.foo.quo\"te", "quo\"te", -1);
assert_sdata_value_equals(msg, "[foo quo%22te=\"quo\\\"te\"]");
log_msg_unref(msg);
}

Test(log_message, test_sdata_value_is_updated_by_sdata_name_value_pairs)
{
LogMessage *msg;
Expand Down