easylogging++: sanitize log payload #6550

moneromooo-monero · 2020-05-17T12:42:24Z

Some of it might be coming from untrusted sources

vtnerd

I started this review but then stopped. Why is this being done for every log message instead of the offending ones? This requires an additional string copy for every log message. This is infrequent (logging shouldn't spam too much), but how many times is some "unsanitized" (not printable?) message being logged? And what happens when the users locale isn't utf8?

vtnerd · 2020-05-17T16:14:02Z

src/common/utf8.h

+  }
+
+  template<typename T>
+  inline T utf8canonical(const T &s)


This overload is unnecessary due to the default argument.

vtnerd · 2020-05-17T16:15:37Z

src/common/utf8.h

+{
+
+  template<typename T, typename Transform>
+  inline T utf8canonical(const T &s, Transform t = [](wint_t c)->wint_t { return c; })


Why is this entire function repeated in this header, and the other cpp? Was it supposed to be removed from the other cpp?

Because easylogging++ does not depend on common.

Some independent thoughts:

Why have template type T? Each "copy" of the function uses a single type.

Using/instantiating templated code doesn't create a library dependency.

The mnemonics code already depends on epee and easylogging.

Because the original used string and wipeable_string. I'll remove the template on that one.

Are you telling me "you could #include "mnemonics/something" or even "common/something" in easylogging ? That'd be gross.

Yes, or the opposite - mnemonics could open something in easylogging.

Doesn't really belong in a logging API though. If you feel strongly about it, I guess I can do it that way.

vtnerd · 2020-05-17T16:18:24Z

external/easylogging++/easylogging++.cc

+        default: throw std::runtime_error("Invalid UTF-8");
+      }
+      *wptr = 0;
+      sc += T(wbuf, bytes);


Just use .append() here. The code already assumes a particular constructor and operator+= interface; append is usually going to be more efficient.

vtnerd · 2020-05-17T16:22:01Z

external/easylogging++/easylogging++.cc

+void sanitize(std::string &s)
+{
+  s = utf8canonical(s, [](wint_t c)->wint_t {
+    if (c == 9 || c == 10 || c == 13)


What are you trying to do here? std::isprint ? There's also boost::spirit::char_encoding::unicode::isprint which works specifically with unicode points regardless of locale settings. There isn't a quick/guaranteed isprint for unicode in C++11.

Ok, it looks like this is filtering for printable ascii and only space characters? This check might be off for upper characters I'd have to look again.

Keeping stuff we want to get in logs, but not control characters.

Comment on whats being allowed?

moneromooo-monero · 2020-05-17T16:59:10Z

It avoids going through all the code to replace LOG(s) with LOG(sanitize_for_log(s)). Which is annoying, might miss some, and does not ensure future ones don't get added.

moneromooo-monero · 2020-05-17T17:04:38Z

I guess later utf8canonical could be changed to be two pass, first pass checks, second pass happens only if the first pass detected the string should change and makes the change.

vtnerd · 2020-05-18T04:11:21Z

external/easylogging++/easylogging++.cc

+{
+    std::string sc = "";
+    size_t avail = s.size();
+    const char *ptr = s.data();


So we're going with the two copies? Ahhh ... well at least in-place overwrites is guaranteed to work since the filtering function returns either the same codepoint or a single-byte codepoint (?).

boost::locale::utf has most of this implemented, and the documentation states linking is not required (header only). This would make an in-place implementation easier to read I think, something like:

char* wptr = &str[0]; const char* rptr = wptr; char const* const end = rptr + str.size(); while ( rptr != end) { using utf8 = boost::locale::utf::utf_traits<char>; const bool copy = rptr != wptr; const auto cp = utf8::decode(rptr, end); if (!boost::locale::utf::is_valid_codepoint(cp)) throw std::runtime_error{...}; // only works if log_filter returns codepoint equal or // less in length (always true currently) const auto filtered = filter_for_log(cp); if (filtered != cp || copy) wptr = utf8::encode(filtered, wptr); else wptr = rptr; }

The switch in the boost implementation is a bit more compact, and contains a few more checks for invalid utf8 sequences. Regardless, the in-place version is doable once the filtering function is static/not mutable.

vtnerd · 2020-05-18T04:12:20Z

external/easylogging++/easylogging++.cc

@@ -2475,6 +2475,100 @@ void DefaultLogDispatchCallback::handle(const LogDispatchData* data) {
  }
 }

+
+template<typename Transform>


The Transform function is also fixed here since this is just a straight copy. Just move the lambda into its own function and call filter_codepoint or whatever from inside the function.

vtnerd · 2020-05-18T04:13:28Z

external/easylogging++/easylogging++.cc

+template<typename Transform>
+static inline std::string utf8canonical(const std::string &s, Transform t = [](wint_t c)->wint_t { return c; })
+{
+    std::string sc = "";


Just use default construction. It might skip std::strlen calls, but there isn't a reason to leave it to chance.

vtnerd · 2020-05-18T04:16:25Z

src/mnemonics/language_base.h

@@ -73,78 +74,11 @@ namespace Language
    return prefix;
  }

-  template<typename T>


I assume this was moved during the initial refactor?

Currently the only new call to this function is using an entirely different copy of the same function. Switching to a templated Transform function also doesn't seem useful. The T template is also (has always been) fixed according to my quick grepping.

vtnerd · 2020-05-18T04:23:15Z

src/common/utf8.h

+{
+
+  template<typename T, typename Transform>
+  inline T utf8canonical(const T &s, Transform t = [](wint_t c)->wint_t { return c; })


Yes, or the opposite - mnemonics could open something in easylogging.

vtnerd · 2020-05-18T05:00:54Z

src/mnemonics/language_base.h

  struct WordHash
  {
    std::size_t operator()(const epee::wipeable_string &s) const
    {
-      const epee::wipeable_string sc = utf8canonical(s);
+      const epee::wipeable_string sc = tools::utf8canonical(s, [](wint_t c) -> wint_t { return std::tolower(c); });


The calls to std::tolower are (and have been previously) invalid since simple wallet changes to the user environment locale which may not be unicode compatible. Presumably, everyone has been using utf8 encodings in their locale, or this would've failed. A comment or Github issue? If someone hits this edge case, lots should go wrong in the code I think.

There's boost::spirit::char_encoding::unicode::tolower (from the same file mentioned previously), but that drags in unicode tables from Boost. I don't see any other easy method, with Boost or C++, as you have to specify some kind of correct locale which is funky and may not be present. Boost locale can convert the entire string, but it only returns std::string.

towlower might work ? I just saw it in the manpges, takes a wint_t, says "in the current locale", no mention of UTF-8.

towlower is better but still has the same user environment locale/encoding issue. If the user environment uses shift_jis, then all of this is incorrect. Although, the wallet is probably unusable in that environment (i.e. scope larger than just this snippet), and no one has filed any bug reports so (nearly) everyone uses utf8 presumable.

I commented on it because the CLI output is probably going to be messed up in that situation, and it may not be immediately obvious why.

I'll use towlower then. Whatever improvements can be done later if you find a good way (or boost).

vtnerd · 2020-05-18T05:22:11Z

external/easylogging++/easylogging++.cc

+void sanitize(std::string &s)
+{
+  s = utf8canonical(s, [](wint_t c)->wint_t {
+    if (c == 9 || c == 10 || c == 13)


Comment on whats being allowed?

moneromooo-monero · 2020-05-18T11:43:48Z

What kind of comment are you after ? Those just make sense as they're already logged.

moneromooo-monero · 2020-05-18T11:44:53Z

I'm just going to ignore the bikeshedding. If you want to change, the code to use boost or add micro optimisations, fine, but I'm not going to do it here.

vtnerd · 2020-05-18T18:38:46Z

I'm just going to ignore the bikeshedding. If you want to change, the code to use boost or add micro optimisations, fine, but I'm not going to do it here.

This isn't bikeshedding or micro-optimizing. The codebase routinely has large functions with many variables instead of smaller re-usable functions. Ideally we move in the other direction. And this PR duplicates entire functions within the same codebase. This PR is to the master branch, not the 0.16 variant - I'm not sure why there's a need to accept this code with some performance impact when an alternative implementation isn't too difficult.

This patch copies+allocates a new string for every log message, while a lock is being held. Log messages are rare, but the same lock must be acquired (basically) for all error/warning/info log statements to determine whether a message is to be displayed (its worse if the user bumps up logging in one category).

moneromooo-monero · 2020-05-18T18:48:29Z

This is not new code, in case you did not notice. It was just moved (the duplication is unfortunate, though I agree). Stuff like changing the ctor to take out a strlen on a "" is time wasting though.

Some of it might be coming from untrusted sources Reported by itsunixiknowthis

moneromooo-monero force-pushed the slog branch from f3544aa to 8c27d5f Compare May 17, 2020 14:31

vtnerd reviewed May 17, 2020

View reviewed changes

moneromooo-monero force-pushed the slog branch from 8c27d5f to e2287a0 Compare May 17, 2020 16:57

moneromooo-monero force-pushed the slog branch from e2287a0 to db99f61 Compare May 18, 2020 02:08

vtnerd reviewed May 18, 2020

View reviewed changes

moneromooo-monero force-pushed the slog branch from db99f61 to 74c5c7c Compare May 18, 2020 18:38

moneromooo-monero force-pushed the slog branch from 74c5c7c to 223252e Compare May 18, 2020 18:45

easylogging++: sanitize log payload

ca60d60

Some of it might be coming from untrusted sources Reported by itsunixiknowthis

moneromooo-monero force-pushed the slog branch from 223252e to ca60d60 Compare May 19, 2020 10:56

luigi1111 approved these changes Jul 8, 2020

View reviewed changes

luigi1111 merged commit ee817e0 into monero-project:master Jul 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

easylogging++: sanitize log payload #6550

easylogging++: sanitize log payload #6550

moneromooo-monero commented May 17, 2020

vtnerd left a comment

vtnerd May 17, 2020

vtnerd May 17, 2020

moneromooo-monero May 17, 2020

vtnerd May 18, 2020

moneromooo-monero May 18, 2020

vtnerd May 18, 2020

moneromooo-monero May 18, 2020

vtnerd May 17, 2020

vtnerd May 17, 2020

vtnerd May 17, 2020

moneromooo-monero May 17, 2020

vtnerd May 18, 2020

moneromooo-monero commented May 17, 2020

moneromooo-monero commented May 17, 2020

vtnerd May 18, 2020 •

edited

Loading

vtnerd May 18, 2020

vtnerd May 18, 2020

vtnerd May 18, 2020

vtnerd May 18, 2020

vtnerd May 18, 2020

moneromooo-monero May 18, 2020

vtnerd May 18, 2020

moneromooo-monero May 18, 2020

vtnerd May 18, 2020

moneromooo-monero commented May 18, 2020

moneromooo-monero commented May 18, 2020

vtnerd commented May 18, 2020 •

edited

Loading

moneromooo-monero commented May 18, 2020

easylogging++: sanitize log payload #6550

easylogging++: sanitize log payload #6550

Conversation

moneromooo-monero commented May 17, 2020

vtnerd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

moneromooo-monero commented May 17, 2020

moneromooo-monero commented May 17, 2020

vtnerd May 18, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

moneromooo-monero commented May 18, 2020

moneromooo-monero commented May 18, 2020

vtnerd commented May 18, 2020 • edited Loading

moneromooo-monero commented May 18, 2020

vtnerd May 18, 2020 •

edited

Loading

vtnerd commented May 18, 2020 •

edited

Loading