Fix errors with sessions & logging related to invalid UTF-8#60
Merged
Conversation
Previously, if the user-supplied User-Agent string contained invalid UTF8 characters this would cause two problems: * The MysqlSession driver (against MySQL 8 in strict mode) would fail to insert the session to the database due to the invalid content in the user_agent string. This caused it to throw an exception, which would cause session_start to fail with "session ID must be a string". * The request logger would output an empty log line - because JSON- encoding the request log entry failed and we were not detecting that. There was also the theoretical risk that the user input could contain non-printable control characters - although not directly an issue, this could cause unexpected behaviour if consuming projects tried to read the database user_agent or parse the log entries without treating this as user input. This fix introduces a new StringSanitiser::ensurePrintableUtf8 method which will silently re-encode ASCII control characters and invalid UTF8 sequences as `�` characters to make them safe for use. That is then used for both the session value and the log value of the User-Agent header. The raw User-Agent is still used for hashing the session ID: the hash function just sees this as a byte sequence so the string encoding is irrelevant. This ensures that sessions created before this commit will still have the same hash as ones used after it, so the change is BC.
Although we now sanitise the user-agent input, there may be other places where an application could be logging messages or context including invalid UTF-8 (exception messages, for example). Rather than lose the whole log entry, have json_encode transliterate these to the � character so that we can get a safe & valid log line.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
In particular, that presenting an invalid UTF-8 string as the
User-Agentheader was:Going forward all session user_agent values, and all values going through StackdriverApplicationLogger, will have encoded � characters in place of any invalid UTF-8 characters. Additionally, ASCII control characters (null, bell, etc) that are valid UTF8 but non-printable will also be escaped if present in the user-agent string.