Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mloginfo --queries crash "TypeError: Object of type bytes is not JSON serializable" #764

Closed
bugre opened this issue Dec 31, 2019 · 10 comments · Fixed by #768
Closed

mloginfo --queries crash "TypeError: Object of type bytes is not JSON serializable" #764

bugre opened this issue Dec 31, 2019 · 10 comments · Fixed by #768
Milestone

Comments

@bugre
Copy link
Contributor

bugre commented Dec 31, 2019

Actual/current behavior

Crash when running.

mloginfo --queries ./mongod.log  

# or (with no differente output)
mloginfo --verbose --queries ./mongod.log

Crashes when running with --queries.
But, does NOT crash running with --connections for example. So i think it's related to some specific query parameter or values.

I can confirm that the problem reported at #684 still exist with the latest version of mtools (tested with latest mtools 1.6.0 release, also with develop branch) and python 3.7.6 (before was using 3.6 with same results, upgraded as a trial to fix the issue :), but no.

✔ ~/dsv/mtools/mtools/util [develop|✚ 1…1]
11:05 $ mlogfilter --version
mtools version 1.6.1-dev || Python 3.7.6 (default, Dec 30 2019, 19:38:28)
[Clang 11.0.0 (clang-1100.0.33.16)]

Steps to reproduce the actual/current behavior

Simply run: mloginfo --verbose --queries ./mongod.log generates the crash with this traceback.

Traceback (most recent call last):
  File "/usr/local/bin/mloginfo", line 11, in <module>
    load_entry_point('mtools', 'console_scripts', 'mloginfo')()
  File "$HOME/dsv/mtools/mtools/mloginfo/mloginfo.py", line 100, in main
    tool.run()
  File "$HOME/dsv/mtools/mtools/mloginfo/mloginfo.py", line 95, in run
    section.run()
  File "$HOME/dsv/mtools/mtools/mloginfo/sections/query_section.py", line 81, in run
    pattern=le.pattern, duration=le.duration,
  File "$HOME/dsv/mtools/mtools/util/logevent.py", line 474, in pattern
    self._pattern = self._find_pattern('filter: ')
  File "$HOME/dsv/mtools/mtools/util/logevent.py", line 989, in _find_pattern
    return json2pattern(search_str)
  File "$HOME/dsv/mtools/mtools/util/pattern.py", line 96, in json2pattern
    return json.dumps(doc, sort_keys=True, separators=(', ', ': '))
  File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/__init__.py", line 238, in dumps
    **kw).encode(obj)
  File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
  File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type bytes is not JSON serializable

Environment

Software Version
mtools tested with 1.6.0 release and with 'develop' branch
MongoDB server logs are from MongoDB Atlas 4.0.13 primary server
Operating system running mtools locally on my mac OS X 10.14.6

Others..

I'll try to reduce the log to reasonable size and sanitise it while still getting it to generate the error (now it's around 4GB and has a lot of private data). And if i get the time, try to troubleshoot it a little and report back or suggest a possible fix if i'm able to do it.

@bugre
Copy link
Contributor Author

bugre commented Dec 31, 2019

@stennie any suggested "hidden" command line options that could generate a more verbosely output while processing the log, to help pinpoint the error generating lines?

@stennie stennie added this to the 1.6.1 milestone Dec 31, 2019
@stennie
Copy link
Collaborator

stennie commented Dec 31, 2019

@bugre There aren't any hidden verbosity options, but it would be useful to add some log context for exceptions (I just created #767 for this).

To troubleshoot mloginfo --queries at the moment I'd narrow down the input log based on the stack trace and the % progress before the error is encountered. This issue appears related to extracting queries from log lines matching filter:.

If you are able to share a subset of the log privately (or recreate this with clean data), I could also extrapolate a test case.

Regards,
Stennie

@bugre
Copy link
Contributor Author

bugre commented Dec 31, 2019

Hi @stennie ,
Thanks for getting back. I'm going the dirty way, aka, adding print's to trace the flow. Seems that with some specific query entries the json.load with "object_hook=_decode_pattern_dict" creates the problem.

I'll need a little more time to understand why and propose a solution.

Thanks,
-wm

@bugre
Copy link
Contributor Author

bugre commented Jan 2, 2020

Hi @stennie ,
Happy New Year.
Went back to this after holiday, and found the issue. It's a side effect of py2 and py3 compatibility i think. I've it working for my first need, but would like to understand a little more if you could help me.

Question, what should be the output of filter patter processing for this test string (file: mtools/utils/pattern.py). Specifically looking at the curList: ["Orange", , "XYZ", "Krown"] part.

s = (r'{_id: ObjectId(\'528556616dde23324f233168\'), curList: [ "Orange", "XYZ", "Krown"], allowedSnacks: 1000 }')
print(json2pattern(s))

Should it be:
{"_id": 1, "allowedSnacks": 1, "curList": ["Krown", "XYZ", "Orange"]}

or should it be something like?:
{"_id": 1, "allowedSnacks": 1, "curList": [1]}

Thanks
-wm

@bugre
Copy link
Contributor Author

bugre commented Jan 3, 2020

log entry to reproduce the error and test the fix..

Adding a one line log entry that can be used to generate the error (before fixing) and after applying the PR #768 it should not crash anymore.

2019-12-30T21:09:15.337+0000 I COMMAND  [conn249832] command mydb.plans command: find { find: "plans", filter: { channelsCode: [ "€£", "Crown", "Metal" ], _id: ObjectId('5d284c8df3bcc30052f6764e'), code: "some_code", name: "Some Code", allowedDevices: 1000, allowedSessions: 1000, features: ObjectId('5b61b777498b8450ca170929') }, projection: {}, limit: 1, singleBatch: true, batchSize: 1, returnKey: false, showRecordId: false, $clusterTime: { clusterTime: Timestamp(1577740154, 356), signature: { hash: BinData(0, 80EAD2A7D45751411F0CE7DC7C272E8AE508EFE7), keyId: 6721887357433282561 } }, lsid: { id: UUID("30d40d1f-fa97-410d-88e0-9d687431e4cf") }, $db: "mydb" } planSummary: IXSCAN { _id: 1 } keysExamined:1 docsExamined:1 cursorExhausted:1 numYields:0 nreturned:1 reslen:1340 locks:{ Global: { acquireCount: { r: 1 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } storage:{} protocol:op_query 324ms

@stennie
Copy link
Collaborator

stennie commented Jan 3, 2020

Question, what should be the output of filter patter processing for this test string (file: mtools/utils/pattern.py). Specifically looking at the curList: ["Orange", , "XYZ", "Krown"] part.

s = (r'{_id: ObjectId(\'528556616dde23324f233168\'), curList: [ "Orange", "XYZ", "Krown"], allowedSnacks: 1000 }')
print(json2pattern(s))

Should it be:
{"_id": 1, "allowedSnacks": 1, "curList": ["Krown", "XYZ", "Orange"]}

or should it be something like?:
{"_id": 1, "allowedSnacks": 1, "curList": [1]}

The latter example is the expected output of json2pattern(): a query is reduced to a pattern with placeholders rather than actual values.

I found two other issues, after i fixed the above one (and with the addition of some print on the exception it was easy to find the offending log line).

They also occur on the logfile i'm processing. Can i fix them under this same issue with different commits of must/should each one have a separate issue created first?

Ideally we try to keep unrelated issues in separate commits so a PR with several related commits can be squashed for easy diffing of changes in future. If you are referring to the commits in the PR for JSON encoding and removing six, it would be OK to leave those in (will merge instead of squashing).

Regards,
Stennie

@stennie stennie modified the milestones: 1.6.1, 1.6.2 Jan 3, 2020
@bugre
Copy link
Contributor Author

bugre commented Jan 3, 2020

Thank you for the clarification Stephen.
Latter tonight or during the weekend, i'll look into the code on how to get list patterns reduced as you mention and update this PR.
at
-wm

@niccottrell
Copy link

This line gives the same error and stacktrace without any obvious unicode chars:

2020-01-06T02:35:39.868+0000 I COMMAND [conn137572] command tinkles.tinkles_cyco.web_resource_state command: find { find: "tinkles_cyco.web_resource_state", filter: { urls: { $all: [ "https://surtronic.info/" ] } }, projection: { web_resource_id: 1 }, lsid: { id: UUID("c34f0c66-8423-4a3a-beeb-0d2ede0415e3") }, $clusterTime: { clusterTime: Timestamp(1578278139, 718), signature: { hash: BinData(0, 67BB9AC67D809AD263875805B445C4BCE059D064), keyId: 6736797216142262274 } }, $db: "tinkles", $readPreference: { mode: "primary" } } planSummary: IXSCAN { urls: 1 } keysExamined:1 docsExamined:1 cursorExhausted:1 numYields:1 nreturned:1 reslen:396 locks:{ Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 2 } }, Collection: { acquireCount: { r: 2 } } } storage:{ data: { bytesRead: 32523, timeReadingMicros: 16629 } } protocol:op_msg 103ms

@bugre
Copy link
Contributor Author

bugre commented Jan 6, 2020

Hi @niccottrell ,
Indeed it should, because of the "list" inside the filter that was not completely simplified to '1'. The code from PR #768, does also fix this.

I've tested you'r log line with my proposed PR and it does indeed work, but the resulted simplification is not exactly what i've would have expected. But in this case, the simplification is done by an already existing code in util/pattern.py and not the new code i've added. I'm still not sure what all the result's should be.

namespace                                  operation    pattern                            count    min (ms)    max (ms)    mean (ms)    95%-ile (ms)    sum (ms)    allowDiskUse    example

tinkles.tinkles_cyco.web_resource_state    find         {"urls": {"$all": ["https:1"]}}        1         103         103        103.0             103       103.0    None            {"urls": {"$all": ["https:1"]}}
✔ ~/dsv/mtools [fix/bugre-#764_JSONSerialize_crash_w_bytes|✚ 2⚑ 3]

@stennie , this line '/.+?/\w*' of the def shell2json... does the simplification in this case. Do you know if this is the expected result and/or case where this regex is expected to actuate?

I could change the code i've added for the simplification of cases like "{... "attrib" : ["val1", "val2", "3"],...}" to "{..."atrib" : [1],... }":

FROM:    s, _ = re.subn(r'("\w+"\s*:\s*\[)(.[^\]]+)(]\s*[,}])', '\\1 1 \\3', s)
TO  :    s, _ = re.subn(r'("\S+"\s*:\s*\[)(.[^\]]+)(]\s*[,}])', '\\1 1 \\3', s)

This would simplify the above log entry:

to this    : {"urls": {"$all": [1]}}
instead of : {"urls": {"$all": ["https:1"]}  -  as it's now.

BUT, this also changes entries like the one bellow ('Input'), so that the result will be the shown in "Output" and not the one in "Expect" that is generated today:

  Input : {d: {$gt: 2, $lt: 4}, b: {$gte: 3}, c: {$nin: [1, "foo", "bar"]}, "$or": [{a:1}, {b:1}] }
  Expect: {"$or": [{"a": 1}, {"b": 1}], "b": 1, "c": {"$nin": 1}, "d": 1}
  Output: {"$or": [1], "b": 1, "c": {"$nin": 1}, "d": 1}

bugre added a commit to bugre/mtools that referenced this issue Jan 6, 2020
It's a pattern with list of url '$all : [ "https://xxxxx.xx" ] ..'
See comment at rueckstiess#764 (comment)
@bugre
Copy link
Contributor Author

bugre commented Jan 6, 2020

i've added one more commit ( 5ba18b3), where i change the regex

FROM:    s, _ = re.subn(r'("\w+"\s*:\s*\[)(.[^\]]+)(]\s*[,}])', '\\1 1 \\3', s)
TO  :    s, _ = re.subn(r'("\S+"\s*:\s*\[)(.+)((?<=\")\s*\]\s*[,}])', '\\1 1 \\3', s)

both cases that i mention above (on my last comment) are handled correctly (as i understand).

@niccottrell use case, is simplified to {"urls": {"$all": [1]}} even if there are multiple urls in the list, and for cases like ..."$or": [{a:1}, {b:1}] ... they're not touched anymore.

Finally this also improves the regex and avoids the simplifying error that happened if a "]" was part of a list value. This case is now also correctly handled.

With this, if no other use cases are handled wrongly or need improvements, i would consider this PR finished and ready for your evaluation.

Let me know if there are coding or style adjustments.

stennie pushed a commit that referenced this issue Feb 10, 2020
It's a pattern with list of url '$all : [ "https://xxxxx.xx" ] ..'
See comment at #764 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants