change regex for hash from md5 to more generic to catch more hashes, and hide email_from as well #2499

thezoggy · 2023-03-10T17:56:18Z

in response to: https://forums.sabnzbd.org/viewtopic.php?p=129264#p129264

sabnzbd/api.py

…ex: apikey in rss feed), and hide email_from as well

mnightingale · 2023-03-15T15:33:57Z

Have alternatives to regular expression ever been considered for cleaning logs? they are not particularly easy to decipher or identify excess matching and there are bound to eventually be cases not covered.

For instance a regular expression to extract http/https URLs, parse them with urllib.parse or similar then either remove the whole path or query parts or attempt to strip sensitive parts as now?

thezoggy · 2023-03-15T15:48:24Z

Have alternatives to regular expression ever been considered for cleaning logs? they are not particularly easy to decipher or identify excess matching and there are bound to eventually be cases not covered.

For instance a regular expression to extract http/https URLs, parse them with urllib.parse or similar then either remove the whole path or query parts or attempt to strip sensitive parts as now?

not sure that would be good for performance to try and do that. but i wouldnt be opposed to more explicit log cleanse like sonarr does:
https://github.com/Sonarr/Sonarr/blob/develop/src/NzbDrone.Common/Instrumentation/CleanseLogMessage.cs#L13

just for our purposes, the log cleanse actually does two parts.. the log and the config.
the config bits we easily just say nuke out the values for x keys.
but like in the case of the rss uri, I didnt want to nuke out the whole value just a piece of it (although prob could just nuke out the key and force user to share explicitly that if needed for triage.. as the url itself may also not be wanted to be shared as some sites are anti them being mentioned)

originally i had regex to do just the part but it was forgone to just leverage existing 'hash' to also take care of that. which that hash is what does the heavy lifting and takes care of some apkeys some passwords and then of course all the 'hashes' in filenames and such. since it isnt the full length of the hash there is some bits that remains so can still somewhat follow along with the reduced version of the hash.

so now that ive said all that, do we want to keep the current hash match to md5 like we had.. and i just nuked out the uri field rather than worrying about just a segment of it?

mnightingale · 2023-03-15T16:11:45Z

I like the way Sonarr breaks them up by application/service, it keeps them individually more readable and easier to maintain.

I think even their first pattern is matching query arguments a bit better than SAB, e.g. ?apikey=... or &apikey=... - as far as I can see SAB is just looking for apikey=
By requiring it starts with ? or & I think that would prevent the overmatching.

The URL shared on the forum link I think is a newznab indexer which seem to default to using 'r' for the key so we should search for ?r=... and &r=... to mask.

I do think alone just matching any a-z characters of length 25 is too eager.

edit: I guess I'm suggesting a separate regular expression to LOG_INI_HIDE_RE which is more specifically for the query part of URLs

mnightingale · 2023-03-15T16:30:04Z

Example of Sonarr' first expression converted to Python, with r= added and an extra one in the test url to check it finds multiple.

LOG_URL_RE = re.compile(
    r"""(?<=\?|&)(apikey|token|passkey|auth|authkey|user|uid|api|r|[a-z_]*apikey|account|passwd)=(?P<secret>[^&=""]+?)(?=[ ""&=]|$)""",
    re.I,
)

test = LOG_URL_RE.sub('\\1=<APIKEY>', 'https://api.nzbgeek.info/rss?t=-2&limit=200&dl=1&del=1&r=supersecretkey&apikey=anotherkey')

print(test)

https://api.nzbgeek.info/rss?t=-2&limit=200&dl=1&del=1&r=<APIKEY>&apikey=<APIKEY>

Safihre · 2023-03-15T17:04:13Z

There's no need to over complicate things like this. The current regex satisfies the needs very well.

thezoggy force-pushed the log-cleanup branch from 924d7ae to 0d92c5f Compare March 10, 2023 17:59

Safihre reviewed Mar 11, 2023

View reviewed changes

sabnzbd/api.py Outdated Show resolved Hide resolved

change regex for hash from md5 to more generic to catch more hashes (…

fb64c7f

…ex: apikey in rss feed), and hide email_from as well

thezoggy force-pushed the log-cleanup branch from 0d92c5f to fb64c7f Compare March 15, 2023 15:06

thezoggy changed the title ~~remove apikey from rss feed uri, and hide email_from as well~~ change regex for hash from md5 to more generic to catch more hashes, and hide email_from as well Mar 15, 2023

Safihre enabled auto-merge (squash) March 15, 2023 17:02

Safihre disabled auto-merge March 15, 2023 17:03

Safihre merged commit 895ac56 into sabnzbd:develop Mar 15, 2023

thezoggy deleted the log-cleanup branch April 23, 2023 21:22

github-actions bot locked as resolved and limited conversation to collaborators Sep 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

change regex for hash from md5 to more generic to catch more hashes, and hide email_from as well #2499

change regex for hash from md5 to more generic to catch more hashes, and hide email_from as well #2499

thezoggy commented Mar 10, 2023 •

edited

mnightingale commented Mar 15, 2023

thezoggy commented Mar 15, 2023

mnightingale commented Mar 15, 2023 •

edited

mnightingale commented Mar 15, 2023 •

edited

Safihre commented Mar 15, 2023

change regex for hash from md5 to more generic to catch more hashes, and hide email_from as well #2499

change regex for hash from md5 to more generic to catch more hashes, and hide email_from as well #2499

Conversation

thezoggy commented Mar 10, 2023 • edited

mnightingale commented Mar 15, 2023

thezoggy commented Mar 15, 2023

mnightingale commented Mar 15, 2023 • edited

mnightingale commented Mar 15, 2023 • edited

Safihre commented Mar 15, 2023

thezoggy commented Mar 10, 2023 •

edited

mnightingale commented Mar 15, 2023 •

edited

mnightingale commented Mar 15, 2023 •

edited