Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change regex for hash from md5 to more generic to catch more hashes, and hide email_from as well #2499

Merged
merged 1 commit into from Mar 15, 2023

Conversation

thezoggy
Copy link
Contributor

@thezoggy thezoggy commented Mar 10, 2023

sabnzbd/api.py Outdated Show resolved Hide resolved
…ex: apikey in rss feed), and hide email_from as well
@thezoggy thezoggy changed the title remove apikey from rss feed uri, and hide email_from as well change regex for hash from md5 to more generic to catch more hashes, and hide email_from as well Mar 15, 2023
@mnightingale
Copy link
Contributor

Have alternatives to regular expression ever been considered for cleaning logs? they are not particularly easy to decipher or identify excess matching and there are bound to eventually be cases not covered.

For instance a regular expression to extract http/https URLs, parse them with urllib.parse or similar then either remove the whole path or query parts or attempt to strip sensitive parts as now?

@thezoggy
Copy link
Contributor Author

Have alternatives to regular expression ever been considered for cleaning logs? they are not particularly easy to decipher or identify excess matching and there are bound to eventually be cases not covered.

For instance a regular expression to extract http/https URLs, parse them with urllib.parse or similar then either remove the whole path or query parts or attempt to strip sensitive parts as now?

not sure that would be good for performance to try and do that. but i wouldnt be opposed to more explicit log cleanse like sonarr does:
https://github.com/Sonarr/Sonarr/blob/develop/src/NzbDrone.Common/Instrumentation/CleanseLogMessage.cs#L13

just for our purposes, the log cleanse actually does two parts.. the log and the config.
the config bits we easily just say nuke out the values for x keys.
but like in the case of the rss uri, I didnt want to nuke out the whole value just a piece of it (although prob could just nuke out the key and force user to share explicitly that if needed for triage.. as the url itself may also not be wanted to be shared as some sites are anti them being mentioned)

originally i had regex to do just the part but it was forgone to just leverage existing 'hash' to also take care of that. which that hash is what does the heavy lifting and takes care of some apkeys some passwords and then of course all the 'hashes' in filenames and such. since it isnt the full length of the hash there is some bits that remains so can still somewhat follow along with the reduced version of the hash.

so now that ive said all that, do we want to keep the current hash match to md5 like we had.. and i just nuked out the uri field rather than worrying about just a segment of it?

@mnightingale
Copy link
Contributor

mnightingale commented Mar 15, 2023

I like the way Sonarr breaks them up by application/service, it keeps them individually more readable and easier to maintain.

I think even their first pattern is matching query arguments a bit better than SAB, e.g. ?apikey=... or &apikey=... - as far as I can see SAB is just looking for apikey=
By requiring it starts with ? or & I think that would prevent the overmatching.

The URL shared on the forum link I think is a newznab indexer which seem to default to using 'r' for the key so we should search for ?r=... and &r=... to mask.

I do think alone just matching any a-z characters of length 25 is too eager.

edit: I guess I'm suggesting a separate regular expression to LOG_INI_HIDE_RE which is more specifically for the query part of URLs

@mnightingale
Copy link
Contributor

mnightingale commented Mar 15, 2023

Example of Sonarr' first expression converted to Python, with r= added and an extra one in the test url to check it finds multiple.

LOG_URL_RE = re.compile(
    r"""(?<=\?|&)(apikey|token|passkey|auth|authkey|user|uid|api|r|[a-z_]*apikey|account|passwd)=(?P<secret>[^&=""]+?)(?=[ ""&=]|$)""",
    re.I,
)

test = LOG_URL_RE.sub('\\1=<APIKEY>', 'https://api.nzbgeek.info/rss?t=-2&limit=200&dl=1&del=1&r=supersecretkey&apikey=anotherkey')

print(test)
https://api.nzbgeek.info/rss?t=-2&limit=200&dl=1&del=1&r=<APIKEY>&apikey=<APIKEY>

@Safihre Safihre enabled auto-merge (squash) March 15, 2023 17:02
@Safihre Safihre disabled auto-merge March 15, 2023 17:03
@Safihre Safihre merged commit 895ac56 into sabnzbd:develop Mar 15, 2023
@Safihre
Copy link
Member

Safihre commented Mar 15, 2023

There's no need to over complicate things like this. The current regex satisfies the needs very well.

@thezoggy thezoggy deleted the log-cleanup branch April 23, 2023 21:22
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 19, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants