New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
change regex for hash from md5 to more generic to catch more hashes, and hide email_from as well #2499
Conversation
…ex: apikey in rss feed), and hide email_from as well
Have alternatives to regular expression ever been considered for cleaning logs? they are not particularly easy to decipher or identify excess matching and there are bound to eventually be cases not covered. For instance a regular expression to extract http/https URLs, parse them with urllib.parse or similar then either remove the whole path or query parts or attempt to strip sensitive parts as now? |
not sure that would be good for performance to try and do that. but i wouldnt be opposed to more explicit log cleanse like sonarr does: just for our purposes, the log cleanse actually does two parts.. the log and the config. originally i had regex to do just the part but it was forgone to just leverage existing 'hash' to also take care of that. which that hash is what does the heavy lifting and takes care of some apkeys some passwords and then of course all the 'hashes' in filenames and such. since it isnt the full length of the hash there is some bits that remains so can still somewhat follow along with the reduced version of the hash. so now that ive said all that, do we want to keep the current hash match to md5 like we had.. and i just nuked out the |
I like the way Sonarr breaks them up by application/service, it keeps them individually more readable and easier to maintain. I think even their first pattern is matching query arguments a bit better than SAB, e.g. ?apikey=... or &apikey=... - as far as I can see SAB is just looking for apikey= The URL shared on the forum link I think is a newznab indexer which seem to default to using 'r' for the key so we should search for I do think alone just matching any a-z characters of length 25 is too eager. edit: I guess I'm suggesting a separate regular expression to LOG_INI_HIDE_RE which is more specifically for the query part of URLs |
Example of Sonarr' first expression converted to Python, with r= added and an extra one in the test url to check it finds multiple. LOG_URL_RE = re.compile(
r"""(?<=\?|&)(apikey|token|passkey|auth|authkey|user|uid|api|r|[a-z_]*apikey|account|passwd)=(?P<secret>[^&=""]+?)(?=[ ""&=]|$)""",
re.I,
)
test = LOG_URL_RE.sub('\\1=<APIKEY>', 'https://api.nzbgeek.info/rss?t=-2&limit=200&dl=1&del=1&r=supersecretkey&apikey=anotherkey')
print(test) https://api.nzbgeek.info/rss?t=-2&limit=200&dl=1&del=1&r=<APIKEY>&apikey=<APIKEY> |
There's no need to over complicate things like this. The current regex satisfies the needs very well. |
in response to: https://forums.sabnzbd.org/viewtopic.php?p=129264#p129264