SNR-1102: Improve HTML link regex#84
Merged
tsellers-r7 merged 2 commits intorapid7:masterfrom Jun 18, 2020
Merged
Conversation
tsellers-r7
commented
Jun 18, 2020
| to_s. | ||
| encode('UTF-8', invalid: :replace, undef: :replace, replace: ''). | ||
| scan(/<([^>]+)>/m).each do |e| | ||
| scan(/<([^<>]{1,4096})>/m).each do |e| |
Contributor
Author
There was a problem hiding this comment.
Technically {1,049} isn't needed to solve the immediate problem. I've included it here to put an upper limit on how long the regex engine will spend on a particular string in the event that is run across data constructed in a particular way. 4096 should be enough for us to extract links in real world situations.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR updates the regex in the HTML link extraction code so as to better handle data that consists of large numbers of repeated
<<<<. The change updates to the regex so as to immediately stop if it detects<as opposed to continuing on looking for>It also bumps the version number in preparation of a release.
This was tested using
rake testsunder Ruby 2.4.5 and 2.6.3.Dap::Filter::FilterDecodeGquicVersionsResult .decode testing gquic valid input base64 encoded output from the real world returns an hash w/ versions as list of versions testing gquic valid input artifical example returns an hash w/ versions as list of versions testing gquic valid versions with invalid versions returns an hash w/ versions as list of versions testing valid string but not gquic versions returns nil testing valid string with Q in it but not gquic versions returns nil testing gquic empty string input returns nil testing gquic nil input returns nil Dap::Filter::FilterDecodeHTTPReply .decode decoding non-HTTP response returns an empty hash decoding uncompressed response correctly sets status code correctly sets status message correctly sets body correctly extracts http_raw_headers extracts Date http header extracts Last-Modified http header decoding binary response correctly sets http_raw_body base64 decoding gzip compressed response correctly decompresses body decoding valid chunked responses correctly dechunks body finds normal headers finds trailing headers decoding bogus chunked responses Skipping impossibly large 255-byte #2 chunk, at offset 14/35 reads the partial body Skipping impossibly large 255-byte #2 chunk, at offset 14/35 finds normal headers decoding truncated, chunked responses Skipping impossibly large 6-byte #3 chunk, at offset 35/35 reads the partial body Skipping impossibly large 6-byte #3 chunk, at offset 35/35 finds normal headers decoding responses that are missing the "reason phrase", an RFC anomaly decodes anyway Dap::Filter::FilterHTMLLinks .process lowercase extracted the correct links uppercase extracted the correct links scattercase extracted the correct links repeated less than symbol extracted the correct links Dap::Filter::FilterDecodeLdapSearchResult .decode testing full ldap response message returns Hash as expected returns expected value testing invalid ldap response message returns error message as expected Dap::Filter::FilterCopy .process copy one json field to another copies and leaves the original field Dap::Filter::FilterFlatten .process flatten nested json has new flattened nested document keys ignore unnested keys is the same as the original document Dap::Filter::FilterExpand .process expand unnested json has new expanded keys ignore all but specified unnested json has new expanded keys ignore nested json is the same as the original document Dap::Filter::FilterRenameSubkeyMatch .process with subkeys renames keys as expected without subkeys produces unchanged output without errors Dap::Filter::FilterMatchRemove .process with similar keys removes the expected keys Dap::Filter::FilterMatchSelect .process with similar keys selects the expected keys Dap::Filter::FilterSelect .process with similar keys selects the expected keys Dap::Filter::FilterMatchSelectKey .process with similar keys selects the expected keys Dap::Filter::FilterMatchSelectValue .process with similar keys selects the expected keys Dap::Filter::FilterTransform .process invalid transform fails reverse ASCII is reversed UTF-8 is reversed int default valid int is the correct int invalid int is the correct int int different base is the correct int float valid float is the correct float invalid float is the correct float json valid json is the correct JSON invalid json raises on invalid JSON stripping lstrip lstripped rstrip rstripped strip stripped Dap::Filter::FilterFieldReplace .process replaced correctly Dap::Filter::FilterFieldReplaceAll .process replaced correctly Dap::Filter::FilterFieldSplitPeriod .process splitting on period boundary splits correctly Dap::Filter::FilterFieldSplitLine .process splitting on newline boundary splits correctly Dap::Filter::FilterDecodeDNSVersionReply .decode parsing empty string returns an empty hash parsing a partial response returns an empty hash parsing TCP DNS response returns the correct version parsing UDP DNS response returns the correct version Dap::Input::InputJSON .read_record decoding input json parses values starting with a colon (:) as a string Dap::Proto::IPMI::Channel_Auth_Reply .valid? testing with valid rmcp version and message length returns true as expected testing with invalid data returns false as expected Dap::Proto::LDAP .decode_elem_length testing lengths shorter than 128 bits returns a Fixnum returns value correctly testing lengths greater than 128 bits returns a Fixnum returns value correctly testing with 3 byte length returns a Fixnum returns value correctly testing invalid length returns nil as expected .split_messages testing full message returns Array as expected returns SearchResultEntry value as expected returns SearchResultDone value as expected testing invalid message returns Array as expected testing short message returns Array as expected testing message length greater than total data length returns Array as expected returns empty Array as expected testing empty ASN.1 Sequence returns Array as expected returns empty Array as expected .parse_ldapresult testing valid data returns Hash as expected returns results as expected testing invalid data returns Hash as expected returns empty Hash as expected .parse_messages testing SearchResultEntry returns Array as expected returns SearchResultEntry value as expected testing SearchResultDone returns Array as expected returns SearchResultDone value as expected testing SearchResultDone - edge case #1 returns Array as expected returns operationsError as expected testing UnhandledTag returns Array as expected returns UnhandledTag value as expected testing empty ASN.1 Sequence returns Array as expected returns error value as expected Dap::Utils::Misc .flatten_hash with mixed nested data flattens properly Finished in 0.02675 seconds (files took 0.3554 seconds to load) 99 examples, 0 failures