Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-printable characters in extractor output are inconsistently escaped #4929

Closed
tovask opened this issue Mar 21, 2024 · 1 comment · Fixed by #4941
Closed

Non-printable characters in extractor output are inconsistently escaped #4929

tovask opened this issue Mar 21, 2024 · 1 comment · Fixed by #4941
Assignees
Labels
Type: Bug Inconsistencies or issues which will cause an issue or problem for users or implementors.

Comments

@tovask
Copy link
Contributor

tovask commented Mar 21, 2024

Starting from version 3.2.0, only the new line (alias \n, LF, 0x0A) character is excaped in the output of the extractors.
This can cause issues, for example if the extracted data is in CRLF line ending format, then only the LF (Line-Feed = \n) is escaped, leaving the CR (Carriage-Return) character messes up the output.

Nuclei version:

3.2.0 - 3.2.2

Current Behavior:

./v3.2.0_nuclei -disable-update-check -u https://httpbin.org/user-agent -t test.yaml

                     __     _
   ____  __  _______/ /__  (_)
  / __ \/ / / / ___/ / _ \/ /
 / / / / /_/ / /__/ /  __/ /
/_/ /_/\__,_/\___/_/\___/_/   v3.2.2

                projectdiscovery.io

[INF] Current nuclei version: v3.2.2 (outdated)
[INF] Current nuclei-templates version:  (latest)
[WRN] Scan results upload to cloud is disabled.
[INF] New templates added in latest release: 0
[INF] Templates loaded for current scan: 1
[INF] Targets loaded for current scan: 1
\n{\n  "user-agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36"\n}]

Previous Behavior:

./v3.1.10_nuclei -disable-update-check -u https://httpbin.org/user-agent -t test.yaml

                     __     _
   ____  __  _______/ /__  (_)
  / __ \/ / / / ___/ / _ \/ /
 / / / / /_/ / /__/ /  __/ /
/_/ /_/\__,_/\___/_/\___/_/   v3.1.10

                projectdiscovery.io

[INF] Current nuclei version: v3.1.10 (outdated)
[INF] Current nuclei-templates version:  (latest)
[WRN] Scan results upload to cloud is disabled.
[INF] New templates added in latest release: 0
[INF] Templates loaded for current scan: 1
[INF] Targets loaded for current scan: 1
[test] [http] [info] https://httpbin.org/user-agent ["HTTP/1.1 200 OK\r\nConnection: close\r\nContent-Length: 152\r\nAccess-Control-Allow-Credentials: true\r\nAccess-Control-Allow-Origin: *\r\nContent-Type: application/json\r\nDate: Thu, 21 Mar 2024 15:25:31 GMT\r\nServer: gunicorn/19.9.0\r\n\r\n{\n  \"user-agent\": \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36 Edg/110.0.1587.63\"\n}"]

Steps To Reproduce:

The template I used for testing:

id: test
info:
  name: Test
  author: Levente Kovats
  severity: info
http:
  - method: GET
    path:
      - "{{BaseURL}}"
    matchers:
      - type: dsl
        dsl:
          - true
    extractors:
      - type: dsl
        dsl:
          - response

If I redirect the output to a file, and open it with a simple text editor, it looks like this:
image
It is clearly showing that the LF character is excaped (to \n), but the CR is not, and that takes back the console carriage to the begining of the line, without starting a new line, which leads to overwriting the previous line of the data.

Anything else:

I guess it is introduced in PR #4849 from issue #4841

It's not clear to me, what was the reason behind not escaping all the non-ASCII / non-printable characters any more?

@tovask tovask added the Type: Bug Inconsistencies or issues which will cause an issue or problem for users or implementors. label Mar 21, 2024
@tarunKoyalwar tarunKoyalwar self-assigned this Mar 25, 2024
@tarunKoyalwar
Copy link
Member

@tovask , the issue was that the quote function available in golang was also escaping json itself , so it would be a pain for anyone who want to use / copy that extracted json ( see: #4849 (comment)) . but i think your point is valid as well . so implemented a workaround for this where we escape all non-ascii + whitespace ascii chars but not quotes and other json reserved chars like " , [ etc (see: #4941)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Bug Inconsistencies or issues which will cause an issue or problem for users or implementors.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants