New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow Unicode output from NSE scripts #1415

Open
bkkgbkjb opened this Issue Dec 14, 2018 · 6 comments

Comments

Projects
None yet
3 participants
@bkkgbkjb
Copy link

bkkgbkjb commented Dec 14, 2018

Recently, I'm using nmap -n --script=http-title -p80 www.baidu.com to scan a Chinese website, wishing http-title can return the title of my target

It seems, however, nmap incorrectly return the \xHH hex format(i.e. \xE7\x99\xBE) of relevant title, which should be "百度一下,你就知道"

After adding the command print(output_str) into original script, I found output directly to terminal works pretty fine.
So it may be the upper level scripts which deal with nse return value have some inconsistency with UTF-8 chars?

@dmiller-nmap

This comment has been minimized.

Copy link

dmiller-nmap commented Dec 14, 2018

Nmap tries to sanitize text output to prevent terminal control characters, homoglyph attacks, or decoding errors ("mojibake") in the terminal. The script receives raw bytes from the network; in order to display these correctly, it would have to detect the correct encoding, decode to Unicode code points, then re-encode to the appropriate encoding for the terminal.

This is something we'd like to do better, but it will probably require including a portable (cross-platform) Unicode library.

@bkkgbkjb

This comment has been minimized.

Copy link

bkkgbkjb commented Dec 15, 2018

Thanks for reply~

Sorry to say I didn't catch your meaning quite much

In my test of http-title.nse, it can and have fetched correct raw bytes encoded in UTF-8 of the website title via http.get method in http library (Add print(output_str) will successfully print the title to terminal).

But after http-title.nse returning, nmap converts it to the hex format incorrectly.

I personally don't think this has any business with "terminal control characters, homoglyph attacks, or decoding errors" since printing results to terminal is such a simple and easy work and we'd better not make it too complex(like converting to hex format if it's UTF-8?).

BTW, does your reply suggest that currently it's not possible to solve this problem?

I don't know much about lua and I understand programming in C/C++ is sometimes awkward

But in 2018, I expect any program to have a good consistency with UTF-8 encoded string

Thanks

@p-l-

This comment has been minimized.

Copy link

p-l- commented Dec 15, 2018

I personally don't think this has any business with "terminal control characters, homoglyph attacks, or decoding errors" since printing results to terminal is such a simple and easy work and we'd better not make it too complex(like converting to hex format if it's UTF-8?).

When one of the most active Nmap devs tells you your issue is hard to solve, and takes the time to explain you why, in details, you should probably try to understand what he says before giving your "personal" opinion on the matter. Especially when you have no idea what you are talking about.

I don't know much about lua and I understand programming in C/C++ is sometimes awkward

OK...

But in 2018, I expect any program to have a good consistency with UTF-8 encoded string

In 2018, I expect open-source software users to have a good behavior toward open-source developers.

@bkkgbkjb

This comment has been minimized.

Copy link

bkkgbkjb commented Dec 15, 2018

Ok Ok

Anyway, I also understand coming to publish complaints without knowing much about nmap is annoying

That's why I would close this issue since it seems not useful at all for the develop

But with a version of 7.70 and reputation of the most renowned software of port scanning, not being able to print UTF-8 correctly is perhaps more annoying for the end users

Hopes you can find a way to solve this problem and I will probably go to find some personal workarounds like re-parsing the nmap output and convert them back to UTF-8

@bkkgbkjb bkkgbkjb closed this Dec 15, 2018

@dmiller-nmap

This comment has been minimized.

Copy link

dmiller-nmap commented Dec 17, 2018

@bkkgbkjb I really do appreciate the ideas, though. I tend to be too conservative: if it's not broken, don't fix it. But this is something that could be really useful, and I know @rewanth1997 has been looking into other Unicode-related interfaces in Nmap.

You should check the XML output from the script (-oX option) to see if it contains the raw bytes (XML-escaped, of course) you are looking for. I don't remember at the moment where the escaping is done, so there's a chance it's only done for normal/screen output. The issue with just outputting any bytes the remote system delivers is that they could be in any encoding: GB2312, UTF-16, etc. That would break output on a UTF-8 console.

@p-l- Thanks for the defense, but we can handle some criticism. It may not have been worded in the most diplomatic manner, but there's some valid ideas here that are worth considering.

@dmiller-nmap dmiller-nmap reopened this Dec 17, 2018

@dmiller-nmap dmiller-nmap changed the title [Bug?] return UTF-8 chars in nse? Allow Unicode output from NSE scripts Dec 17, 2018

@bkkgbkjb

This comment has been minimized.

Copy link

bkkgbkjb commented Dec 18, 2018

Thanks for your kind reply

I apologize for any offensive in my previous reply.
As a non-English speaker, it's a little hard for me to manager my tone
Besides, I personally think converting UTF-8 chars to hex encoded is not a practical way as it introduces more troubles.
So in my second reply, I probably was too straight and hurry

Anyway, I did a quick test nmap -n www.baidu.com -p 443 --script=http-title --script-args="http.useragent='Mozilla/5.0 (X11; Linux x86_64; rv:63.0) Gecko/20100101 Firefox/63.0'" -oA test (Please do bear the useragent args otherwise that website will try to redirect you to http and http.title will fail to grab).

test.gnmap doesn't show script result. test.xml show in hex and html escape. test.nmap show only in hex escape

I agree there should be escape in test.xml.
But it would be great if you can remove the hex escape in both xml and nmap

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment