Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode characters in symbols are extracted incorrectly #644

Open
10110111 opened this Issue Feb 27, 2018 · 2 comments

Comments

Projects
None yet
2 participants
@10110111
Copy link
Contributor

10110111 commented Feb 27, 2018

See the following example C++ program:

double \u00fc() { return 843; }
int main()
{
    \u00fc();
}

We should see ü function being called (and do in GDB). EDB instead says something like ü in its symbol map, and Ã�¼ in the Disassembly and Analysis views.

@eteran

This comment has been minimized.

Copy link
Owner

eteran commented Feb 27, 2018

Interesting... Well this is an encoding issue. Do we assume UTF-8? UTF-16? It looks like EDB is perhaps assuming Latin1 encoding in some places.

I don't know if there is a "right answer" here because there is likely nothing to indicate what the appropriate encoding is. What are your thoughts?

@10110111

This comment has been minimized.

Copy link
Contributor Author

10110111 commented Feb 27, 2018

I suppose we have the following options:

  • Follow system locale (LC_CTYPE I suppose)
  • Assume UTF-8 on UNIX-like platforms (since it's the defacto standard there), UTF-16 on Windows (when we finally support it)

Funnily enough, QtCreator (at least 4.0.3) shows ü in its disassembly view, thus assuming Latin1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.