Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom Charset translating commands #414

Closed
condret opened this issue Dec 3, 2013 · 22 comments
Closed

Custom Charset translating commands #414

condret opened this issue Dec 3, 2013 · 22 comments
Assignees
Milestone

Comments

@condret
Copy link
Member

condret commented Dec 3, 2013

I'm thinking about to add some st-commands(psT, psTz, wsT, /sT, ./sT) for supporting unconventional code-charts like this one.
Maybe you remember this. It is nasty to overwrite the original string, if you can only use wx for this instead of w.
I think that there might be a lot of files using unconventional code-charts.

Should this be a cmd-plugin?
Where should I put code-charts? (I think it would be good if they were defined in files in a place like ~/.radare2/ )

@XVilka
Copy link
Contributor

XVilka commented Dec 3, 2013

Good idea, can be generalized for working with some ugly kinds of unicode or another uncommon encodings/string formats.

@radare
Copy link
Collaborator

radare commented Dec 4, 2013

Someone from mame proposed me many years ago to add support for configurable character sets. This required a special DSL to describe the encoding and the code necessary to translate between encodings. I think that we can enhace this idea into something more general than just pokemon-specific.

We can use a simple char -> hexstring table to specify the conversions between characters. This is ASCII/UTF8 for host and an array of bytes corresponding that char.

To search for an encoded string we can add the /e command which will do the following:

/e hello
"hello" -> encode into an array of bytes (translate each char)
perform search (case insensitive searches with this will not work

A text encoding can also specify the termination character, if it's wide and if it's prefixed by the length, like a tlv or a pascal/java string.

In order to configure that character encoding we need another command. 'te' can be used for that. (types encoding). As an extension for the cparse engine. But IMHO this should be implemented in r_util, because we need it to work with r_search and other lower level libraries (compared to r_core).

We can use sdb to store that conversion table. We need to do the following:

te -> list all available text encodings
* add/delete a new encoding
* set/unset specific char->hexstr conversion

For printing we will use the inversed functions:

pe pokemon  # uses last selected encoding

You can write a cmd plugin to make your tests or quick hacks if you need it.

@condret
Copy link
Member Author

condret commented Dec 5, 2013

sure, this should be unspefic

@XVilka
Copy link
Contributor

XVilka commented Jul 29, 2014

@condret any update on this?

@condret
Copy link
Member Author

condret commented Mar 19, 2015

yes, /e is allready used now

@gogo2464
Copy link
Contributor

gogo2464 commented Mar 22, 2020

Hello. Can you tell me more about the syntax of psT please?

Also I may write the command psT. Can you assign me please?

@gogo2464
Copy link
Contributor

gogo2464 commented Mar 24, 2020

I finished to made a new command psT to support gameboy encoding.

I made an array of structs to store the custom chart encoding of the gameboy. I imagine we want more custom text encoding. Where can I store the arrays?

It is a bit a problem I explored in this issue: #16272.

@radare
Copy link
Collaborator

radare commented Mar 25, 2020 via email

@gogo2464
Copy link
Contributor

I noticed your command pe pokemon # uses last selected encoding.

@kazarmy
Copy link
Contributor

kazarmy commented Mar 25, 2020

I think it's fine that you want to do something like the table in https://en.wikipedia.org/wiki/Code_page_437#Character_set but for your own custom encodings. I think what @radare is saying that you should use pst instead of psT where t stands for table (or maybe even te and pe?). You do have to figure out suitable syntax yourself though, for both the command(s) needed and the data files. You can see the implementations of eco and pfo for example to see how commands with data files should be implemented.

Don't think this is a task for first-timers but I don't know what you're capable of so good luck!

@gogo2464
Copy link
Contributor

@kazarmy I want to load values from a file. Do I implement my own file format and do I make a new function to parse it or do I use an existing function if it exists?

I inspected the files for pfo (radare2/libr/bin/d) and eco (radare2/libr/cons/d). I saw there are 2 different file formats: .h with c code for type definition only for pfo and cmds for eco.

I tried to make a .h header with values to get it from a function but the code has failed because the function used is made only for type definition for rabin2 and because my type is too complexe. I also can not use cmds for my task.

@kazarmy
Copy link
Contributor

kazarmy commented Mar 28, 2020

I inspected the files for pfo (radare2/libr/bin/d) and eco (radare2/libr/cons/d). I saw there are 2 different file formats: .h with c code for type definition only for pfo and cmds for eco.

If you check e.g. radare2/libr/bin/d/elf64, you'll notice that it's composed of cmds too. This is apparently the recommended way to define data files in r2 if you're doing them from scratch. This does mean that you can also choose to parse data files that are already in some standard format (say CP437.TXT) if it's convenient.


I want to load values from a file. Do I implement my own file format and do I make a new function to parse it or do I use an existing function if it exists?

If you check the code for pfo (here) and eco (starting here), you will find that they will eventually call r_core_cmd_file(). Please use this function.


I also can not use cmds for my task.

You can propose new cmds and even new api functions within reason.


I tried to make a .h header with values to get it from a function but the code has failed because the function used is made only for type definition for rabin2 and because my type is too complexe. I also can not use cmds for my task.

Header files are probably too complicated for this (with their braces and whatnot). Follow Einstein's dictum that:

"Everything should be made as simple as possible, but not simpler."

and you should be fine.


(radare2/libr/bin/d) ... (radare2/libr/cons/d)

Btw, thanks for this. Saved me some time in looking for the data files.

@gogo2464
Copy link
Contributor

Thank you very much @kazarmy ! I did not read the part on types on the manual of radare2 yet ;). I should have read that. So I did not used to know the command k yet. Now I know it and I will use it in a file opened by the command pse that I implemented. When the file will be read, I will prorammatically read the database of radare2. I have already seen some functions. Good bye and give me good luck. ;)

@kazarmy
Copy link
Contributor

kazarmy commented Mar 28, 2020

Good luck! Just be prepared to change things depending on what @radare says 😁

@radare
Copy link
Collaborator

radare commented Mar 28, 2020 via email

@gogo2464
Copy link
Contributor

gogo2464 commented Mar 29, 2020

@kazarmy Thank you very much. I am currently waiting for opinion and change request of @radare .

@gogo2464
Copy link
Contributor

The build as failed because as developped in the description of the draft pull request: I have generated files. I just wait to know if my folder encoding is on the right place. I can fix this issue. I just want to have opinion on it before.

@kazarmy
Copy link
Contributor

kazarmy commented Mar 30, 2020

I just wait to know if my folder encoding is on the right place.

See @radare's comment here ... libr/util/ doesn't have a d directory so it should be fine. If you think util is too generic, the location can be changed later if needed without much effect on the user. Right now focus on pushing the pr through.

More comments:

  • Do first, ask for approval later.
  • Do you need to change the encoding dynamically when r2 is running?
  • @radare is the ultimate authority on this. If he wants you to define the charsets in a header file, well ... figure out the performance, maintenance etc. ramifications and defend your decision.

@kazarmy
Copy link
Contributor

kazarmy commented Mar 30, 2020

And please change the issue title.

@radare radare changed the title sT-commands Custom Charset translating commands Apr 2, 2020
@radare radare removed this from the 9999 milestone Apr 2, 2020
@trufae
Copy link
Collaborator

trufae commented Jun 11, 2020

This 7yo issue is one of the best examples of why i dont want to automatically close old issues

@gogo2464
Copy link
Contributor

gogo2464 commented Jun 11, 2020

@trufae I know that people were talking on the future of radare2. I was waiting for. I was also looking for a way to remove the size 61 in r_charset_encode_str (var, core->block, len, custom_charset, 61);.

@trufae
Copy link
Collaborator

trufae commented Apr 25, 2021

I think we can close this issue, and create other ones for the specific stuff that is missing.
thanks @gogo2464 ! good work!

@trufae trufae closed this as completed Apr 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants