Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supporting non-UTF-8 paths #33

Open
Byron opened this issue Jul 2, 2020 · 8 comments
Open

Supporting non-UTF-8 paths #33

Byron opened this issue Jul 2, 2020 · 8 comments
Labels
discussion Active proposal enhancement New feature or request help wanted Extra attention is needed

Comments

@Byron
Copy link

Byron commented Jul 2, 2020

I love argh for its simplicity, compile performance and leanness, and I find myself migrating more and more projects to it.

Just now it became evident that despite supporting PathBuf in struct fields, it will always assume valid UTF-8 in the arguments it parses. This might be an assumption that holds in Fuchsia, but may make it impossible to use for general purpose linux tools that consume paths in arguments.

I wonder if there is any chance to allow parsing PathBuf fields without making assumptions about their encoding?

Thank you

PS: This question was triggered by this post on reddit/r/rust.

@Freaky
Copy link

Freaky commented Jul 2, 2020

Perhaps os_str_bytes could be of use.

@benbrittain
Copy link
Contributor

Thank you for the kind words!

Like Freaky said, you can likely use std::ffi::OsString instead of a PathBuf. Is that sufficient for your use case?

Fuchsia does indeed have everything as UTF-8, but I'm 100% willing to add more general support.

@ssokolow
Copy link

ssokolow commented Jul 2, 2020

That sounds like a bug. PathBuf is implemented as a newtype around OsString, so why does using it detour through String?

@Freaky
Copy link

Freaky commented Jul 2, 2020

argh is fundamentally &str-driven:

argh/src/lib.rs

Line 188 in e59f2c8

fn from_args(command_name: &[&str], args: &[&str]) -> Result<Self, EarlyExit>;

I mentioned os_str_bytes because it looks useful for argument parsing, though I suspect the simplified approach argh takes makes it less important.

@Byron
Copy link
Author

Byron commented Jul 2, 2020

Thanks so much, @benbrittain, for your openness towards this improvement! With that added correctness, I think this will put argh right into the sweet spot between low-enough compile times, convenience, and low/no binary overhead.
Right now I am actually implementing a tool supporting both structopt and argh CLIs, and it's clear that the declaration of argh is easier to remember and less cluttered, so that opinionatedness does hit a sweet spot for me indeed.

My thoughts are as follows:

  • work with OsStr|OsString instances internally, and use std::env::args_os() instead of std::env::args()
  • when placing an OsStr into into a field that requires UTF-8 encoding, try to convert or fail gracefully
  • when wanting to call FromArgValue or from_str_fn(…), try to convert or fail gracefully
  • introduce a new FromArgValueOS or from_os_str_fn(…) (working titles :D) for those who want to support non-UTF-8 fully

Is there anything fundamental I am missing?
And how much effort do you think it is?

I am asking because maybe we sketch out the implementation a little and attract contributors :).

@benbrittain benbrittain added discussion Active proposal enhancement New feature or request help wanted Extra attention is needed labels Jul 10, 2020
@benbrittain benbrittain changed the title [Q&A] Is supporting non-UTF-8 paths possible? Supporting non-UTF-8 paths Jul 10, 2020
@Byron
Copy link
Author

Byron commented Jul 11, 2020

Hi @benbrittain, I think if I would hear your ideas about my initial thoughts stated above, I would consider giving the implementation a shot. That way I hope to avoid going off in the wrong direction.

Thanks a lot!

@benbrittain
Copy link
Contributor

@Byron Sorry about the slow response! That sounds all sounds well thought out and I'm happy to work with you in moving argh in this direction! My only concern is with increasing binary overhead, so I want to ensure that we avoid that. I added #39 yesterday so that we actually track that

@Byron
Copy link
Author

Byron commented Jul 11, 2020

That sounds great, thank you! Regarding binary overhead, I would think that there is close to none, but then again I don't really know how Argh is working. As long as the code generator won't have to support two versions of everything, i.e. OsStr + str, I am optimistic supporting non-utf8 won't be detrimental.
Let's see how it goes :).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Active proposal enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants