Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

treat @ARGV as utf8 string #7

Closed
GHolk opened this issue Aug 4, 2019 · 4 comments · Fixed by #10
Closed

treat @ARGV as utf8 string #7

GHolk opened this issue Aug 4, 2019 · 4 comments · Fixed by #10

Comments

@GHolk
Copy link

GHolk commented Aug 4, 2019

i face some problem when using command line tool xpath.
when my query contain non-ascii string like xpath -e '//*[contains(., "早安")',
xpath match nothing.
using perl -CA option to make perl treat @ARGV as utf8 fix this problem,
so it sould be caused by argv encoding problem.

i am not familiar with perl, so i am not sure what is the best solution
of argv encoding problem.

@grr
Copy link

grr commented May 12, 2022

read https://perldoc.perl.org/perlunitut#I/O-flow-(the-actual-5-minute-tutorial)

@ARGV is considered input to the program, so the user has to decode it. it is not done automatically.

@shawnw
Copy link
Contributor

shawnw commented May 13, 2022

@grr True for your own programs, but for ones that are bundled as part of a module? They should be doing the decoding. But it's not as easy as just decoding UTF-8, because the input might not even be UTF-8 (cough Windows cough).

See this stackoverflow question for some ways to do it portably.

shawnw added a commit to shawnw/XML-XPath that referenced this issue May 13, 2022
* Get the encoding used for command line arguments from the environment.
Fixes issue manwar#7.

* Don't assume standard input and output are UTF-8; also get their
encoding from the environment.
shawnw added a commit to shawnw/XML-XPath that referenced this issue May 13, 2022
* Get the encoding used for command line arguments from the environment.
Fixes issue manwar#7.

* Don't assume standard input and output are UTF-8; also get their
encoding from the environment.
@grr
Copy link

grr commented May 13, 2022

@shawnw, you're right- i overlooked you were referring to the program and not the module. note that your fix will require increasing the MIN_PERL_VERSION to 5.8 as I18N::Langinfo is not dual-life (only ships with perl and not available separately):

$ corelist I18N::Langinfo

Data for 2022-04-20
I18N::Langinfo was first released with perl v5.7.3

@shawnw
Copy link
Contributor

shawnw commented May 13, 2022

I hope nobody's using a version that outdated, but I'll bump it anyways.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants