Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grpc.Tools: Accented characters in nuget path breaks c# grpc tool compiler #17995

Closed
ludydoo opened this issue Feb 9, 2019 · 39 comments
Closed
Assignees
Labels
disposition/help wanted Maintainers do not have enough resources to allocate to this at the moment. Help is appreciated! kind/bug lang/C# priority/P2

Comments

@ludydoo
Copy link

ludydoo commented Feb 9, 2019

@kkm000

1.18.0. C#

Windows 10 Pro x64

Reproduce :

  1. Create a .proto file in a project (Follow "Greeter" example)
  2. Add a nuget.config file and specify the globalPackagesFolder to a path containaing either a space or an accented character (éèàé...)
  3. Restore packages
  4. Build.

What did you expect to see?

Successful build

What did you see instead?

--grpc_out could not find the path specified

"C:\Users\Dré Lot\.nuget\packages\grpc.tools\1.18.0\tools\windows_x64\protoc.exe" --proto_path="C:\Repositories\MyRepo" --grpc_out="C:\Repositories\MyRepo\MyRepo.Server" --plugin=protoc-gen-grpc="C:\Users\Dré Lot\.nuget\packages\grpc.tools\1.18.0\tools\windows_x64\grpc_csharp_plugin.exe" "C:\Repositories\MyRepo\MyRepo\book.proto

This is what the Task executes. If I run this myself in a cmd, it works. For some reason, the task itself fails with --grpc-out The path specified could not be found

@kkm000
Copy link
Contributor

kkm000 commented Feb 11, 2019

Thanks, the repro seems clear. In the command line message that you copy, there is no final quote at the very end of the command. Is this what you are actually seeing, or did it just not make it when copying?

I'll have a look at this, indeed, but, in all fairness, when working with tools originally designed for Linux and ported to Windows later, is it really important to push them to the limit and use paths with spaces and Unicode characters? I am just trying to estimate how to prioritize this issue.

As a workaround, a symlink (or NTFS junction) e. g. C:\Users\dl => C:\Users\Dré Lot would likely take care of this issue. FWIW, personally, I always establish junctions C:\_P => C:\Program Files and C:\_PX => C:\Program Files (x86) first thing after installing the OS... just in case, to stay on the safe side. Saved me a lot of headache with a couple of Python distros over the years. The command is mklink /j, and requires elevation. NTFS in Windows 10 also supports "normal" first class un*xoid symlinks (mklink /d), which do not, if developer mode is enabled (but you'll still likely need to elevate to create a symlink in Users, it's ACL is quite restrictive). Will this workaround help you in the meantime?

@jtattermusch
Copy link
Contributor

@kkm000 are you planning to come up with a fix?

@kkm000
Copy link
Contributor

kkm000 commented Feb 23, 2019

@jtattermusch, is this worth it? What is your feeling?

@jtattermusch
Copy link
Contributor

I'd say not supporting spaces in file names is bad. About not supporting accented characters I feel less strongly, but also something that should really work. How much work is it to fix this?

@kkm000
Copy link
Contributor

kkm000 commented Feb 26, 2019

@jtattermusch, ok, let me have a look. I agree, paths should be quoted everywhere, so spaces must work. The non-ASCII, if an issue, then it's likely in the tools, as .NET is all wchar_t throughout.

I have quite a batch of issues already, should be able to get to them this week.

@jtattermusch jtattermusch changed the title Accented characters or spaces in nuget path breaks c# grpc tool compiler Grpc.Tools: Accented characters or spaces in nuget path breaks c# grpc tool compiler Mar 15, 2019
@jtattermusch
Copy link
Contributor

I think the rootcause for not handling spaces in filenames correctly is the same as in #17661 (the CsharpGeneratorService doesn't mimic protoc output file name mangling accurately).

@kkm000
Copy link
Contributor

kkm000 commented Mar 22, 2019

Sorry @ludydoo, it looks like we cannot fix that. This has to do with the way protoc handles non-ASCII characters.

First, I reproduce your issue, right in the grpc tree (some output trimmed for brevity; the environment variable NUGET_PACKAGES overrides the package dump location):

c:\projects\grpc\examples\csharp\Helloworld\Greeter>chcp 65001
Active code page: 65001
c:\projects\grpc\examples\csharp\Helloworld\Greeter>set NUGET_PACKAGES=C:\temp\Dré Lot\.nuget
c:\projects\grpc\examples\csharp\Helloworld\Greeter>dotnet build -bl
C:\_P\dotnet\sdk\2.1.505\MSBuild.dll -consoleloggerparameters:Summary -distributedlogger:Microsoft.DotNet.Tools.MSBuild.MSBuildLogger,C:\_P\dotnet\sdk\2.1.505\dotnet.dll*Microsoft.DotNet.Tools.MSBuild.MSBuildForwardingLogger,C:\_P\dotnet\sdk\2.1.505\dotnet.dll -maxcpucount -restore -target:Build -verbosity:m /bl .\Greeter.csproj
  Restoring packages for c:\projects\grpc\examples\csharp\Helloworld\Greeter\Greeter.csproj...
  Installing System.Threading.Thread 4.0.0.
  Installing System.Threading.ThreadPool 4.0.10.
  Installing System.Runtime.Loader 4.0.0.
  Installing Grpc.Core 1.18.0.
  Installing Grpc 1.18.0.
  Installing Grpc.Tools 1.18.0.
  Installing Google.Protobuf 3.6.1.
  Generating MSBuild file c:\projects\grpc\examples\csharp\Helloworld\Greeter\obj\Greeter.csproj.nuget.g.props.
  Generating MSBuild file c:\projects\grpc\examples\csharp\Helloworld\Greeter\obj\Greeter.csproj.nuget.g.targets.
  Restore completed in 10.2 sec for c:\projects\grpc\examples\csharp\Helloworld\Greeter\Greeter.csproj.
  --grpc_out: protoc-gen-grpc: The system cannot find the path specified.

C:\temp\Dré Lot\.nuget\grpc.tools\1.18.0\build\_protobuf\Google.Protobuf.Tools.targets(263,5): error MSB6006: "C:\temp\Dré Lot\.nuget\grpc.tools\1.18.0\tools\windows_x64\protoc.exe" exited with code 1. [c:\projects\grpc\examples\csharp\Helloworld\Greeter\Greeter.csproj]

Build FAILED.

The command as reported in the log fails in a different manner, but this has to do with the way arguments are passed to protoc (more on that later):

c:\projects\grpc\examples\csharp\Helloworld\Greeter>"C:\temp\Dré Lot\.nuget\grpc.tools\1.18.0\tools\windows_x64\protoc.exe" --csharp_out=obj\Debug\netstandard1.5 --plugin="protoc-gen-grpc=C:\temp\Dré Lot\.nuget\grpc.tools\1.18.0\tools\windows_x64\grpc_csharp_plugin.exe" --grpc_out=obj\Debug\netstandard1.5 --proto_path="C:\temp\Dré Lot\.nuget\grpc.tools\1.18.0\build\native\include" --proto_path=..\..\..\protos --dependency_out=obj\Debug\netstandard1.5\50c9e7acc97344d0_helloworld.protodep ../../../protos/helloworld.proto
C:\temp\Dr Lot\.nuget\grpc.tools\1.18.0\build\native\include: warning: directory does not exist.

Note the é is missing from the path in the error message.

Really the arguments are passed using a response file, since protoc accepts spaces in arguments, one per line, without quoting. Let's construct this file, protoc.rsp, save as UTF-8:

--csharp_out=obj\Debug\netstandard1.5
--plugin=protoc-gen-grpc=C:\temp\Dré Lot\.nuget\grpc.tools\1.18.0\tools\windows_x64\grpc_csharp_plugin.exe
--grpc_out=obj\Debug\netstandard1.5
--proto_path=C:\temp\Dré Lot\.nuget\grpc.tools\1.18.0\build\native\include
--proto_path=..\..\..\protos
--dependency_out=obj\Debug\netstandard1.5\50c9e7acc97344d0_helloworld.protodep
../../../protos/helloworld.proto

and give it a go with protoc

c:\projects\grpc\examples\csharp\Helloworld\Greeter>"C:\temp\Dré Lot\.nuget\grpc.tools\1.18.0\tools\windows_x64\protoc.exe" @protoc.rsp
--grpc_out: protoc-gen-grpc: The system cannot find the path specified.

Now the build error reproduces exactly. protoc complains now that it cannot find the file specified in --plugin=protoc-gen-grpc=, apparently for the very same reason: it drops the é from file name.

To confirm that the issue is not with the space in the path, do

c:\projects\grpc\examples\csharp\Helloworld\Greeter>rm -rf obj bin
c:\projects\grpc\examples\csharp\Helloworld\Greeter>set NUGET_PACKAGES=C:\temp\hello world\.nuget
c:\projects\grpc\examples\csharp\Helloworld\Greeter>dotnet build
  Restoring packages for c:\projects\grpc\examples\csharp\Helloworld\Greeter\Greeter.csproj...
  Installing System.Threading.ThreadPool 4.0.10.
  Installing System.Threading.Thread 4.0.0.
  Installing System.Runtime.Loader 4.0.0.
  Installing Grpc.Core 1.18.0.
  Installing Grpc.Tools 1.18.0.
  Installing Grpc 1.18.0.
  Installing Google.Protobuf 3.6.1.
  Generating MSBuild file c:\projects\grpc\examples\csharp\Helloworld\Greeter\obj\Greeter.csproj.nuget.g.props.
  Generating MSBuild file c:\projects\grpc\examples\csharp\Helloworld\Greeter\obj\Greeter.csproj.nuget.g.targets.
  Restore completed in 1.72 sec for c:\projects\grpc\examples\csharp\Helloworld\Greeter\Greeter.csproj.
  Greeter -> c:\projects\grpc\examples\csharp\Helloworld\Greeter\bin\Debug\netstandard1.5\Greeter.dll

Build succeeded.

protoc accepts spaces either on command line, if quoted as logged (I'm using the same executable path; Windows itself is ok with this):

c:\projects\grpc\examples\csharp\Helloworld\Greeter>"C:\temp\Dré Lot\.nuget\grpc.tools\1.18.0\tools\windows_x64\protoc.exe" --csharp_out=obj\Debug\netstandard1.5 --plugin="protoc-gen-grpc=C:\temp\hello world\.nuget\grpc.tools\1.18.0\tools\windows_x64\grpc_csharp_plugin.exe" --grpc_out=obj\Debug\netstandard1.5 --proto_path="C:\temp\hello world\.nuget\grpc.tools\1.18.0\build\native\include" --proto_path=..\..\..\protos --dependency_out=obj\Debug\netstandard1.5\50c9e7acc97344d0_helloworld.protodep ../../../protos/helloworld.proto
c:\projects\grpc\examples\csharp\Helloworld\Greeter>

Or in the response file

--csharp_out=obj\Debug\netstandard1.5
--plugin=protoc-gen-grpc=C:\temp\hello world\.nuget\grpc.tools\1.18.0\tools\windows_x64\grpc_csharp_plugin.exe
--grpc_out=obj\Debug\netstandard1.5
--proto_path=C:\temp\hello world\.nuget\grpc.tools\1.18.0\build\native\include
--proto_path=..\..\..\protos
--dependency_out=obj\Debug\netstandard1.5\50c9e7acc97344d0_helloworld.protodep
../../../protos/helloworld.proto

again saved as protoc.rsp:

c:\projects\grpc\examples\csharp\Helloworld\Greeter>"C:\temp\Dré Lot\.nuget\grpc.tools\1.18.0\tools\windows_x64\protoc.exe" @protoc.rsp
c:\projects\grpc\examples\csharp\Helloworld\Greeter>

TL;DR: This is a issue with non-ASCII character in paths, not spaces, and the bug is in protoc, not in gRPC tooling. I believe that fixing it to support Windows filenames properly would be difficult. @jtattermusch, what is your take on this, what do you think?

@jtattermusch
Copy link
Contributor

Looks like the problem with spaces might be at least partially fixed by #18470? (if the space is in the name of the '.proto' file itself, not the path)

@kkm000
Copy link
Contributor

kkm000 commented Mar 24, 2019

@jtattermusch, the problem is not with spaces, but rather with the way protoc's handles non-ASCII characters. This reproduces with protoc from command line. So no, this won't help, that's a different issue.

I have no idea how eager is the protoc team about filing off the corners of the Windows port. There must be an additional, OS-specific layer that does the conversion between OS an ind internal path representation. Besides path syntax, even encoding differs: Windows API use wchar_t representing codepoints (not UTF-16), while protoc uses UTF-8 MBC internally.

@jtattermusch jtattermusch changed the title Grpc.Tools: Accented characters or spaces in nuget path breaks c# grpc tool compiler Grpc.Tools: Accented characters in nuget path breaks c# grpc tool compiler Mar 25, 2019
@stale
Copy link

stale bot commented Sep 21, 2019

This issue/PR has been automatically marked as stale because it has not had any update (including commits, comments, labels, milestones, etc) for 180 days. It will be closed automatically if no further update occurs in 1 day. Thank you for your contributions!

@stale stale bot closed this as completed Sep 23, 2019
@jtattermusch jtattermusch reopened this Sep 23, 2019
@stale stale bot removed the disposition/stale label Sep 23, 2019
@jtattermusch
Copy link
Contributor

Still an issue.

@joaoportela
Copy link

Still an issue for me too.

  1. Where should I report this issue to have it fixed? (since you guys mentioned that it might need to be fixed by a different team)
  2. What is the recommended workaround?

@jtattermusch
Copy link
Contributor

@kkm000 can you provide a simple repro of the protoc accented characters handling problem, so that we can followup with the protobuft team? Alternatively, feel free to file a protobuf issue directly.

@angel6708
Copy link

just shit!!!!!!! bugs fly everywhere!!

@Genteure
Copy link

Genteure commented Apr 1, 2020

As a workaround:
You can change where the global‑packages is located, link or move files to a path that doesn't contain non-english alphabet (e.g. D:\nuget_packages\packages\)

I would like this to get fixed though.

related:

@jtattermusch jtattermusch added the disposition/help wanted Maintainers do not have enough resources to allocate to this at the moment. Help is appreciated! label May 6, 2020
@jtattermusch
Copy link
Contributor

@Falco20019 would you be interested in picking this issue up, since you already have some experience with Grpc.Tools internals?

@Genteure
Copy link

Genteure commented May 6, 2020

image

It seems appveyor may not be a good choice for this kind of tests, they also have problem processing non-ascii characters. You may want to run a VM locally to test things.

@Falco20019
Copy link
Contributor

Falco20019 commented May 6, 2020

@jtattermusch The problem lies partly within protoc (non-UTF8 characters) and partly in how windows handles response file commands (which MSBuild uses). For my test, I renamed the grpc_csharp_plugin.exe to ágrpc_csharp_plugin.exe. I can execute the following command nicely on CMD directly but not through an RSP file:

test.rsp:

--plugin=protoc-gen-grpc=<path_to_protoc_in_nuget>\ágrpc_csharp_plugin.exe
--grpc_out=grpc_out
test.proto

Cmd:
"<userprofile>\.nuget\packages\grpc.tools\2.28.1\tools\windows_x64\protoc.exe" @test.rsp

As soon as a non-ASCII character is in the RSP file, it does fail. And UTF-16 characters also seem to fail in protoc on CMD additionally... So chinese usernames fail for this reason either way.

EDIT: I found a way to change how we encode the response file here:
https://github.com/grpc/grpc/blob/master/src/csharp/Grpc.Tools/ProtoCompile.cs#L375-L376

According to dotnet/msbuild#4904 this should work on MSBuild 16.5.

Sadly, protoc only accepts UTF-8 without BOM. All other encodings (including Unicode and UTF-32 LE/BE & BOM/no-BOM) fail. I don't understand why it's not accepting á as this should be covered by UTF-8. But I was also not able to get it working with UTF-8 on CMD manually.

@Falco20019
Copy link
Contributor

I thought about it some more tonight. I found the command „chcp“ to change the codepage. Powershell seems to support UTF-32, so maybe after setting it together with the encoding in our code, protoc because able to support those characters. I assume that the processor was just using system default encoding (1252) and tried to interpret the UTF-8 with it.

Will try it out later.

@Falco20019
Copy link
Contributor

@jtattermusch I think I have a clearer picture of the problem now. https://github.com/protocolbuffers/protobuf/blob/master/src/google/protobuf/compiler/command_line_interface.cc#L1326 reads the file using std::ifstream only which does not support multi-characters and BOM in files. This is also why it works on command line but not with command file.

We should use std::wifstream and make sure we also support BOM to get this fixed.

@jtattermusch
Copy link
Contributor

@Falco20019 interesting findings.
I'm not very familiar with MSBuild internals, but it seems that we are using ToolTask.GenerateResponseFileCommands (see https://docs.microsoft.com/en-us/dotnet/api/microsoft.build.utilities.tooltask.generateresponsefilecommands?view=netframework-4.8)

protected override string GenerateResponseFileCommands()
internally to generate the protoc command.

A quick documentation check also reveals that there is ToolTask.GenerateCommandLineCommands method https://docs.microsoft.com/en-us/dotnet/api/microsoft.build.utilities.tooltask.generatecommandlinecommands?view=netframework-4.8. Perhaps we could use it instead?

@Falco20019
Copy link
Contributor

@jtattermusch Should also be possible, but then we would need to do escaping (start/end of arguments) ourselves. I will give it a look.

@danielleiszen
Copy link

danielleiszen commented May 10, 2020

So if I give my full name when I install Windows and then I install VS 2019 I am going to have a global-packages folder with spaces - because I have a first and a last names delimited by spaces, and the default global packages folder is under my user folder. Then I create a grpcService project and try to build it, I will get this error. Because this is what I did.

Don't you think it would be worth at least documenting the workaround step by step (for a year old issue that has not been solved)? Or is it only me with a first name and last name who tries to use grpc?

@Falco20019
Copy link
Contributor

@danielleiszen I'm just trying out the approach using GenerateResponseFileCommands that @jtattermusch suggested.

@Falco20019
Copy link
Contributor

@danielleiszen It seems like this change will help with any UTF-8 chracters and also whitepaces, but will not fix problems for UTF-16 users. As of my current understanding of the protoc side, without changing the strings to wide-strings, it will not be possible to use these paths. I will also look into adding some notes about the workaround. Just waiting for some feedback on the PR first to make sure we want to have it in BUILD-INTEGRATIONS.md.

@joergbattermann
Copy link

Just ran into this problem as well with my username being unchangeable (and 'azuread\jörg' .. with an umlaut) as it is set via Azure AD.

@Falco20019
Copy link
Contributor

@joergbattermann #22916 should fix this for you once merged as german umlauts are fine with UTF-8. I will try and ping Jan on that. I might need to rebase the PR.

@JoaoPinheiro
Copy link

I just ran into this issue as well, either due to a space or the 'ã' character in my name from the windows user folder.

@Falco20019
Copy link
Contributor

@JoaoPinheiro Feel free to ping #22916 as it's implemented but not merged.

@krishna116
Copy link

When I run protoc, even I do not using the argument "--plugin=<key=value>",
the error exist too, please see follow picture for detial.
protoc-error

@Falco20019
Copy link
Contributor

@krishna116 The root cause is protocolbuffers/protobuf#7474 which needs to be solved on protoc.

@tonydnewell
Copy link
Contributor

Summarising the discussion for my benefit (and anyone else revisiting this):

  • non-ascii characters in the path names for the plugins to protoc cause problems - protoc can't find the executables
    • just plugin paths?
  • Grpc.Tools writes out a response file containing the arguments to pass to proto rather than passing each argument separately
    • there is a suggestion that this could be changed to pass the arguments separately but this still has issues:
      • seems only to work for some characters - won't handle all UTF-16 characters so is only a partial solution
      • a response file allows for very long arguments, e.g. long path names
    • the response file is written out as utf-8 without a byte-order-marker (BOM) - this is explicitly encoded as utf-8 in ProtoCompile.cs
  • there is a suggested workaround that involves the user creating a symlink with a ascii-friendly name
  • protoc should fix this (see #7474)

Some notes on a possible protoc fix. Added here incase anyone wants to contribute to the discussion before I experiment locally with a fix:

Looking at the protobuf code, the fix isn't changing from std::string to std::wstring and using std::wistream in CommandLineInterface::ExpandArgumentFile - in fact I think that is not the correct solution. Reading each line of the utf-8 encoded file into a std::string is probably correctly (each line is just a sequence of bytes). It is when the string is used that conversion from utf-8 to std::wstring needs to be done.

io_win32.cc seems to handle conversion from utf8 to std::wstring correctly (in as_windows_path()).

The problem seems to be when calling the plugins in subprocess.cc. Subprocess::Start() uses CreateProcessA instead of CreateProcessW and doesn't do the utf-8 to wstring conversion.

@jtattermusch
Copy link
Contributor

protocolbuffers/protobuf#10200 was merged, which should fix this problem once

  • the fix is released with the next version of protobuf
  • grpc's third_party/protobuf submodule is upgraded to the new protobuf version.

@jtattermusch
Copy link
Contributor

I'm going to close this issue now, let's update this thread once the fix is released as part of Grpc.Tools (see #17995 (comment) for details).

@kimjamia
Copy link

kimjamia commented Jan 3, 2023

I'm still experiencing this in Grpc.Tools 2.51.0. Is that expected?

@tonydnewell
Copy link
Contributor

@kimjamia A release hasn't yet happened that contains the fix.
@jtattermusch do we know when grpc's third_party/protobuf submodule will be updated?

@jtattermusch
Copy link
Contributor

#32136 upgrades our third_party/protobuf dependency to 21.12, which does contain the fix from protocolbuffers/protobuf#10200.
Which means the full fix will likely be in Grpc.Tools 2.53.x once it is released (which hopefully will happen over the next few weeks)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
disposition/help wanted Maintainers do not have enough resources to allocate to this at the moment. Help is appreciated! kind/bug lang/C# priority/P2
Projects
None yet
Development

Successfully merging a pull request may close this issue.