Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot connect to gRPC speaking services from VMs #7567

Closed
MrMoose opened this issue Nov 8, 2021 · 6 comments
Closed

Cannot connect to gRPC speaking services from VMs #7567

MrMoose opened this issue Nov 8, 2021 · 6 comments
Labels
priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@MrMoose
Copy link

MrMoose commented Nov 8, 2021

Does this issue affect the google-cloud-cpp project?
no

What component of google-cloud-cpp is this related to?
pubsub, logging, presumably gRPC speaking services

Describe the bug
I have a Unreal based C++ application using a binary Windows build (within https://github.com/MrMoose/CloudConnector) to connect to Google Cloud Services. Notably Pubsub, Logging and Storage. When using the application from outside the Google Cloud (various local Windows PCs) everything works fine. When using the same application on a Windows VM instance inside the cloud, only Storage works. Pubsub and Logging do not. None of the calls as it seems.

I am getting errors suggesting network connectivity timeouts like this:
Retry policy exhausted in CreateSubscription: Empty update

After lengthy analysis and debugging my hypothesis is, that all gRPC speaking services fail, while Storage (afair) uses http(s). Reason could be SSL certificates that are missing in the cloud but for some reason present here but I have no means to back this up.

We are sure the service role includes necessary permissions.
We have tried to use the gcloud command line tool on the instance to create the subscription and it works.

Operating system:
Windows Server 2019

What compiler and version are you using?
Visual Studio 2019

What version of google-cloud-cpp are you using?
1.32.1 in binary form included in https://github.com/MrMoose/CloudConnector

Additional context
I seem to remember that I have seen a similar problem with early tests of the Google Cloud C++ SDK long ago. Back then grpc wanted specific root certificates to be installed when used by this project. I am not sure if I ever dealt with that problem but have not seen this since. Now this problem seem to be related. Yet timeouts seem to be all we get, no specific SSL related errors. No idea if any of this helps.

@MrMoose MrMoose added priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. labels Nov 8, 2021
@MrMoose
Copy link
Author

MrMoose commented Nov 8, 2021

I have more info about this. Meanwhile we were following my hunch and wrapped the code in a standalone impl to test this.
Turns out I was right and it all depends on an environment variable called GRPC_DEFAULT_SSL_ROOTS_FILE_PATH being set to a roots.pem certificate bundle which I happened to still have from my earlier attempts.

Copying this file to the cloud VM and setting the env seems to solve the issue. Yet my client obviously refrains from using certificates out of unknown sources ;-) So, is this a known situation?
Where would I officially get this file from?

@MrMoose
Copy link
Author

MrMoose commented Nov 8, 2021

Answered my own question and found this:
https://github.com/googleapis/google-cloud-cpp/blob/main/google/cloud/bigtable/examples/README.md

So it is indeed documented behavior. Still, considering the effort it took us to get to this point I wonder if there would be a better solution or if this environment setting should be featured more prominently.

@coryan
Copy link
Member

coryan commented Nov 8, 2021

I have more info about this. Meanwhile we were following my hunch and wrapped the code in a standalone impl to test this. Turns out I was right and it all depends on an environment variable called GRPC_DEFAULT_SSL_ROOTS_FILE_PATH being set to a roots.pem certificate bundle which I happened to still have from my earlier attempts.

Ack.

Copying this file to the cloud VM and setting the env seems to solve the issue. Yet my client obviously refrains from using certificates out of unknown sources ;-) So, is this a known situation?

Yes. We document this in the Windows-specific notes for our quick start guides:

https://github.com/googleapis/google-cloud-cpp/tree/main/google/cloud/pubsub/quickstart#windows

and it is documented in the gRPC site too:

https://grpc.io/docs/guides/auth/#supported-auth-mechanisms

Where would I officially get this file from?

We use https://pki.google.com/roots.pem I suspect that any certificate bundle from a reputable source (maybe Microsoft has one?) would work too.

@MrMoose
Copy link
Author

MrMoose commented Nov 8, 2021

Yes, figured this out as you responded, thank you.
I think we can close this one but having to solve this issue for the second time now I still think there should be a way to solve this more permanently than exposing it to the user.
After all, grpc might be a layer too low to give a damn but the google cloud sdk, as I can see it, will only work in the Google Cloud (on Windows) if it is set. Is there no way of baking this certificate into the library somehow?

@coryan
Copy link
Member

coryan commented Nov 8, 2021

I think we can close this one but having to solve this issue for the second time now I still think there should be a way to solve this more permanently than exposing it to the user.

FWIW, I found my original bug report for gRPC:

grpc/grpc#16571

After all, grpc might be a layer too low

gRPC is probably the right layer. I think the "Right Thing":tm: would be for gRPC to use schannel on Windows. Then the native bundle would work.

but the google cloud sdk, as I can see it, will only work in the Google Cloud (on Windows) if it is set. Is there no way of baking this certificate into the library somehow?

We could do that, but I think it would break in even harder to debug ways. The certificate bundles change over time, as root certificates expire, are refreshed, etc. If we have a bundle hard-coded then your initial builds would work, and one day the deployed code would stop working.

I think a better fix may be something like:

https://stackoverflow.com/questions/9507184/can-openssl-on-windows-use-the-system-certificate-store

@MrMoose
Copy link
Author

MrMoose commented Nov 8, 2021

Ah, yes. It's all coming back now. I remember this bug report of yours and how it led me to the solution.
Still, very unfortunate that the situation is left like this. Especially that they didn't even merge a ready implementation for MacOS which will certainly discourage others from implementing this for Windows.

However, I agree with you as in the certs should probably not be hardwired.
I will add a prominent note in CloudConnector, making users aware of this.
Thanks for your help! I'm closing this.

@MrMoose MrMoose closed this as completed Nov 8, 2021
MrMoose added a commit to MrMoose/CloudConnector that referenced this issue Nov 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

No branches or pull requests

2 participants