-
Notifications
You must be signed in to change notification settings - Fork 360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inserting lots of data to Google BigQuery randomly throws an SSL Exception #10625
Comments
Hi @PureKrome, We've discussed this in the team, and we'd like to at least try to reproduce the problem for the sake of completeness, but we suspect we won't want to actually make any changes to how retries are performed... while it may be safe to automatically retry in your particular case, at the point where the network transport has problems we're in unusual territory, and in many cases retrying would be the wrong approach. Could you let us know:
|
I've tried to reproduce this in https://github.com/jskeet/google-cloud-dotnet/tree/issue-10625 - I've not seen the problem yet. My most recently attempt consisted of:
(The test took about 25 minutes from home.) If you'd be happy to run the repro code in your environment, the results would be interesting. |
Thanks @jskeet for showing interest into this issue - I really really appreciate it. I'll try and answer as much of the questions as possible:
i have 7 tables i'm inserting data into. So I just spent the last 2 hours running my app to see when/where it fails and if a specific table is always the cause. Two different tables had the same error at random times. The first one was at rows 83k -> 84k. The 2nd one was at rows 2.433m -> 2.434m. I've also seen it happen at various/random places.
The errors occurred both from my home network and through the office network. Neither has anything special. Home has a pi-hole, but that's DNS. Office has nothing like that. I don't have anything special setup on my dev box here. E.g. no custom firewalls, etc.
Pretty random. Sometimes it's 5 - 10 mins. other times .. an hour+ it's usually never longer than an hour -> tonight's run was unique how i got so far (from 80k -> 2.4m)
Test 1Ok. So I've done this and it worked 100% from my home env/location. Here's some data to help explain what I did (so we are both on the same page with no assumptions)
and this was with a simple WIFI connection to home internet (25 up). I had to change the global.json to:
and this was how i did my creds: the Test 2This time, I'm going to try and use the .. and this worked 💯 fine, like before. Question.
How could I calculate how large each row is? With the 2x errors that occurred in the run above,
cheers! |
Thanks for the info. Will try with more columns next week, although it's hard to see how that would affect things. I may also try with a delay to deliberately run the test over a long time period without actually having to insert billions of rows.
Just roughly in terms of what the field types are, and how long any strings or byte arrays are - very roughly, as in are we talking about 100 bytes per row or 100K... |
👋🏻 G'Day @jskeet sorry for the slow/late reply. Here's some more info:
so out of the 10mil odd db rows, here's some data for the first 20k rows. feels like each poco is just over 200 bytes. the `record` has **21** properties. (click to see code)
does this help? |
You're using |
kewl. which suggests it's not much data then, versus thousands and thousands of bytes, per row. does the "region" have any difference? could there be some rogue SSL cert or server, in the region I'm pushing the data up to? |
The region may well be relevant, although I think the chances of a rogue cert on Google's side are slim. I'll definitely try with australia-southeast2 as well though. |
Okay, next test - still in multi-region, US: Text fields per row (in addition to ID): 20 (Code is in the branch.) Now starting to run against a dataset in Australia (inserting from London). |
:) Yep! I was really fishing hard, with that idea 😬 |
And for a dataset in Australia: Text fields per row (in addition to ID): 20 So still no joy in reproducing at the moment :( |
@PureKrome: I'm not sure what the next step is at this point. If you can run the tests (which are now checked in at https://github.com/googleapis/google-cloud-dotnet/tree/main/issues/Issue10625) some more to see if the problem is reproduced that way, that would be useful. (I'd suggest using the GOOGLE_APPLICATION_CREDENTIALS environment variable to refer to a JSON file with your service account rather than modifying the code.) I'm on holiday for a week now, so won't be able to run any more tests (although I'll still monitor this issue). Looking at the stack trace again, it's interesting that it's failing when establishing the connection... I'm not sure why that would happen during the run. Unless you're creating a new BigQueryClient for each batch of rows? If you're able to provide a similar standalone program that does reproduce the issue for you, that would be really helpful. |
👋🏻 G'Day Jon!
YES! This was what I was doing, actually! These lines:
So as you can see here, each time we did a batch of 1000 rows to upload to BQ, I would create the client AND the table references! So now I've changed the code to be something like this:
and the instance of this class is a singleton / single instance, in my main app. The So .. given this information ... I'm still trying to understand the relationship between the error (which the stack trace suggests it's during I'll also give your code another run, as request. I'll report back. |
Each client has its own HttpClient, which would create a separate set of network connections. I suspect you're running out of TCP ports or similar. I strongly suspect this will clear up the problem. I might try making a change to the repro to create a new client for each batch and see if that fixes the problem - then I could document it accordingly. |
I tried my code with the client initialization within the loop, and it was still working fine - but I didn't let it do a huge run due to other constraints. It's definitely worth changing your code to only create a single client though. The table reference is simple though - no need to worry about caching/reusing those. |
Hi @jskeet - I really hope you're having a rest on your break so maybe read this reply when you get back 🏖️ 🌞 When I refactored to a single client I never got the issue, again. So it feels like this is not really an issue about INSERTING data. It's really about creating heaps of clients. Maybe something like this might be used to see if this is an issue with the client creation?
|
Not really answering... but I should check when I'm back whether we cache a service for default credentials, which would make my current test code unsuitable. At the very least, using the same credential for each client will be a bit different. And we don't use self-signed JWTs for BigQuery. |
Hmm... I still can't reproduce this when inserting, even with loads of clients each with a new credential. But it may well depend on precise environmental factors at that point. I'm really glad to hear that using a single client seems to have resolved the problem - but I'll leave this issue open and have another go at reproducing it next week with the code you've suggested above. Just for clarity, does that code reproduce the issues on your machine? |
Sorry - I don't understand. Which code do you mean, causes the SSL issue? I also might see if i can reproduce it by packaging up my original code (1x client per upload-to-bigquery) into a single exe and then run it from some VM in azure. This way, I can see if it's a weird issue from my machine to isolate some hardware out of the equation. |
Yup. I haven't been able to reproduce that anywhere yet. Using a single client is a better idea anyway, but at the moment I don't know whether it's really related to this issue. |
@PureKrome: Have you seen enough data now with a single client to know whether this has actually fixed the issue? While I still haven't reproduced the problem myself, I can imagine it could be related, which would be good to know in case we see other users with the same issue. It's a shame that I can't reproduce it, but if your code now works, I'd like to close this issue. If you're still seeing the error, we should absolutely continue to pursue it... |
👋🏻 G'Day @jskeet - ok. interesting! I tried again running my code on an Azure VM and it didn't seem to error at all. Ran it on my localhost and yeah .. took a while but I got the error popping up again. So i think it's then fair to say one (or more) of the following:
I was expecting the error to pop up when I did it on the Azure VM - but it didn't. I feel really bad about this as I honestly thought it might be a rogue setting somewhere outside of my little bubble. Serious lessons learnt here. I'm going to close this card. Again Jon - thank you so much for showing some interest in this post. Keep up the awesome work. Thanks, mate. |
Please don't feel bad about this at all - I thought it was a great collaboration. Okay, we didn't find anything in the end - but we still ended up with a code improvement for you, and I'll definitely remember this error if it comes up for another customer. |
Thanks Jon for the kind words. Yep - an action item here was definitely how to reuse the Client instead of creating a new one for each item to save. So there was some good that came out of this. Thank you again for all your patience and great questions in trying to resolve this. Really appreciated it. 👏🏻 🍰 👍🏻 |
👋🏻 G'Day!
Problem
An SSL error is randomly thrown while importing a lot of data into Google BigQuery.
Details
I've got a (once off) maintenance task which is trying to insert about 10mil rows into a Google BigQuery table. Code is nothing too crazy - because this is a once off import.
Randomly (well, it feels random) an SSL error thrown which crashes my app.
I've added in some retrying ability (using Polly for .NET) and we're back on track.
I'm not really sure how to reproduce it, but I've provided some info here to maybe help. It's happened after 30/40k have been pushed up. Other times, hundreds of thousands pushed up. With Polly, it retries and it works again/continues ... until the next random failure .. which Polly retries OK and we rinse/repeat.
Environment details
Here's some sample code I've got which is doing this:
The text was updated successfully, but these errors were encountered: