Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MySqlBulkCopy failed when data is too big (over 80,000 rows) #780

Closed
szmcdull opened this issue Mar 20, 2020 · 16 comments
Closed

MySqlBulkCopy failed when data is too big (over 80,000 rows) #780

szmcdull opened this issue Mar 20, 2020 · 16 comments
Labels

Comments

@szmcdull
Copy link

MySql.Data.MySqlClient.MySqlException (0x80004005): Failed to read the result set.
 ---> System.Net.Sockets.SocketException (10054): An existing connection was forcibly closed by the remote host.
   at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken)
   at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.GetResult(Int16 token)
   at MySqlConnector.Protocol.Serialization.SocketByteHandler.DoWriteBytesAsync(ReadOnlyMemory`1 data) in C:\projects\mysqlconnector\src\MySqlConnector\Protocol\Serialization\SocketByteHandler.cs:line 133
   at MySqlConnector.Protocol.Serialization.ProtocolUtility.<WritePacketAsync>g__WritePacketAsyncAwaited|8_0(ValueTask`1 task_, Byte[] buffer_) in C:\projects\mysqlconnector\src\MySqlConnector\Protocol\Serialization\ProtocolUtility.cs:line 548
   at MySqlConnector.Core.ServerSession.SendReplyAsyncAwaited(ValueTask`1 task) in C:\projects\mysqlconnector\src\MySqlConnector\Core\ServerSession.cs:line 802
   at MySqlConnector.Core.ResultSet.ReadResultSetHeaderAsync(IOBehavior ioBehavior) in C:\projects\mysqlconnector\src\MySqlConnector\Core\ResultSet.cs:line 118
   at MySql.Data.MySqlClient.MySqlDataReader.ActivateResultSet() in C:\projects\mysqlconnector\src\MySqlConnector\MySql.Data.MySqlClient\MySqlDataReader.cs:line 130
   at MySql.Data.MySqlClient.MySqlDataReader.CreateAsync(CommandListPosition commandListPosition, ICommandPayloadCreator payloadCreator, IDictionary`2 cachedProcedures, IMySqlCommand command, CommandBehavior behavior, IOBehavior ioBehavior, CancellationToken cancellationToken) in C:\projects\mysqlconnector\src\MySqlConnector\MySql.Data.MySqlClient\MySqlDataReader.cs:line 391
   at MySqlConnector.Core.CommandExecutor.ExecuteReaderAsync(IReadOnlyList`1 commands, ICommandPayloadCreator payloadCreator, CommandBehavior behavior, IOBehavior ioBehavior, CancellationToken cancellationToken) in C:\projects\mysqlconnector\src\MySqlConnector\Core\CommandExecutor.cs:line 62
   at MySql.Data.MySqlClient.MySqlCommand.ExecuteNonQueryAsync(IOBehavior ioBehavior, CancellationToken cancellationToken) in C:\projects\mysqlconnector\src\MySqlConnector\MySql.Data.MySqlClient\MySqlCommand.cs:line 220
   at MySql.Data.MySqlClient.MySqlBulkLoader.LoadAsync(IOBehavior ioBehavior, CancellationToken cancellationToken) in C:\projects\mysqlconnector\src\MySqlConnector\MySql.Data.MySqlClient\MySqlBulkLoader.cs:line 116
   at MySql.Data.MySqlClient.MySqlBulkCopy.WriteToServerAsync(IOBehavior ioBehavior, CancellationToken cancellationToken) in C:\projects\mysqlconnector\src\MySqlConnector\MySql.Data.MySqlClient\MySqlBulkCopy.cs:line 167
   at MySql.Data.MySqlClient.MySqlBulkCopy.WriteToServerAsync(IDataReader dataReader, CancellationToken cancellationToken) in C:\projects\mysqlconnector\src\MySqlConnector\MySql.Data.MySqlClient\MySqlBulkCopy.cs:line 99
   at KlineImport.Program.Test() in C:\code\quant\KlineImport\Program.cs:line 252
   at KlineImport.Program.Main2(String[] args) in C:\code\quant\KlineImport\Program.cs:line 273

my code:

            using (var conn = await Common.GetDBAsync())
            {
                var kline = new KlineData("usdt_eth", 1, true);

                using (var stream = File.OpenRead(kline.path))
                using (var tx = new StreamReader(stream))
                using (var reader = new CsvHelper.CsvReader(tx, new CsvConfiguration(CultureInfo.InvariantCulture)))
                using (var rd = new CsvDataReader(reader))
                {
                    //var headers = reader.Context.HeaderRecord;

                    var bcp = new MySqlBulkCopy((MySqlConnection)conn)
                    {
                        DestinationTableName = kline.tableName
                    };
                    //Assume the file headers and table fields have the same names
                    //foreach (var header in headers)
                    //{
                    //    bcp.ColumnMappings.Add(header, header);
                    //}

                    //var i = 0;
                    //while (rd.Read())
                    //{
                    //    i++;
                    //}
                    //Log.Information($"{i} line read");

                    await bcp.WriteToServerAsync(rd);
                }
            }

If I uncomment the rd.Read() loop, the log will be read 80473 line read. So the reader is all right. Also I tried removing part of the data from the file and try to find if some ill-formed data cause the error. It turns out the error occurs whenever the file is too big (I guess about 32765), no matter which part of the file is removed.

@bgrainger
Copy link
Member

bgrainger commented Mar 20, 2020

It looks like you might be starting with a CSV file; are you able to load it successfully with MySqlBulkLoader?

@szmcdull
Copy link
Author

@bgrainger I can successfully import the CSV using mysqlimport

@bgrainger
Copy link
Member

I was curious if using a different C#-based approach MySqlBulkLoader instead of MySqlBulkCopy works. (MySqlBulkLoader is the "traditional" way that's been implemented for a long time; MySqlBulkCopy is the experimental new API that can stream an IDataReader to the MySQL Server.)

@szmcdull
Copy link
Author

Oh. I didn't know there is a MySqlBulkLoader. Because it is not on your doc site. I found it on dev.mysql.com, applied it to MySqlConnector, and it works.

                    var bcp = new MySqlBulkLoader((MySqlConnection)conn)
                    {
                        TableName = kline.Value.tableName,
                        FieldTerminator = ",",
                        NumberOfLinesToSkip = 1,
                        FileName = kline.Value.path,
                        ConflictOption = MySqlBulkLoaderConflictOption.Ignore,
                    };
                    await bcp.LoadAsync();

@bgrainger
Copy link
Member

Thanks for confirming that MySqlBulkLoader works but MySqlBulkCopy doesn't. They mostly use the same underlying code, so it should be possible to investigate and determine why only one succeeds.

I've opened #781 to improve the docs.

@bgrainger bgrainger added the bug label Mar 21, 2020
@bgrainger
Copy link
Member

I can't reproduce yet (with a 82MB file containing 1.2m rows).

@chester89
Copy link

I can reproduce that with 30k rows (yes, my table has many columns). I'm using mysql 5.7.29 with 0.66.0 driver

@bgrainger
Copy link
Member

I still haven't been able to reproduce the problem. Is it possible that anyone who can reproduce this could get a Wireshark packet capture (ideally of a large batch that's still small enough to work, and then a batch that's large enough to fail)? CC @MaceWindu

@MaceWindu
Copy link
Contributor

MaceWindu commented May 30, 2020

Finally found time to check it. I've updated to recent 8.0.20.0 version and issue still persists, so it is not version specific issue. Will see if I can grab netflow logs.

Update: I think it is mysql issue, not provider issue.

Update 2: What I found interesting is that probably it also depends on target table size, as at some batch size, it fails not on first batch, but on second (of same size)

@bgrainger
Copy link
Member

Update: I think it is mysql issue, not provider issue.

It does seem like MySQL is closing the connection, but there are reports that MySqlBulkLoader works in situations where MySqlBulkCopy doesn't, so it still seems possible there's something that could be changed on the client side to improve robustness.

@MaceWindu
Copy link
Contributor

Wireshark logs for 0.69.2
Test table https://github.com/linq2db/linq2db/blob/ef947d30b52f7f0012b86ec009c24602efa05ac5/Data/Create%20Scripts/MySql.sql#L329

inserting 20060 records works always
20061 records always fails
Mysql version: : 8.0.20.0 (win)

wireshark.zip

@bgrainger
Copy link
Member

The conversation used TLS, so I can't really see anything in the packet capture other than MySQL Server sending RST packets to the client in the middle of the conversation.

Oddly, the success conversation has 3,096 client packets and 27 server packets; the fail conversation is much shorter: it only has 133 client packets and 15 server packets. So even though the client is going to try to send more data, it doesn't get to send most of it.

Is there somehow malformed data in an early packet that makes the server close the connection? Are there any errors logged in MySQL Server when this happens? Can you send me the server's SSL private key (you can email it to [GitHub user name] at gmail) so I can try to decrypt the conversation? Or can you repeat the packet capture, but with SslMode=None?

@MaceWindu
Copy link
Contributor

Whoops, I actually captured both (with ssl and without ssl), but attached wrong one. Will send proper logs a bit later when I reach my machine.

@MaceWindu
Copy link
Contributor

packet dump without ssl
wireshark-nossl.zip

I don't see anything in mysql logs, but after sql server restart I found out that limit of accepted records changed (increased, but I didn't checked to which number).

@bgrainger
Copy link
Member

Oh, the error is very obvious now:

Got a packet bigger than 'max_allowed_packet' bytes

The failing packet capture is sending a packet that's 0x400062 (4,194,402) bytes, so you must be using the MySQL 5.7 default of 4 MB for max_allowed_packet. Going from 20,060 to 20,061 packets must have just crossed the 4MB boundary.

(That also explains why MySqlBulkLoader didn't have this problem, because it breaks uploaded CSV files up into 1 MB chunks.)

To fix this, the packet size that MySqlBulkCopy sends (which also happens to be the limit of the largest row that it can send) could be reduced to 1 MB or 2MB; or, it could query for @@max_allowed_packet before starting the copy and reduce the maximum size dynamically.

@bgrainger
Copy link
Member

Fixed in 0.69.3 by feea06b.

Opened #834 to improve the feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

4 participants