Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] myloader of table with json column fails when mydumper uses --load-data #574

Closed
matthewlenz opened this issue Feb 10, 2022 · 34 comments · Fixed by #586, #972 or #974
Closed

[BUG] myloader of table with json column fails when mydumper uses --load-data #574

matthewlenz opened this issue Feb 10, 2022 · 34 comments · Fixed by #586, #972 or #974
Labels

Comments

@matthewlenz
Copy link

matthewlenz commented Feb 10, 2022

Mysql bug or myloader?

** (myloader:12325): CRITICAL **: 11:09:25.761: Error occours between lines: 6 and 8 on file db_name.tbDoc.00000.sql.gz: Invalid JSON text: "Invalid value." at position 0 in value for column 'table_name.data'.

@matthewlenz
Copy link
Author

matthewlenz commented Feb 10, 2022

I think the issue is this:

select into outfile escapes control characters inside strings. ie. backslash becomes double backslash.
mydumper isn't escaping them.

That's really the only difference I can see when I compare a .dat file produced via both methods.

@davidducos
Copy link
Member

@matthewlenz, Can you test using --fields-escaped-by with a different character?

@davidducos davidducos added the bug label Feb 10, 2022
@davidducos davidducos added this to the Release 0.12.1 milestone Feb 10, 2022
@matthewlenz
Copy link
Author

@davidducos sorry, not sure what you are asking. Is there a reason why you wouldn't want the default .dat export functionality from mydumper to match what mysql does by default?

@davidducos
Copy link
Member

davidducos commented Feb 14, 2022

@matthewlenz I was not able to reproduce your issue. Can you provide a test case? And let us know what version of database are you using.

@matthewlenz
Copy link
Author

matthewlenz commented Feb 14, 2022

I found the REAL issue. mydumper --load-data dumps NULL fields as NULL. select into outfile dumps NULL as \N Edit: I'm running the newest Percona mysql 8.0.26 release. One other thing and I don't know if it matters but I am running latin1 for everything on this db.

@matthewlenz
Copy link
Author

matthewlenz commented Feb 14, 2022

Actually, I just looked at the data in all the tables. It's just inserting null and table defaults into every record. hmm.

@davidducos
Copy link
Member

It says that you have a problem in the column data in the row 0, if I didn't read the error message incorrectly.

@matthewlenz
Copy link
Author

Mydumper without --load-data + myloader = works
Mydumper --load-data + myloader = invalid data and/or outright failures to load various tables.

@davidducos
Copy link
Member

Hi @matthewlenz sorry to hear that, but Can you provide a test case? just one small table where this is falling will help me to reproduce and fix MyDumper.

@matthewlenz
Copy link
Author

Yeah, I can try. It doesn't seem to be consistent though and it's driving me batty. Do you think it has anything to do with the server being set to character-set-server = latin1 and collation-server = latin1_swedish_ci ?

@davidducos
Copy link
Member

If data can not be inserted, it should be a consistent issue. It might be related to character set and collations, but I think that once that we have a test case, we are going to be able to find the solution.

@davidducos davidducos removed this from the Release 0.12.1 milestone Feb 15, 2022
@matthewlenz
Copy link
Author

matthewlenz commented Feb 17, 2022

Well. This isn't fixed. Reopening while I do more debugging. I thought it was due to the compression of the .dat files (per the new bug) but it's not.

@matthewlenz matthewlenz reopened this Feb 17, 2022
@matthewlenz
Copy link
Author

matthewlenz commented Feb 17, 2022

@davidducos see if this causes an issue with your test server:

create table `testjson` ( `ID` int unsigned NOT NULL AUTO_INCREMENT, `somevc` varchar(10) not null default '', `data` json DEFAULT NULL, PRIMARY KEY (`ID`)) ENGINE=InnoDB DEFAULT CHARSET=latin1;

insert into `testjson` (`somevc`, `data`) values ('test1', '["arrval0"]'), ('test2', null), ('test3', '{ "keynam0": "keyval0" }');

mydumper  -o test_db -B test_db --load-data -T testjson

myloader -d test_db -s test_db

With mine I get:

** (myloader:23494): CRITICAL **: 10:19:07.029: Error occours between lines: 6 and 8 on file test_db.testjson.00000.sql: Invalid JSON text: "Invalid value." at position 0 in value for column 'testjson.data'.

EDIT:

If I do the following:

perl -pi -e's/NULL$/\\N/' test_db.testjson.00000.dat

then the load is successful.

@matthewlenz
Copy link
Author

matthewlenz commented Feb 17, 2022

@davidducos even crazier. Mysql only seems to care if json fields use \N instead of NULL:

create table `testjson` ( `ID` int unsigned NOT NULL AUTO_INCREMENT, `someint` int unsigned default null, `somevc` varchar(10) not null default '', `data` json DEFAULT NULL, PRIMARY KEY (`ID`)) ENGINE=InnoDB DEFAULT CHARSET=latin1;

insert into `testjson` (`someint`, `somevc`, `data`) values (1, 'test1', '["arrval0"]'), (2, 'test2', null), (null, 'test3', '{ "keynam0": "keyval0" }');

Try myloader without and with:

perl -pi -e's/NULL$/\\N/' test_db.testjson.00000.dat

Keep in mind this only adjusts the json value. Not the int that defaults to null as well.

@davidducos
Copy link
Member

Oh!!! so, when it is JSON needs to be \N and when it is other needs to be NULL... that is a simple fix... please, confirm and will be doing the fix later today.

@matthewlenz
Copy link
Author

matthewlenz commented Feb 18, 2022

For load data NULL always needs to be \N. I overlooked this but if you use the string NULL for any field you get inconsistent results.

I overlooked this but If you use my example above and let that single 'someint' value remain as NULL in the .dat file then it gets set to "0" instead of NULL in the db.

perl -pi -e's/NULL/\\N/' test_db.testjson.00000.dat

Results in the correct import (see that I removed the eol anchor $ from the regexp).

@davidducos
Copy link
Member

@matthewlenz, Can you check branch https://github.com/mydumper/mydumper/tree/fix_574 ?

@matthewlenz
Copy link
Author

Oooof. I just looked at a text field that allows NULL as well and it actually put the string "NULL" in the text field rather than setting it to NULL.

@davidducos davidducos linked a pull request Feb 18, 2022 that will close this issue
@davidducos davidducos added this to the Release 0.12.1 milestone Feb 18, 2022
@matthewlenz
Copy link
Author

That looks right to me. Did you need me to try to build it from source?

@matthewlenz
Copy link
Author

matthewlenz commented Feb 18, 2022

FYI, according to the mysql site these are the defaults that mysql uses for load data and select into outfile:

FIELDS TERMINATED BY '\t' ENCLOSED BY '' ESCAPED BY '\\'
LINES TERMINATED BY '\n' STARTING BY ''

It also states that any escape characters themselves must be escaped for proper importing. That might explain why there are differences in the escaping between default mydumper --load-data and mysql select into outfile defaults.

EDIT: put in code block because it wasn't showing the double backslash on escaped by.

@davidducos
Copy link
Member

No worries, I already test it. BTW, with string NULL and with \N works in my testing environment for non json fields.
So, I don't get the same behavior than you: #574 (comment)

root@ubuntu-focal:~/issue_574/mydumper# cat dump/issue_574.history16.00000.dat
1	1	61	1	61	2021-12-22 00:55:41	10.00	NULL
2	1	61	1	61	2021-12-22 00:55:41	10.00	hhhhhhhhhhhhhhhhhhhhhh
3	1	61	1	61	2021-12-22 00:55:41	10.00	sssssssssssssssssss
4	1	61	1	61	2021-12-22 00:55:41	10.00	qqqqqqqqqqqqqqq
5	1	61	1	61	2021-12-22 00:55:41	10.00	NULL
6	1	61	1	61	2021-12-22 00:55:41	10.00	nnnnnnnnnnnnnnnnn
7	1	61	1	61	2021-12-22 00:55:41	10.00	bbbbbbbbbbbbbbbbbb
8	1	61	1	61	2021-12-22 00:55:41	10.00	yyyyyyyyyyyyy
9	1	61	1	61	2021-12-22 00:55:41	10.00	bbbbbbbbbbbbbb
10	1	61	1	61	2021-12-22 00:55:41	10.00	NULL
mysql> select * from history16 where h_data is NULL;
+--------+----------+----------+--------+--------+---------------------+----------+--------+
| h_c_id | h_c_d_id | h_c_w_id | h_d_id | h_w_id | h_date              | h_amount | h_data |
+--------+----------+----------+--------+--------+---------------------+----------+--------+
|      1 |        1 |       61 |      1 |     61 | 2021-12-22 00:55:41 |    10.00 | NULL   |
|      5 |        1 |       61 |      1 |     61 | 2021-12-22 00:55:41 |    10.00 | NULL   |
|     10 |        1 |       61 |      1 |     61 | 2021-12-22 00:55:41 |    10.00 | NULL   |
+--------+----------+----------+--------+--------+---------------------+----------+--------+

@matthewlenz
Copy link
Author

matthewlenz commented Feb 18, 2022

that's really odd. Maybe it's the server character set thing or a difference in mysql versions (percona mysql 8.0.26-17 for me). Or maybe it's to do with allowing NULL vs defaulting to NULL? What was the schema on that table?

@davidducos
Copy link
Member

Oh god... I'm reviewing https://dev.mysql.com/doc/refman/8.0/en/load-data.html for NULL values, and there are lots of cases depending in the options of LOAD data. mydumper is only appending those options if they are selected but I'm not going to add the logic to determine the right value (NULL or \N) that represents NULL values.
So, what I'm going to do is ALWAYS adding all the options to LOAD DATA and the set the only and correct NULL value. What do you think @matthewlenz ?

@matthewlenz
Copy link
Author

matthewlenz commented Feb 18, 2022

Yeah, it's incredibly confusing. The details are spread across multiple pages as well. Honestly I think the best solution is to use the same defaults as mysql and make sure that mydumper/loader (by default) matches mysql select into outfile defaults. I think the main differences I see right now are the NULL issue (which you are fixing) and how mysql select into outfile defaults to the \\ instead of \ for escape sequences inside strings. EDIT: lots of typos

@matthewlenz
Copy link
Author

matthewlenz commented Dec 7, 2022

* (myloader:13321): CRITICAL **: 12:01:13.166: Thread 2: Error restoring: Cannot create a JSON value from a string with CHARACTER SET 'binary'.

** (myloader:13321): CRITICAL **: 12:01:13.166: Error occurs between lines: 4 and 5 on file db.table.00000.sql.gz: Cannot create a JSON value from a string with CHARACTER SET 'binary'.

@davidducos I thought we'd figured this one out but it appears not.

@matthewlenz
Copy link
Author

matthewlenz commented Dec 7, 2022

Difference appears to be maybe the quoting? Do you use the same escaping defaults as mysql?

mydumper:
{\"event\": {\"during_trans_approve\": {\"td_id\": 11680829}}}

select * into outfile:
{"event": {"during_trans_approve": {"td_id": 11680829}}}

@matthewlenz
Copy link
Author

Actually this doesn't appear to be related to this bug. You were tracking it on #437

@davidducos
Copy link
Member

@matthewlenz is this resolved? has #437 fixed it? do you need more work on my side?

@matthewlenz
Copy link
Author

I'll try to load some data that used --load-data tomorrow when I get into the office and report back.

@matthewlenz
Copy link
Author

Still fails with mydumper0.13.1-1, built against MySQL 5.7.40-43 with SSL support with GZIP

Cannot create a JSON value from a string with CHARACTER SET 'binary'.

@davidducos davidducos linked a pull request Jan 5, 2023 that will close this issue
@davidducos davidducos linked a pull request Jan 13, 2023 that will close this issue
@matthewlenz
Copy link
Author

@davidducos thanks! I'll check it out next release and report back if I find any issues.

@davidducos
Copy link
Member

@matthewlenz no need, I finally understood what is the issue... not sure how we are going to be able to sort it out

@davidducos davidducos reopened this Jan 13, 2023
@matthewlenz
Copy link
Author

matthewlenz commented Jan 13, 2023

What issue did you identify? Did mydumper always set names to binary when doing dump? or was that something that was added more recently?

@davidducos davidducos linked a pull request Jan 13, 2023 that will close this issue
@davidducos
Copy link
Member

@matthewlenz we need to do the same CONVERT that we do when we create the INSERT INTO statement. I already have a fix for it, actually, it was simpler than I thought.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment