Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

executor: fix csv parser #9005

Merged
merged 11 commits into from Jan 15, 2019

Conversation

@imtbkcat
Copy link
Contributor

commented Jan 10, 2019

What problem does this PR solve?

TiDB will panic when encounter irregular csv format, like this.

\N
"\N",
"NULL",
NULL

Test SQL:

create table test (i int default null);
load data local infile 'abc.csv' into table test FIELDS TERMINATED BY ',' enclosed by '"';

This is because csv file above mixed enclosed words and unenclosed words.

another case will cause panic is shown below.

abc,123,
"def",456
hij,"789",

TiDB:

mysql> create table test (str varchar(10) default null, i int default null);
Query OK, 0 rows affected (0.01 sec)

mysql> load data local infile '~/test.csv' into table test FIELDS TERMINATED BY ',' enclosed by '"';
ERROR 2013 (HY000): Lost connection to MySQL server during query

MySQL:

mysql> create table test (str varchar(10) default null, i int default null);
Query OK, 0 rows affected (0.02 sec)

mysql> load data local infile '~/test.csv' into table test FIELDS TERMINATED BY ',' enclosed by '"';
Query OK, 3 rows affected (0.00 sec)
Records: 3  Deleted: 0  Skipped: 0  Warnings: 0

mysql> select * from test;
+------+------+
| str  | i    |
+------+------+
| abc  |  123 |
| def  |  456 |
| hij  |  789 |
+------+------+
3 rows in set (0.00 sec)

What is changed and how it works?

Restructuring getFieldsFromLine function.

Check List

Tests

  • Unit test
  • Manual test

Code changes

  • Has exported function/method change

Side effects

  • Increased code complexity

Related changes

@imtbkcat

This comment has been minimized.

Copy link
Contributor Author

commented Jan 10, 2019

/run-all-tests

@codecov-io

This comment has been minimized.

Copy link

commented Jan 10, 2019

Codecov Report

Merging #9005 into master will increase coverage by 0.01%.
The diff coverage is 82.72%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #9005      +/-   ##
==========================================
+ Coverage   67.16%   67.17%   +0.01%     
==========================================
  Files         371      371              
  Lines       76393    76486      +93     
==========================================
+ Hits        51311    51383      +72     
- Misses      20494    20511      +17     
- Partials     4588     4592       +4
Impacted Files Coverage Δ
executor/load_data.go 81.48% <82.72%> (-0.88%) ⬇️
executor/join.go 77.92% <0%> (-0.52%) ⬇️
executor/distsql.go 72.53% <0%> (-0.47%) ⬇️
executor/executor.go 67.08% <0%> (+0.14%) ⬆️
expression/schema.go 94.95% <0%> (+0.84%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dca815c...0f0b27b. Read the comment docs.

Copy link
Member

left a comment

It is better to add the test cases which this PR could fix.

sep = append(sep, e.FieldsInfo.Enclosed)
sep = append(sep, e.FieldsInfo.Terminated...)
sep = append(sep, e.FieldsInfo.Enclosed)
var (

This comment has been minimized.

Copy link
@jackysp

jackysp Jan 11, 2019

Member

There are so many lines in this function. Please split it into pieces.

@imtbkcat

This comment has been minimized.

Copy link
Contributor Author

commented Jan 11, 2019

/run-all-tests

1 similar comment
@imtbkcat

This comment has been minimized.

Copy link
Contributor Author

commented Jan 11, 2019

/run-all-tests

@imtbkcat

This comment has been minimized.

Copy link
Contributor Author

commented Jan 11, 2019

/run-unit-test

@zz-jason zz-jason requested review from XuHuaiyu and qw4990 Jan 14, 2019
Copy link
Contributor

left a comment

LGTM

fp, err = os.Create(path)
c.Assert(err, IsNil)
c.Assert(fp, NotNil)

This comment has been minimized.

Copy link
@tiancaiamao

tiancaiamao Jan 14, 2019

Contributor

You can os.Remove(path) here immediately.

This comment has been minimized.

Copy link
@imtbkcat

imtbkcat Jan 14, 2019

Author Contributor

The next test case still use this file.

}

func (w *fieldWriter) GetField() (bool, field) {
// the bool return value implies whether is at the end of line.

This comment has been minimized.

Copy link
@tiancaiamao

tiancaiamao Jan 14, 2019

Contributor

s/the/The

whether ?? is at the end of line

w.rollback()
}
}
for {

This comment has been minimized.

Copy link
@tiancaiamao

tiancaiamao Jan 14, 2019

Contributor

Now it's complex enough and more difficult to maintain.
If we meet some error next time, I'll consider use some more general method instead of hard written those things.

w.term = term
}

func (w *fieldWriter) rollback() {

This comment has been minimized.

Copy link
@jackysp

jackysp Jan 14, 2019

Member

It is better to change the function name.

@@ -198,6 +198,11 @@ func (s *testExecSuite) TestGetFieldsFromLine(c *C) {
`"\0\b\n\r\t\Z\\\ \c\'\""`,
[]string{string([]byte{0, '\b', '\n', '\r', '\t', 26, '\\', ' ', ' ', 'c', '\'', '"'})},
},
// Test mixed.
{
`"123",456,"\t7890",abcd`,

This comment has been minimized.

Copy link
@jackysp

jackysp Jan 14, 2019

Member

It is better to add more test cases to guarantee the behavior we support.

Copy link
Member

left a comment

rest LGTM

@imtbkcat imtbkcat force-pushed the imtbkcat:fixloaddata branch from fa7320d to 0f0b27b Jan 15, 2019
@imtbkcat

This comment has been minimized.

Copy link
Contributor Author

commented Jan 15, 2019

/run-all-tests

Copy link
Member

left a comment

LGTM

@jackysp jackysp merged commit 33b4c3e into pingcap:master Jan 15, 2019
4 of 5 checks passed
4 of 5 checks passed
continuous-integration/travis-ci/pr The Travis CI build is in progress
Details
ci/circleci Your tests passed on CircleCI!
Details
idc-jenkins-ci-tidb/build Jenkins job succeeded.
Details
idc-jenkins-ci-tidb/check_dev Jenkins job succeeded.
Details
license/cla Contributor License Agreement is signed.
Details
imtbkcat added a commit to imtbkcat/tidb that referenced this pull request Apr 25, 2019
jackysp added a commit that referenced this pull request Apr 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.