Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lighting/parser: align NULL and ESCAPED BY with LOAD DATA #40909

Merged
merged 11 commits into from Feb 13, 2023

Conversation

lance6716
Copy link
Contributor

@lance6716 lance6716 commented Jan 31, 2023

Signed-off-by: lance6716 lance6716@gmail.com

What problem does this PR solve?

Issue Number: ref #40499

Problem Summary:

What is changed and how it works?

  • ESCAPED BY: add a new configuration to alter the escape character from \
  • NULL: the full behaviour should work together with caller. If the ESCAPED BY is set (for example \), \N should be treated as NULL. If ENCLOSED BY is set (for example "), unenclosed NULL in data file should be treated as NULL. Modify the code so caller should set \N, \NULL in the configuration

Also fix a bug in ReadUntilTerminator.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

Signed-off-by: lance6716 <lance6716@gmail.com>
@ti-chi-bot
Copy link
Member

ti-chi-bot commented Jan 31, 2023

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • buchuitoudegou
  • gozssky

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added release-note-none size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jan 31, 2023
@lance6716
Copy link
Contributor Author

/cc @gozssky @dsdashun

Signed-off-by: lance6716 <lance6716@gmail.com>
Signed-off-by: lance6716 <lance6716@gmail.com>
@lance6716
Copy link
Contributor Author

/retest

Signed-off-by: lance6716 <lance6716@gmail.com>
},
}

testCases := []testCase{
{
input: `\\`,
expected: [][]types.Datum{{nullDatum}},
expected: [][]types.Datum{{types.NewStringDatum("")}},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note that the behaviour is changed. empty field inside delimiter is no longer NULL

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I just noticed we have explicitly stated the behaviour in doc

Quoting does not affect whether a field is null.

https://docs.pingcap.com/tidb/stable/tidb-lightning-data-source#not-null-and-null

I should add a hide configuration for LOAD DATA 😂

Signed-off-by: lance6716 <lance6716@gmail.com>
Signed-off-by: lance6716 <lance6716@gmail.com>
Signed-off-by: lance6716 <lance6716@gmail.com>
Signed-off-by: lance6716 <lance6716@gmail.com>
@lance6716
Copy link
Contributor Author

/run-integration-br-tests

@lance6716
Copy link
Contributor Author

@lance6716
Copy link
Contributor Author

/cc @buchuitoudegou @lichunzhu

@@ -303,20 +303,22 @@ func (parser *blockParser) readBlock() error {
}
}

var unescapeRegexp = regexp.MustCompile(`(?s)\\.`)
var chunkParserUnescapeRegexp = regexp.MustCompile(`(?s)\\.`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost LGTM. I don't quite understand why this regexp has been preserved and will review it later.

@ti-chi-bot ti-chi-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 12, 2023
@ti-chi-bot
Copy link
Member

@lance6716: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@lance6716
Copy link
Contributor Author

ping

csv.EscapedBy = `\`
}
if !csv.BackslashEscape && csv.EscapedBy == `\` {
csv.EscapedBy = ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is csv.EscapedBy set to empty if it is set in the config but backslashEscape is false?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently the default value of EscapedBy is \, and BackslashEscape is hidden and has default value true. So if BackslashEscape is changed to false, it means this is an old format configuration file with false BackslashEscape, we should disable escaping

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

br/pkg/lightning/config/config.go Outdated Show resolved Hide resolved
br/pkg/lightning/mydump/csv_parser.go Show resolved Hide resolved
br/pkg/lightning/mydump/csv_parser.go Show resolved Hide resolved
Signed-off-by: lance6716 <lance6716@gmail.com>
Signed-off-by: lance6716 <lance6716@gmail.com>
Signed-off-by: lance6716 <lance6716@gmail.com>
@ti-chi-bot ti-chi-bot added the status/LGT1 Indicates that a PR has LGTM 1. label Feb 13, 2023
@ti-chi-bot ti-chi-bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Feb 13, 2023
@lance6716
Copy link
Contributor Author

/merge

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: 79f2d6b

@ti-chi-bot ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Feb 13, 2023
@lance6716 lance6716 removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 13, 2023
@lance6716
Copy link
Contributor Author

/merge

@lance6716
Copy link
Contributor Author

/retest

1 similar comment
@lance6716
Copy link
Contributor Author

/retest

@lance6716
Copy link
Contributor Author

/test all

@lance6716
Copy link
Contributor Author

/retest

@ti-chi-bot ti-chi-bot merged commit 55c8358 into pingcap:master Feb 13, 2023
blacktear23 pushed a commit to blacktear23/tidb that referenced this pull request Feb 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note-none size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants