Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lightning: not set pos to end if fail to ReadUntil #40232

Merged
merged 14 commits into from Jan 12, 2023

Conversation

buchuitoudegou
Copy link
Contributor

@buchuitoudegou buchuitoudegou commented Dec 29, 2022

What problem does this PR solve?

Issue Number: close #40034

Problem Summary:

What is changed and how it works?

  • not set pos to end

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

@ti-chi-bot
Copy link
Member

ti-chi-bot commented Dec 29, 2022

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • gozssky
  • lance6716

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added release-note-none do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/needs-triage-completed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Dec 29, 2022
@buchuitoudegou
Copy link
Contributor Author

/run-integration-br-test

@buchuitoudegou
Copy link
Contributor Author

/component lightning

@ti-chi-bot ti-chi-bot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Dec 29, 2022
@pingcap pingcap deleted a comment from ti-chi-bot Dec 29, 2022
@ti-chi-bot ti-chi-bot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Dec 29, 2022
@buchuitoudegou
Copy link
Contributor Author

/run-integration-br-test

@@ -341,7 +342,7 @@ func (parser *CSVParser) readUntil(chars *byteSet) ([]byte, byte, error) {
if err == nil {
err = io.EOF
}
parser.pos += int64(len(buf))
// parser.pos += int64(len(buf))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changed the semantics of pos. pos means the current file offset. Since we have read to the end, the pos should be set to the end offset.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a test. I didn't change it here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This demand is from import on cloud. We are trying to use the offset reported in the syntax error to show the content in users' files. However, in an unterminated quote error, the offset always exceeds the range of the file. So I am about to return the position to the previous one.

If we meet an error, setting back the pos seems safe because we will never use it again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, in an unterminated quote error, the offset always exceeds the range of the file.

Why does offset exceed the range of the file? The end offset is also a valid offset.

It's hard to tell which pos is the starting point of syntax error. Suppose we have a line of data: a,b,c,"xxxxx
It's obvious that it has an unterminated quote error. But what is the right pos of syntax error? Is it the start pos of "xxxx or the end pos of this line? In my opinion, the end pos is not a wrong answer, because everything is still valid until the end.

I'm not sure how you use the syntax error pos. If you just want to print content near the error, you can print the content around the pos. e.g. When pos is 1000, print content[800:1200].

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We tend to present the starting point of the unterminated quote (i.e., the first quote). Because users should fix the data files and import them again, informing them of the starting point would be better than telling them the end offset of the whole file, which is a bit meaningless

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does offset exceed the range of the file? The end offset is also a valid offset.

It is the position of EOF (e.g., the size of the file is 35 bytes, then the final offset is 35).

@buchuitoudegou buchuitoudegou changed the title WIP lightning: not set pos to end if fail to ReadUntil lightning: not set pos to end if fail to ReadUntil Jan 3, 2023
@ti-chi-bot ti-chi-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 3, 2023
@buchuitoudegou
Copy link
Contributor Author

/run-unit-test

@buchuitoudegou
Copy link
Contributor Author

/run-integration-br-test

Copy link
Contributor

@sleepymole sleepymole left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM.

br/pkg/lightning/mydump/csv_parser.go Outdated Show resolved Hide resolved
@buchuitoudegou
Copy link
Contributor Author

/run-integration-br-test

@lance6716
Copy link
Contributor

lgtm

wait the result of BR CI

@joccau
Copy link
Member

joccau commented Jan 9, 2023

/run-integration-br-test

@joccau joccau closed this Jan 9, 2023
@joccau joccau reopened this Jan 9, 2023
@joccau
Copy link
Member

joccau commented Jan 9, 2023

/run-integration-br-test

@buchuitoudegou
Copy link
Contributor Author

/test check-dev2

@lance6716
Copy link
Contributor

@buchuitoudegou please fix CI (lightning exit code not zero)

@buchuitoudegou
Copy link
Contributor Author

/run-integration-br-test

@buchuitoudegou
Copy link
Contributor Author

aborted...
/run-integration-br-test

@buchuitoudegou
Copy link
Contributor Author

/run-integration-br-test

@buchuitoudegou
Copy link
Contributor Author

/retest

@buchuitoudegou
Copy link
Contributor Author

@lance6716 @gozssky PTAL again. The CI has passed.

@ti-chi-bot ti-chi-bot added the status/LGT1 Indicates that a PR has LGTM 1. label Jan 12, 2023
@ti-chi-bot ti-chi-bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Jan 12, 2023
@buchuitoudegou
Copy link
Contributor Author

/merge

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: 2a49047

@ti-chi-bot ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Jan 12, 2023
@ti-chi-bot ti-chi-bot merged commit eef8438 into pingcap:master Jan 12, 2023
lance6716 pushed a commit to lance6716/tidb that referenced this pull request Jun 1, 2023
* fix: not set pos to end

* add it

* fix: return to prevPos

* fix: return err

* fix

* fix

* fix it

* fix

* fix: not exit when err

Co-authored-by: Zak Zhao <57036248+joccau@users.noreply.github.com>
@lance6716 lance6716 mentioned this pull request Jun 1, 2023
12 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/lightning This issue is related to Lightning of TiDB. release-note-none size/S Denotes a PR that changes 10-29 lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

lightning: error offset exceed range of file
5 participants