Skip to content
This repository has been archived by the owner on Dec 8, 2021. It is now read-only.

mydumper: fix s3 parquet reader #461

Merged
merged 2 commits into from
Nov 10, 2020
Merged

mydumper: fix s3 parquet reader #461

merged 2 commits into from
Nov 10, 2020

Conversation

glorv
Copy link
Contributor

@glorv glorv commented Nov 10, 2020

What problem does this PR solve?

Update dependency parquet-go to include xitongsys/parquet-go#330

close #442

What is changed and how it works?

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

Related changes

Copy link
Collaborator

@kennytm kennytm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kennytm
Copy link
Collaborator

kennytm commented Nov 10, 2020

/run-all-tests

https://internal.pingcap.net/idc-jenkins/blue/organizations/jenkins/lightning_ghpr_test/detail/lightning_ghpr_test/2575/pipeline

assuming spurious but should be fixed asap.

[2020-11-10T04:23:40.511Z] FAIL: checksum_test.go:254: checksumSuite.TestGcTTLManagerSingle
[2020-11-10T04:23:40.511Z] 
[2020-11-10T04:23:40.511Z] checksum_test.go:279:
[2020-11-10T04:23:40.511Z]     c.Assert(atomic.LoadInt32(&pdClient.count), Equals, val)
[2020-11-10T04:23:40.511Z] ... obtained int32 = 7
[2020-11-10T04:23:40.511Z] ... expected int32 = 6

@kennytm kennytm added the status/LGT1 One reviewer already commented LGTM (LGTM1) label Nov 10, 2020
Copy link
Member

@overvenus overvenus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@overvenus overvenus added status/LGT2 Two reviewers already commented LGTM, ready for merge (LGTM2) and removed status/LGT1 One reviewer already commented LGTM (LGTM1) labels Nov 10, 2020
@glorv glorv merged commit 0e8cef2 into master Nov 10, 2020
@glorv glorv deleted the fix-parquet-reader branch November 10, 2020 10:07
lance6716 added a commit to lance6716/tidb-lightning that referenced this pull request Nov 17, 2020
add notes

save work

save work

fix unit test

remove tidbMgr in RestoreController

remove some comments

remove some comments

change logger in SQLWithRetry

revert replace log.Logger to *zap.Logger

dep: update uuid dependency to latest google/uuid (pingcap#452)

* dep: update satori/go.uuid to latest

* fix tests

* change to google/uuid

* fix build

* try fix test

* get familiar with google/uuid

* address comment

tidb-lightning-ctl: change default of -d to 'noop://' (pingcap#453)

also add noop:// to supported storage types (to represent an empty store)

replace tab to space

try another port to fix CI

remove some comment

*: more glue

restore: fix the bug that gc life time ttl does not take effect (pingcap#448)

* fix gc ttl loop

* resolve comment and add tests

fix CI

report info to host TiDB

config: filter out all system schemas by default (pingcap#459)

backend: fix auto random default value for primary key (pingcap#457)

* fix auto generate auto random primary key column

* fix default for auto random primary key

* fix test

* use prev row id for auto random and add a test

* replace chunck with session opt

* fix

* fix

mydumper: fix parquet data parser (pingcap#435)

* fix parquet

* reorder imports

* fix test

* use empty collation

* fix a error and add more test cases

* add pointer type tests

* resolve comments

Co-authored-by: kennytm <kennytm@gmail.com>

address comment

backend/local: use range properties to optimize region range estimate (pingcap#422)

* use range propreties to estimate region range

* post-restore: add optional level for post-restore operations (pingcap#421)

* add optional level for opst-restore operations

* trim leading and suffix '"

* use UnmarshalTOML to unmarshal post restore op level

* resolve comments and fix unit test

* backend/local: do not retry epochNotMatch error when ingest sst (pingcap#419)

* do not retry epochNotMatch error when ingest sst

* add retry ingest for 'Raft raft: proposal dropped' error in ingest

* change some retryable error log level from Error to Warn

* fix nextKey

* add a comment for nextKey

* fix comment and add a unit test

* wrap time.Sleep in select

Co-authored-by: kennytm <kennytm@gmail.com>

* update

* use range properties to optimze region range estimate

* update pebble

* change the default value for batch-size

* add unit tests and reslove comments

* add a comment to range properties test

* add a comment

* add a test for range property with pebble

* rename const variable

Co-authored-by: kennytm <kennytm@gmail.com>

fix pd service id is empty (pingcap#460)

fix s3 parquet reader (pingcap#461)

Co-authored-by: Neil Shen <overvenus@gmail.com>

fix service gc ttl again (pingcap#465)

address comment

mydumper: verify file routing config (pingcap#470)

* fix file routing

* remove useless line

* remove redundant if check

rename a method in interface

save work

try fix CI

could work

change ctx usage

try fix CI

try fix CI

refine function interface

refine some fucntion interface

debug CI

address comment

config: allow four byte-size config to be specified using human-readable units ("100 GiB") (pingcap#471)

* Makefile: add `make finish-prepare` action

* config: accept human-readable size for most byte-related config

e.g. allow `region-split-size = '96M'` in additional to `= 100663296`

(known issue: these values' precisions will be truncated to 53 bits
instead of supporting all 63 bits)

* restore: reduce chance of spurious errors from TestGcTTLManagerSingle

Co-authored-by: glorv <glorvs@163.com>

remove debug log

test: change double type syntax (pingcap#474)

address comment

checkpoint: add glue checkpoint

resolve cycle import

expose Retry

refine

change interface to cope with TiDB

fix SQL string

fix SQL

adjust interface to embedded in TiDB

could import now

reduce TLS

restore: add `glue.Glue` interface and other function (pingcap#456)

* save my work

* add notes

* save work

* save work

* fix unit test

* remove tidbMgr in RestoreController

* remove some comments

* remove some comments

* change logger in SQLWithRetry

* revert replace log.Logger to *zap.Logger

* replace tab to space

* try another port to fix CI

* remove some comment

* *: more glue

* report info to host TiDB

* fix CI

* address comment

* address comment

* rename a method in interface

* save work

* try fix CI

* could work

* change ctx usage

* try fix CI

* try fix CI

* refine function interface

* refine some fucntion interface

* debug CI

* address comment

* remove debug log

* address comment

modify code

add comment

refine some code
lance6716 added a commit to lance6716/tidb-lightning that referenced this pull request Nov 17, 2020
add notes

save work

save work

fix unit test

remove tidbMgr in RestoreController

remove some comments

remove some comments

change logger in SQLWithRetry

revert replace log.Logger to *zap.Logger

dep: update uuid dependency to latest google/uuid (pingcap#452)

* dep: update satori/go.uuid to latest

* fix tests

* change to google/uuid

* fix build

* try fix test

* get familiar with google/uuid

* address comment

tidb-lightning-ctl: change default of -d to 'noop://' (pingcap#453)

also add noop:// to supported storage types (to represent an empty store)

replace tab to space

try another port to fix CI

remove some comment

*: more glue

restore: fix the bug that gc life time ttl does not take effect (pingcap#448)

* fix gc ttl loop

* resolve comment and add tests

fix CI

report info to host TiDB

config: filter out all system schemas by default (pingcap#459)

backend: fix auto random default value for primary key (pingcap#457)

* fix auto generate auto random primary key column

* fix default for auto random primary key

* fix test

* use prev row id for auto random and add a test

* replace chunck with session opt

* fix

* fix

mydumper: fix parquet data parser (pingcap#435)

* fix parquet

* reorder imports

* fix test

* use empty collation

* fix a error and add more test cases

* add pointer type tests

* resolve comments

Co-authored-by: kennytm <kennytm@gmail.com>

address comment

backend/local: use range properties to optimize region range estimate (pingcap#422)

* use range propreties to estimate region range

* post-restore: add optional level for post-restore operations (pingcap#421)

* add optional level for opst-restore operations

* trim leading and suffix '"

* use UnmarshalTOML to unmarshal post restore op level

* resolve comments and fix unit test

* backend/local: do not retry epochNotMatch error when ingest sst (pingcap#419)

* do not retry epochNotMatch error when ingest sst

* add retry ingest for 'Raft raft: proposal dropped' error in ingest

* change some retryable error log level from Error to Warn

* fix nextKey

* add a comment for nextKey

* fix comment and add a unit test

* wrap time.Sleep in select

Co-authored-by: kennytm <kennytm@gmail.com>

* update

* use range properties to optimze region range estimate

* update pebble

* change the default value for batch-size

* add unit tests and reslove comments

* add a comment to range properties test

* add a comment

* add a test for range property with pebble

* rename const variable

Co-authored-by: kennytm <kennytm@gmail.com>

fix pd service id is empty (pingcap#460)

fix s3 parquet reader (pingcap#461)

Co-authored-by: Neil Shen <overvenus@gmail.com>

fix service gc ttl again (pingcap#465)

address comment

mydumper: verify file routing config (pingcap#470)

* fix file routing

* remove useless line

* remove redundant if check

rename a method in interface

save work

try fix CI

could work

change ctx usage

try fix CI

try fix CI

refine function interface

refine some fucntion interface

debug CI

address comment

config: allow four byte-size config to be specified using human-readable units ("100 GiB") (pingcap#471)

* Makefile: add `make finish-prepare` action

* config: accept human-readable size for most byte-related config

e.g. allow `region-split-size = '96M'` in additional to `= 100663296`

(known issue: these values' precisions will be truncated to 53 bits
instead of supporting all 63 bits)

* restore: reduce chance of spurious errors from TestGcTTLManagerSingle

Co-authored-by: glorv <glorvs@163.com>

remove debug log

test: change double type syntax (pingcap#474)

address comment

checkpoint: add glue checkpoint

resolve cycle import

expose Retry

refine

change interface to cope with TiDB

fix SQL string

fix SQL

adjust interface to embedded in TiDB

could import now

reduce TLS

restore: add `glue.Glue` interface and other function (pingcap#456)

* save my work

* add notes

* save work

* save work

* fix unit test

* remove tidbMgr in RestoreController

* remove some comments

* remove some comments

* change logger in SQLWithRetry

* revert replace log.Logger to *zap.Logger

* replace tab to space

* try another port to fix CI

* remove some comment

* *: more glue

* report info to host TiDB

* fix CI

* address comment

* address comment

* rename a method in interface

* save work

* try fix CI

* could work

* change ctx usage

* try fix CI

* try fix CI

* refine function interface

* refine some fucntion interface

* debug CI

* address comment

* remove debug log

* address comment

modify code

add comment

refine some code
glorv pushed a commit that referenced this pull request Nov 23, 2020
* save my work

add notes

save work

save work

fix unit test

remove tidbMgr in RestoreController

remove some comments

remove some comments

change logger in SQLWithRetry

revert replace log.Logger to *zap.Logger

dep: update uuid dependency to latest google/uuid (#452)

* dep: update satori/go.uuid to latest

* fix tests

* change to google/uuid

* fix build

* try fix test

* get familiar with google/uuid

* address comment

tidb-lightning-ctl: change default of -d to 'noop://' (#453)

also add noop:// to supported storage types (to represent an empty store)

replace tab to space

try another port to fix CI

remove some comment

*: more glue

restore: fix the bug that gc life time ttl does not take effect (#448)

* fix gc ttl loop

* resolve comment and add tests

fix CI

report info to host TiDB

config: filter out all system schemas by default (#459)

backend: fix auto random default value for primary key (#457)

* fix auto generate auto random primary key column

* fix default for auto random primary key

* fix test

* use prev row id for auto random and add a test

* replace chunck with session opt

* fix

* fix

mydumper: fix parquet data parser (#435)

* fix parquet

* reorder imports

* fix test

* use empty collation

* fix a error and add more test cases

* add pointer type tests

* resolve comments

Co-authored-by: kennytm <kennytm@gmail.com>

address comment

backend/local: use range properties to optimize region range estimate (#422)

* use range propreties to estimate region range

* post-restore: add optional level for post-restore operations (#421)

* add optional level for opst-restore operations

* trim leading and suffix '"

* use UnmarshalTOML to unmarshal post restore op level

* resolve comments and fix unit test

* backend/local: do not retry epochNotMatch error when ingest sst (#419)

* do not retry epochNotMatch error when ingest sst

* add retry ingest for 'Raft raft: proposal dropped' error in ingest

* change some retryable error log level from Error to Warn

* fix nextKey

* add a comment for nextKey

* fix comment and add a unit test

* wrap time.Sleep in select

Co-authored-by: kennytm <kennytm@gmail.com>

* update

* use range properties to optimze region range estimate

* update pebble

* change the default value for batch-size

* add unit tests and reslove comments

* add a comment to range properties test

* add a comment

* add a test for range property with pebble

* rename const variable

Co-authored-by: kennytm <kennytm@gmail.com>

fix pd service id is empty (#460)

fix s3 parquet reader (#461)

Co-authored-by: Neil Shen <overvenus@gmail.com>

fix service gc ttl again (#465)

address comment

mydumper: verify file routing config (#470)

* fix file routing

* remove useless line

* remove redundant if check

rename a method in interface

save work

try fix CI

could work

change ctx usage

try fix CI

try fix CI

refine function interface

refine some fucntion interface

debug CI

address comment

config: allow four byte-size config to be specified using human-readable units ("100 GiB") (#471)

* Makefile: add `make finish-prepare` action

* config: accept human-readable size for most byte-related config

e.g. allow `region-split-size = '96M'` in additional to `= 100663296`

(known issue: these values' precisions will be truncated to 53 bits
instead of supporting all 63 bits)

* restore: reduce chance of spurious errors from TestGcTTLManagerSingle

Co-authored-by: glorv <glorvs@163.com>

remove debug log

test: change double type syntax (#474)

address comment

checkpoint: add glue checkpoint

resolve cycle import

expose Retry

refine

change interface to cope with TiDB

fix SQL string

fix SQL

adjust interface to embedded in TiDB

could import now

reduce TLS

restore: add `glue.Glue` interface and other function (#456)

* save my work

* add notes

* save work

* save work

* fix unit test

* remove tidbMgr in RestoreController

* remove some comments

* remove some comments

* change logger in SQLWithRetry

* revert replace log.Logger to *zap.Logger

* replace tab to space

* try another port to fix CI

* remove some comment

* *: more glue

* report info to host TiDB

* fix CI

* address comment

* address comment

* rename a method in interface

* save work

* try fix CI

* could work

* change ctx usage

* try fix CI

* try fix CI

* refine function interface

* refine some fucntion interface

* debug CI

* address comment

* remove debug log

* address comment

modify code

add comment

refine some code

* address comment

* add some comments

* fix CI and change CREATE TABLE
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
status/LGT2 Two reviewers already commented LGTM, ready for merge (LGTM2)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unable to import parquet file from S3
3 participants