Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: moving uploadSchema into UploadJob #3888

Merged
merged 12 commits into from
Sep 21, 2023

Conversation

fracasula
Copy link
Collaborator

@fracasula fracasula commented Sep 19, 2023

Description

Sub issue for the Warehouse integration tests with -race.

The plan is to incrementally reintroduce the changes in smaller PRs to better test the changes in isolation.

In this Pull Request we are moving the uploadSchema out the schema handle and in the UploadJob.

It is recommended to review this PR commit by commit to simplify the review.

Sidenotes

  • uploadSchema does not need a lock for concurrent writes/reads, see dependency map below
  • E2E build passing here

UploadSchema dependency map

Upload.UploadSchema READS

upload.go

  • exportIdentities - not concurrent with generateUploadSchema()
  • exportUserTables - not concurrent with generateUploadSchema()
  • initTableUploads - not concurrent with either write
  • loadAllTablesExcept, used in exportRegularTables - not concurrent with generateUploadSchema()
  • loadUserTables, used in exportUserTables - not concurrent with generateUploadSchema()
  • RefreshPartitions - not concurrent with generateUploadSchema()
  • run - not concurrent with generateUploadSchema()
  • GetTableSchemaInUpload
    • loadIdentityTables - safe
    • loadUserTables - safe
    • updateSchema - safe
      • loadTable - safe
      • loadUserTables - safe
    • LoadTable - used by loadAllTablesExcept - safe
    • LoadUserTables - used by exportUserTables - safe

internal/loadfiles/loadfiles.go

  • createFromStaging
    • CreateLoadFiles - safe
    • ForceCreateLoadFiles - safe

internal/repo/upload.go

  • scanUpload
    • upload.Get
      • grpc.TriggerWHUpload - safe
    • upload.GetToProcess
      • runUploadJobAllocator - safe

Upload.UploadSchema WRITES

router_identities.go

  • getPendingPopulateIdentitiesLoad
    • router.populateHistoricIdentities() in a go routine, same as below, it's used to create an UploadJob only
  • initPrePopulateDestIdentitiesUpload
    • router.populateHistoricIdentities() in a go routine, same as above, it's used to create an UploadJob only

upload.go

  • generateUploadSchema - fired in UploadJob.run() once so it's OK and it's the only write that we should check against
    concurrent reads

Linear Ticket

< Linear Link >

Security

  • The code changed/added as part of this pull request won't create any security issues with how the software is being used.

@fracasula fracasula force-pushed the feature/pipe-297-move-upload-schema-in-uploadjob branch from 9a422d3 to a4dbf64 Compare September 19, 2023 13:42
@codecov
Copy link

codecov bot commented Sep 19, 2023

Codecov Report

Patch coverage: 92.79% and project coverage change: +0.06% 🎉

Comparison is base (f5b8e7b) 69.20% compared to head (5a71caf) 69.26%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3888      +/-   ##
==========================================
+ Coverage   69.20%   69.26%   +0.06%     
==========================================
  Files         353      353              
  Lines       53021    53018       -3     
==========================================
+ Hits        36694    36724      +30     
+ Misses      14023    13989      -34     
- Partials     2304     2305       +1     
Files Changed Coverage Δ
warehouse/upload.go 75.96% <88.88%> (-0.84%) ⬇️
warehouse/internal/repo/upload.go 89.07% <100.00%> (+0.01%) ⬆️
warehouse/schema.go 100.00% <100.00%> (ø)

... and 9 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@fracasula fracasula marked this pull request as ready for review September 19, 2023 15:27
Comment on lines +350 to +351
// TableSchemaDiff returns the diff between the warehouse schema and the upload schema
func (sh *Schema) TableSchemaDiff(tableName string, tableSchema model.TableSchema) whutils.TableSchemaDiff {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are calculating diff, it feels more intuitive to have the first parameter as schema and the second as tableName? wdyt?

Suggested change
// TableSchemaDiff returns the diff between the warehouse schema and the upload schema
func (sh *Schema) TableSchemaDiff(tableName string, tableSchema model.TableSchema) whutils.TableSchemaDiff {
// TableSchemaDiff returns the diff between the warehouse schema and the upload schema
func (sh *Schema) TableSchemaDiff(tableSchema model.TableSchema, tableName string) whutils.TableSchemaDiff {

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you just suggesting to have the tableSchema as the first parameter?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

@cisse21 cisse21 merged commit 2a5547f into master Sep 21, 2023
35 checks passed
@cisse21 cisse21 deleted the feature/pipe-297-move-upload-schema-in-uploadjob branch September 21, 2023 10:59
This was referenced Oct 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants