Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(warehouse): added warehouse handling for s3 with glue and other improvements #2940

Merged
merged 28 commits into from
Mar 2, 2023

Conversation

achettyiitr
Copy link
Member

@achettyiitr achettyiitr commented Feb 5, 2023

Description

  • Added S3 Glue validations.
  • Use the namespace defined by the customer in the destination config. If not present use the default test namespace.
  • Don't use random tables every time while doing validations.
  • Using simple components like objectStorage, connections, createSchema, createAlterTable, fetchSchema, loadTable to satisfy Validate() error instead of coupling it with validation steps.

Improvements

  • Avoid coupling the GRPC module with the validations module.
  • Test coverage for the validations module.

Design changes

Validations

Notion Ticket

Security

  • The code changed/added as part of this pull request won't create any security issues with how the software is being used.

@codecov
Copy link

codecov bot commented Feb 6, 2023

Codecov Report

Patch coverage: 72.47% and project coverage change: +0.33 🎉

Comparison is base (4832630) 52.77% compared to head (54a8a5d) 53.11%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2940      +/-   ##
==========================================
+ Coverage   52.77%   53.11%   +0.33%     
==========================================
  Files         334      332       -2     
  Lines       52216    52313      +97     
==========================================
+ Hits        27558    27784     +226     
+ Misses      23056    22937     -119     
+ Partials     1602     1592      -10     
Impacted Files Coverage Δ
warehouse/admin.go 3.17% <0.00%> (+0.14%) ⬆️
warehouse/api.go 69.89% <0.00%> (+0.20%) ⬆️
warehouse/integrations/bigquery/bigquery.go 2.08% <0.00%> (+<0.01%) ⬆️
warehouse/integrations/testhelper/setup.go 0.00% <0.00%> (ø)
warehouse/upload.go 23.31% <0.00%> (+0.05%) ⬆️
warehouse/utils/utils.go 70.07% <ø> (ø)
warehouse/integrations/redshift/redshift.go 10.34% <26.66%> (+0.49%) ⬆️
warehouse/warehousegrpc.go 70.42% <41.30%> (-9.23%) ⬇️
warehouse/integrations/deltalake/deltalake.go 65.17% <73.68%> (+0.06%) ⬆️
warehouse/validations/validate.go 78.90% <79.94%> (+27.30%) ⬆️
... and 38 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@@ -115,23 +115,25 @@ func (bq *HandleT) DeleteTable(tableName string) (err error) {
return
}

func (bq *HandleT) CreateTable(tableName string, columnMap map[string]string) (err error) {
func (bq *HandleT) CreateTable(tableName string, columnMap map[string]string) error {
Copy link
Member Author

@achettyiitr achettyiitr Feb 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While writing the Unit test, we found that suppression is not happening properly.
The error is getting overridden when we createView if dedup flag is enabled.

}

// Handle error in case of single column
if len(columnsInfo) == 1 {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would need to suppress the errors in case of a single column alter since databricks does support ADD COLUMN IF NOT EXISTS

&validationStep{
ID: 2,
Name: verifyingConnections,
Validator: ct.verifyingConnections,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was too much coupling with steps and validation functions. steps API will return only the steps. Once we get the steps, we call NewValidator to get the validating function.

@@ -1,13 +0,0 @@
package validations_test
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the package does not contain Ginkgo-based tests. Removing it.

return map[string]*validationFunc{
"validate": {
Path: "/validate",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Path is not being used. Removed it.

@@ -94,12 +97,49 @@ func (*warehouseGRPC) TriggerWHUpload(_ context.Context, request *proto.WHUpload
return res, err
}

func (grpc *warehouseGRPC) Validate(_ context.Context, req *proto.WHValidationRequest) (*proto.WHValidationResponse, error) {
handleT := validations.CTHandleT{
Copy link
Member Author

@achettyiitr achettyiitr Feb 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The domain model for the validations package should depend on the destination only, not on the internal representation of the destination like whether tunneling is enabled or not.
Moving the logic to the caller itself

Comment on lines 261 to 262
if CheckAndIgnoreColumnAlreadyExistError(err) {
pkgLogger.Infof("RS: Column already exists for destinationID: %s, tableName: %s with query: %v", rs.Warehouse.Destination.ID, tableName, query)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would need to suppress the errors in case of a single column alter since Redshift does support ADD COLUMN IF NOT EXISTS

@@ -1,60 +0,0 @@
package validations
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to validate.go

@@ -1,149 +0,0 @@
package validations
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to validate.go

Comment on lines +27 to +54
type objectStorage struct {
destination *backendconfig.DestinationT
}

type connections struct {
manager manager.WarehouseOperations
destination *backendconfig.DestinationT
}

type createSchema struct {
manager manager.WarehouseOperations
}

type createAlterTable struct {
manager manager.WarehouseOperations
table string
}

type fetchSchema struct {
manager manager.WarehouseOperations
destination *backendconfig.DestinationT
}

type loadTable struct {
manager manager.WarehouseOperations
destination *backendconfig.DestinationT
table string
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These satisfies the validator interface to validate the particular step.

Comment on lines 516 to 518
if err = operations.Setup(warehouse, &mockUploader{
dest: dest,
}); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure why we are doing things. Since mockUploader is usually for helping testing doesn't imply a lot about its functionality.

Copy link
Member Author

@achettyiitr achettyiitr Feb 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of the complexity of the changes associated with Uploader interface in #2993, I am thinking of working on it separately.
I renamed it to dummyUploader for the current use case.

@achettyiitr achettyiitr requested review from cisse21 and removed request for abhimanyubabbar February 22, 2023 12:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants