Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: categorise semi structured datatype for databricks during missing datatype #4771

Merged
merged 3 commits into from
Jun 11, 2024

Conversation

achettyiitr
Copy link
Member

@achettyiitr achettyiitr commented Jun 6, 2024

Description

  • For Databricks if the datatype is semi-structured like MAP, ARRAY, STRUCT we are getting the entire schema.
  • Something like STRUCT<activity_type_name STRING, logmessage STRING, clouderrorreporting_googleapis_com_notification_trigger_error_ingestion_time STRING, notificationtype STRING, errorgroupid STRING>

Linear Ticket

  • Resolves PIPE-1173

Security

  • The code changed/added as part of this pull request won't create any security issues with how the software is being used.

Summary by CodeRabbit

  • New Features

    • Enhanced error reporting with more detailed aggregation in error reports.
    • Improved statistics tracking for various data warehouses including Azure Synapse, BigQuery, Clickhouse, Deltalake, MSSQL, Postgres, Redshift, and Snowflake.
  • Refactor

    • Updated function parameters across multiple integrations to improve consistency and reliability in statistics reporting.
  • Tests

    • Added new test cases for schema creation and validation in Deltalake integration.
    • Refactored cleanup logic in Deltalake integration tests for better maintainability.

@rudderlabs rudderlabs deleted a comment from coderabbitai bot Jun 6, 2024
Copy link

codecov bot commented Jun 6, 2024

Codecov Report

Attention: Patch coverage is 81.48148% with 5 lines in your changes missing coverage. Please review.

Project coverage is 74.67%. Comparing base (1ff62ff) to head (4e38b50).
Report is 1 commits behind head on master.

Files Patch % Lines
...ehouse/integrations/azure-synapse/azure-synapse.go 0.00% 1 Missing ⚠️
warehouse/integrations/bigquery/bigquery.go 0.00% 1 Missing ⚠️
warehouse/integrations/mssql/mssql.go 0.00% 1 Missing ⚠️
warehouse/integrations/redshift/redshift.go 0.00% 1 Missing ⚠️
warehouse/integrations/snowflake/snowflake.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4771      +/-   ##
==========================================
+ Coverage   74.61%   74.67%   +0.06%     
==========================================
  Files         414      414              
  Lines       48654    48665      +11     
==========================================
+ Hits        36304    36343      +39     
+ Misses       9966     9947      -19     
+ Partials     2384     2375       -9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@achettyiitr achettyiitr changed the title chore: limit semi structured datatype for databricks during missing datatype chore: categorise semi structured datatype for databricks during missing datatype Jun 6, 2024
@achettyiitr achettyiitr force-pushed the chore.limit-semi-structured-datatype branch from fbeb114 to 362ac77 Compare June 6, 2024 13:49
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between f0cbda3 and 362ac77.

Files selected for processing (1)
  • warehouse/integrations/deltalake/deltalake.go (2 hunks)
Additional comments not posted (1)
warehouse/integrations/deltalake/deltalake.go (1)

363-372: Ensure proper handling of semi-structured data types.

The logic to categorize semi-structured data types (ARRAY, MAP, STRUCT) into uppercase for statistics tracking is implemented correctly. This aligns with the PR's objective to handle semi-structured data types effectively.

@rudderlabs rudderlabs deleted a comment from coderabbitai bot Jun 6, 2024
@lvrach lvrach force-pushed the master branch 2 times, most recently from 4155fa2 to b124ce4 Compare June 6, 2024 15:13
* chore: update flaky test for error reporting aggregation test

* chore: use ElementsMatch func in require for assertion

* chore: update the logic of forming errors property

* chore: update for better readability

---------

Co-authored-by: Sai Sankeerth <sanpj2292@github.com>
@achettyiitr achettyiitr force-pushed the chore.limit-semi-structured-datatype branch from 362ac77 to 2545612 Compare June 6, 2024 18:27
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 362ac77 and 2545612.

Files selected for processing (12)
  • enterprise/reporting/error_reporting.go (1 hunks)
  • warehouse/integrations/azure-synapse/azure-synapse.go (2 hunks)
  • warehouse/integrations/bigquery/bigquery.go (2 hunks)
  • warehouse/integrations/clickhouse/clickhouse.go (1 hunks)
  • warehouse/integrations/datalake/schema-repository/glue.go (2 hunks)
  • warehouse/integrations/deltalake/deltalake.go (2 hunks)
  • warehouse/integrations/deltalake/deltalake_test.go (19 hunks)
  • warehouse/integrations/mssql/mssql.go (3 hunks)
  • warehouse/integrations/postgres/postgres.go (2 hunks)
  • warehouse/integrations/redshift/redshift.go (1 hunks)
  • warehouse/integrations/snowflake/snowflake.go (1 hunks)
  • warehouse/utils/utils.go (2 hunks)
Files not reviewed due to errors (1)
  • warehouse/utils/utils.go (no review received)
Files skipped from review as they are similar to previous changes (1)
  • warehouse/integrations/deltalake/deltalake.go
Additional comments not posted (12)
warehouse/integrations/datalake/schema-repository/glue.go (1)

106-106: LGTM! Ensure that the stats.Default parameter is consistently used in similar function calls across the codebase.

warehouse/integrations/postgres/postgres.go (1)

471-471: LGTM! Ensure that the pg.stats parameter is consistently used in similar function calls across the codebase.

Verification successful

The search results show that the WHCounterStat function is consistently used with a stats parameter across various files in the codebase. Here are the relevant occurrences:

  • warehouse/integrations/deltalake/deltalake.go
  • warehouse/integrations/snowflake/snowflake.go
  • warehouse/integrations/postgres/postgres.go
  • warehouse/integrations/mssql/mssql.go
  • warehouse/integrations/redshift/redshift.go
  • warehouse/integrations/datalake/schema-repository/glue.go
  • warehouse/integrations/clickhouse/clickhouse.go
  • warehouse/integrations/bigquery/bigquery.go
  • warehouse/integrations/azure-synapse/azure-synapse.go

Each instance includes a stats parameter, confirming that the change is consistently applied.

Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify consistent use of `pg.stats` in `WHCounterStat` function calls.

# Test: Search for inconsistent usage. Expect: No occurrences of `WHCounterStat` without `pg.stats`.
rg --type go $'WHCounterStat\([^,]*,[^,]*,[^,]*\)'

Length of output: 51



Script:

#!/bin/bash
# Description: Search for all occurrences of `WHCounterStat` and verify the parameters used.

# Find all occurrences of `WHCounterStat` and display the surrounding lines for context.
rg --type go 'WHCounterStat' -A 2 -B 2

Length of output: 4612

enterprise/reporting/error_reporting.go (1)

543-553: LGTM! The manual iteration over reportGrpKeys and reportsCountMap to populate errs is implemented correctly.

warehouse/integrations/mssql/mssql.go (1)

940-940: Ensure proper handling of unrecognized data types.

The addition of ms.stats as an argument in the WHCounterStat function call is consistent with the changes across other warehouse integrations. This change enhances the tracking of missing data types, aligning with the PR's objectives.

warehouse/integrations/azure-synapse/azure-synapse.go (1)

932-932: Ensure proper handling of unrecognized data types.

Verification successful

The code snippet provided shows that unrecognized data types are being handled by adding them to the unrecognizedSchema map with the value warehouseutils.MissingDatatype. Additionally, a counter statistic is being incremented to log the occurrence of a missing data type.

  • The unrecognized data types are stored in a separate schema (unrecognizedSchema).
  • A counter statistic is incremented to log the missing data type.

This indicates that unrecognized data types are indeed being handled appropriately by logging and tracking them.

Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify that all data types are recognized and properly handled.

# Test: Search for unrecognized data types in the schema. Expect: No occurrences of unrecognized data types.
rg --type go $'warehouseutils.MissingDatatype' warehouse/integrations/azure-synapse/

Length of output: 215



Script:

#!/bin/bash
# Extract the relevant code snippet around the identified line to analyze the handling of unrecognized data types.
sed -n '920,940p' warehouse/integrations/azure-synapse/azure-synapse.go

Length of output: 793

warehouse/integrations/bigquery/bigquery.go (2)

20-20: Added import for stats from github.com/rudderlabs/rudder-go-kit/stats.

This change is necessary to support the new functionality of tracking missing data types as described in the PR summary.


825-825: Updated WHCounterStat function call to include stats.Default as an argument.

This change aligns with the PR's objective to enhance statistics tracking by including a default stats instance when reporting missing data types.

warehouse/integrations/redshift/redshift.go (1)

1197-1197: Ensure proper handling of missing data types.

The addition of rs.stats as an argument in the WHCounterStat function call is consistent with the PR's objective to enhance statistics tracking for missing data types. This change should help in better monitoring and reporting of issues related to unrecognized data types in Redshift schemas.

warehouse/integrations/clickhouse/clickhouse.go (1)

1022-1022: Ensure proper handling of unrecognized data types.

This change correctly logs missing data types as specified in the PR summary. It's a good practice to track these occurrences for debugging and data integrity checks.

warehouse/integrations/snowflake/snowflake.go (1)

1419-1419: Update the WHCounterStat function call to include sf.stats.

This change aligns with the PR's objective to enhance statistics tracking by including the stats object in the WHCounterStat function calls. It correctly implements the new signature of WHCounterStat which now requires a stats.Stats parameter.

warehouse/integrations/deltalake/deltalake_test.go (2)

17-17: Added import for memstats from rudder-go-kit.

This import is necessary for the new statistics tracking functionality as per the PR description.


1222-1236: The cleanup function efficiently handles schema deletion with retries.

This is a robust approach to ensure that resources are cleaned up even if transient errors occur during the cleanup process.

@achettyiitr achettyiitr force-pushed the chore.limit-semi-structured-datatype branch from 2545612 to b810d2b Compare June 6, 2024 18:56
@rudderlabs rudderlabs deleted a comment from coderabbitai bot Jun 6, 2024
@achettyiitr achettyiitr force-pushed the chore.limit-semi-structured-datatype branch from b810d2b to 9426bf2 Compare June 6, 2024 18:58
@rudderlabs rudderlabs deleted a comment from coderabbitai bot Jun 6, 2024
@rudderlabs rudderlabs deleted a comment from coderabbitai bot Jun 10, 2024
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 1ff62ff and 037a79c.

Files selected for processing (13)
  • enterprise/reporting/error_reporting.go (1 hunks)
  • warehouse/integrations/azure-synapse/azure-synapse.go (2 hunks)
  • warehouse/integrations/bigquery/bigquery.go (2 hunks)
  • warehouse/integrations/clickhouse/clickhouse.go (1 hunks)
  • warehouse/integrations/datalake/datalake.go (1 hunks)
  • warehouse/integrations/datalake/schema-repository/glue.go (2 hunks)
  • warehouse/integrations/deltalake/deltalake.go (2 hunks)
  • warehouse/integrations/deltalake/deltalake_test.go (19 hunks)
  • warehouse/integrations/mssql/mssql.go (4 hunks)
  • warehouse/integrations/postgres/postgres.go (2 hunks)
  • warehouse/integrations/redshift/redshift.go (1 hunks)
  • warehouse/integrations/snowflake/snowflake.go (1 hunks)
  • warehouse/utils/utils.go (2 hunks)
Additional comments not posted (18)
warehouse/integrations/datalake/datalake.go (1)

16-16: The addition of the stats import is appropriate for enhanced statistics tracking.

warehouse/integrations/datalake/schema-repository/glue.go (1)

106-106: The update to include stats.Default in the WHCounterStat function call aligns with the PR's objective to standardize statistics tracking across the platform.

warehouse/integrations/postgres/postgres.go (1)

471-471: The update to include pg.stats in the WHCounterStat function call aligns with the PR's objective to standardize statistics tracking across the platform.

warehouse/utils/utils.go (1)

Line range hint 744-755: The modification to include a stats.Stats parameter in the WHCounterStat function enhances flexibility and configurability for statistics tracking.

Verification successful

The modification to include a stats.Stats parameter in the WHCounterStat function has been consistently applied across various warehouse integration files.

  • warehouse/integrations/snowflake/snowflake.go
  • warehouse/integrations/postgres/postgres.go
  • warehouse/integrations/deltalake/deltalake.go
  • warehouse/integrations/mssql/mssql.go
  • warehouse/integrations/redshift/redshift.go
  • warehouse/integrations/clickhouse/clickhouse.go
  • warehouse/integrations/bigquery/bigquery.go
  • warehouse/integrations/datalake/schema-repository/glue.go
  • warehouse/integrations/azure-synapse/azure-synapse.go
Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify the integration of the new `WHCounterStat` function across different warehouse integrations.

# Test: Search for the function usage across warehouse integrations. Expect: Consistent usage of the new signature.
rg --type go $'WHCounterStat'

Length of output: 2030

warehouse/integrations/mssql/mssql.go (1)

940-940: Good use of the updated WHCounterStat function to track missing data types.

This change aligns with the PR's objective to enhance error reporting by categorizing semi-structured data types and tracking them appropriately.

warehouse/integrations/azure-synapse/azure-synapse.go (1)

932-932: The update to include the stats parameter in the WHCounterStat function call is correctly implemented and aligns with the new function signature. Good job ensuring consistency across the codebase.

warehouse/integrations/bigquery/bigquery.go (1)

825-825: The addition of stats.Default to the WHCounterStat function call aligns with the broader updates across the codebase to standardize statistics tracking. This ensures consistent usage of the stats parameter and likely enhances the configurability and default handling of statistics.

warehouse/integrations/redshift/redshift.go (1)

1197-1197: The update to include stats in the WHCounterStat function call aligns with the PR's objectives to enhance statistics tracking. Ensure that the stats object is properly initialized and integrated across all uses in the system.

Verification successful

The verification confirms that the stats parameter is properly integrated in all WHCounterStat function calls across the system.

  • warehouse/integrations/snowflake/snowflake.go
  • warehouse/integrations/postgres/postgres.go
  • warehouse/integrations/mssql/mssql.go
  • warehouse/integrations/redshift/redshift.go
  • warehouse/integrations/datalake/schema-repository/glue.go
  • warehouse/integrations/deltalake/deltalake.go
  • warehouse/integrations/clickhouse/clickhouse.go
  • warehouse/integrations/bigquery/bigquery.go
  • warehouse/integrations/azure-synapse/azure-synapse.go
Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify the integration of the `stats` parameter in all `WHCounterStat` function calls across the system.

# Test: Search for the function usage. Expect: All occurrences should include the `stats` parameter.
rg --type go $'WHCounterStat\('

Length of output: 2034

warehouse/integrations/clickhouse/clickhouse.go (1)

1022-1022: The update to include ch.stats in the WHCounterStat function call aligns with the broader initiative to standardize statistics tracking across the platform.

warehouse/integrations/deltalake/deltalake_test.go (9)

1222-1236: The dropSchema function implements robust error handling and retry logic, which is crucial for ensuring clean test environments. Good job on this implementation.


1120-1145: The test case for schema creation when it does not exist is well-structured and effectively validates the functionality of fetching schemas. It's good to see comprehensive checks before and after the operation.


1147-1218: The test case for handling missing datatypes is comprehensive and uses mocks effectively to simulate system behavior. This is crucial for ensuring robust error handling and reporting in production environments.


Line range hint 929-976: The test case for loading data when the schema does not exist is well-implemented, covering crucial error handling paths and ensuring that the system behaves as expected in this failure scenario.


Line range hint 798-844: The test case for handling mismatches in the number of columns during data loading is well-crafted, ensuring that the system can handle and report schema mismatches effectively.


Line range hint 842-888: The test case for handling mismatches in schema during data loading is thorough and ensures that the system can correctly identify and handle schema discrepancies.


Line range hint 886-931: The test case for handling discards during data loading is well-implemented, ensuring that discarded records are handled appropriately and reported correctly.


Line range hint 974-1037: The test case for partition pruning is well-structured and effectively tests the system's ability to handle partition pruning, which is crucial for performance optimization in large datasets.


1085-1220: The test case for fetching schema is comprehensive and effectively tests the system's ability to fetch and recognize schemas correctly, ensuring robust functionality in production environments.

warehouse/integrations/deltalake/deltalake.go Outdated Show resolved Hide resolved
@rudderlabs rudderlabs deleted a comment from coderabbitai bot Jun 10, 2024
@rudderlabs rudderlabs deleted a comment from coderabbitai bot Jun 10, 2024
@rudderlabs rudderlabs deleted a comment from coderabbitai bot Jun 10, 2024
@rudderlabs rudderlabs deleted a comment from coderabbitai bot Jun 10, 2024
@achettyiitr achettyiitr force-pushed the chore.limit-semi-structured-datatype branch from 037a79c to 1e342f9 Compare June 10, 2024 10:53
@rudderlabs rudderlabs deleted a comment from coderabbitai bot Jun 10, 2024
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 1ff62ff and 1e342f9.

Files selected for processing (13)
  • enterprise/reporting/error_reporting.go (1 hunks)
  • warehouse/integrations/azure-synapse/azure-synapse.go (2 hunks)
  • warehouse/integrations/bigquery/bigquery.go (2 hunks)
  • warehouse/integrations/clickhouse/clickhouse.go (1 hunks)
  • warehouse/integrations/datalake/datalake.go (1 hunks)
  • warehouse/integrations/datalake/schema-repository/glue.go (2 hunks)
  • warehouse/integrations/deltalake/deltalake.go (3 hunks)
  • warehouse/integrations/deltalake/deltalake_test.go (19 hunks)
  • warehouse/integrations/mssql/mssql.go (4 hunks)
  • warehouse/integrations/postgres/postgres.go (2 hunks)
  • warehouse/integrations/redshift/redshift.go (1 hunks)
  • warehouse/integrations/snowflake/snowflake.go (1 hunks)
  • warehouse/utils/utils.go (2 hunks)
Files skipped from review as they are similar to previous changes (11)
  • enterprise/reporting/error_reporting.go
  • warehouse/integrations/azure-synapse/azure-synapse.go
  • warehouse/integrations/bigquery/bigquery.go
  • warehouse/integrations/clickhouse/clickhouse.go
  • warehouse/integrations/datalake/schema-repository/glue.go
  • warehouse/integrations/deltalake/deltalake.go
  • warehouse/integrations/mssql/mssql.go
  • warehouse/integrations/postgres/postgres.go
  • warehouse/integrations/redshift/redshift.go
  • warehouse/integrations/snowflake/snowflake.go
  • warehouse/utils/utils.go
Additional comments not posted (4)
warehouse/integrations/datalake/datalake.go (1)

16-16: Ensure the newly imported logger is utilized effectively throughout the file.

This import enhances logging capabilities, which is crucial for debugging and monitoring in production environments. Make sure that the logging statements cover all critical operations and error handling paths to maximize the utility of this addition.

warehouse/integrations/deltalake/deltalake_test.go (3)

17-17: Added import for github.com/rudderlabs/rudder-go-kit/stats/memstats.

This import is necessary for the new functionality to track memory statistics as part of the testing framework.


29-29: Added multiple imports for backend configurations and test helpers.

These imports are essential for the expanded testing capabilities and integration with backend configurations.


1222-1236: Function dropSchema is well-implemented for schema cleanup.

This utility function is crucial for ensuring that schemas are properly cleaned up after tests, preventing potential conflicts or resource leaks in subsequent tests.

@rudderlabs rudderlabs deleted a comment from coderabbitai bot Jun 10, 2024
@rudderlabs rudderlabs deleted a comment from coderabbitai bot Jun 10, 2024
@achettyiitr achettyiitr force-pushed the chore.limit-semi-structured-datatype branch from 1e342f9 to 4e38b50 Compare June 10, 2024 17:09
@rudderlabs rudderlabs deleted a comment from coderabbitai bot Jun 10, 2024
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 211b421 and 4e38b50.

Files selected for processing (13)
  • enterprise/reporting/error_reporting.go (1 hunks)
  • warehouse/integrations/azure-synapse/azure-synapse.go (2 hunks)
  • warehouse/integrations/bigquery/bigquery.go (2 hunks)
  • warehouse/integrations/clickhouse/clickhouse.go (1 hunks)
  • warehouse/integrations/datalake/datalake.go (1 hunks)
  • warehouse/integrations/datalake/schema-repository/glue.go (2 hunks)
  • warehouse/integrations/deltalake/deltalake.go (3 hunks)
  • warehouse/integrations/deltalake/deltalake_test.go (19 hunks)
  • warehouse/integrations/mssql/mssql.go (4 hunks)
  • warehouse/integrations/postgres/postgres.go (2 hunks)
  • warehouse/integrations/redshift/redshift.go (1 hunks)
  • warehouse/integrations/snowflake/snowflake.go (1 hunks)
  • warehouse/utils/utils.go (2 hunks)
Files skipped from review as they are similar to previous changes (12)
  • enterprise/reporting/error_reporting.go
  • warehouse/integrations/azure-synapse/azure-synapse.go
  • warehouse/integrations/bigquery/bigquery.go
  • warehouse/integrations/clickhouse/clickhouse.go
  • warehouse/integrations/datalake/datalake.go
  • warehouse/integrations/datalake/schema-repository/glue.go
  • warehouse/integrations/deltalake/deltalake.go
  • warehouse/integrations/mssql/mssql.go
  • warehouse/integrations/postgres/postgres.go
  • warehouse/integrations/redshift/redshift.go
  • warehouse/integrations/snowflake/snowflake.go
  • warehouse/utils/utils.go
Additional comments not posted (1)
warehouse/integrations/deltalake/deltalake_test.go (1)

491-491: Standardize cleanup operations.
[REFACTOR_SUGGESTion]
The repeated use of dropSchema in the cleanup blocks across multiple test cases suggests a pattern that could be abstracted into a helper function or setup in a more centralized manner to reduce redundancy and improve maintainability.

- t.Cleanup(func() {
-   dropSchema(t, d.DB.DB, namespace)
- })
+ setupCleanup(t, d, namespace)

And define setupCleanup as:

func setupCleanup(t *testing.T, d *deltalake.Deltalake, namespace string) {
    t.Cleanup(func() {
        if err := dropSchema(t, d.DB.DB, namespace); err != nil {
            t.Error("Failed to drop schema:", err)
        }
    })
}

Also applies to: 516-516, 564-564, 617-617, 668-668, 721-721, 773-773, 798-798, 842-842, 886-886, 929-929, 974-974, 1035-1035

time.Minute,
time.Second,
)
dropSchema(t, db, namespace)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ensure proper cleanup of resources in tests.

The dropSchema function is called in a cleanup block, which is good practice. However, consider adding error handling within the dropSchema function to log or handle potential failures during the cleanup process.

time.Minute,
time.Second,
)
dropSchema(t, db, namespace)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consistency in test cleanup.

Similar to the previous comment, ensure that the dropSchema function used in the cleanup process handles potential errors effectively. This consistency in error handling will make the tests more robust.

Comment on lines +1222 to +1236
func dropSchema(t *testing.T, db *sql.DB, namespace string) {
t.Helper()

require.Eventually(t,
func() bool {
_, err := db.Exec(fmt.Sprintf(`DROP SCHEMA %s CASCADE;`, namespace))
if err != nil {
t.Logf("error deleting schema %q: %v", namespace, err)
return false
}
return true
},
time.Minute,
time.Second,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enhance the dropSchema function with error handling.

The dropSchema function currently logs errors but does not return them, which could mask issues during test cleanup. Modify it to return errors so they can be handled or logged by the caller.

- func dropSchema(t *testing.T, db *sql.DB, namespace string) {
+ func dropSchema(t *testing.T, db *sql.DB, namespace string) error {
-   require.Eventually(t,
+   return require.Eventually(t,
        func() bool {
            _, err := db.Exec(fmt.Sprintf(`DROP SCHEMA %s CASCADE;`, namespace))
            if err != nil {
                t.Logf("error deleting schema %q: %v", namespace, err)
                return false
            }
            return true
        },
        time.Minute,
        time.Second,
    )
}

Committable suggestion was skipped due to low confidence.

Copy link
Contributor

coderabbitai bot commented Jun 10, 2024

Walkthrough

The recent updates span multiple files, primarily enhancing error reporting, statistics tracking, and schema handling in various data warehouse integrations. Key changes include modifications to function signatures, the addition of new methods, and updates to import statements. These enhancements aim to improve the robustness and accuracy of error and statistics handling across different warehouse integrations.

Changes

Files/Groups Change Summary
enterprise/reporting/error_reporting.go Updated aggregate method to manually iterate over reportGrpKeys and reportsCountMap instead of using MapToSlice.
warehouse/integrations/azure-synapse/... Modified parameter passed to WHCounterStat within AzureSynapse struct.
warehouse/integrations/bigquery/... Added import for stats and modified function call in FetchSchema method to use stats.Default.
warehouse/integrations/clickhouse/... Updated WHCounterStat call by adding ch.stats as an argument.
warehouse/integrations/datalake/... Added import for logger from rudder-go-kit.
warehouse/integrations/datalake/schema-repository/glue.go Added import for stats and modified FetchSchema method to include stats.Default.
warehouse/integrations/deltalake/... Added sendStatForMissingDatatype method to handle missing data types in statistics reporting and updated FetchSchema method.
warehouse/integrations/deltalake/..._test.go Refactored cleanup logic in test cases, introduced cleanup function, and added new test cases under Fetch Schema.
warehouse/integrations/mssql/... Added various imports and modified function signature in ProcessColumnValue method.
warehouse/integrations/postgres/... Updated FetchSchema method to include pg.stats in WHCounterStat call.
warehouse/integrations/redshift/... Modified FetchSchema method to include rs.stats in WHCounterStat call.
warehouse/integrations/snowflake/... Updated FetchSchema method to pass sf.stats to WHCounterStat instead of directly referencing &sf.Warehouse.
warehouse/utils/utils.go Modified WHCounterStat function to accept a stats.Stats parameter for creating new tagged statistics.

Sequence Diagram(s)

No sequence diagrams are necessary for these changes as they primarily involve updates to function parameters, imports, and minor logic adjustments without introducing new features or significantly altering control flow.

Poem

In the realm of code so bright,
We tweak and tune with all our might,
Stats and errors, now more clear,
Data flows without a fear.
Warehouses hum a smoother song,
With every change, we move along.
🐇✨


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@cisse21 cisse21 merged commit b81ee4a into master Jun 11, 2024
54 checks passed
@cisse21 cisse21 deleted the chore.limit-semi-structured-datatype branch June 11, 2024 05:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants