Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🎉 Source Salesforce: add optional parameters to filter properties within streams #1

Closed
wants to merge 68 commits into from

Conversation

jkaelin
Copy link
Owner

@jkaelin jkaelin commented Jan 31, 2022

What

Allows a user to exclude specific fields and/or data-types from the stream.

Current Issues

There are at least a few issues directly requesting the generic ability to exclude specific fields from streams:

I acknowledge that this PR is specific to the Salesforce Source and does not fully address the above issues.

General Use Cases

Some use cases that may be faciliated by this PR:

  • Exclude sensitive fields or data types from streams in order to comply with legal/privacy regulations
  • Exclude fields or data types which may prohibit the stream from syncing successfully
  • Exclude datatypes which may invoke the fallback to the non-Bulk API (e.g. base64/object)

Specific Use Case

The immediate use case being solved for with this implementation is the ability to exclude certain calculated fields within Salesforce objects that may generate exceptions on evaluation. Although these exceptions are effectively masked in the UI, they will cause the Salesforce Bulk API to fail with only a generic error message.

How

  • Add two optional string array parameters to the spec
  • Inject evaluation of these user exclusions against the fields returned by generate_schema

Recommended reading order

  1. spec.json
    • Added two optional parameters exclude_fields & exclude_types
  2. source.py
    • get_user_excluded_fields & get_user_excluded_types
      • Added two helper functions to evaluate the above parameters
    • generate_streams
      • Pass parameters on to api.py:generate_schema
  3. api.py
    • generate_schema
      • add optional parameters for user exclusions
      • exclude fields from schema based on optional parameters
  4. unit_test.py
    • test_bulk_sync_pagination
      • No functional changes
      • Forced to add the #noqa E203 directive due to conflict between black & flake8 on line 143
    • test_schema_with_user_excluded_fields
      • parameterized unit test for named fields
    • test_schema_with_user_excluded_types
      • parameterized unit test for data types

User Impact

  • Users will see two new optional parameters on the New Source page
    • Fields to Exclude
      msedge_EY1gpJuSHa
    • Data Types to Exclude
      msedge_zwrrmF7FuN
  • Sync logging will explicitly list the fields which are being skipped

Pre-merge Checklist

Updating a connector

Community member or Airbyter

  • Grant edit access to maintainers (instructions) -- will do
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • Connector's bootstrap.md. See description and examples
    • Changelog updated in docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
  • PR name follows PR naming conventions

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • Credentials added to Github CI. Instructions.
  • /test connector=connectors/<name> command is passing.
  • New Connector version released on Dockerhub by running the /publish command described here
  • After the new connector version is published, connector version bumped in the seed directory as described here
  • Seed specs have been re-generated by building the platform and committing the changes to the seed spec files, as described here

DoNotPanicUA and others added 30 commits January 25, 2022 22:39
* #9554 PR

* format

* incr version for Postgres

* Update docs/integrations/sources/postgres.md

Co-authored-by: Marcos Marx <marcosmarxm@users.noreply.github.com>

* test upd

* test upd

* incr ver for mssql as effected source

Co-authored-by: Marcos Marx <marcosmarxm@users.noreply.github.com>
* add getConfigWithMetadata method

* run gw format
* Address review comments

* Add comments

* Format code

* Update user documentation
Co-authored-by: lmossman <lmossman@users.noreply.github.com>
* add FailureHelper

* add jobPersistence method for writing failure summary

* record source/destination failures and include them in ReplicationOutput and StandardSyncOutput

* handle failures in ConnectionManagerWorkflow, persist them when failing/cancelling an attempt

* rename attempt to attempt_id in FailureHelper

* test that ConnectionManagerWorkflow correctly records failures

* only set failures on ReplicationOutput if a failure actually occurred

* test that source or destination failure results in correct failureReason

* remove cancellation from failure summaries

* formatting, cleanup

* remove failureSummaryForCancellation

* rename failureSource -> failureOrigin, delete retryable, clarify failureType enum values

* actually persist attemptFailureSummary now that column exists

* use attemptNumber instead of attemptId where appropriate

* small fixes

* formatting

* use maybeAttemptId instead of connectionUpdaterInput.getAttemptNumber

* missed rename from failureSource to failureOrigin
…7787)

* Began working on HubSpot Form Submission Connector

* Added Property History Stream

* Added form_guid to as value to form_submissions_stream.

* Finalized the Form Submission Stream

* Added documentation and test config

* Corrected Version Number

* updated version number to 0.1.25

* removed or none worked on tests

* Changed code due to review comments & merges

* readded Propertyhistory after merging

* bump connector version

Co-authored-by: Tino Merl <tino.merl@park-sieben.com>
Co-authored-by: Marcos Marx <marcosmarxm@gmail.com>
* reintroduce window in days, log warning when sampling occurs

* Unit tests

* Documentation update

* Update airbyte-integrations/connectors/source-google-analytics-v4/source_google_analytics_v4/source.py

Co-authored-by: Sergei Solonitcyn <11441558+sergei-solonitcyn@users.noreply.github.com>

* fix the spec

Signed-off-by: Sergei Solonitcyn <sergei.solonitcyn@zazmic.com>

* some mypy fixes

Signed-off-by: Sergei Solonitcyn <sergei.solonitcyn@zazmic.com>

* bump version

* format

* updated spec and def yaml

* Update source.py

Co-authored-by: Sergei Solonitcyn <11441558+sergei-solonitcyn@users.noreply.github.com>
Co-authored-by: Sergei Solonitcyn <sergei.solonitcyn@zazmic.com>
Co-authored-by: auganbay <auganenu@gmail.com>
…nal images (#9634)

* feat: add publish-external command to slash commands to publish external connector images

* fix: publish only stpec to cache

* fix: according to new script changes

* fix: removed tox installation

* fix: version is not read from command
This is forcing the temporal sync workflow to fail. Using a signal didn't worked because the cancelation scope didn't cancel the child workflow.
* fix double with:

* rename env MY_GITHUB_TOKEN
* github schema

* GitHub dockerfile

* formatted

* 🎉Source HubSpot: Adds form_submission and property_history streams (#7787)

* Began working on HubSpot Form Submission Connector

* Added Property History Stream

* Added form_guid to as value to form_submissions_stream.

* Finalized the Form Submission Stream

* Added documentation and test config

* Corrected Version Number

* updated version number to 0.1.25

* removed or none worked on tests

* Changed code due to review comments & merges

* readded Propertyhistory after merging

* bump connector version

Co-authored-by: Tino Merl <tino.merl@park-sieben.com>
Co-authored-by: Marcos Marx <marcosmarxm@gmail.com>

* bump connector version

Co-authored-by: Tino Merl <35485536+tinomerl@users.noreply.github.com>
Co-authored-by: Tino Merl <tino.merl@park-sieben.com>
Co-authored-by: Marcos Marx <marcosmarxm@gmail.com>
* correct spec + bump connector version

* Update Dockerfile
Co-authored-by: benmoriceau <benmoriceau@users.noreply.github.com>
* remove datetime from bot-profile channel_msgs

* correct retry-after

* return retry-after

* bump connector version
* Move getRegexTests to python source acceptance test

* Remove unused imports

* Update test template
Co-authored-by: Marcos Marx <marcosmarxm@gmail.com>
Signed-off-by: Sergey Chvalyuk <grubberr@gmail.com>
igrankova and others added 29 commits January 28, 2022 14:37
* Files title/description update for issue # 8952

* Version update for issue # 8952

* Changelogs update for PR #9183

* updated pubsub spec in destination_specs.yaml

Co-authored-by: Vadym Ratniuk <midavadim@yahoo.com>
* Files title/description update for issue # 8951

* Version update for issue # 8951

* Changelogs update for PR #9177

* updated oracle spec in destination_specs.yaml

Co-authored-by: Vadym Ratniuk <midavadim@yahoo.com>
…ma (#9851)

* test_defined_keyword_exist_in_schema added

Signed-off-by: Sergey Chvalyuk <grubberr@gmail.com>
* update connector version

* updated expected_spec.json according to new spec

* fixed expected_spec.json
* Verify catalog in redshift source acceptance test

* Dry code

* Fix tests
* fix pre-commit config

* fix tests

* beatify CHANGELOG.md

Co-authored-by: Eugene Kulak <kulak.eugene@gmail.com>
* Re-add HTTP Request source docs

* Add to SUMMARY.md
* airbyte-9328: Added Sentry integration to BigQuery and BigQuery denormalized connector.

* airbyte-5050: Added strategy for INSERT ROW.

* airbyte-9328: Added Sentry integration to Snowflake.

* airbyte-9328: Fix Sentry config.

* airbyte-9328: Fixed PR comments.

* airbyte-9328: Fixed PR comments.

* airbyte-9328: Fix PR comments.

* airbyte-9328: Fixed PR comments.

* airbyte-9328: Fixed PR comments.

* airbyte-9328: Fixed PR comments.

* airbyte-9328: Small changes.

* airbyte-9328: Small changes.

* airbyte-9328: Move SENTRY DSN keys to Dockerfiles.

* Use new dsn

* Revert format

* Remove sentry dsn from compose temporarily

* Log sentry event id

* Move sentry to java base

* Remove sentry code from bigquery

* Update dockerfiles

* Fix build

* Update release tag format

* Bump version

* Add env to dockerfiles

* Fix e2e test connector dockerfil

* Fix snowflake bigquery dockerfile

* Mark new versions as unpublished

Co-authored-by: LiRen Tu <tuliren@gmail.com>
Co-authored-by: Liren Tu <tuliren.git@outlook.com>
* Try python 3.7

* Use black 21.12b0

* Add comment

* Update comment

* Remove unused imports from template
* Allow updating workspace names

* Add additional unit test

* Fix code styling

* Update slug as well

* Update indentations

* Pull name update into separate endpoint
…ter than the value in state for incremental streams (#9550)

* 8906 Output only records in which cursor field is greater than the value in state for incremental streams

* 8906 Fix full refresh read for SurveyResponses stream

* 8906 Add tests + update docs

* 8906 Update docs

* 8906 Bump connector's version
* update doc with list of new tables

* add small description for tables
* Add streamr document

* add missing end line

* fix name: streamr > Streamr

Co-authored-by: alafanechere <augustin.lafanechere@gmail.com>
* Add Jenkins source from Faros AI to connector catalog

* todos

* Add setup instruction

* Update doc

* Feedback and add to dropdown
Offer users the ability to one click deploy much more quickly on DO

Co-authored-by: rbalsick <rbalsick@gmail.com>
Signed-off-by: Sergey Chvalyuk <grubberr@gmail.com>
Co-authored-by: timroes <timroes@users.noreply.github.com>
@jkaelin jkaelin closed this Jan 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet