Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add select_streams option (to be added in the config file) to generate catalog file with some stream(s) pre-selected #9

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

davicorreiajr
Copy link

Hi,

First of all, apologies if I should have waited for an issue to be approved before creating this PR. It's just that I really need this feature, and I was going to use my forked repo; but then I thought maybe it would be helpful for everyone. I would love your opinion on this.

Description of change

It's very weird to me one needs to run the discovery mode, and then somehow (manually, programmatically, or even with tools like singer-discover) change the catalog file in order to select which stream you want to retrieve data from in the sync step (which will probably be sheets, in this case).

Singer taps are a great tool and I think it should be as plug-and-play as possible; and if one uses it when running periodic tasks (which is my case), it just doesn't make sense to manually change any file. Personally, I don't like the idea of programmatically changing the catalog file (as, to be honest, I've seen some people doing) if there's a way to generate it with the desired stream already selected.

The idea here is to add the possibility of including an option in the config file, so the discovery part generates the catalog.json with a (or some) stream(s) pre-selected. This way, there's no intermediate step between the discovery and the sync, in order to get the data you want.

The solution is based on the fact that:

Relates to #8

Manual QA steps

  • Include the option select_streams in the config file:
    • it should be an array of strings (i.e. the names of the streams);
    • you can use pre-defined streams (e.g. file_metadata) or the name of the sheet (e.g. Sheet 1)
  • the discovery step (tap-google-sheets --config config.json --discover > catalog.json) should generate the catalog file with the option "selected": true in the schema corresponding to the stream defined in the select_streams option.

Risks

  • Not sure if select_streams is the best name.

Rollback steps

  • revert this branch

…e catalog file with some stream(s) pre-selected
@cmerrick
Copy link
Contributor

Hi @davicorreiajr, thanks for your contribution!

In order for us to evaluate and accept your PR, we ask that you sign a contribution license agreement. It's all electronic and will take just minutes.

@cmerrick
Copy link
Contributor

You did it @davicorreiajr!

Thank you for signing the Singer Contribution License Agreement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants