Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make extractors usable on their own #19

Closed
MeltyBot opened this issue Jun 25, 2018 · 1 comment
Closed

Make extractors usable on their own #19

MeltyBot opened this issue Jun 25, 2018 · 1 comment

Comments

@MeltyBot
Copy link
Contributor

Migrated from GitLab: https://gitlab.com/meltano/meltano/-/issues/20

Originally created by @joshlambert on 2018-06-25 16:54:40


We discussed this in the past, but opted to not do this due to the increased complexity of adding capabilities for which we do not immediately need: databases other than Postgres, splitting the extractors into separate projects, etc.

Now that we have more engineering resources on board, and have addressed the near term nears of our internal data team, I think we should revisit this topic for a few reasons:

  1. Each of these extractors has their own value, and could generate interest on their own. For example, a good SFDC, Zuora, or Netsuite extractor would be useful for the broader community.
  2. It will take some time for us to really make the full meltano experience great, end to end.
  3. While that is work is being done, we could start generating interest and critical mass with just the extractors themselves.
  4. Right now however, there are a few major hurdles in driving usage of these extractors:
  • Our extractors only output to Postgres. There is no support for exporting to a file, or any other database type. If your EDW runs on bigquery, we can't help you.
  • There is no SEO for the individual extractors. If you google for "sfdc extract", you aren't going to get a good hit based on the full Meltano readme.
  • Further, the extractors aren't easily usable on their own. It's expected that they are used in the context of the full project. For example, there is no canned image, instead they are pulled down with a git checkout.
  • We currently operate as a monorepo, and it is not user friendly to work on these in isolation. Our issues, MR's, README's, etc. all cover a broader scope than the simple sharp tool of extracting from a source.

There are some downsides:

  1. There will be some work to really "productize" these individually, if we are going with our own system.
  • We should accelerate the output to an intermediate format for the extractors, so we can support multiple storage engines. (PG, MySQL, Bigquery, Redshift, Snowflake, etc.) We can then build individual loaders for these.
  • We will need to rework the pipelines, to build a final image for each extractor. Then update the main CI pipeline.
  1. This work may delay down the effort to productize the full meltano project, for example building the data mapping feature.
@MeltyBot
Copy link
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant