Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ingest): MongoDB schema inference #2546

Merged
merged 30 commits into from
May 14, 2021

Conversation

kevinhu
Copy link
Contributor

@kevinhu kevinhu commented May 12, 2021

Adds schema inference capabilities for MongoDB:

  • Detects nested fields (concatenates with '.')
  • Detects if field is nullable (single MongoDB is schemaless, this translates to if the field is ever missing)
  • Includes sample_size option to scan a few documents rather than an entire collection

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable)

@kevinhu kevinhu changed the title feat(ingest) MongodDB schema inference feat(ingest): MongodDB schema inference May 12, 2021
@kevinhu kevinhu changed the title feat(ingest): MongodDB schema inference feat(ingest): MongoDB schema inference May 13, 2021
@kevinhu kevinhu marked this pull request as ready for review May 13, 2021 17:29
Copy link
Collaborator

@hsheth2 hsheth2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking pretty good so far - just some small things

Copy link
Collaborator

@hsheth2 hsheth2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with one minor nit

metadata-ingestion/README.md Outdated Show resolved Hide resolved
Copy link
Contributor

@shirshanka shirshanka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@shirshanka shirshanka merged commit 5ab1cbb into datahub-project:master May 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants