Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixes #12591: add BigTable #15122

Merged
merged 12 commits into from
Feb 13, 2024
Merged

fixes #12591: add BigTable #15122

merged 12 commits into from
Feb 13, 2024

Conversation

sushi30
Copy link
Contributor

@sushi30 sushi30 commented Feb 9, 2024

Describe your changes:

Fixes #12591. Implements the BigTable metadata connector.

Documentation will be submitted in a separate PR.

Summary

  1. Implemented the metadata ingestion connector in metadata.py,connection.py,models.py`.
  2. Added bytes type support for datalake_utils.
  3. Added debug message when records return unsupported types.
  4. Added support for table constraints for NoSQL databases.
  5. Implemented the required jsonschemas.
  6. Implemented ClassConverter`

Type of change:

  • Improvement

Checklist:

  • I have read the CONTRIBUTING document.
  • My PR title is Fixes <issue-number>: <short explanation>
  • I have commented on my code, particularly in hard-to-understand areas.
  • For JSON Schema changes: I updated the migration scripts or explained why it is not needed.

Copy link

github-actions bot commented Feb 9, 2024

The Python checkstyle failed.

Please run make py_format and py_format_check in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Python code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

Copy link

github-actions bot commented Feb 9, 2024

Jest test Coverage

UI tests summary

Lines Statements Branches Functions
Coverage: 58%
58.8% (30165/51300) 39.04% (12104/31003) 38.79% (3680/9488)

1. docstrings
2. tests
3. created a Row BaseModel
4. implemented a ClassConverter
Copy link

The Java checkstyle failed.

Please run mvn spotless:apply in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Java code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

Copy link

The Python checkstyle failed.

Please run make py_format and py_format_check in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Python code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

@sushi30 sushi30 changed the title feat(connector): add BigTable fixes 12591: add BigTable Feb 12, 2024
Copy link

The Python checkstyle failed.

Please run make py_format and py_format_check in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Python code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

- removed TODO
- changed headers' year to 2024 for new files
- fixed typos
Copy link

The Python checkstyle failed.

Please run make py_format and py_format_check in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Python code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

@sushi30 sushi30 changed the title fixes 12591: add BigTable fixes #12591: add BigTable Feb 12, 2024
@sushi30 sushi30 mentioned this pull request Feb 12, 2024
5 tasks
Copy link

The Python checkstyle failed.

Please run make py_format and py_format_check in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Python code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

Copy link
Collaborator

@pmbrull pmbrull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome @sushi30! Thanks for the great job. Left just a few comments if you have the time


# ths instances and tables are cached to avoid making redundant requests to the API.
self.instances: Dict[str, Tuple[Instance]] = {}
self.tables: Dict[str, Dict[str, Tuple[Table]]] = {}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - since this is quite a hard typing, would it make sense to have a custom pydantic model?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the types to be aliased so it is clear how the dictionaries are formed. If you still think a pydantic Model is better suited I am open to modifying but I need an example of how to model it either in the project or some draft.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, works with the aliases 👍

return cls(config, metadata)

def get_configured_database(self) -> Optional[str]:
return None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe I'm just out of context on the NoSQL common class, but a comment/docstring explaining why this is None would be great 🙏

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a docstring. this is an edge case for a MultiDB source without a default DB.

project_id = self.context.database
try:
# the first element is a list of instances
# the second element is another collection (seems empty) and I do not know what is its purpose
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair enough 😅 sometimes these client have weird things

try:
instance = [
i for i in self.instances[project_id] if i.instance_id == schema_name
][0]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh this feels a bit magic

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

abstracted it to a method. should be clearer now.

"type": "service_account",
"projectId": "my-gcp-project",
"privateKeyId": "private_key_id",
"privateKey": "-----BEGIN RSA PRIVATE KEY-----\nMIIEpQIBAAKCAQEAw3vHG9fDIkcYB0xi2Mv4fS2gUzKR9ZRrcVNeKkqGFTT71AVB\nOzgIqYVe8b2aWODuNye6sipcrqTqOt05Esj+sxhk5McM9bE2RlxXC5QH/Bp9zxMP\n/Yksv9Ov7fdDt/loUk7sTXvI+7LDJfmRYU6MtVjyyLs7KpQIB2xBWEToU1xZY+v0\ndRC1NA+YWc+FjXbAiFAf9d4gXkYO8VmU5meixVh4C8nsjokEXk0T/HEItpZCxadk\ndZ7LKUE/HDmWCO2oNG6sCf4ET2crjSdYIfXuREopX1aQwnk7KbI4/YIdlRz1I369\nAz3+Hxlf9lLJVH3+itN4GXrR9yWWKWKDnwDPbQIDAQABAoIBAQC3X5QuTR7SN8iV\niBUtc2D84+ECSmza5shG/UJW/6N5n0Mf53ICgBS4GNEwiYCRISa0/ILIgK6CcVb7\nsuvH8F3kWNzEMui4TO0x4YsR5GH9HkioCCS224frxkLBQnL20HIIy9ok8Rpe6Zjg\nNZUnp4yczPyqSeA9l7FUbTt69uDM2Cx61m8REOpFukpnYLyZGbmNPYmikEO+rq9r\nwNID5dkSeVuQYo4MQdRavOGFUWvUYXzkEQ0A6vPyraVBfolESX8WaLNVjic7nIa3\nujdSNojnJqGJ3gslntcmN1d4JOfydc4bja4/NdNlcOHpWDGLzY1QnaDe0Koxn8sx\nLT9MVD2NAoGBAPy7r726bKVGWcwqTzUuq1OWh5c9CAc4N2zWBBldSJyUdllUq52L\nWTyva6GRoRzCcYa/dKLLSM/k4eLf9tpxeIIfTOMsvzGtbAdm257ndMXNvfYpxCfU\nK/gUFfAUGHZ3MucTHRY6DTkJg763Sf6PubA2fqv3HhVZDK/1HGDtHlTPAoGBAMYC\npdV7O7lAyXS/d9X4PQZ4BM+P8MbXEdGBbPPlzJ2YIb53TEmYfSj3z41u9+BNnhGP\n4uzUyAR/E4sxrA2+Ll1lPSCn+KY14WWiVGfWmC5j1ftdpkbrXstLN8NpNYzrKZwx\njdR0ZkwvZ8B5+kJ1hK96giwWS+SJxJR3TohcQ18DAoGAJSfmv2r//BBqtURnHrd8\nwq43wvlbC8ytAVg5hA0d1r9Q4vM6w8+vz+cuWLOTTyobDKdrG1/tlXrd5r/sh9L0\n15SIdkGm3kPTxQbPNP5sQYRs8BrV1tEvoao6S3B45DnEBwrdVN42AXOvpcNGoqE4\nuHpahyeuiY7s+ZV8lZdmxSsCgYEAolr5bpmk1rjwdfGoaKEqKGuwRiBX5DHkQkxE\n8Zayt2VOBcX7nzyRI05NuEIMrLX3rZ61CktN1aH8fF02He6aRaoE/Qm9L0tujM8V\nNi8WiLMDeR/Ifs3u4/HAv1E8v1byv0dCa7klR8J257McJ/ID4X4pzcxaXgE4ViOd\nGOHNu9ECgYEApq1zkZthEQymTUxs+lSFcubQpaXyf5ZC61cJewpWkqGDtSC+8DxE\nF/jydybWuoNHXymnvY6QywxuIooivbuib6AlgpEJeybmnWlDOZklFOD0abNZ+aNO\ndUk7XVGffCakXQ0jp1kmZA4lGsYK1h5dEU5DgXqu4UYJ88Vttax2W+Y=\n-----END RSA PRIVATE KEY-----\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope this is a made-up key 😬

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a real key. but it doesnt open aynthing... it has to be real otherwise the model fails on validation.

1. added missing docstrings.
2. abstracted the _find_instance method.
3. aliased the IDs used in the BigTable connection
Copy link

The Python checkstyle failed.

Please run make py_format and py_format_check in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Python code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

Copy link

The Python checkstyle failed.

Please run make py_format and py_format_check in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Python code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

Copy link

sonarcloud bot commented Feb 12, 2024

Quality Gate Passed Quality Gate passed for 'open-metadata-ui'

Issues
0 New issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

Copy link

sonarcloud bot commented Feb 12, 2024

Quality Gate Passed Quality Gate passed for 'open-metadata-ingestion'

Issues
0 New issues

Measures
0 Security Hotspots
74.7% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

Copy link
Collaborator

@pmbrull pmbrull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sushi30, congrats on the first connector! LGTM

@pmbrull pmbrull merged commit aeb5fbe into main Feb 13, 2024
20 checks passed
@pmbrull pmbrull deleted the big-table-connector branch February 13, 2024 07:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend documentation Improvements or additions to documentation Ingestion safe to test Add this label to run secure Github workflows on PRs UI UI specific issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BigTable connector
4 participants