-
Notifications
You must be signed in to change notification settings - Fork 452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Auto-create tables and schemas #2090
Auto-create tables and schemas #2090
Conversation
03fceab
to
f871325
Compare
b2402f3
to
5b8c8ed
Compare
cafa679
to
3788c7b
Compare
@@ -91,3 +91,19 @@ class ComponentException(BaseException): | |||
|
|||
:param msg: msg for BaseException | |||
""" | |||
|
|||
|
|||
class UnsupportedDatatype(BaseException): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice
"""A factory for pil image # noqa.""" | ||
|
||
@staticmethod | ||
def check(data: t.Any) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense(future) to add a function check=my_checking_function
to DataType
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This approach complicates the DataType; additionally, we need to discover all possible datatypes for implementing auto-schema. If we add to the datatype, we must initialize an instance of the datatype to discover it. For some types used in functions, such as arrays or torch, this is not very elegant. Therefore, implementing a factory class method is preferable. Only the data types that have implemented this class will be discovered.
WDYT?
superduperdb/base/datalayer.py
Outdated
@@ -1076,6 +1087,16 @@ def infer_schema( | |||
""" | |||
return self.databackend.infer_schema(data, identifier) | |||
|
|||
def set_cfg(self, cfg: Config): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not do this?
@cfg.setter
def cfg(self, value):
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, do we really want to do this? That would mess things up: self.compute
depends on the config used to build the Datalayer
. So you set the cfg
and it doesn't match the attributes of the db
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be an original issue here. If we start a datalayer
by invoking superduperdb(**new_config)
, the value of the config is actually incorrect at this point because it has not been updated using **new_config
. Therefore, when we send tasks to the compute layer, we should send the config with which we actually constructed the datalayer
, not the one from the configuration file; otherwise, there will be a mismatch between the client side and the compute side.
I believe that the correct approach should be to launch the datalayer
using the received CFG
in each compute job
, rather than merely retrieving it from a configuration file.
WDYT?
@@ -262,6 +263,7 @@ class Config(BaseConfig): | |||
logging_type: LogType = LogType.SYSTEM | |||
|
|||
bytes_encoding: BytesEncoding = BytesEncoding.BYTES | |||
auto_schema: bool = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
12f25b1
to
1347f36
Compare
Description
Related Issues
Checklist
make unit_testing
andmake integration-testing
successfully?Additional Notes or Comments