New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Data Store] My Sql - target, source and driver(storey) #2407
Conversation
mlrun/datastore/sources.py
Outdated
engine = db.create_engine(db_path) | ||
metadata = db.MetaData() | ||
connection = engine.connect() | ||
collection = db.Table( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why call a table a "collection"? Where does this terminology come from? It looks like the term collection only exists in Oracle's PL/SQL, and it doesn't mean a table there either.
mlrun/datastore/storeySourse.py
Outdated
class SqlDBSourceStorey(storey.sources._IterableSource, storey.sources.WithUUID): | ||
"""Use mongodb collection as input source for a flow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a copy and paste accident.
mlrun/datastore/storeySourse.py
Outdated
from storey.dtypes import _termination_obj | ||
|
||
|
||
class SqlDBSourceStorey(storey.sources._IterableSource, storey.sources.WithUUID): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please open a storey PR for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also storeySourse.py
is not a correct file name. Would be storey_source.py
. But the correct resolution is to move this to storey.
mlrun/datastore/storeySourse.py
Outdated
self.collection_name, metadata, autoload=True, autoload_with=engine | ||
) | ||
results = connection.execute(db.select([collection])).fetchall() | ||
df = pandas.DataFrame(results) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The convention is to import pandas as pd
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gtopper I know it, but in source.py
in storey you imported it as import pandas
so I did the same, because one day we will merge it.
name=self.name or "SqlTarget", | ||
after=after, | ||
graph_shape="cylinder", | ||
class_name="storey.NoSqlTarget", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the terminology is off when SqlDBTarget
translates to NoSqlTarget
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gtopper NoSqlTarget using the driver(table) for write & read the data.
def _are_mongodb_connection_string_not_set() -> bool: | ||
return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the test never runs? Also the double negation here is confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it to be more understandable... and it always run
mlrun/datastore/targets.py
Outdated
db.Table(collection_name, metadata, *columns) | ||
metadata.create_all(engine) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's right to create a SQL table inside of the constructor. Table creation should happen just before writing. The way it is now, simple creating a SqlDBTarget
will try to create the SQL table, and that shouldn't happen until the user runs an operation (like ingest
).
mlrun/datastore/targets.py
Outdated
header=True, | ||
table=table, | ||
index_cols=key_columns, | ||
# storage_options=self._get_store().get_storage_options(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dead code ☠️
mlrun/datastore/targets.py
Outdated
): | ||
import sqlalchemy as db | ||
|
||
# {‘fail’, ‘replace’, ‘append’} # |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dead code ☠️
mlrun/datastore/targets.py
Outdated
@@ -36,6 +36,7 @@ | |||
|
|||
|
|||
class TargetTypes: | |||
mongodb = "mongodb" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does mongo have to do with this PR though?
This pr focus in implementation of source, target and driver for SqlDB.
SqlDB target can create or read from exists sql collection. The collection is fixed and can't change is schema in the flow.
SqlDriver is not fully supported in the aggregation query flow.
https://jira.iguazeng.com/browse/ML-2610