Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Data Store] My Sql - target, source and driver(storey) #2407

Closed
wants to merge 120 commits into from

Conversation

davesh0812
Copy link
Contributor

@davesh0812 davesh0812 commented Sep 19, 2022

This pr focus in implementation of source, target and driver for SqlDB.
SqlDB target can create or read from exists sql collection. The collection is fixed and can't change is schema in the flow.
SqlDriver is not fully supported in the aggregation query flow.

https://jira.iguazeng.com/browse/ML-2610

engine = db.create_engine(db_path)
metadata = db.MetaData()
connection = engine.connect()
collection = db.Table(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why call a table a "collection"? Where does this terminology come from? It looks like the term collection only exists in Oracle's PL/SQL, and it doesn't mean a table there either.

Comment on lines 23 to 24
class SqlDBSourceStorey(storey.sources._IterableSource, storey.sources.WithUUID):
"""Use mongodb collection as input source for a flow.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a copy and paste accident.

from storey.dtypes import _termination_obj


class SqlDBSourceStorey(storey.sources._IterableSource, storey.sources.WithUUID):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please open a storey PR for this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also storeySourse.py is not a correct file name. Would be storey_source.py. But the correct resolution is to move this to storey.

self.collection_name, metadata, autoload=True, autoload_with=engine
)
results = connection.execute(db.select([collection])).fetchall()
df = pandas.DataFrame(results)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The convention is to import pandas as pd.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gtopper I know it, but in source.py in storey you imported it as import pandas so I did the same, because one day we will merge it.

name=self.name or "SqlTarget",
after=after,
graph_shape="cylinder",
class_name="storey.NoSqlTarget",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the terminology is off when SqlDBTarget translates to NoSqlTarget.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gtopper NoSqlTarget using the driver(table) for write & read the data.

Comment on lines 30 to 31
def _are_mongodb_connection_string_not_set() -> bool:
return False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the test never runs? Also the double negation here is confusing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to be more understandable... and it always run

Comment on lines 1579 to 1580
db.Table(collection_name, metadata, *columns)
metadata.create_all(engine)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's right to create a SQL table inside of the constructor. Table creation should happen just before writing. The way it is now, simple creating a SqlDBTarget will try to create the SQL table, and that shouldn't happen until the user runs an operation (like ingest).

header=True,
table=table,
index_cols=key_columns,
# storage_options=self._get_store().get_storage_options(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dead code ☠️

):
import sqlalchemy as db

# {‘fail’, ‘replace’, ‘append’} #
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dead code ☠️

@@ -36,6 +36,7 @@


class TargetTypes:
mongodb = "mongodb"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does mongo have to do with this PR though?

@davesh0812 davesh0812 closed this Jan 3, 2023
@davesh0812
Copy link
Contributor Author

#2869

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants