-
Notifications
You must be signed in to change notification settings - Fork 262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ludwig-based model train and tune support. #935
Conversation
This reverts commit 84efe77.
The training support is ready. Testcases failed are not related to training. I will fix the fuzzy join one. And Jiashen will fix the native executor. We will set up a new circleci test for training. |
python -c "import yaml;f = open('evadb/evadb.yml', 'r+');config_obj = yaml.load(f, Loader=yaml.FullLoader);config_obj['experimental']['ray'] = True;f.seek(0);f.write(yaml.dump(config_obj));f.truncate();" | ||
else | ||
if [ $PY_VERSION != "3.11" ]; then | ||
pip install ".[dev,ludwig,qdrant]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gaurav274 @jiashenC @jarulraj To avoid dependency conflict, I have changed the way that circle ci installs the EvaDB. Let me know if you have a different idea. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with this as well. Another way to rewrite this using the current infrastructure is when the ray is disabled and the python version is not 3.11. We already have both of those flags. If I understand correctly, we just need to combine https://github.com/georgia-tech-db/evadb/blob/master/.circleci/config.yml#L168 and https://github.com/georgia-tech-db/evadb/blob/master/.circleci/config.yml#L181?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not able to find a way to do when else
in circle ci. I am following the suggestions from https://discuss.circleci.com/t/conditional-steps-if-else-without-code-duplication/45030
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am thinking of the following logic.
// we already have those two
if ray enabled: install ray
if python != 3.11: install qdrant
// we add for this PR
if ray disabled and python != 3.11: install ludwig
We can use nested logic https://circleci.com/docs/configuration-reference/#logic-statements to specify ray disabled
and python != 3.11
in one when statement. Will this work?
python -c "import yaml;f = open('evadb/evadb.yml', 'r+');config_obj = yaml.load(f, Loader=yaml.FullLoader);config_obj['experimental']['ray'] = True;f.seek(0);f.write(yaml.dump(config_obj));f.truncate();" | ||
else | ||
if [ $PY_VERSION != "3.11" ]; then | ||
pip install ".[dev,ludwig,qdrant]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with this as well. Another way to rewrite this using the current infrastructure is when the ray is disabled and the python version is not 3.11. We already have both of those flags. If I understand correctly, we just need to combine https://github.com/georgia-tech-db/evadb/blob/master/.circleci/config.yml#L168 and https://github.com/georgia-tech-db/evadb/blob/master/.circleci/config.yml#L181?
inputs, outputs = [], [] | ||
for column in all_column_list: | ||
if column.name in predict_columns: | ||
column.name = column.name + "_predictions" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we ensure the UDF always returns something named xxx_predictions
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ludwig appends the _predictions
to the target column. I haven't found a way to specify the output column name for ludwig yet.
.. code-block:: sql | ||
|
||
CREATE UDF IF NOT EXISTS PredictHouseRent FROM | ||
(SELECT * FROM HomeRentals) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember you were discussing this. Is *
working now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It works. It will consider _row_id
also as an input column. I will fix it in later PRs.
python -c "import yaml;f = open('evadb/evadb.yml', 'r+');config_obj = yaml.load(f, Loader=yaml.FullLoader);config_obj['experimental']['ray'] = True;f.seek(0);f.write(yaml.dump(config_obj));f.truncate();" | ||
else | ||
if [ $PY_VERSION != "3.11" ]; then | ||
pip install ".[dev,ludwig,qdrant]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am thinking of the following logic.
// we already have those two
if ray enabled: install ray
if python != 3.11: install qdrant
// we add for this PR
if ray disabled and python != 3.11: install ludwig
We can use nested logic https://circleci.com/docs/configuration-reference/#logic-statements to specify ray disabled
and python != 3.11
in one when statement. Will this work?
Provide support for the following queries.
👋 Thanks for submitting a Pull Request to EvaDB!
🙌 We want to make contributing to EvaDB as easy and transparent as possible. Here are a few tips to get you started:
👉 Please see our ✅ Contributing Guide for more details.