Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ludwig-based model train and tune support. #935

Merged
merged 36 commits into from
Aug 21, 2023
Merged

Ludwig-based model train and tune support. #935

merged 36 commits into from
Aug 21, 2023

Conversation

xzdandy
Copy link
Collaborator

@xzdandy xzdandy commented Aug 17, 2023

Provide support for the following queries.

CREATE UDF IF NOT EXISTS PredictHouseRent FROM
(SELECT * FROM home_rentals)
TYPE Ludwig
'predict' 'rental_price';
SELECT PredictHouseRent(...) FROM home_rentals;
  • Basic implementation
  • Documentation on the model training feature.
  • Set up circleci for ludwig tests

πŸ‘‹ Thanks for submitting a Pull Request to EvaDB!

πŸ™Œ We want to make contributing to EvaDB as easy and transparent as possible. Here are a few tips to get you started:

  • πŸ” Search existing EvaDB PRs to see if a similar PR already exists.
  • πŸ”— Link this PR to a EvaDB issue to help us understand what bug fix or feature is being implemented.
  • πŸ“ˆ Provide before and after profiling results to help us quantify the improvement your PR provides (if applicable).

πŸ‘‰ Please see our βœ… Contributing Guide for more details.

@xzdandy xzdandy marked this pull request as ready for review August 18, 2023 08:18
@xzdandy
Copy link
Collaborator Author

xzdandy commented Aug 19, 2023

The training support is ready. Testcases failed are not related to training. I will fix the fuzzy join one. And Jiashen will fix the native executor. We will set up a new circleci test for training.

python -c "import yaml;f = open('evadb/evadb.yml', 'r+');config_obj = yaml.load(f, Loader=yaml.FullLoader);config_obj['experimental']['ray'] = True;f.seek(0);f.write(yaml.dump(config_obj));f.truncate();"
else
if [ $PY_VERSION != "3.11" ]; then
pip install ".[dev,ludwig,qdrant]"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gaurav274 @jiashenC @jarulraj To avoid dependency conflict, I have changed the way that circle ci installs the EvaDB. Let me know if you have a different idea. Thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with this as well. Another way to rewrite this using the current infrastructure is when the ray is disabled and the python version is not 3.11. We already have both of those flags. If I understand correctly, we just need to combine https://github.com/georgia-tech-db/evadb/blob/master/.circleci/config.yml#L168 and https://github.com/georgia-tech-db/evadb/blob/master/.circleci/config.yml#L181?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not able to find a way to do when else in circle ci. I am following the suggestions from https://discuss.circleci.com/t/conditional-steps-if-else-without-code-duplication/45030

Copy link
Member

@jiashenC jiashenC Aug 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking of the following logic.

// we already have those two
if ray enabled: install ray
if python != 3.11: install qdrant

// we add for this PR
if ray disabled and python != 3.11: install ludwig 

We can use nested logic https://circleci.com/docs/configuration-reference/#logic-statements to specify ray disabled and python != 3.11 in one when statement. Will this work?

python -c "import yaml;f = open('evadb/evadb.yml', 'r+');config_obj = yaml.load(f, Loader=yaml.FullLoader);config_obj['experimental']['ray'] = True;f.seek(0);f.write(yaml.dump(config_obj));f.truncate();"
else
if [ $PY_VERSION != "3.11" ]; then
pip install ".[dev,ludwig,qdrant]"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with this as well. Another way to rewrite this using the current infrastructure is when the ray is disabled and the python version is not 3.11. We already have both of those flags. If I understand correctly, we just need to combine https://github.com/georgia-tech-db/evadb/blob/master/.circleci/config.yml#L168 and https://github.com/georgia-tech-db/evadb/blob/master/.circleci/config.yml#L181?

inputs, outputs = [], []
for column in all_column_list:
if column.name in predict_columns:
column.name = column.name + "_predictions"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we ensure the UDF always returns something named xxx_predictions?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ludwig appends the _predictions to the target column. I haven't found a way to specify the output column name for ludwig yet.

.. code-block:: sql

CREATE UDF IF NOT EXISTS PredictHouseRent FROM
(SELECT * FROM HomeRentals)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember you were discussing this. Is * working now?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works. It will consider _row_id also as an input column. I will fix it in later PRs.

python -c "import yaml;f = open('evadb/evadb.yml', 'r+');config_obj = yaml.load(f, Loader=yaml.FullLoader);config_obj['experimental']['ray'] = True;f.seek(0);f.write(yaml.dump(config_obj));f.truncate();"
else
if [ $PY_VERSION != "3.11" ]; then
pip install ".[dev,ludwig,qdrant]"
Copy link
Member

@jiashenC jiashenC Aug 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking of the following logic.

// we already have those two
if ray enabled: install ray
if python != 3.11: install qdrant

// we add for this PR
if ray disabled and python != 3.11: install ludwig 

We can use nested logic https://circleci.com/docs/configuration-reference/#logic-statements to specify ray disabled and python != 3.11 in one when statement. Will this work?

@xzdandy xzdandy merged commit e8f3221 into master Aug 21, 2023
21 checks passed
@xzdandy xzdandy deleted the model_train branch August 21, 2023 15:07
@gaurav274 gaurav274 added Integrations 🧩 Pull requests that update an integration Feature Request ✨ New feature or request labels Aug 25, 2023
@gaurav274 gaurav274 added this to the v0.3.2 milestone Aug 25, 2023
@americast americast mentioned this pull request Aug 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature Request ✨ New feature or request Integrations 🧩 Pull requests that update an integration
Development

Successfully merging this pull request may close these issues.

None yet

3 participants