Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: change the data preprocessing as an interface version #144

Merged
merged 3 commits into from Nov 24, 2023

Conversation

qidanrui
Copy link
Collaborator

@qidanrui qidanrui commented Nov 23, 2023

The Added Feature

After changing the data preprocessing part as an interface version, users can directly use our sql data preprocessor by using following code:

pip install dbgpt_hub
from dbgpt_hub.data_process import preprocess_sft_data

data_folder = "dbgpt_hub/data"
data_info = [
        {
            "data_source": "spider",
            "train_file": ["train_spider.json", "train_others.json"],
            "dev_file": ["dev.json"],
            "tables_file": "tables.json",
            "db_id_name": "db_id",
            "is_multiple_turn": False,
            "train_output": "spider_train.json",
            "dev_output": "spider_dev.json",
        }
]

preprocess_sft_data(
      data_folder = data_folder,
      data_info = data_info
)

How Do I Test the Feature

  1. Test Environment: Google Colab with T4 GPU
  2. Test Process:
    • Run poetry install

Uploading Screen Shot 2023-11-24 at 9.55.47 PM.png…

* Run `poetry build`
Screen Shot 2023-11-24 at 9 56 03 PM
* Run `poetry run pip install dist/dbgpt_hub-0.2.1-py3-none-any.whl`
Screen Shot 2023-11-24 at 9 56 14 PM
* Run the Python code above
Screen Shot 2023-11-24 at 9 56 48 PM

Comments

I still keep Hongyang's previous version for the current soft execution of DB-GPT-Hub. In the future, DB-GPT-Hub can be a package released on PyPI and Conda with APIs for data preprocessing and fine-tuning. This is the first step. I'll continue doing works to achieve this goal.

@qidanrui qidanrui marked this pull request as draft November 23, 2023 16:24
wangzaistone
wangzaistone previously approved these changes Nov 24, 2023
Copy link
Member

@wangzaistone wangzaistone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so great!

@wangzaistone wangzaistone marked this pull request as ready for review November 24, 2023 02:52
@qidanrui
Copy link
Collaborator Author

so great!

I'll complete the test on GPU by this weekend. please donnot merge it before the test is completedThanks

@qidanrui qidanrui changed the title DRAFT WITHOUT TEST! DONOT MERGE! [change the data preprocessing as an interface version] Feat: change the data preprocessing as an interface version Nov 24, 2023
@qidanrui qidanrui requested a review from csunny November 24, 2023 14:04
@qidanrui qidanrui added the enhancement New feature or request label Nov 24, 2023
Copy link
Member

@wangzaistone wangzaistone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!GREAT!

@wangzaistone wangzaistone merged commit 28dd6a9 into eosphoros-ai:main Nov 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants