Added `prompt_template` preprocessing param for text features #3298

tgaddair · 2023-03-28T23:39:00Z

Useful when fine-tuning large language models (LLMs) that benefit from providing additional context to improve the quality of the embeddings they generate. Particularly valuable when using fixed weights (trainable=false).

for more information, see https://pre-commit.ci

github-actions · 2023-03-29T01:53:40Z

Unit Test Results

    6 files ±  0   6 suites ±0 31m 8s ⏱️ - 1h 21m 22s
  64 tests - 87 61 ✔️ - 77   3 💤 - 10 0 ❌ ±0
104 runs - 87 93 ✔️ - 77 11 💤 - 10 0 ❌ ±0

Results for commit 491ab56. ± Comparison against base commit 87a56fa.

This pull request removes 139 and adds 52 tests. Note that renamed tests count towards both.

tests.integration_tests.test_automl ‑ test_auto_train
tests.integration_tests.test_automl ‑ test_autoconfig_preprocessing_balanced
tests.integration_tests.test_automl ‑ test_autoconfig_preprocessing_imbalanced
tests.integration_tests.test_automl ‑ test_autoconfig_preprocessing_text_image
tests.integration_tests.test_automl ‑ test_create_auto_config[image]
tests.integration_tests.test_automl ‑ test_create_auto_config[multimodal]
tests.integration_tests.test_automl ‑ test_create_auto_config[tabular_large]
tests.integration_tests.test_automl ‑ test_create_auto_config[tabular_small]
tests.integration_tests.test_automl ‑ test_create_auto_config[text]
tests.integration_tests.test_automl ‑ test_create_auto_config_with_dataset_profile
…

tests.ludwig.augmentation.test_augmentation_pipeline ‑ test_ray_model_training_with_augmentation_pipeline[preprocessing0-False]
tests.ludwig.augmentation.test_augmentation_pipeline ‑ test_ray_model_training_with_augmentation_pipeline[preprocessing0-True]
tests.ludwig.augmentation.test_augmentation_pipeline ‑ test_ray_model_training_with_augmentation_pipeline[preprocessing0-augmentation_pipeline_ops2]
tests.ludwig.augmentation.test_augmentation_pipeline ‑ test_ray_model_training_with_augmentation_pipeline[preprocessing1-False]
tests.ludwig.augmentation.test_augmentation_pipeline ‑ test_ray_model_training_with_augmentation_pipeline[preprocessing1-True]
tests.ludwig.augmentation.test_augmentation_pipeline ‑ test_ray_model_training_with_augmentation_pipeline[preprocessing1-augmentation_pipeline_ops2]
tests.ludwig.automl.test_base_config ‑ test_dataset_info[dask]
tests.ludwig.automl.test_base_config ‑ test_dataset_info[pandas]
tests.ludwig.automl.test_base_config ‑ test_infer_parquet_types
tests.ludwig.automl.test_base_config ‑ test_is_field_boolean[dask]
…

This pull request removes 11 skipped tests and adds 1 skipped test. Note that renamed tests count towards both.

tests.integration_tests.test_class_imbalance_feature ‑ test_imbalance_ray[oversample_minority]
tests.integration_tests.test_class_imbalance_feature ‑ test_imbalance_ray[undersample_majority]
tests.integration_tests.test_horovod ‑ test_horovod_gpu_memory_limit
tests.integration_tests.test_hyperopt_ray_horovod ‑ test_hyperopt_executor_bohb
tests.integration_tests.test_hyperopt_ray_horovod ‑ test_hyperopt_executor_with_metric
tests.integration_tests.test_hyperopt_ray_horovod ‑ test_hyperopt_run_hyperopt
tests.integration_tests.test_ray ‑ test_ray_image_modin
tests.integration_tests.test_ray ‑ test_ray_set_and_vector_outputs[csv]
tests.integration_tests.test_ray ‑ test_ray_set_and_vector_outputs[parquet]
tests.integration_tests.test_ray ‑ test_ray_split
…

tests.ludwig.models.test_training_determinism ‑ test_training_determinism_ray_backend

♻️ This comment has been updated with latest results.

…to prompt-templ

justinxzhao · 2023-03-29T10:07:29Z

ludwig/features/text_feature.py

            inverse_vocabulary=metadata[f"{prefix}str2idx"],
-            tokenizer_type=preprocessing_parameters[f"{prefix}tokenizer"],


Would you also be able to remove the rest of the backwards compatibility workaround (introduced in #1859)?

I'm not too familiar with this code tbh, maybe you can take it in a follow-up?

Added test

b986cc1

tgaddair requested review from justinxzhao and geoffreyangus March 28, 2023 23:39

[pre-commit.ci] auto fixes from pre-commit.com hooks

6f99010

for more information, see https://pre-commit.ci

tgaddair added 2 commits March 28, 2023 21:26

Fixed backwards compatibility test

57aa4f6

Merge branch 'prompt-templ' of https://github.com/ludwig-ai/ludwig in…

c04f9fe

…to prompt-templ

justinxzhao approved these changes Mar 29, 2023

View reviewed changes

tgaddair added 2 commits March 29, 2023 10:03

Fixed tests

39b87b5

Fix test

491ab56

tgaddair merged commit 5e4ceab into master Mar 30, 2023

tgaddair deleted the prompt-templ branch March 30, 2023 16:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added `prompt_template` preprocessing param for text features #3298

Added `prompt_template` preprocessing param for text features #3298

tgaddair commented Mar 28, 2023

github-actions bot commented Mar 29, 2023 •

edited

justinxzhao Mar 29, 2023

tgaddair Mar 29, 2023

		inverse_vocabulary=metadata[f"{prefix}str2idx"],
		tokenizer_type=preprocessing_parameters[f"{prefix}tokenizer"],

Added prompt_template preprocessing param for text features #3298

Added prompt_template preprocessing param for text features #3298

Conversation

tgaddair commented Mar 28, 2023

github-actions bot commented Mar 29, 2023 • edited

Unit Test Results

justinxzhao Mar 29, 2023

Choose a reason for hiding this comment

tgaddair Mar 29, 2023

Choose a reason for hiding this comment

Added `prompt_template` preprocessing param for text features #3298

Added `prompt_template` preprocessing param for text features #3298

github-actions bot commented Mar 29, 2023 •

edited