-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support combining multiple columns into a single text feature #3426
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Just 2 small comments.
@@ -169,44 +174,64 @@ def format_input_with_prompt( | |||
template = DEFAULT_ZERO_SHOT_PROMPT_TEMPLATE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OOC: Would a user be able to use the DEFAULT_ZERO_SHOT_PROMPT_TEMPLATE
for the prompt template within an input feature definition like so?
input_features:
- name: context
type: text
preprocessing:
prompt:
template: None # Use default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, though I believe in this case they need to provide a task
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's right, they'd need to provide a task
|
||
preprocessing = input_feature_config["preprocessing"] | ||
if "prompt" in preprocessing and preprocessing["prompt"]["task"] is not None: | ||
if _has_prompt_section(preprocessing): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this work? Are we accepting prompt
keys at both the local and global level?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The LLM schema requires the prompt at the top-level, and does not allow feature-level prompts.
The ECD schema allows prompts at the feature-level, but not the top level.
if is_few_shot: | ||
df["context"] = retrieval_model.search(df, backend, k=k, return_data=True) | ||
def generate_prompt_for_row(row): | ||
kwargs = {col: field_to_dtype[col](row[col]) for col in template_fields} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need to cast the values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have a number like 0.1234
in the DF and you want to format it like {number:.2f}
it will fail if the number is represented as a string in the DF. So we need to cast it to a float. I can add a comment to this effect.
For LLMs and other text models, it's common to treat multiple inputs as a single string of text to feed into the model as input, rather than construct separate "towers" for each feature. This PR provides a standard way to do this via the
prompt
config.For ECD:
For LLM:
Here, it is not required that
context
be a valid column of the input dataset, as it will be generated from theprompt
. However, each ofcolor
,animal
,size
, andobject
are assumed to be columns of the input dataset.