-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validation of label column needed for dataset being labeled #178
Comments
@Sardhendu is working on this next |
@rishabh-bhargava Do you have script handy to reproduce this? |
@nihit @rishabh-bhargava Guys another thing I observed is that we are reading the entire data in memory. It would be a good idea to have a |
We will likely have to think about label validation for each task separately. The initial example in this issue was for the NER task. For NER, the label column has to follow a certain schema: {
"Location": [],
"Organization": [
"Kurdistan Workers Party",
"PKK"
],
"Person": [],
"Miscellaneous": [
"Kurdish"
]
} However, when I change the config here and replace label_column value with
Are you able to replicate this? |
When a user attempts to run
agent.plan
oragent.run
on a dataset, we should first validate that any data columns needed for labeling/evaluation are in the correct format. For example, when labeling an NER dataset, we observed the following error:This happened when the column in the CSV wasn't a proper JSON object, but instead looked like a Python object:
{'Disease': [], 'Chemical': ['Naloxone', 'clonidine']}
The text was updated successfully, but these errors were encountered: