Label definitions (NER + Textcat) #64

jankrepl · 2023-04-21T15:33:30Z

I really love this tool! Great job:)

If I am not mistaken, this project assumes that providing names of the labels is enough for the model to understand what that label represents. To use an example from your README

DISH, INGREDIENT, EQUIPMENT
recipe, feedback, question

However, what if my classification labels are not self-explanatory (even to humans) and require extra definition of what one means by them? See below a (rather artificial) example for textcat labels

A, B, C

A - sentence was written by Anna, she is very kind and never gets angry.
B - sentence was written by Bob, he is very creative and and likes to make things up.
C - sentence was written by Celine, she likes to keep it short.

Let's assume we don't have any or enough examples for the model to figure out that relationship.

I can think of 2 possible solutions

Replacing the label with the actual definition. Downsides
- The number of prompt tokens and generated tokens will increase (-> more money spent)
- More likely for the model to make a mistake by not generating exactly the same label thus making the output parsing tricky
Having a prompt prefix where one simply copy pastes the definitions and then just continues with the standard prompt

I would be more than happy to hear from you and your ideas how to handle this!

Thank you in advance

(@koaning you might be the right person to answer this)

The text was updated successfully, but these errors were encountered:

koaning · 2023-04-25T14:04:19Z

Have you seen the setting for the --example-path? Here's an example of what you might send along. It's not exactly what you're asking for, but have you tried it? It might be enough.

There's certainly some prompt engineering that you could do here though, which is stuff we're putting on our roadmap. If you've used a custom template with better results I'd be all ears.

jankrepl · 2023-04-26T17:34:03Z

Thanks for the reply:) Yeh, providing a couple of examples and hoping the model gets it is probably the most straightforward solution.

As I mentioned, my intuition would be to go for

Having a prompt prefix where one simply copy pastes the definitions and then just continues with the standard prompt

In other words, one would let the user define a CSV table

name	definition
A	Definition of label A
B	Definition of label B
C	Definition of label C

and then introduced an option in the CLI e.g. --label-definition-path that would point at the table.

Finally the jinja2 template would contain something like

{% if labels %}
Below are definitions of all labels:
{% for label in labels %}
Text:
"""
label: {{ label.name }}
definition: {{ label.definition }}
"""
{% endfor %}
{% endif %}

{# Now one could have the examples section #}

This would not be compatible with the current labels CLI argument that is a comma separated list of label names (without definitions).

Anyway, I understand that this feature might not be that relevant. I just happen to be dealing with a dataset where the labels need to be defined. Closing the issue.

Thanks for the help:)

koaning · 2023-04-26T18:25:15Z

I'm re-opening this issue because it occurs to me as fair feedback and I'd like to have this ticket around as a reminder.

koaning · 2023-08-09T08:28:28Z

The recipes in this repository have since moved to Prodigy and are being maintained there. They will soon even get an upgrade with the advent of spacy-llm support, which features better prompts and multiple LLM providers. That is why we've opted to archive this repo, which is also why I'm closing all the issues.

You can learn more by checking out the large language models section on the docs.

jankrepl changed the title ~~Label definitions~~ Label definitions (NER + Textcat) Apr 21, 2023

jankrepl closed this as completed Apr 26, 2023

koaning reopened this Apr 26, 2023

koaning closed this as completed Aug 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Label definitions (NER + Textcat) #64

Label definitions (NER + Textcat) #64

jankrepl commented Apr 21, 2023 •

edited

Loading

koaning commented Apr 25, 2023

jankrepl commented Apr 26, 2023 •

edited

Loading

koaning commented Apr 26, 2023

koaning commented Aug 9, 2023

Label definitions (NER + Textcat) #64

Label definitions (NER + Textcat) #64

Comments

jankrepl commented Apr 21, 2023 • edited Loading

koaning commented Apr 25, 2023

jankrepl commented Apr 26, 2023 • edited Loading

koaning commented Apr 26, 2023

koaning commented Aug 9, 2023

jankrepl commented Apr 21, 2023 •

edited

Loading

jankrepl commented Apr 26, 2023 •

edited

Loading