Skip to content

Commit ff07d25

Browse files
benwtrentlcawl
andauthored
[ML] show expected model outputs for each nlp task type (#2112)
* [ML] show expected model outputs for each nlp task type Co-authored-by: Lisa Cawley <lcawley@elastic.co> Co-authored-by: lcawl <lcawley@elastic.co>
1 parent 1f44ff9 commit ff07d25

File tree

1 file changed

+118
-1
lines changed

1 file changed

+118
-1
lines changed

docs/en/stack/ml/nlp/ml-nlp-model-ref.asciidoc

Lines changed: 118 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,6 @@ refer to <<ml-nlp-overview>>.
4545
* https://huggingface.co/elastic/distilbert-base-cased-finetuned-conll03-english
4646
* https://huggingface.co/philschmid/distilroberta-base-ner-conll2003
4747

48-
4948
[discrete]
5049
[[ml-nlp-model-ref-text-embedding]]
5150
== Third party text embedding models
@@ -97,3 +96,121 @@ Using `DPREncoderWrapper`:
9796
* https://huggingface.co/valhalla/distilbart-mnli-12-6
9897
* https://huggingface.co/cross-encoder/nli-distilroberta-base
9998
* https://huggingface.co/cross-encoder/nli-roberta-base
99+
100+
[discrete]
101+
== Expected model output
102+
103+
Models used for each NLP task type must output tensors of a specific format to be used in the Elasticsearch NLP pipelines.
104+
105+
Here are the expected outputs for each task type.
106+
107+
[discrete]
108+
=== Fill mask expected model output
109+
110+
Fill mask is a specific kind of token classification; it is the base training task of many transformer models.
111+
112+
For the Elastic stack's fill mask NLP task to understand the model output, it must have a specific format. It needs to
113+
be a float tensor with `shape(<number of sequences>, <number of tokens>, <vocab size>)`.
114+
115+
Here is an example with a single sequence `"The capital of [MASK] is Paris"` and with vocabulary
116+
`["The", "capital", "of", "is", "Paris", "France", "[MASK]"]`.
117+
118+
Should output:
119+
120+
[source]
121+
----
122+
[
123+
[
124+
[ 0, 0, 0, 0, 0, 0, 0 ], // The
125+
[ 0, 0, 0, 0, 0, 0, 0 ], // capital
126+
[ 0, 0, 0, 0, 0, 0, 0 ], // of
127+
[ 0.01, 0.01, 0.3, 0.01, 0.2, 1.2, 0.1 ], // [MASK]
128+
[ 0, 0, 0, 0, 0, 0, 0 ], // is
129+
[ 0, 0, 0, 0, 0, 0, 0 ] // Paris
130+
]
131+
]
132+
----
133+
134+
The predicted value here for `[MASK]` is `"France"` with a score of 1.2.
135+
136+
[discrete]
137+
=== Named entity recognition expected model output
138+
139+
Named entity recognition is a specific token classification task. Each token in the sequence is scored related to
140+
a specific set of classification labels. For the Elastic Stack, we use Inside-Outside-Beginning (IOB) tagging. Additionally,
141+
only the following classification labels are supported: "O", "B_MISC", "I_MISC", "B_PER", "I_PER", "B_ORG", "I_ORG", "B_LOC", "I_LOC".
142+
143+
The `"O"` entity label indicates that the current token is outside any entity.
144+
`"I"` indicates that the token is inside an entity.
145+
`"B"` indicates the beginning of an entity.
146+
`"MISC"` is a miscellaneous entity.
147+
`"LOC"` is a location.
148+
`"PER"` is a person.
149+
`"ORG"` is an organization.
150+
151+
The response format must be a float tensor with `shape(<number of sequences>, <number of tokens>, <number of classification labels>)`.
152+
153+
Here is an example with a single sequence `"Waldo is in Paris"`:
154+
155+
[source]
156+
----
157+
[
158+
[
159+
// "O", "B_MISC", "I_MISC", "B_PER", "I_PER", "B_ORG", "I_ORG", "B_LOC", "I_LOC"
160+
[ 0, 0, 0, 0.4, 0.5, 0, 0.1, 0, 0 ], // Waldo
161+
[ 1, 0, 0, 0, 0, 0, 0, 0, 0 ], // is
162+
[ 1, 0, 0, 0, 0, 0, 0, 0, 0 ], // in
163+
[ 0, 0, 0, 0, 0, 0, 0, 0, 1.0 ] // Paris
164+
]
165+
]
166+
----
167+
168+
[discrete]
169+
=== Text embedding expected model output
170+
171+
Text embedding allows for semantic embedding of text for dense information retrieval.
172+
The output of the model must be the specific embedding directly without any additional pooling.
173+
174+
Eland does this wrapping for the aforementioned models. But if supplying your own, the model must output the embedding for
175+
each inferred sequence.
176+
177+
[discrete]
178+
=== Text classification expected model output
179+
180+
With text classification (for example, in tasks like sentiment analysis), the entire sequence is classified. The output of
181+
the model must be a float tensor with `shape(<number of sequences>, <number of classification labels>)`.
182+
183+
Here is an example with two sequences for a binary classification model of "happy" and "sad":
184+
[source]
185+
----
186+
[
187+
[
188+
// happy, sad
189+
[ 0, 1], // first sequence
190+
[ 1, 0] // second sequence
191+
]
192+
]
193+
----
194+
195+
[discrete]
196+
=== Zero-shot text classification expected model output
197+
198+
Zero-shot text classification allows text to be classified for arbitrary labels not necessarily part of the original
199+
training. Each sequence is combined with the label given some hypothesis template. The model then scores each of these
200+
combinations according to `[entailment, neutral, contradiction]`. The output of the model must be a float tensor
201+
with `shape(<number of sequences>, <number of labels>, 3)`.
202+
203+
Here is an example with a single sequence classified against 4 labels:
204+
205+
[source]
206+
----
207+
[
208+
[
209+
// entailment, neutral, contradiction
210+
[ 0.5, 0.1, 0.4], // first label
211+
[ 0, 0, 1], // second label
212+
[ 1, 0, 0], // third label
213+
[ 0.7, 0.2, 0.1] // fourth label
214+
]
215+
]
216+
----

0 commit comments

Comments
 (0)