@@ -45,7 +45,6 @@ refer to <<ml-nlp-overview>>.
4545* https://huggingface.co/elastic/distilbert-base-cased-finetuned-conll03-english
4646* https://huggingface.co/philschmid/distilroberta-base-ner-conll2003
4747
48-
4948[discrete]
5049[[ml-nlp-model-ref-text-embedding]]
5150== Third party text embedding models
@@ -97,3 +96,121 @@ Using `DPREncoderWrapper`:
9796* https://huggingface.co/valhalla/distilbart-mnli-12-6
9897* https://huggingface.co/cross-encoder/nli-distilroberta-base
9998* https://huggingface.co/cross-encoder/nli-roberta-base
99+
100+ [discrete]
101+ == Expected model output
102+
103+ Models used for each NLP task type must output tensors of a specific format to be used in the Elasticsearch NLP pipelines.
104+
105+ Here are the expected outputs for each task type.
106+
107+ [discrete]
108+ === Fill mask expected model output
109+
110+ Fill mask is a specific kind of token classification; it is the base training task of many transformer models.
111+
112+ For the Elastic stack's fill mask NLP task to understand the model output, it must have a specific format. It needs to
113+ be a float tensor with `shape(<number of sequences>, <number of tokens>, <vocab size>)`.
114+
115+ Here is an example with a single sequence `"The capital of [MASK] is Paris"` and with vocabulary
116+ `["The", "capital", "of", "is", "Paris", "France", "[MASK]"]`.
117+
118+ Should output:
119+
120+ [source]
121+ ----
122+ [
123+ [
124+ [ 0, 0, 0, 0, 0, 0, 0 ], // The
125+ [ 0, 0, 0, 0, 0, 0, 0 ], // capital
126+ [ 0, 0, 0, 0, 0, 0, 0 ], // of
127+ [ 0.01, 0.01, 0.3, 0.01, 0.2, 1.2, 0.1 ], // [MASK]
128+ [ 0, 0, 0, 0, 0, 0, 0 ], // is
129+ [ 0, 0, 0, 0, 0, 0, 0 ] // Paris
130+ ]
131+ ]
132+ ----
133+
134+ The predicted value here for `[MASK]` is `"France"` with a score of 1.2.
135+
136+ [discrete]
137+ === Named entity recognition expected model output
138+
139+ Named entity recognition is a specific token classification task. Each token in the sequence is scored related to
140+ a specific set of classification labels. For the Elastic Stack, we use Inside-Outside-Beginning (IOB) tagging. Additionally,
141+ only the following classification labels are supported: "O", "B_MISC", "I_MISC", "B_PER", "I_PER", "B_ORG", "I_ORG", "B_LOC", "I_LOC".
142+
143+ The `"O"` entity label indicates that the current token is outside any entity.
144+ `"I"` indicates that the token is inside an entity.
145+ `"B"` indicates the beginning of an entity.
146+ `"MISC"` is a miscellaneous entity.
147+ `"LOC"` is a location.
148+ `"PER"` is a person.
149+ `"ORG"` is an organization.
150+
151+ The response format must be a float tensor with `shape(<number of sequences>, <number of tokens>, <number of classification labels>)`.
152+
153+ Here is an example with a single sequence `"Waldo is in Paris"`:
154+
155+ [source]
156+ ----
157+ [
158+ [
159+ // "O", "B_MISC", "I_MISC", "B_PER", "I_PER", "B_ORG", "I_ORG", "B_LOC", "I_LOC"
160+ [ 0, 0, 0, 0.4, 0.5, 0, 0.1, 0, 0 ], // Waldo
161+ [ 1, 0, 0, 0, 0, 0, 0, 0, 0 ], // is
162+ [ 1, 0, 0, 0, 0, 0, 0, 0, 0 ], // in
163+ [ 0, 0, 0, 0, 0, 0, 0, 0, 1.0 ] // Paris
164+ ]
165+ ]
166+ ----
167+
168+ [discrete]
169+ === Text embedding expected model output
170+
171+ Text embedding allows for semantic embedding of text for dense information retrieval.
172+ The output of the model must be the specific embedding directly without any additional pooling.
173+
174+ Eland does this wrapping for the aforementioned models. But if supplying your own, the model must output the embedding for
175+ each inferred sequence.
176+
177+ [discrete]
178+ === Text classification expected model output
179+
180+ With text classification (for example, in tasks like sentiment analysis), the entire sequence is classified. The output of
181+ the model must be a float tensor with `shape(<number of sequences>, <number of classification labels>)`.
182+
183+ Here is an example with two sequences for a binary classification model of "happy" and "sad":
184+ [source]
185+ ----
186+ [
187+ [
188+ // happy, sad
189+ [ 0, 1], // first sequence
190+ [ 1, 0] // second sequence
191+ ]
192+ ]
193+ ----
194+
195+ [discrete]
196+ === Zero-shot text classification expected model output
197+
198+ Zero-shot text classification allows text to be classified for arbitrary labels not necessarily part of the original
199+ training. Each sequence is combined with the label given some hypothesis template. The model then scores each of these
200+ combinations according to `[entailment, neutral, contradiction]`. The output of the model must be a float tensor
201+ with `shape(<number of sequences>, <number of labels>, 3)`.
202+
203+ Here is an example with a single sequence classified against 4 labels:
204+
205+ [source]
206+ ----
207+ [
208+ [
209+ // entailment, neutral, contradiction
210+ [ 0.5, 0.1, 0.4], // first label
211+ [ 0, 0, 1], // second label
212+ [ 1, 0, 0], // third label
213+ [ 0.7, 0.2, 0.1] // fourth label
214+ ]
215+ ]
216+ ----
0 commit comments