-
Notifications
You must be signed in to change notification settings - Fork 253
Description
For the example app at https://github.com/elastic/elasticsearch-labs/blob/main/supporting-blog-content/plagiarism-detection-with-elasticsearch/plagiarism_detection_es.ipynb. I find there are following issues:
- According to code at https://github.com/elastic/eland/blob/main/eland/ml/pytorch/transformers.py,
class TransformerModel: def __init__( self, *, model_id: str, task_type: str, es_version: Optional[Tuple[int, int, int]] = None, quantize: bool = False, access_token: Optional[str] = None, ):
It needs to have keyword arguments for the constructor. I am not sure the demo uses which eland version. I use the latest version:
$ pip3 list | grep eland eland 8.11.1
The call in the code:
tm = TransformerModel(hf_model_id, "text_embedding")
does not work. The working one is:
tm = TransformerModel(model_id=hf_model_id, task_type="text_embedding")
This applies to the code as well:
tm = TransformerModel(hf_model_id, "text_classification")
- pipeline name is spelled wrongly in the code:
client.reindex(wait_for_completion=True, source={ "index": "plagiarism-docs" }, dest= { "index": "plagiarism-checker", "pipeline": "plagiarism-pipeline" } )
In the code, "plagiarism-pipeline" pipeline is never created. It should be "plagiarism-checker-pipeline".
- The synchronous call for the reindex seems not working at all.
client.reindex(wait_for_completion=True, source={ "index": "plagiarism-docs" }, dest= { "index": "plagiarism-checker", "pipeline": "plagiarism-pipeline" } )
It always returns "Connection time out" error. I have to change it to False to make it work.