You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
and type make all in the terminal. after a while you should get the following error:
Traceback (most recent call last):
File "/workspaces/rag-experiment-accelerator/01_index.py", line 22, in <module>
index_dict = run(environment, config, index_config, file_paths)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspaces/rag-experiment-accelerator/rag_experiment_accelerator/run/index.py", line 68, in run
docs = cluster(docs, config)
^^^^^^^^^^^^^^^^^^^^^
File "/workspaces/rag-experiment-accelerator/rag_experiment_accelerator/sampling/clustering.py", line 244, in cluster
df["processed_text"] = df["text"].progress_apply(spacy_tokenizer)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/vscode/.local/lib/python3.11/site-packages/tqdm/std.py", line 917, in inner
return getattr(df, df_function)(wrapper, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/vscode/.local/lib/python3.11/site-packages/pandas/core/series.py", line 4915, in apply
).apply()
^^^^^^^
File "/home/vscode/.local/lib/python3.11/site-packages/pandas/core/apply.py", line 1427, in apply
return self.apply_standard()
^^^^^^^^^^^^^^^^^^^^^
File "/home/vscode/.local/lib/python3.11/site-packages/pandas/core/apply.py", line 1507, in apply_standard
mapped = obj._map_values(
^^^^^^^^^^^^^^^^
File "/home/vscode/.local/lib/python3.11/site-packages/pandas/core/base.py", line 921, in _map_values
return algorithms.map_array(arr, mapper, na_action=na_action, convert=convert)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/vscode/.local/lib/python3.11/site-packages/pandas/core/algorithms.py", line 1743, in map_array
return lib.map_infer(values, mapper, convert=convert)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "lib.pyx", line 2972, in pandas._libs.lib.map_infer
File "/home/vscode/.local/lib/python3.11/site-packages/tqdm/std.py", line 912, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/workspaces/rag-experiment-accelerator/rag_experiment_accelerator/sampling/clustering.py", line 41, in spacy_tokenizer
mytokens = parser(sentence)
^^^^^^^^^^^^^^^^
File "/home/vscode/.local/lib/python3.11/site-packages/spacy/language.py", line 1037, in __call__
doc = self._ensure_doc(text)
^^^^^^^^^^^^^^^^^^^^^^
File "/home/vscode/.local/lib/python3.11/site-packages/spacy/language.py", line 1131, in _ensure_doc
raise ValueError(Errors.E1041.format(type=type(doc_like)))
ValueError: [E1041] Expected a string, Doc, or bytes as input, but got: <class 'dict'>
make: *** [Makefile:30: index] Error 1
The content you are editing has changed. Please copy your edits and refresh the page.
The text was updated successfully, but these errors were encountered:
guybartal
changed the title
Sampling fails with ValueError: [E1041] Expected a string, Doc,
[Bug] Sampling fails with ValueError: [E1041] Expected a string, Doc,
Apr 17, 2024
Closes#487 fixed a bug with spaCy reading the sampled content. Added
documentation steps and check to see if run locally as opposed to on
AML.
---------
Co-authored-by: Julia Meshcheryakova <juliame@microsoft.com>
Co-authored-by: Ritesh Modi <rimod@microsoft.com>
steps to reproduce:
set
config.json
with the following sampling settings:and type
make all
in the terminal. after a while you should get the following error:Tasks
The text was updated successfully, but these errors were encountered: