Skip to content

IndexError: Invalid key: 0 is out of bounds for size 0 when scoring #518

@HuskyDanny

Description

@HuskyDanny

Describe the bug
This error "IndexError: Invalid key: 0 is out of bounds for size 0" occurs in the middle of evaluation.
I believe this is one of the corner cases as I have succeeded in a smaller dataset, but for the longer one it failed at middle.
Ragas version: git+https://github.com/explodinggradients/ragas.git@5f105c08b7579188aea1113334dae6b6a8a15660.
Python version: 3.9

Error trace
Traceback (most recent call last):
File "C:\Users\allenpan\Anaconda\envs\rag_experiment\Lib\site-packages\ragas\evaluation.py", line 176, in evaluate
raise e
File "C:\Users\allenpan\Anaconda\envs\rag_experiment\Lib\site-packages\ragas\evaluation.py", line 159, in evaluate
results = executor.results()
^^^^^^^^^^^^^^^^^^
File "C:\Users\allenpan\Anaconda\envs\rag_experiment\Lib\site-packages\ragas\executor.py", line 118, in results
raise e
File "C:\Users\allenpan\Anaconda\envs\rag_experiment\Lib\site-packages\ragas\executor.py", line 114, in results
r = future.result()
^^^^^^^^^^^^^^^
File "C:\Users\allenpan\Anaconda\envs\rag_experiment\Lib\concurrent\futures_base.py", line 449, in result
return self.__get_result()
^^^^^^^^^^^^^^^^^^^
File "C:\Users\allenpan\Anaconda\envs\rag_experiment\Lib\concurrent\futures_base.py", line 401, in __get_result
raise self._exception
File "C:\Users\allenpan\Anaconda\envs\rag_experiment\Lib\concurrent\futures\thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\allenpan\Anaconda\envs\rag_experiment\Lib\site-packages\ragas\executor.py", line 36, in wrapped_callable
return counter, callable(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\allenpan\Anaconda\envs\rag_experiment\Lib\site-packages\ragas\metrics\base.py", line 75, in score
raise e
File "C:\Users\allenpan\Anaconda\envs\rag_experiment\Lib\site-packages\ragas\metrics\base.py", line 71, in score
score = self._score(row=row, callbacks=group_cm)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\allenpan\Anaconda\envs\rag_experiment\Lib\site-packages\ragas\metrics_answer_relevance.py", line 136, in _score
return self._calculate_score(response, row)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\allenpan\Anaconda\envs\rag_experiment\Lib\site-packages\ragas\metrics_answer_relevance.py", line 114, in _calculate_score
cosine_sim = self.calculate_similarity(question, gen_questions)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\allenpan\Anaconda\envs\rag_experiment\Lib\site-packages\ragas\metrics_answer_relevance.py", line 92, in calculate_similarity
norm = np.linalg.norm(gen_question_vec, axis=1) * np.linalg.norm(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\allenpan\Anaconda\envs\rag_experiment\Lib\site-packages\numpy\linalg\linalg.py", line 2583, in norm
return sqrt(add.reduce(s, axis=axis, keepdims=keepdims))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
numpy.exceptions.AxisError: axis 1 is out of bounds for array of dimension 1

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Users\allenpan\repos\autogenProject\rag_evaluation.py", line 95, in
result = evaluate(
^^^^^^^^^
File "C:\Users\allenpan\Anaconda\envs\rag_experiment\Lib\site-packages\ragas\evaluation.py", line 178, in evaluate
result = Result(
^^^^^^^
File "", line 6, in init
File "C:\Users\allenpan\Anaconda\envs\rag_experiment\Lib\site-packages\ragas\evaluation.py", line 207, in post_init
for cn in self.scores[0].keys():
~~~~~~~~~~~^^^
File "C:\Users\allenpan\Anaconda\envs\rag_experiment\Lib\site-packages\datasets\arrow_dataset.py", line 2800, in getitem
return self._getitem(key)
^^^^^^^^^^^^^^^^^^
File "C:\Users\allenpan\Anaconda\envs\rag_experiment\Lib\site-packages\datasets\arrow_dataset.py", line 2784, in _getitem
pa_subtable = query_table(self._data, key, indices=self._indices if self._indices is not None else None)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\allenpan\Anaconda\envs\rag_experiment\Lib\site-packages\datasets\formatting\formatting.py", line 583, in query_table
_check_valid_index_key(key, size)
File "C:\Users\allenpan\Anaconda\envs\rag_experiment\Lib\site-packages\datasets\formatting\formatting.py", line 526, in _check_valid_index_key
raise IndexError(f"Invalid key: {key} is out of bounds for size {size}")
IndexError: Invalid key: 0 is out of bounds for size 0

Expected behavior
Should have no error regardless of dataset size

Additional context
Add any other context about the problem here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions