Describe the bug
To continue on #528 and #550, unfortunately I'm still encountering nan values sometimes.
Ragas version: v0.1.1
Python version: 3.11
Model: Azure OpenAI endpoint gpt 3.5-turbo-4k
Code to Reproduce
The data itself is not publicly available, but the code is a generic evaluate call
results = evaluate( ragas_ds["eval"], metrics=[faithfulness, context_precision, context_recall, answer_similarity], llm=azure_model, embeddings=azure_embeddings, )
Error trace
evaluation.py:276: RuntimeWarning: Mean of empty slice value = np.nanmean(self.scores[cn])
Expected behavior
Ideally no nan's or a more concise description of where the 'empty slice' was encountered
Additional context
I've ran my evaluation set a couple of times, due to the inconsistency in where the NaNs are placed I suspect is has to do with the llm output format during metric scoring
Out of the metrics I'm testing, faithfulness seems to throw nan's more often than other metrics but as mentioned it's not consistent
I'm happy to answer any questions which could help clarify the issue
Regards, Koen