I am wondering why the RAGAS metric "context recall" is sometimes different for the same question, the same context and the same ground truth.
I reduced my dataset to 1 question. I see the same context for each test, because the retrieval model is the same, but different answers since I test different LLMs for summarization. The ground truth is also the same for all cases.
The values I get are 0.5 for one test case and 1.0 for another. If I change the question I can get 0 and 1, for example.
I expect "context recall" values be the same for both cases.
The same problem with "context precision".