You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to look deeper on the evaluation result. like, how exactly the model perform on different testing samples.
I successfully run over the example code you provided on README.md and got pretty good results. However, those results are limited to some high-level metrics.
So, I am trying to look deeper to the performance on each testing samples, to uncover some clues about:
how the testing samples actually looks like to human?
what is the performance of the model on each testing sample? and what are their recommended movies based on the historical dialog.
Do you know how can I manually check the what the model is actually taking as input and output for each testing sample?
Thanks in advance!
Sincerely,
The text was updated successfully, but these errors were encountered:
For Q1, you can check the directory for data storage, and may need to transform the .pkl format into .json or .jsonl.
For Q2, you need to clone the repo and add your own codes to implement this requirement. For example, you can add this when computing metrics or before with a for loop.
Dear authors,
Thank you for sharing this awesome project!
I am trying to look deeper on the evaluation result. like, how exactly the model perform on different testing samples.
I successfully run over the example code you provided on README.md and got pretty good results. However, those results are limited to some high-level metrics.
So, I am trying to look deeper to the performance on each testing samples, to uncover some clues about:
Do you know how can I manually check the what the model is actually taking as input and output for each testing sample?
Thanks in advance!
Sincerely,
The text was updated successfully, but these errors were encountered: