Understanding the prediction_dir format for leaderboard submission #89

jmamath · 2021-09-30T13:48:01Z

I wonder if the log folder used during training is the prediction_dir described in Get Started: Evaluating trained models.

I tried to reproduce the ERM result on a subset of camelyon with the following command:

python examples/run_expt.py --dataset camelyon17 --algorithm ERM--root_dir data --frac 0.1 --log_dir log_erm_01.

Training goes well.

But my file camelyon17_split:id_val_seed:0_epoch is empty.

Then I ran the following command:
python examples/evaluate.py log_erm_01 erm_01_output --root-dir data --dataset camelyon17

And I got this:

Traceback (most recent call last):
  File "examples/evaluate.py", line 282, in <module>
    main()
  File "examples/evaluate.py", line 244, in main
    evaluate_benchmark(
  File "examples/evaluate.py", line 136, in evaluate_benchmark
    predictions_file = get_prediction_file(
  File "examples/evaluate.py", line 89, in get_prediction_file
    raise FileNotFoundError(
FileNotFoundError: Could not find CSV or pth prediction file that starts with camelyon17_split:id_val_seed:0.

So my question is whether the log file is the prediction_dir described in Get Started ?

The text was updated successfully, but these errors were encountered:

kohpangwei · 2021-09-30T17:28:10Z

Hi Jean-Michel,

Those log files should be in the log_dir that you specified (i.e,. log_erm_01). Is that folder empty?

jmamath · 2021-10-01T10:43:23Z

Hi, thank for your response.

No the folder isn't empty. After training it has the following files:

camelyon17_seed:0_epoch:best_model.pth
camelyon17_seed:0_epoch:last_model.pth
camelyon17_split:id_val_seed:0_epoch
camelyon17_split:test_seed:0_epoch
camelyon17_split:val_seed:0_epoch
id_val_algo.csv
id_val_eval.csv
log
test_algo.csv
test_eval.csv
train_algo.csv
train_eval.csv
val_algo.csv
val_eval.csv

But the files camelyon17_split:id_val_seed:0_epoch, camelyon17_split:test_seed:0_epoch, camelyon17_split:val_seed:0_epoch are the only empty files.

(EDIT)

I think I get it, once we have finished training, we should run the same command with --eval_only True to get the prediction results. So in this specific case it would be:
python examples/run_expt.py --dataset camelyon17 --algorithm ERM--root_dir data --log_dir log_erm_01 --eval_only True. This would result in the files asked in the leaderboard submission: https://wilds.stanford.edu/submit/.

Note that in the previous command, I did not used --frac, as it would get predictions on a fraction of the dataset, however, it is not possible to specify such parameter when evaluating later:
python examples/evaluate.py log_erm_01 erm_01_output --root-dir data --dataset camelyon17

I think that precising how to get the predictions in the readme could help.

The problem I faced is that using Windows as operating system does not allow to have colons ":" in filenames, so I used a hack: https://stackoverflow.com/questions/10386344/how-to-get-a-file-in-windows-with-a-colon-in-the-filename to change the colons in many files and functions to save the model and result. Then it becames difficult to see where I use the hacky colon or the normal colon. Maybe it would be more democratic to use underscore "_" in place of colon ":".

kohpangwei · 2021-10-16T04:39:59Z

Hi Jean-Michel,

Hmm, you shouldn't need to run it with --eval_only to get the prediction results. The prediction files are for some reason incorrectly named in your case. Instead of

camelyon17_split:id_val_seed:0_epoch
camelyon17_split:test_seed:0_epoch
camelyon17_split:val_seed:0_epoch

it should look like

camelyon17_split:id_val_seed:0_epoch:best_pred.csv
camelyon17_split:id_val_seed:0_epoch:last_pred.csv
camelyon17_split:test_seed:0_epoch:best_pred.csv
camelyon17_split:test_seed:0_epoch:last_pred.csv
camelyon17_split:val_seed:0_epoch:best_pred.csv
camelyon17_split:val_seed:0_epoch:last_pred.csv

Perhaps this is related to the Windows colon issue? I hadn't known about that. Sorry for the trouble. (If it helps, using Windows Subsystem for Linux would allow you to bypass these issues.)

jmamath closed this as completed Oct 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Understanding the prediction_dir format for leaderboard submission #89

Understanding the prediction_dir format for leaderboard submission #89

jmamath commented Sep 30, 2021

kohpangwei commented Sep 30, 2021

jmamath commented Oct 1, 2021 •

edited

kohpangwei commented Oct 16, 2021 •

edited

Understanding the prediction_dir format for leaderboard submission #89

Understanding the prediction_dir format for leaderboard submission #89

Comments

jmamath commented Sep 30, 2021

kohpangwei commented Sep 30, 2021

jmamath commented Oct 1, 2021 • edited

kohpangwei commented Oct 16, 2021 • edited

jmamath commented Oct 1, 2021 •

edited

kohpangwei commented Oct 16, 2021 •

edited