When test_start_index=0 the first 10 test samples are used and not the first one per class. This is caused by checking if test_false_index is True. This can be fixed by using None when no index should be used. I fixed it here and can make a pull request if you agree that this is the intended behaviour.
Best regards
Verena