You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I am looking at the pre-trained weights for the MLPerf benchmark configuration on Criteo Terabyte that are provided in the README (link). If I understand correctly, this should be the best checkpoint of the configuration that is run with the script ./bench/run_and_time.sh.
Based on the code snippet
if args.max_ind_range > 0:
ln_emb = np.array(
list(
map(
lambda x: x if x < args.max_ind_range else args.max_ind_range,
ln_emb,
)
)
)
since that config uses --max-ind-range=40000000, I was expecting the largest embedding tables (namely, tables 0, 9, 19, 20, 21) to be reduced to have exactly 40M rows, however the length of these tensors in the state_dict in the downloaded checkpoint is more variable than that:
How does the hashing work for this model? It cannot be just taking the categorical value ID modulo 40M as in the released pytorch code. Moreover, it seems to me that also some of the smaller embedding tables have been reduced in size (suggesting additional custom filtering/merging of the categorical values?)
Also, I am not seeing the test_auc key in the checkpointed dictionary, despite --mlperf-logging being set in ./bench/run_and_time.sh: what's the test AUC of this pre-trained model?
The text was updated successfully, but these errors were encountered:
Hello, I am looking at the pre-trained weights for the MLPerf benchmark configuration on Criteo Terabyte that are provided in the README (link). If I understand correctly, this should be the best checkpoint of the configuration that is run with the script
./bench/run_and_time.sh
.Based on the code snippet
since that config uses
--max-ind-range=40000000
, I was expecting the largest embedding tables (namely, tables 0, 9, 19, 20, 21) to be reduced to have exactly 40M rows, however the length of these tensors in thestate_dict
in the downloaded checkpoint is more variable than that:How does the hashing work for this model? It cannot be just taking the categorical value ID modulo 40M as in the released pytorch code. Moreover, it seems to me that also some of the smaller embedding tables have been reduced in size (suggesting additional custom filtering/merging of the categorical values?)
Also, I am not seeing the
test_auc
key in the checkpointed dictionary, despite--mlperf-logging
being set in./bench/run_and_time.sh
: what's the test AUC of this pre-trained model?The text was updated successfully, but these errors were encountered: