Generating `parquet` files for `dlrm` takes 1 hour, with only 1 minute spent in writing. How to reproduce: - Use the updated branch which is generating parquet data with the proper size https://github.com/mlcommons/DLIO_local_changes/pull/9 - Generate data for DLRM with 10 processors and 10 files - Wrap data generation code with cProfile statement in DLIO `main.py` or simply enable `DFTRACER` - Analyze the results (POSIX calls time or `write_table` time) versus total time
Generating
parquetfiles fordlrmtakes 1 hour, with only 1 minute spent in writing.How to reproduce:
main.pyor simply enableDFTRACERwrite_tabletime) versus total time