Dataset is derived from Fannie Mae’s Single-Family Loan Performance Data with all rights reserved by Fannie Mae and made available here by RAPIDS team.
Here are preliminary results on M1 SoC across one year and two year dataset variants:
id | name | run_date | total_time_process | total_time_cpu | max_memory_usage | incremental_memory_usage | power_mW | cpu_mJ | dram_energy_sum | datadir | db |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | summary | 20/02/2023 22:17:15 | 1.7020075409673154 | 3.375095 | 333.140625 | 164.25 | 12412.0 | 15240 | 337 | oneyear | duckdb |
1 | summary | 20/02/2023 22:17:20 | 3.9095819171052426 | 10.747104 | 5862.703125 | 719.53125 | 6795.0 | 23351 | 2186 | oneyear | polars |
2 | summary | 20/02/2023 22:17:25 | 3.9357682089321315 | 18.247276999999997 | 1408.171875 | 352.703125 | 15384.0 | 52811 | 1078 | twoyear | duckdb |
3 | summary | 20/02/2023 22:17:51 | 26.165474832989275 | 56.67650499999999 | 7933.53125 | -662.390625 | 5309.0 | 136470 | 16505 | twoyear | polars |
The following script will download the data and parse it into parquet files
❯ python prepare.py --help
Usage: prepare.py [OPTIONS]
Options:
--with-id-as-float64 / --without-id-as-float64
[default: without-id-as-float64]
--years [1|2|4|8|16|17] Number of years of fannie mae data to
download [default: 1]
--datadir TEXT directory to download the data
--help Show this message and exit.
❯ python run.py --help
Usage: run.py [OPTIONS]
Options:
--powermetrics / --no-powermetrics
Flag to get cpu and power metrics on OSX
[default: no-powermetrics]
--threads TEXT comma seperated list of threads to run e.g.
2,4,8 [default: 8]
--datadir TEXT [default: data]
--help Show this message and exit.