Should benchmark / try out mimalloc which claims a 2x speed improvement in single-core perf over dlmalloc: https://web.dev/articles/scaling-multithreaded-webassembly-applications#mimalloc