Mac vs Nvidia for “good enough” performance? #310
Replies: 2 comments 1 reply
-
Hmmmm maybe never mind. I just found the Geekbench openCL benchmarks and the M2 Ultra gets only a third of the score of a 4090. So while it does have up to eight times the VRAM available, I imagine that it just can't put this to good use at this stage in comparison. I still believe in this goldilocks zone though. |
Beta Was this translation helpful? Give feedback.
-
Does anyone have some ready comparison of, e.g. the speed of 13B CodeLlama q4_k_m GGUF on 4090 vs M2 Max or M2 Ultra? If not, here's my results on 4090. Maybe someone with an M2 Max or M2 Ultra can compare and update here. I have fully updated Win11, latest 537.13 drivers, RTX 4090. Running 13B q4_k_m I can set context up to about 14000 tokens before I get a slowdown due to VRAM usage/swapping. Output starts around 60 tokens/second. With 34B q3_k_m I can set context up to 10000 tokens. Output starts around 35 tok/s. |
Beta Was this translation helpful? Give feedback.
-
I’m in the process of buying a new home workstation. I’m guessing it be Nvidia based but the potential heat and noise outputs for running the larger models with enough VRAM (thus multiple GPUs) is scary! On this basis a maxed out Max Studio looks very appealing.
It seems to me that a chatable rate of token production on a good model is likely good enough. Training might be either a rare special event, or relegated to larger servers.
Perhaps for workstations there might be a goldilocks zone in a graph of heat vs noise vs tokens/sec?
So without going into details on my exact workflow and and thus making any info gathered here less useful for others, are there any comparisons between these latest Mac M2 Ultra computers compared to Nvidia based setups?
Beta Was this translation helpful? Give feedback.
All reactions