Skip to content

Conversation

@albinahlback
Copy link
Collaborator

@albinahlback albinahlback commented Feb 25, 2024

On Skylake with GCC 13.2.1 (cutoff-point for when to go to full multiplication is 330):

Click to expand (n from 1 to 340)
         mul_n / mulhigh || mpfr / flint
n =   1:      1.00x      ||     3.19
n =   2:      1.00x      ||     3.61
n =   3:      1.20x      ||     3.31
n =   4:      1.19x      ||     3.04
n =   5:      1.29x      ||     2.90
n =   6:      1.36x      ||     2.63
n =   7:      1.43x      ||     2.47
n =   8:      1.49x      ||     2.31
n =   9:      1.70x      ||     2.11
n =  10:      1.75x      ||     1.99
n =  11:      1.91x      ||     1.96
n =  12:      1.81x      ||     1.77
n =  13:      1.28x      ||     1.17
n =  14:      1.32x      ||     1.16
n =  15:      1.20x      ||     1.14
n =  16:      1.26x      ||     1.13
n =  17:      1.40x      ||     1.13
n =  18:      1.35x      ||     1.11
n =  19:      1.41x      ||     1.09
n =  20:      1.37x      ||     1.09
n =  21:      1.43x      ||     1.08
n =  22:      1.38x      ||     1.06
n =  23:      1.44x      ||     1.07
n =  24:      1.40x      ||     1.07
n =  25:      1.44x      ||     1.07
n =  26:      1.38x      ||     1.06
n =  27:      1.43x      ||     1.07
n =  28:      1.40x      ||     1.07
n =  29:      1.34x      ||     1.06
n =  30:      1.26x      ||     1.06
n =  31:      1.35x      ||     1.06
n =  32:      1.29x      ||     1.05
n =  33:      1.38x      ||     1.05
n =  34:      1.38x      ||     1.04
n =  35:      1.35x      ||     1.05
n =  36:      1.31x      ||     1.05
n =  37:      1.39x      ||     1.05
n =  38:      1.36x      ||     1.04
n =  39:      1.36x      ||     1.07
n =  40:      1.29x      ||     1.03
n =  41:      1.35x      ||     1.14
n =  42:      1.35x      ||     1.10
n =  43:      1.39x      ||     1.09
n =  44:      1.33x      ||     1.09
n =  45:      1.39x      ||     1.10
n =  46:      1.36x      ||     1.09
n =  47:      1.37x      ||     1.09
n =  48:      1.35x      ||     1.09
n =  49:      1.35x      ||     1.09
n =  50:      1.32x      ||     1.05
n =  51:      1.34x      ||     1.07
n =  52:      1.31x      ||     1.06
n =  53:      1.35x      ||     1.07
n =  54:      1.34x      ||     1.07
n =  55:      1.35x      ||     1.07
n =  56:      1.27x      ||     1.09
n =  57:      1.28x      ||     1.05
n =  58:      1.23x      ||     1.11
n =  59:      1.22x      ||     1.07
n =  60:      1.20x      ||     1.14
n =  61:      1.19x      ||     1.08
n =  62:      1.21x      ||     1.14
n =  63:      1.16x      ||     1.06
n =  64:      1.21x      ||     1.14
n =  65:      1.19x      ||     1.07
n =  66:      1.22x      ||     1.07
n =  67:      1.22x      ||     1.05
n =  68:      1.21x      ||     1.03
n =  69:      1.20x      ||     1.07
n =  70:      1.24x      ||     1.05
n =  71:      1.20x      ||     1.05
n =  72:      1.20x      ||     1.07
n =  73:      1.21x      ||     1.04
n =  74:      1.28x      ||     1.06
n =  75:      1.23x      ||     1.00
n =  76:      1.24x      ||     1.03
n =  77:      1.25x      ||     1.03
n =  78:      1.25x      ||     1.01
n =  79:      1.23x      ||     1.01
n =  80:      1.23x      ||     1.03
n =  81:      1.23x      ||     0.99
n =  82:      1.24x      ||     1.00
n =  83:      1.25x      ||     0.98
n =  84:      1.25x      ||     0.99
n =  85:      1.29x      ||     0.99
n =  86:      1.29x      ||     0.98
n =  87:      1.24x      ||     0.99
n =  88:      1.23x      ||     1.01
n =  89:      1.26x      ||     0.98
n =  90:      1.27x      ||     0.99
n =  91:      1.25x      ||     0.97
n =  92:      1.26x      ||     1.00
n =  93:      1.23x      ||     0.96
n =  94:      1.22x      ||     0.96
n =  95:      1.25x      ||     0.98
n =  96:      1.23x      ||     0.99
n =  97:      1.20x      ||     0.95
n =  98:      1.22x      ||     0.94
n =  99:      1.22x      ||     0.95
n = 100:      1.23x      ||     0.96
n = 101:      1.22x      ||     0.92
n = 102:      1.21x      ||     0.98
n = 103:      1.22x      ||     0.98
n = 104:      1.21x      ||     0.96
n = 105:      1.20x      ||     0.97
n = 106:      1.22x      ||     1.02
n = 107:      1.22x      ||     0.98
n = 108:      1.24x      ||     0.99
n = 109:      1.22x      ||     0.97
n = 110:      1.21x      ||     0.98
n = 111:      1.24x      ||     1.00
n = 112:      1.22x      ||     1.00
n = 113:      1.18x      ||     0.99
n = 114:      1.20x      ||     1.02
n = 115:      1.18x      ||     0.99
n = 116:      1.19x      ||     1.00
n = 117:      1.17x      ||     0.99
n = 118:      1.16x      ||     1.01
n = 119:      1.16x      ||     1.02
n = 120:      1.17x      ||     1.06
n = 121:      1.13x      ||     0.99
n = 122:      1.12x      ||     0.96
n = 123:      1.14x      ||     0.97
n = 124:      1.16x      ||     1.05
n = 125:      1.16x      ||     0.96
n = 126:      1.16x      ||     0.98
n = 127:      1.17x      ||     1.03
n = 128:      1.15x      ||     1.05
n = 129:      1.15x      ||     0.98
n = 130:      1.18x      ||     1.00
n = 131:      1.18x      ||     0.97
n = 132:      1.19x      ||     0.99
n = 133:      1.18x      ||     0.98
n = 134:      1.18x      ||     0.95
n = 135:      1.21x      ||     0.98
n = 136:      1.19x      ||     0.97
n = 137:      1.18x      ||     0.97
n = 138:      1.13x      ||     0.93
n = 139:      1.18x      ||     0.97
n = 140:      1.20x      ||     0.98
n = 141:      1.17x      ||     0.99
n = 142:      1.19x      ||     0.98
n = 143:      1.18x      ||     1.01
n = 144:      1.17x      ||     0.98
n = 145:      1.17x      ||     0.97
n = 146:      1.19x      ||     0.97
n = 147:      1.20x      ||     0.97
n = 148:      1.21x      ||     0.97
n = 149:      1.19x      ||     0.98
n = 150:      1.19x      ||     0.97
n = 151:      1.17x      ||     0.96
n = 152:      1.19x      ||     0.97
n = 153:      1.20x      ||     0.94
n = 154:      1.19x      ||     0.96
n = 155:      1.21x      ||     0.96
n = 156:      1.21x      ||     0.95
n = 157:      1.19x      ||     0.96
n = 158:      1.18x      ||     0.96
n = 159:      1.19x      ||     0.97
n = 160:      1.18x      ||     0.99
n = 161:      1.18x      ||     0.94
n = 162:      1.17x      ||     0.98
n = 163:      1.18x      ||     0.93
n = 164:      1.19x      ||     0.97
n = 165:      1.17x      ||     0.95
n = 166:      1.19x      ||     0.94
n = 167:      1.20x      ||     0.97
n = 168:      1.18x      ||     0.96
n = 169:      1.19x      ||     0.94
n = 170:      1.19x      ||     0.94
n = 171:      1.20x      ||     0.94
n = 172:      1.19x      ||     0.96
n = 173:      1.19x      ||     0.97
n = 174:      1.15x      ||     0.94
n = 175:      1.17x      ||     0.97
n = 176:      1.17x      ||     0.98
n = 177:      1.16x      ||     0.96
n = 178:      1.16x      ||     0.95
n = 179:      1.18x      ||     0.95
n = 180:      1.19x      ||     0.96
n = 181:      1.18x      ||     0.96
n = 182:      1.17x      ||     0.96
n = 183:      1.19x      ||     0.95
n = 184:      1.19x      ||     0.98
n = 185:      1.17x      ||     0.95
n = 186:      1.15x      ||     0.92
n = 187:      1.16x      ||     0.93
n = 188:      1.14x      ||     0.94
n = 189:      1.17x      ||     0.94
n = 190:      1.18x      ||     0.96
n = 191:      1.17x      ||     0.99
n = 192:      1.16x      ||     0.99
n = 193:      1.16x      ||     0.96
n = 194:      1.16x      ||     0.95
n = 195:      1.17x      ||     0.96
n = 196:      1.15x      ||     0.96
n = 197:      1.16x      ||     0.96
n = 198:      1.17x      ||     0.97
n = 199:      1.18x      ||     0.99
n = 200:      1.16x      ||     0.98
n = 201:      1.17x      ||     0.97
n = 202:      1.16x      ||     0.95
n = 203:      1.18x      ||     0.96
n = 204:      1.20x      ||     1.00
n = 205:      1.18x      ||     1.01
n = 206:      1.17x      ||     1.01
n = 207:      1.18x      ||     1.00
n = 208:      1.18x      ||     1.03
n = 209:      1.14x      ||     0.96
n = 210:      1.17x      ||     0.95
n = 211:      1.17x      ||     0.95
n = 212:      1.16x      ||     0.98
n = 213:      1.17x      ||     0.97
n = 214:      1.16x      ||     0.96
n = 215:      1.16x      ||     0.93
n = 216:      1.16x      ||     0.98
n = 217:      1.15x      ||     0.90
n = 218:      1.15x      ||     0.89
n = 219:      1.16x      ||     0.90
n = 220:      1.17x      ||     0.92
n = 221:      1.17x      ||     0.93
n = 222:      1.18x      ||     0.94
n = 223:      1.17x      ||     0.93
n = 224:      1.16x      ||     0.96
n = 225:      1.15x      ||     0.94
n = 226:      1.13x      ||     0.92
n = 227:      1.15x      ||     0.93
n = 228:      1.16x      ||     0.94
n = 229:      1.15x      ||     0.96
n = 230:      1.16x      ||     0.96
n = 231:      1.20x      ||     0.97
n = 232:      1.20x      ||     0.97
n = 233:      1.19x      ||     0.99
n = 234:      1.17x      ||     0.97
n = 235:      1.18x      ||     0.97
n = 236:      1.19x      ||     0.97
n = 237:      1.23x      ||     0.99
n = 238:      1.22x      ||     0.99
n = 239:      1.22x      ||     1.00
n = 240:      1.25x      ||     1.01
n = 241:      1.22x      ||     0.97
n = 242:      1.16x      ||     0.94
n = 243:      1.16x      ||     0.95
n = 244:      1.15x      ||     0.96
n = 245:      1.17x      ||     0.95
n = 246:      1.15x      ||     0.93
n = 247:      1.16x      ||     0.95
n = 248:      1.18x      ||     0.96
n = 249:      1.14x      ||     0.93
n = 250:      1.12x      ||     0.92
n = 251:      1.13x      ||     0.92
n = 252:      1.12x      ||     0.93
n = 253:      1.16x      ||     0.93
n = 254:      1.16x      ||     0.93
n = 255:      1.16x      ||     0.96
n = 256:      1.18x      ||     0.97
n = 257:      1.13x      ||     0.92
n = 258:      1.11x      ||     0.90
n = 259:      1.11x      ||     0.91
n = 260:      1.09x      ||     0.90
n = 261:      1.11x      ||     0.91
n = 262:      1.10x      ||     0.91
n = 263:      1.11x      ||     0.90
n = 264:      1.12x      ||     0.91
n = 265:      1.08x      ||     0.88
n = 266:      1.06x      ||     0.88
n = 267:      1.09x      ||     0.89
n = 268:      1.07x      ||     0.88
n = 269:      1.10x      ||     0.89
n = 270:      1.10x      ||     0.90
n = 271:      1.09x      ||     0.89
n = 272:      1.10x      ||     0.90
n = 273:      1.12x      ||     0.88
n = 274:      1.11x      ||     0.88
n = 275:      1.09x      ||     0.88
n = 276:      1.11x      ||     0.89
n = 277:      1.10x      ||     0.88
n = 278:      1.12x      ||     0.88
n = 279:      1.12x      ||     0.89
n = 280:      1.12x      ||     0.92
n = 281:      1.10x      ||     0.88
n = 282:      1.11x      ||     0.93
n = 283:      1.09x      ||     0.92
n = 284:      1.11x      ||     0.93
n = 285:      1.13x      ||     0.93
n = 286:      1.12x      ||     0.92
n = 287:      1.13x      ||     0.93
n = 288:      1.11x      ||     0.92
n = 289:      1.12x      ||     0.91
n = 290:      1.10x      ||     0.90
n = 291:      1.10x      ||     0.91
n = 292:      1.10x      ||     0.90
n = 293:      1.09x      ||     0.89
n = 294:      1.07x      ||     0.88
n = 295:      1.08x      ||     0.90
n = 296:      1.08x      ||     0.90
n = 297:      1.08x      ||     0.88
n = 298:      1.07x      ||     0.89
n = 299:      1.08x      ||     0.90
n = 300:      1.09x      ||     0.89
n = 301:      1.09x      ||     0.89
n = 302:      1.08x      ||     0.88
n = 303:      1.07x      ||     0.89
n = 304:      1.06x      ||     0.89
n = 305:      1.05x      ||     0.88
n = 306:      1.05x      ||     0.87
n = 307:      1.05x      ||     0.86
n = 308:      1.07x      ||     0.87
n = 309:      1.07x      ||     0.87
n = 310:      1.06x      ||     0.86
n = 311:      1.05x      ||     0.87
n = 312:      1.06x      ||     0.88
n = 313:      1.08x      ||     0.87
n = 314:      1.07x      ||     0.87
n = 315:      1.07x      ||     0.90
n = 316:      1.08x      ||     0.90
n = 317:      1.07x      ||     0.90
n = 318:      1.06x      ||     0.90
n = 319:      1.11x      ||     0.90
n = 320:      1.12x      ||     0.90
n = 321:      1.08x      ||     0.89
n = 322:      1.06x      ||     0.87
n = 323:      1.04x      ||     0.87
n = 324:      1.04x      ||     0.86
n = 325:      1.03x      ||     0.84
n = 326:      1.03x      ||     0.86
n = 327:      1.02x      ||     0.86
n = 328:      1.01x      ||     0.86
n = 329:      1.00x      ||     0.85
n = 330:      1.00x      ||     0.85
n = 331:      1.00x      ||     0.82
n = 332:      1.00x      ||     0.83
n = 333:      1.00x      ||     0.83
n = 334:      1.00x      ||     0.84
n = 335:      1.00x      ||     0.83
n = 336:      1.00x      ||     0.84
n = 337:      1.00x      ||     0.81
n = 338:      0.99x      ||     0.82
n = 339:      1.00x      ||     0.82
n = 340:      1.00x      ||     0.84

This is currently slower than MPFR above around 80 limbs as the algorithm is not quite the same. In MPFR they utilize the scrap space in rp[0...n-1] which we do not have.

Edit: I also added a function to add mpn-arrays onto another array at the same time using adcx-adox-chains. This is much faster than calling mpn_add_n twice.

@albinahlback
Copy link
Collaborator Author

So, assuming we want this in Arb, it is first above 5000 bits of precision that one need to adjust the algorithm in order to outperform MPFR.

@fredrik-johansson
Copy link
Collaborator

This is currently slower than MPFR above around 80 limbs as the algorithm is not quite the same. In MPFR they utilize the scrap space in rp[0...n-1] which we do not have.

Outside of the basecase range we can certainly allocate temporary scrap space too.

@fredrik-johansson
Copy link
Collaborator

I can try to implement the MPFR algorithm on top of your basecase to compare.

@albinahlback
Copy link
Collaborator Author

This is currently slower than MPFR above around 80 limbs as the algorithm is not quite the same. In MPFR they utilize the scrap space in rp[0...n-1] which we do not have.

Outside of the basecase range we can certainly allocate temporary scrap space too.

Indeed!

@albinahlback
Copy link
Collaborator Author

Should also set a global preprocessor constant to when full multiplication is used (so one does not need to worry about the error in this case, if that would matter).

Currently only available if x86_64 with ADX
@albinahlback albinahlback merged commit d8b5320 into flintlib:main Feb 25, 2024
@albinahlback albinahlback deleted the mulhigh_generic branch February 25, 2024 19:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants