GT4Py v1.0.10
Summary of changes since v1.0.9
Cartesian
- New backend
dace:cpu_kfirst. - New experimental features:
- absolute indexing in K.
- expose K index.
- Fixes and performance improvements in DaCe backends.
- Improved error messages.
Development
- Enabled DaCe backends on AMD MI300 CI.
Next
See commit history.
All changes
- CI: enable dace tests on beverin by @edopao in #2263
- fix[next][dace]: check for gpu schedule before adding cudaStreamSynchronize by @edopao in #2260
- feat[next]: Transformation pass from runtime to compile-time domains by @SF-N in #2151
- feat[cartesian]: Add backend
dace:cpu_kfirstby @romanc in #2255 - fix[cartesian]: dace backend fixes by @romanc in #2259
- refactor[dace]: Removed Dublicated Code by @philip-paul-mueller in #2272
- fix[next][dace]: move cuda-codegen setting to gt4py config by @edopao in #2271
- fix[next][dace]: ensure unique names for conditional blocks and loop regions by @edopao in #2270
- perf[next][dace] Remove scalar views by @iomaganaris in #2166
- bug[next]: Fix non scan projector in global tmp pass by @tehrengruber in #2274
- tests[cartesian]: attempt to stabilize macos daily (uv resolution highest) by @romanc in #2269
- refactor[next]: Refactor static args mechanism by @tehrengruber in #2258
- fix[next]: Fix multi-node scaling issues of
FileCacheby @tehrengruber in #2261 - build[next]: update dace version by @edopao in #2275
- feat[next][dace]: extend CopyChainRemover to remove full-write copies by @edopao in #2273
- feature[cartesian]: absolute indexing in K (experimental) by @romanc in #2276
- fix[next]: Fix allocator typing by @havogt in #2279
- feat[next]: synchronous and synchronize compilation by @havogt in #2096
- feat[next][dace]: collect sdfg execution time with tasklet by @edopao in #2280
- feat[dace][next]: Added Inline Fuser by @philip-paul-mueller in #2283
- feat[next][dace]: Hooksystem for
gt_auto_optimizer()by @philip-paul-mueller in #2291 - feat[next]: flexible metric collections by @egparedes in #2287
- refactor[cartesian]: shiny error messges from the gt_script frontend by @romanc in #2042
- feat[dace][next]: Set GPU Thread Block Size Per Dimension from
gt_auto_optimizer()by @philip-paul-mueller in #2298 - cartesian[feat]: Expand push vertical map down to include
ForScopeby @FlorianDeconinck in #2290 - feat[dace][next]: Updated Splitting Fusion Transformations by @philip-paul-mueller in #2296
- bug[next]: Fix dead-code elimination on some
concat_whereexpressions by @tehrengruber in #2292 - refactor[next]: Cleanup implicit offsets by @havogt in #2299
- feat[bug]: Fix name collision in collapse tuple pass by @tehrengruber in #2301
- refactor[next][dace]: Move zero-origin constant substitution out of lowering module by @edopao in #2303
- doc[next]: Document
concat_wheredomain arg canonicalization by @tehrengruber in #2223 - feat[next]: Prune empty concat where branches pass by @tehrengruber in #2286
- fix[next][dace]: Avoid name conflict between tasklet connector and data node by @edopao in #2306
- build[cartesian]: update DaCe version by @romanc in #2307
- feat[next][dace]: Enable customization of auto-optimize in dace backend workflow by @edopao in #2295
- bug[next]: change flaky wait_for_compilation test by @havogt in #2297
- feat[dace][next]: Demotion of Fields by @philip-paul-mueller in #2288
- perf[next][dace] Avoid unnecessary vertical map splitting by @iomaganaris in #2278
- fix[next]: Fix offset provider access by @tehrengruber in #2310
- bug[next]: Add support for
nanandinfliterals in GTIR by @tehrengruber in #2308 - feat[next][dace]: Updated
gt_auto_optimizer()'s Hook-System by @philip-paul-mueller in #2294 - Revert "fix[next][dace]: Avoid name conflict between tasklet connector and data node" by @edopao in #2317
- perf[next][dace] Set GPU thread block size properly by @iomaganaris in #2313
- fix[next][dace]: Avoid name conflict between tasklet connector and data node by @edopao in #2318
- feat[next]: upgrade tree_map with custom constructor per original type by @havogt in #2285
- build[cartesian]: remove reference to gtpyc from pyproject.toml by @romanc in #2320
- fix[next][dace]: SDFG instrumentation with empty else-branch caused memory leak by @edopao in #2321
- test[cartesian]: remove duplicate dace parsing tests by @romanc in #2325
- feat[cartesian]: utils.warn_experimental_feature() by @romanc in #2324
- refactor[next][dace]: cleanup update_sdfg_args by @edopao in #2312
- refactor[next]: Reduce programs runtime overhead by @egparedes in #2305
- feat[next][dace]: enable overriding of cxx/cuda compiler arguments by @edopao in #2327
- Fix thread block size setting assertion by @iomaganaris in #2331
- feat[next]: Implement faster data pointer retrieval in ndarray-based fields by @egparedes in #2332
- feature[cartesian]: expose iteration index in
K(experimental) by @romanc in #2300 - build[dace][next]: Updated DaCe Dependency by @philip-paul-mueller in #2328
- refactor[next][dace]: Use global array instead of return value to collect the SDFG compute time by @edopao in #2333
- Releasing v1.0.10 by @havogt in #2330
Full Changelog: v1.0.9...v1.0.10