Skip to content

GT4Py v1.0.10

Choose a tag to compare

@havogt havogt released this 23 Oct 15:11
· 250 commits to main since this release
e7fd361

Summary of changes since v1.0.9

Cartesian

  • New backend dace:cpu_kfirst.
  • New experimental features:
    • absolute indexing in K.
    • expose K index.
  • Fixes and performance improvements in DaCe backends.
  • Improved error messages.

Development

  • Enabled DaCe backends on AMD MI300 CI.

Next

See commit history.

All changes

  • CI: enable dace tests on beverin by @edopao in #2263
  • fix[next][dace]: check for gpu schedule before adding cudaStreamSynchronize by @edopao in #2260
  • feat[next]: Transformation pass from runtime to compile-time domains by @SF-N in #2151
  • feat[cartesian]: Add backend dace:cpu_kfirst by @romanc in #2255
  • fix[cartesian]: dace backend fixes by @romanc in #2259
  • refactor[dace]: Removed Dublicated Code by @philip-paul-mueller in #2272
  • fix[next][dace]: move cuda-codegen setting to gt4py config by @edopao in #2271
  • fix[next][dace]: ensure unique names for conditional blocks and loop regions by @edopao in #2270
  • perf[next][dace] Remove scalar views by @iomaganaris in #2166
  • bug[next]: Fix non scan projector in global tmp pass by @tehrengruber in #2274
  • tests[cartesian]: attempt to stabilize macos daily (uv resolution highest) by @romanc in #2269
  • refactor[next]: Refactor static args mechanism by @tehrengruber in #2258
  • fix[next]: Fix multi-node scaling issues of FileCache by @tehrengruber in #2261
  • build[next]: update dace version by @edopao in #2275
  • feat[next][dace]: extend CopyChainRemover to remove full-write copies by @edopao in #2273
  • feature[cartesian]: absolute indexing in K (experimental) by @romanc in #2276
  • fix[next]: Fix allocator typing by @havogt in #2279
  • feat[next]: synchronous and synchronize compilation by @havogt in #2096
  • feat[next][dace]: collect sdfg execution time with tasklet by @edopao in #2280
  • feat[dace][next]: Added Inline Fuser by @philip-paul-mueller in #2283
  • feat[next][dace]: Hooksystem for gt_auto_optimizer() by @philip-paul-mueller in #2291
  • feat[next]: flexible metric collections by @egparedes in #2287
  • refactor[cartesian]: shiny error messges from the gt_script frontend by @romanc in #2042
  • feat[dace][next]: Set GPU Thread Block Size Per Dimension from gt_auto_optimizer() by @philip-paul-mueller in #2298
  • cartesian[feat]: Expand push vertical map down to include ForScope by @FlorianDeconinck in #2290
  • feat[dace][next]: Updated Splitting Fusion Transformations by @philip-paul-mueller in #2296
  • bug[next]: Fix dead-code elimination on some concat_where expressions by @tehrengruber in #2292
  • refactor[next]: Cleanup implicit offsets by @havogt in #2299
  • feat[bug]: Fix name collision in collapse tuple pass by @tehrengruber in #2301
  • refactor[next][dace]: Move zero-origin constant substitution out of lowering module by @edopao in #2303
  • doc[next]: Document concat_where domain arg canonicalization by @tehrengruber in #2223
  • feat[next]: Prune empty concat where branches pass by @tehrengruber in #2286
  • fix[next][dace]: Avoid name conflict between tasklet connector and data node by @edopao in #2306
  • build[cartesian]: update DaCe version by @romanc in #2307
  • feat[next][dace]: Enable customization of auto-optimize in dace backend workflow by @edopao in #2295
  • bug[next]: change flaky wait_for_compilation test by @havogt in #2297
  • feat[dace][next]: Demotion of Fields by @philip-paul-mueller in #2288
  • perf[next][dace] Avoid unnecessary vertical map splitting by @iomaganaris in #2278
  • fix[next]: Fix offset provider access by @tehrengruber in #2310
  • bug[next]: Add support for nan and inf literals in GTIR by @tehrengruber in #2308
  • feat[next][dace]: Updated gt_auto_optimizer()'s Hook-System by @philip-paul-mueller in #2294
  • Revert "fix[next][dace]: Avoid name conflict between tasklet connector and data node" by @edopao in #2317
  • perf[next][dace] Set GPU thread block size properly by @iomaganaris in #2313
  • fix[next][dace]: Avoid name conflict between tasklet connector and data node by @edopao in #2318
  • feat[next]: upgrade tree_map with custom constructor per original type by @havogt in #2285
  • build[cartesian]: remove reference to gtpyc from pyproject.toml by @romanc in #2320
  • fix[next][dace]: SDFG instrumentation with empty else-branch caused memory leak by @edopao in #2321
  • test[cartesian]: remove duplicate dace parsing tests by @romanc in #2325
  • feat[cartesian]: utils.warn_experimental_feature() by @romanc in #2324
  • refactor[next][dace]: cleanup update_sdfg_args by @edopao in #2312
  • refactor[next]: Reduce programs runtime overhead by @egparedes in #2305
  • feat[next][dace]: enable overriding of cxx/cuda compiler arguments by @edopao in #2327
  • Fix thread block size setting assertion by @iomaganaris in #2331
  • feat[next]: Implement faster data pointer retrieval in ndarray-based fields by @egparedes in #2332
  • feature[cartesian]: expose iteration index in K (experimental) by @romanc in #2300
  • build[dace][next]: Updated DaCe Dependency by @philip-paul-mueller in #2328
  • refactor[next][dace]: Use global array instead of return value to collect the SDFG compute time by @edopao in #2333
  • Releasing v1.0.10 by @havogt in #2330

Full Changelog: v1.0.9...v1.0.10