Skip to content

Conversation

@wsmoses
Copy link
Member

@wsmoses wsmoses commented Jun 16, 2022

No description provided.

@wsmoses wsmoses requested a review from ftynse June 16, 2022 19:50
@wsmoses wsmoses merged commit a8e3bec into main Jun 21, 2022
@wsmoses wsmoses deleted the rename branch June 21, 2022 04:14
mmoadeli added a commit to InteonCo/Polygeist that referenced this pull request Aug 1, 2022
* LLVM global initialization (llvm#167)

* Add template member test

* Add LLVM global initialization

* Fix sign/zero integer extension (llvm#162)

* bump LLVM (llvm#166)

* bump LLVM

* fix

* more fixes

* fix

* Fix build

* Use actual llvm hash

Co-authored-by: William S. Moses <gh@wsmoses.com>

* Silence compound literals warning (NFC)

* Variable Redeclaration Fixes (llvm#172)

* Non ODR constexpr

* Fix redeclarable decls

* Support addassign binary operation for memrefs (llvm#173)

* Support add assign for memrefs

* Add test case for memref add assign

* Fix constructor of LLVM global (llvm#175)

* Visibility

* Use ctors

* Constructor global llvm

* Constexpr (llvm#176)

* add std initializer

* Fix loop inc bound

* Add type align (llvm#177)

* LLVM Rebase and type size/alignment (llvm#178)

* Add type align

* Add align

* Increase

* Bump LLVM

* Bump DL

* format

* Add canonicalizer

* Fix SCF lowering

* SCF lowering

* Fix format

* OpenMPOpt

* Add OpenMPOpt

* Fix tests

* Allow building

* Fix OpenMP upstream (llvm#180)

* Fix crash when adding malloc/free functions to module (llvm#184)

Happens when two threads try to do it at the same time

* Transform MemcpyToSymbol call ops (llvm#183)

* Remove bad assertion (llvm#186)

* Handle noexcept (llvm#182)

* Fix raising with index cast (llvm#185)

* Fix raising with index cast

* Fix affine raising

* Create for with break handling

* Canonicalize to for with with break

* look through

* Fix bug and bump llvm

* Fix rebase

* Fix build

* Fix negative step

* Fix raise bug

* Fix format

* Barrier hoist and similar

* Bump LLVM

* Bump and fix

* Add excluded middle

* Fix raising of integers

* Mem2reg with affine if

* Internal opts

* Parallel interchange

* Fix distribute bug

* Fix format

* Update affine raise

* Fix for raising and distribute

* Fix affine raising

* Handle wrapping if and affine if with barrier

* WIP min cut cache optimisation

* Some fixes

* mincut fixes

* fixes

* Recalculate values after the barrier

* Recalculate vals

* Fix

* Working with mincut optimisation disabled

* recalculation fix

* Load cached values only once

* Do not add barriers prior for's with only recomputable values preceeding

* Lower polygeist cache load to memref load

* Support fors and ifs of different types with preceeding recomputable ops

Large scale refactoring to eliminate code duplication

* Fix test

* Support for nested if's with barriers

Properly remap values to cloned recomputations in parallel regions

* For and if interchange fixes

* Support for interchanging while loops

* Refactor recomputable insertion into a separate function

* Fix rebase

* Add test case

* Recognise command line option "distribute.mincut" for cpuify

* Correct handling of recomputed for boundaries/if conditions

* Add test with if bound cache load

* Add missing check for recomputables before while when interchanging

* Remove unneeded code and comments

* Handle some TODO's

* Finalise rebase

* Add unneeded alloca deletions

* Fix test case

* Fix rebase

* Fix tests for alloca scope change

* Fix test allocdist

* Remove wrong comment

* Fix Openmp opt

* Fix barrier memory semantics

* Fix mem2reg bug

* Fix OpenMPOpt interchange and bump if (llvm#191)

* Fix if interchange

* Fix allocation region

* Fix format

* Fix test

* Fix sizecheck

* Fix affine for raising

* Fix canonicalize

* Fix memory issue

* fix idx

* Fix mem2reg switch

* Add Try/catch (llvm#193)

* Add Try

* Fix placement new

* Update clang-mlir.cc

* Add an option for openmp opt (llvm#195)

* Clang-tidy (NFC)

run-clang-tidy -fix tools/mlir-clang/Lib/* -checks=-*,llvm*
run-clang-tidy -fix lib/polygeist/* -checks=-*, llvm*
run-clang-tidy -fix tools/mlir-clang/mlir-clang.cc -checks=-*,llvm*

Checks:

  llvm-header-guard
  llvm-include-order
  llvm-namespace-comment
  llvm-prefer-isa-or-dyn-cast-in-conditionals
  llvm-prefer-register-over-unsigned
  llvm-qualified-auto
  llvm-twine-local

* Parallel LICM and pointer math bugfixes (llvm#194)

* Fix

* Only allow constant rems

* foo

* Fix emit direct callee

* Add fetch add

* No condition for

* Add test

* Fix subclass abi

* New derived handling

* Fix Parallel LICM & inheritence (llvm#196)

* Fix Parallel LICM condition

* Fix ParallelLICM

* Fix inheritence base lowering

* Fix address merged offset of base

* Add tests

* Handle new virtual/pure only

* Stabilize ptraddsubtest

* Fix flag name

* Override cudaGetLastError

* Fix format

* Prefix ABI (llvm#197)

* Prefix ABI

* Fix format

Co-authored-by: William S. Moses <gh@wsmoses.com>

* Fix llvm struct abi field lowering (llvm#198)

* Fix lookup where no base is created in llvm abi

* Add test

* Bump LLVM

* Fix API change

* Fix union

* Add test

* Foreach and prefix malloc/free (llvm#199)

* Don't prefix malloc/free

* C++ Foreach

* Cleanup Mem2Reg and GPU (llvm#204)

* Upgrade mem2reg

* Fix test

* Correct barrier elimination

* Fix affine raising

* Fix affine raise

* Fix affine raising

* Lowering update

* Simplify affine and whilelicm

* Fix affine raise

* Fix infinite loop in memset/cpy to load/store

* Optionally disable loop unroll

* Fix cmpi bug

* Fix allocation location

* Add memcpy behavior

* Properly handle continue

* Do not duplicate already cached values crossing a barrier

* Mem2reg and licm with affine

* Fix unused scf inductive var

* Fix parallel affine raise

* Remove print

* Partially speed up mem2reg

* Reduce max unroll size to 32

* Add rank reduction

* Disable RankReduction, fix for InductiveVarRemoval

* Work around affineexpr bug

* Improve ordering of loop distribute

* Improve if reg2mem

* Make cacheload

* Fix reg2mem for (2)

* Skip recursive load/store

* Now ignoring barriers

* No unnecessary internal store forwarding

* Attempt fix

* Add inner serialize

* Fix set operands

* Fix infinite loop

* Remove unnecessary yield

* Fix inner serialize

* Do not use cacheload when not mincut

* Fix recompute threshold

* Strengthen alias analysis

* No infinite loop

* Add single execution query

* Fix bug

* Fix compile bug

* First pass fix tests

* Bump LLVM

* Cleanup polygeist opt tests

* Fix tests

* Fix format

Co-authored-by: Ivan Radanov Ivanov <ivanov.i.aa@m.titech.ac.jp>

* Distribute immediately after wrap (llvm#207)

* Fix barriers betting removed when they are required for interchange

* Fix distributing after wrap

* Make dsitrib after for respect mincut setting

* Ignore CacheLoadOp's memory effects when checking recomputability

* Use FullyRecomputable function in while wrap and interchange

* Make CacheLoad a no side effect op

* Reuse code for recomputability check

* clang-format

* Remove NoSideEffects from CacheLoad, handle it explicitly

* Fix collectEffect behaviour for CanonicalizeFor

* Fix distributing around the wrong barrier after a wrap

* Forgot to set singleExecution=true for recomputable check before ifs

* Update tests

* Comments

* Do not replicate reads if possible in distribute around barriers

* Fix some resultless operations such as yield not getting recomputed

* Fix op recalculation when distributing around barrier

* Improved check for recomputability in distribute

* Update tests

* clang format

* Bump llvm

- FuncOp namespace
- include path for moveLoopInvariantCode

* Global function code gen fix (llvm#208)

* Call device stub from host code for global functions

* Update tests

* Don't reverse inputs

* Add a test case for cuda global code gen

* Add nocuda{inc,lib} options and fix cuda test

* clang format

* Add barebones cuda header for cuda tests

* Handles basic syntax using ext_vector_type type. (llvm#179)

* Handles basic syntax using ext_vector_type type.
* Improvement is needed to support more complex syntax (such as ext.xyz)

* Refactors implementation of VisitExtVectorElementExpr using CommonArrayLookup.

* Adds test.

* CUDA Stream support (llvm#213)

* CUDA Stream support

* Async lowering [WIP]

* Fix lowering to moccuda

* Convert to malloc/free

* Fix non-async

* Update LLVM

* Fix build

* Install polygeist tools (llvm#216)

* Emit some std:: clang builtin functions (llvm#217)

* Add support for some builtin std:: functions

clang::Builtin::BImove
clang::Builtin::BImove_if_noexcept
clang::Builtin::BIforward
clang::Builtin::BIas_const

* Add test

* Infer location of LLVM source tree (llvm#218)

* Rename mlir-clang to cgeist (llvm#221)

* Rename mlir-clang to cgeist

* Remove todos

* Update build.yml

* update README (llvm#226)

* Handle allocation when allocating struct with new (llvm#227)

* Move tests in 'Verification' (NFC) (llvm#225)

* [BUGFIX] correctly emitting calls with addressoff operator (llvm#224)

[BUGFIX] correctly emitting calls with addressoff operator

Formatting

moved test to cgeist dir

reduced lit test

reduced lit test

Co-authored-by: pietro.ghiglio <pietro.ghiglio@codeplay.com>

* [BUGFIX] Handling elaborated types is VisitArrayLoopInit (llvm#230)


authored-by: pietro.ghiglio <pietro.ghiglio@codeplay.com>

* Erase op (llvm#231)

* update test for LLVM 15

* update sycl dialect include

* update merge conflict

* fix Unexpectedly Passed errors

* fix virt and cuda failing tests

* Fixes a number of failures after merge.
- Removes un-necessary // XFAIL: *s
- Removes a verify in driver.cc, which is added by Codeplay earlier.

* [FIX] Positional arguments needs to sorted by their order of construction. Merge breaked that order

* [FIX] Remove expected failed on polybench's tests

* [CLEAN] Remove debug utilities

Co-authored-by: William Moses <gh@wsmoses.com>
Co-authored-by: lorenzo chelini <l.chelini@icloud.com>
Co-authored-by: Ivan <ivanov.i.aa@m.titech.ac.jp>
Co-authored-by: Stephen Neuendorffer <stephen.neuendorffer@xilinx.com>
Co-authored-by: PietroGhg <38155419+PietroGhg@users.noreply.github.com>
Co-authored-by: pietro.ghiglio <pietro.ghiglio@codeplay.com>
Co-authored-by: Jefferson Le Quellec <jefferson.lequellec@codeplay.com>
mmoadeli added a commit to InteonCo/Polygeist that referenced this pull request Aug 1, 2022
* LLVM global initialization (llvm#167)

* Add template member test

* Add LLVM global initialization

* Fix sign/zero integer extension (llvm#162)

* bump LLVM (llvm#166)

* bump LLVM

* fix

* more fixes

* fix

* Fix build

* Use actual llvm hash

Co-authored-by: William S. Moses <gh@wsmoses.com>

* Silence compound literals warning (NFC)

* Variable Redeclaration Fixes (llvm#172)

* Non ODR constexpr

* Fix redeclarable decls

* Support addassign binary operation for memrefs (llvm#173)

* Support add assign for memrefs

* Add test case for memref add assign

* Fix constructor of LLVM global (llvm#175)

* Visibility

* Use ctors

* Constructor global llvm

* Constexpr (llvm#176)

* add std initializer

* Fix loop inc bound

* Add type align (llvm#177)

* LLVM Rebase and type size/alignment (llvm#178)

* Add type align

* Add align

* Increase

* Bump LLVM

* Bump DL

* format

* Add canonicalizer

* Fix SCF lowering

* SCF lowering

* Fix format

* OpenMPOpt

* Add OpenMPOpt

* Fix tests

* Allow building

* Fix OpenMP upstream (llvm#180)

* Fix crash when adding malloc/free functions to module (llvm#184)

Happens when two threads try to do it at the same time

* Transform MemcpyToSymbol call ops (llvm#183)

* Remove bad assertion (llvm#186)

* Handle noexcept (llvm#182)

* Fix raising with index cast (llvm#185)

* Fix raising with index cast

* Fix affine raising

* Create for with break handling

* Canonicalize to for with with break

* look through

* Fix bug and bump llvm

* Fix rebase

* Fix build

* Fix negative step

* Fix raise bug

* Fix format

* Barrier hoist and similar

* Bump LLVM

* Bump and fix

* Add excluded middle

* Fix raising of integers

* Mem2reg with affine if

* Internal opts

* Parallel interchange

* Fix distribute bug

* Fix format

* Update affine raise

* Fix for raising and distribute

* Fix affine raising

* Handle wrapping if and affine if with barrier

* WIP min cut cache optimisation

* Some fixes

* mincut fixes

* fixes

* Recalculate values after the barrier

* Recalculate vals

* Fix

* Working with mincut optimisation disabled

* recalculation fix

* Load cached values only once

* Do not add barriers prior for's with only recomputable values preceeding

* Lower polygeist cache load to memref load

* Support fors and ifs of different types with preceeding recomputable ops

Large scale refactoring to eliminate code duplication

* Fix test

* Support for nested if's with barriers

Properly remap values to cloned recomputations in parallel regions

* For and if interchange fixes

* Support for interchanging while loops

* Refactor recomputable insertion into a separate function

* Fix rebase

* Add test case

* Recognise command line option "distribute.mincut" for cpuify

* Correct handling of recomputed for boundaries/if conditions

* Add test with if bound cache load

* Add missing check for recomputables before while when interchanging

* Remove unneeded code and comments

* Handle some TODO's

* Finalise rebase

* Add unneeded alloca deletions

* Fix test case

* Fix rebase

* Fix tests for alloca scope change

* Fix test allocdist

* Remove wrong comment

* Fix Openmp opt

* Fix barrier memory semantics

* Fix mem2reg bug

* Fix OpenMPOpt interchange and bump if (llvm#191)

* Fix if interchange

* Fix allocation region

* Fix format

* Fix test

* Fix sizecheck

* Fix affine for raising

* Fix canonicalize

* Fix memory issue

* fix idx

* Fix mem2reg switch

* Add Try/catch (llvm#193)

* Add Try

* Fix placement new

* Update clang-mlir.cc

* Add an option for openmp opt (llvm#195)

* Clang-tidy (NFC)

run-clang-tidy -fix tools/mlir-clang/Lib/* -checks=-*,llvm*
run-clang-tidy -fix lib/polygeist/* -checks=-*, llvm*
run-clang-tidy -fix tools/mlir-clang/mlir-clang.cc -checks=-*,llvm*

Checks:

  llvm-header-guard
  llvm-include-order
  llvm-namespace-comment
  llvm-prefer-isa-or-dyn-cast-in-conditionals
  llvm-prefer-register-over-unsigned
  llvm-qualified-auto
  llvm-twine-local

* Parallel LICM and pointer math bugfixes (llvm#194)

* Fix

* Only allow constant rems

* foo

* Fix emit direct callee

* Add fetch add

* No condition for

* Add test

* Fix subclass abi

* New derived handling

* Fix Parallel LICM & inheritence (llvm#196)

* Fix Parallel LICM condition

* Fix ParallelLICM

* Fix inheritence base lowering

* Fix address merged offset of base

* Add tests

* Handle new virtual/pure only

* Stabilize ptraddsubtest

* Fix flag name

* Override cudaGetLastError

* Fix format

* Prefix ABI (llvm#197)

* Prefix ABI

* Fix format

Co-authored-by: William S. Moses <gh@wsmoses.com>

* Fix llvm struct abi field lowering (llvm#198)

* Fix lookup where no base is created in llvm abi

* Add test

* Bump LLVM

* Fix API change

* Fix union

* Add test

* Foreach and prefix malloc/free (llvm#199)

* Don't prefix malloc/free

* C++ Foreach

* Cleanup Mem2Reg and GPU (llvm#204)

* Upgrade mem2reg

* Fix test

* Correct barrier elimination

* Fix affine raising

* Fix affine raise

* Fix affine raising

* Lowering update

* Simplify affine and whilelicm

* Fix affine raise

* Fix infinite loop in memset/cpy to load/store

* Optionally disable loop unroll

* Fix cmpi bug

* Fix allocation location

* Add memcpy behavior

* Properly handle continue

* Do not duplicate already cached values crossing a barrier

* Mem2reg and licm with affine

* Fix unused scf inductive var

* Fix parallel affine raise

* Remove print

* Partially speed up mem2reg

* Reduce max unroll size to 32

* Add rank reduction

* Disable RankReduction, fix for InductiveVarRemoval

* Work around affineexpr bug

* Improve ordering of loop distribute

* Improve if reg2mem

* Make cacheload

* Fix reg2mem for (2)

* Skip recursive load/store

* Now ignoring barriers

* No unnecessary internal store forwarding

* Attempt fix

* Add inner serialize

* Fix set operands

* Fix infinite loop

* Remove unnecessary yield

* Fix inner serialize

* Do not use cacheload when not mincut

* Fix recompute threshold

* Strengthen alias analysis

* No infinite loop

* Add single execution query

* Fix bug

* Fix compile bug

* First pass fix tests

* Bump LLVM

* Cleanup polygeist opt tests

* Fix tests

* Fix format

Co-authored-by: Ivan Radanov Ivanov <ivanov.i.aa@m.titech.ac.jp>

* Distribute immediately after wrap (llvm#207)

* Fix barriers betting removed when they are required for interchange

* Fix distributing after wrap

* Make dsitrib after for respect mincut setting

* Ignore CacheLoadOp's memory effects when checking recomputability

* Use FullyRecomputable function in while wrap and interchange

* Make CacheLoad a no side effect op

* Reuse code for recomputability check

* clang-format

* Remove NoSideEffects from CacheLoad, handle it explicitly

* Fix collectEffect behaviour for CanonicalizeFor

* Fix distributing around the wrong barrier after a wrap

* Forgot to set singleExecution=true for recomputable check before ifs

* Update tests

* Comments

* Do not replicate reads if possible in distribute around barriers

* Fix some resultless operations such as yield not getting recomputed

* Fix op recalculation when distributing around barrier

* Improved check for recomputability in distribute

* Update tests

* clang format

* Bump llvm

- FuncOp namespace
- include path for moveLoopInvariantCode

* Global function code gen fix (llvm#208)

* Call device stub from host code for global functions

* Update tests

* Don't reverse inputs

* Add a test case for cuda global code gen

* Add nocuda{inc,lib} options and fix cuda test

* clang format

* Add barebones cuda header for cuda tests

* Handles basic syntax using ext_vector_type type. (llvm#179)

* Handles basic syntax using ext_vector_type type.
* Improvement is needed to support more complex syntax (such as ext.xyz)

* Refactors implementation of VisitExtVectorElementExpr using CommonArrayLookup.

* Adds test.

* CUDA Stream support (llvm#213)

* CUDA Stream support

* Async lowering [WIP]

* Fix lowering to moccuda

* Convert to malloc/free

* Fix non-async

* Update LLVM

* Fix build

* Install polygeist tools (llvm#216)

* Emit some std:: clang builtin functions (llvm#217)

* Add support for some builtin std:: functions

clang::Builtin::BImove
clang::Builtin::BImove_if_noexcept
clang::Builtin::BIforward
clang::Builtin::BIas_const

* Add test

* Infer location of LLVM source tree (llvm#218)

* Rename mlir-clang to cgeist (llvm#221)

* Rename mlir-clang to cgeist

* Remove todos

* Update build.yml

* update README (llvm#226)

* Handle allocation when allocating struct with new (llvm#227)

* Move tests in 'Verification' (NFC) (llvm#225)

* [BUGFIX] correctly emitting calls with addressoff operator (llvm#224)

[BUGFIX] correctly emitting calls with addressoff operator

Formatting

moved test to cgeist dir

reduced lit test

reduced lit test

Co-authored-by: pietro.ghiglio <pietro.ghiglio@codeplay.com>

* [BUGFIX] Handling elaborated types is VisitArrayLoopInit (llvm#230)


authored-by: pietro.ghiglio <pietro.ghiglio@codeplay.com>

* Erase op (llvm#231)

* update test for LLVM 15

* update sycl dialect include

* update merge conflict

* fix Unexpectedly Passed errors

* fix virt and cuda failing tests

* Fixes a number of failures after merge.
- Removes un-necessary // XFAIL: *s
- Removes a verify in driver.cc, which is added by Codeplay earlier.

* [FIX] Positional arguments needs to sorted by their order of construction. Merge breaked that order

* [FIX] Remove expected failed on polybench's tests

* [CLEAN] Remove debug utilities

Co-authored-by: William Moses <gh@wsmoses.com>
Co-authored-by: lorenzo chelini <l.chelini@icloud.com>
Co-authored-by: Ivan <ivanov.i.aa@m.titech.ac.jp>
Co-authored-by: Stephen Neuendorffer <stephen.neuendorffer@xilinx.com>
Co-authored-by: PietroGhg <38155419+PietroGhg@users.noreply.github.com>
Co-authored-by: pietro.ghiglio <pietro.ghiglio@codeplay.com>
Co-authored-by: Jefferson Le Quellec <jefferson.lequellec@codeplay.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants