remifa

Reduced and mixed precision factorization algorithms.

This repository contains a high-performance implementation of various LU factorization algorithms for NVIDIA GPU devices. In particular, some of these algorithms are capable of exploiting the mixed precision floating point units Tensor Cores available on Volta and Turing architectures.

These codes were used to obtain the experimental results of the article "Mixed Precision LU Factorization on GPU Tensor Cores: Reducing Data Movement and Memory Footprint", co-authored by Florent Lopez and Theo Mary.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
src		src
tests		tests
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

remifa

About

Releases

Packages

Contributors 2

Languages

License

flipflapflop/remifa

Folders and files

Latest commit

History

Repository files navigation

remifa

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages