diff --git a/.github/ISSUE_TEMPLATE/vulnerability.yml b/.github/ISSUE_TEMPLATE/vulnerability.yml index be07e18a75..e264c89fbb 100644 --- a/.github/ISSUE_TEMPLATE/vulnerability.yml +++ b/.github/ISSUE_TEMPLATE/vulnerability.yml @@ -1,4 +1,4 @@ -name: "Vulnerability Report" +name: "\U0001F6A8 Vulnerability Report" description: Report a security vulnerability in our project. title: "[VULNERABILITY]: " labels: ["security, High Priority"] @@ -14,50 +14,50 @@ body: attributes: label: Affected Version(s) description: List the affected versions of the library. - validations: - required: true + validations: + required: true - type: textarea id: severity attributes: label: Severity description: Specify the severity of the vulnerability (e.g., Low/Medium/High/Critical). - validations: - required: true + validations: + required: true - type: textarea id: description attributes: label: Description description: Provide a clear and concise description of the security vulnerability. - validations: - required: true + validations: + required: true - type: textarea id: steps-to-reproduce attributes: label: Steps to Reproduce description: Outline the steps to reproduce the vulnerability, including any relevant code snippets or configuration settings. - validations: - required: true + validations: + required: true - type: textarea id: expected-behavior attributes: label: Expected Behavior description: Explain what you expected to happen when following the steps above. - validations: - required: true + validations: + required: true - type: textarea id: actual-behavior attributes: label: Actual Behavior description: Describe what actually happened when you followed the steps above, highlighting the security issue. - validations: - required: true + validations: + required: true - type: textarea id: impact attributes: label: Impact description: Discuss the potential impact of this vulnerability, including any possible consequences or risks associated with its exploitation. - validations: - required: true + validations: + required: true - type: textarea id: proof-of-concept attributes: diff --git a/README.md b/README.md index 062dc9b415..f19e8dd99c 100644 --- a/README.md +++ b/README.md @@ -14,151 +14,165 @@ Heat is a distributed tensor framework for high performance data analytics. [![license: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT) [![PyPI Version](https://img.shields.io/pypi/v/heat)](https://pypi.org/project/heat/) [![Downloads](https://pepy.tech/badge/heat)](https://pepy.tech/project/heat) +[![Anaconda-Server Badge](https://anaconda.org/conda-forge/heat/badges/version.svg)](https://anaconda.org/conda-forge/heat) [![fair-software.eu](https://img.shields.io/badge/fair--software.eu-%E2%97%8F%20%20%E2%97%8F%20%20%E2%97%8F%20%20%E2%97%8F%20%20%E2%97%8F-green)](https://fair-software.eu) [![OpenSSF Best Practices](https://bestpractices.coreinfrastructure.org/projects/7688/badge)](https://bestpractices.coreinfrastructure.org/projects/7688) [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.2531472.svg)](https://doi.org/10.5281/zenodo.2531472) [![Benchmarks](https://img.shields.io/badge/Github--Pages-Benchmarks-2ea44f)](https://helmholtz-analytics.github.io/heat/dev/bench) [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black) -# Goals +# Table of Contents + - [What is Heat for?](#what-is-heat-for) + - [Features](#features) + - [Getting Started](#getting-started) + - [Installation](#installation) + - [Requirements](#requirements) + - [pip](#pip) + - [conda](#conda) + - [Support Channels](#support-channels) + - [Contribution guidelines](#contribution-guidelines) + - [Resources](#resources) + - [License](#license) + - [Citing Heat](#citing-heat) + - [FAQ](#faq) + - [Acknowledgements](#acknowledgements) -Heat is a flexible and seamless open-source software for high performance data -analytics and machine learning. It provides highly optimized algorithms and data -structures for tensor computations using CPUs, GPUs, and distributed cluster -systems on top of MPI. The goal of Heat is to fill the gap between data -analytics and machine learning libraries with a strong focus on single-node -performance, and traditional high-performance computing (HPC). Heat's generic -Python-first programming interface integrates seamlessly with the existing data -science ecosystem and makes it as effortless as using numpy to write scalable -scientific and data science applications. -Heat allows you to tackle your actual Big Data challenges that go beyond the -computational and memory needs of your laptop and desktop. +# What is Heat for? + +Heat builds on [PyTorch](https://pytorch.org/) and [mpi4py](https://mpi4py.readthedocs.io) to provide high-performance computing infrastructure for memory-intensive applications within the NumPy/SciPy ecosystem. + + +With Heat you can: +- port existing NumPy/SciPy code from single-CPU to multi-node clusters with minimal coding effort; +- exploit the entire, cumulative RAM of your many nodes for memory-intensive operations and algorithms; +- run your NumPy/SciPy code on GPUs (CUDA, ROCm, coming up: Apple MPS). + +For a example that highlights the benefits of multi-node parallelism, hardware acceleration, and how easy this can be done with the help of Heat, see, e.g., our [blog post on trucated SVD of a 200GB data set](https://helmholtz-analytics.github.io/heat/2023/06/16/new-feature-hsvd.html). + +Check out our [coverage tables](coverage_tables.md) to see which NumPy, SciPy, scikit-learn functions are already supported. + + If you need a functionality that is not yet supported: + - [search existing issues](https://github.com/helmholtz-analytics/heat/issues) and make sure to leave a comment if someone else already requested it; + - [open a new issue](https://github.com/helmholtz-analytics/heat/issues/new/choose). + + +Check out our [features](#features) and the [Heat API Reference](https://heat.readthedocs.io/en/latest/autoapi/index.html) for a complete list of functionalities. # Features -* High-performance n-dimensional tensors +* High-performance n-dimensional arrays * CPU, GPU, and distributed computation using MPI * Powerful data analytics and machine learning methods -* Abstracted communication via split tensors -* Python API +* Seamless integration with the NumPy/SciPy ecosystem +* Python array API (work in progress) -# Support Channels -We use [GitHub Discussions](https://github.com/helmholtz-analytics/heat/discussions) as a forum for questions about Heat. -If you found a bug or miss a feature, then please file a new [issue](https://github.com/helmholtz-analytics/heat/issues/new/choose). +# Getting Started -# Requirements +Go to [Quick Start](quick_start.md) for a quick overview. For more details, see [Installation](#installation). -Heat requires Python 3.7 or newer. -Heat is based on [PyTorch](https://pytorch.org/). Specifically, we are exploiting -PyTorch's support for GPUs *and* MPI parallelism. For MPI support we utilize -[mpi4py](https://mpi4py.readthedocs.io). Both packages can be installed via pip -or automatically using the setup.py. +**You can test your setup** by running the [`heat_test.py`](https://github.com/helmholtz-analytics/heat/blob/main/scripts/heat_test.py) script: -# Installation +```shell +mpirun -n 2 python heat_test.py +``` -Tagged releases are made available on the -[Python Package Index (PyPI)](https://pypi.org/project/heat/). You can typically -install the latest version with +It should print something like this: -``` -$ pip install heat[hdf5,netcdf] +```shell +x is distributed: True +Global DNDarray x: DNDarray([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=ht.int32, device=cpu:0, split=0) +Global DNDarray x: +Local torch tensor on rank 0 : tensor([0, 1, 2, 3, 4], dtype=torch.int32) +Local torch tensor on rank 1 : tensor([5, 6, 7, 8, 9], dtype=torch.int32) ``` -where the part in brackets is a list of optional dependencies. You can omit -it, if you do not need HDF5 or NetCDF support. +Our Jupyter Notebook [**Tutorial**](https://github.com/helmholtz-analytics/heat/blob/main/scripts/) illustrates Heat's basics. More tutorials [here](https://heat.readthedocs.io/en/latest/tutorials.html). -**It is recommended to use the most recent supported version of PyTorch!** +The complete documentation of the latest version is always deployed on +[Read the Docs](https://heat.readthedocs.io/). -**It is also very important to ensure that the PyTorch version is compatible with the local CUDA installation.** -More information can be found [here](https://pytorch.org/get-started/locally/). -# Hacking + +# Installation -The installation can then be done from the checked-out sources with +## Requirements -``` -$ pip install heat[hdf5,netcdf,dev] -``` +### Basics +- python >= 3.8 +- MPI (OpenMPI, MPICH, Intel MPI, etc.) +- mpi4py >= 3.0.0 +- pytorch >= 1.8.0 -# Getting Started +### Parallel I/O +- h5py +- netCDF4 -TL;DR: [Quick Start](quick_start.md) (Read this to get a quick overview of Heat). +### GPU support +In order to do computations on your GPU(s): +- your CUDA or ROCm installation must match your hardware and its drivers; +- your [PyTorch installation](https://pytorch.org/get-started/locally/) must be compiled with CUDA/ROCm support. -Check out our Jupyter Notebook [**Tutorial**](https://github.com/helmholtz-analytics/heat/blob/main/scripts/) -right here on GitHub or in the /scripts directory, to learn and understand about the basics and working of Heat. +### HPC systems +On most HPC-systems you will not be able to install/compile MPI or CUDA/ROCm yourself. Instead, you will most likely need to load a pre-installed MPI and/or CUDA/ROCm module from the module system. Maybe, you will even find PyTorch, h5py, or mpi4py as (part of) such a module. Note that for optimal performance on GPU, you need to usa an MPI library that has been compiled with CUDA/ROCm support (e.g., so-called "CUDA-aware MPI"). -The complete documentation of the latest version is always deployed on -[Read the Docs](https://heat.readthedocs.io/). -***Try your first Heat program*** +## pip +Install the latest version with -```shell -$ python +```bash +pip install heat[hdf5,netcdf] ``` +where the part in brackets is a list of optional dependencies. You can omit +it, if you do not need HDF5 or NetCDF support. -```python ->>> import heat as ht ->>> x = ht.arange(10,split=0) ->>> print(x) -DNDarray([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=ht.int32, device=cpu:0, split=0) ->>> y = ht.ones(10,split=0) ->>> print(y) -DNDarray([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], dtype=ht.float32, device=cpu:0, split=0) ->>> print(x + y) -DNDarray([ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10.], dtype=ht.float32, device=cpu:0, split=0) -``` +## **conda** -### Also, you can test your setup by running the [`heat_test.py`](https://github.com/helmholtz-analytics/heat/blob/main/scripts/heat_test.py) script: +The conda build includes all dependencies **including OpenMPI**. +```bash + conda install -c conda-forge heat + ``` -```shell -mpirun -n 2 python heat_test.py -``` +# Support Channels -### It should print something like this: +Go ahead and ask questions on [GitHub Discussions](https://github.com/helmholtz-analytics/heat/discussions). If you found a bug or are missing a feature, then please file a new [issue](https://github.com/helmholtz-analytics/heat/issues/new/choose). You can also get in touch with us on [Mattermost](https://mattermost.hzdr.de/signup_user_complete/?id=3sixwk9okpbzpjyfrhen5jpqfo) (sign up with your GitHub credentials). Once you log in, you can introduce yourself on the `Town Square` channel. -```shell -x is distributed: True -Global DNDarray x: DNDarray([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=ht.int32, device=cpu:0, split=0) -Global DNDarray x: -Local torch tensor on rank 0 : tensor([0, 1, 2, 3, 4], dtype=torch.int32) -Local torch tensor on rank 1 : tensor([5, 6, 7, 8, 9], dtype=torch.int32) -``` -## Resources: +# Contribution guidelines + +**We welcome contributions from the community, if you want to contribute to Heat, be sure to review the [Contribution Guidelines](contributing.md) and [Resources](#resources) before getting started!** + +We use [GitHub issues](https://github.com/helmholtz-analytics/heat/issues) for tracking requests and bugs, please see [Discussions](https://github.com/helmholtz-analytics/heat/discussions) for general questions and discussion. You can also get in touch with us on [Mattermost](https://mattermost.hzdr.de/signup_user_complete/?id=3sixwk9okpbzpjyfrhen5jpqfo) (sign up with your GitHub credentials). Once you log in, you can introduce yourself on the `Town Square` channel. + +If you’re unsure where to start or how your skills fit in, reach out! You can ask us here on GitHub, by leaving a comment on a relevant issue that is already open. + +**If you are new to contributing to open source, [this guide](https://opensource.guide/how-to-contribute/) helps explain why, what, and how to get involved.** + + +## Resources * [Heat Tutorials](https://heat.readthedocs.io/en/latest/tutorials.html) * [Heat API Reference](https://heat.readthedocs.io/en/latest/autoapi/index.html) ### Parallel Computing and MPI: -* @davidhenty's [course](https://www.archer2.ac.uk/training/courses/200514-mpi/) +* David Henty's [course](https://www.archer2.ac.uk/training/courses/200514-mpi/) * Wes Kendall's [Tutorials](https://mpitutorial.com/tutorials/) +* Rolf Rabenseifner's [MPI course material](https://www.hlrs.de/training/self-study-materials/mpi-course-material) (including C, Fortran **and** Python via `mpi4py`) ### mpi4py * [mpi4py docs](https://mpi4py.readthedocs.io/en/stable/tutorial.html) * [Tutorial](https://www.kth.se/blogs/pdc/2019/08/parallel-programming-in-python-mpi4py-part-1/) - -# Contribution guidelines - -**We welcome contributions from the community, if you want to contribute to Heat, be sure to review the [Contribution Guidelines](contributing.md) before getting started!** - -We use [GitHub issues](https://github.com/helmholtz-analytics/heat/issues) for tracking requests and bugs, please see [Discussions](https://github.com/helmholtz-analytics/heat/discussions) for general questions and discussion, and You can also get in touch with us on [Mattermost](https://mattermost.hzdr.de/signup_user_complete/?id=3sixwk9okpbzpjyfrhen5jpqfo). You can sign up with your GitHub credentials. Once you log in, you can introduce yourself on the `Town Square` channel. - -Small improvements or fixes are always appreciated; issues labeled as **"good first issue"** may be a good starting point. - -If you’re unsure where to start or how your skills fit in, reach out! You can ask us here on GitHub, by leaving a comment on a relevant issue that is already open. - -**If you are new to contributing to open source, [this guide](https://opensource.guide/how-to-contribute/) helps explain why, what, and how to get involved.** - # License Heat is distributed under the MIT license, see our @@ -166,7 +180,9 @@ Heat is distributed under the MIT license, see our # Citing Heat -If you find Heat helpful for your research, please mention it in your publications. You can cite: + + +Please do mention Heat in your publications if it helped your research. You can cite: * Götz, M., Debus, C., Coquelin, D., Krajsek, K., Comito, C., Knechtges, P., Hagemeier, B., Tarnawa, M., Hanselmann, S., Siggel, S., Basermann, A. & Streit, A. (2020). HeAT - a Distributed and GPU-accelerated Tensor Framework for Data Analytics. In 2020 IEEE International Conference on Big Data (Big Data) (pp. 276-287). IEEE, DOI: 10.1109/BigData50022.2020.9378050. @@ -195,6 +211,13 @@ If you find Heat helpful for your research, please mention it in your publicatio doi={10.1109/BigData50022.2020.9378050} } ``` +# FAQ +Work in progress... + + ## Acknowledgements @@ -202,8 +225,11 @@ If you find Heat helpful for your research, please mention it in your publicatio Networking Fund](https://www.helmholtz.de/en/about_us/the_association/initiating_and_networking/) under project number ZT-I-0003 and the Helmholtz AI platform grant.* +*This project has received funding from Google Summer of Code (GSoC) in 2022.* + + ---
- +
diff --git a/coverage_tables.md b/coverage_tables.md new file mode 100644 index 0000000000..f90dadfba4 --- /dev/null +++ b/coverage_tables.md @@ -0,0 +1,407 @@ +# NumPy Coverage Tables +This file is automatically generated by `./scripts/numpy_coverage_tables.py`. +Please do not edit this file directly, but instead edit `./scripts/numpy_coverage_tables.py` and run it to generate this file. +The following tables show the NumPy functions supported by Heat. +## Table of Contents +1. [NumPy Mathematical Functions](#numpy--mathematical-functions) +2. [NumPy Array Creation](#numpy-array-creation) +3. [NumPy Array Manipulation](#numpy-array-manipulation) +4. [NumPy Binary Operations](#numpy-binary-operations) +5. [NumPy IO Operations](#numpy-io-operations) +6. [NumPy LinAlg Operations](#numpy-linalg-operations) +7. [NumPy Logic Functions](#numpy-logic-functions) +8. [NumPy Sorting Operations](#numpy-sorting-operations) +9. [NumPy Statistical Operations](#numpy-statistical-operations) +10. [NumPy Random Operations](#numpy-random-operations) + +## NumPy Mathematical Functions +[Back to Table of Contents](#table-of-contents) + +| NumPy Mathematical Functions | Heat | +|---|---| +| sin | ✅ | +| cos | ✅ | +| tan | ✅ | +| arcsin | ✅ | +| arccos | ✅ | +| arctan | ✅ | +| hypot | ✅ | +| arctan2 | ✅ | +| degrees | ✅ | +| radians | ✅ | +| unwrap | ❌ | +| deg2rad | ✅ | +| rad2deg | ✅ | +| sinh | ✅ | +| cosh | ✅ | +| tanh | ✅ | +| arcsinh | ✅ | +| arccosh | ✅ | +| arctanh | ✅ | +| round | ✅ | +| around | ❌ | +| rint | ❌ | +| fix | ❌ | +| floor | ✅ | +| ceil | ✅ | +| trunc | ✅ | +| prod | ✅ | +| sum | ✅ | +| nanprod | ✅ | +| nansum | ✅ | +| cumprod | ✅ | +| cumsum | ✅ | +| nancumprod | ❌ | +| nancumsum | ❌ | +| diff | ✅ | +| ediff1d | ❌ | +| gradient | ❌ | +| cross | ✅ | +| trapz | ❌ | +| exp | ✅ | +| expm1 | ✅ | +| exp2 | ✅ | +| log | ✅ | +| log10 | ✅ | +| log2 | ✅ | +| log1p | ✅ | +| logaddexp | ✅ | +| logaddexp2 | ✅ | +| i0 | ❌ | +| sinc | ❌ | +| signbit | ✅ | +| copysign | ✅ | +| frexp | ❌ | +| ldexp | ❌ | +| nextafter | ❌ | +| spacing | ❌ | +| lcm | ✅ | +| gcd | ✅ | +| add | ✅ | +| reciprocal | ❌ | +| positive | ✅ | +| negative | ✅ | +| multiply | ✅ | +| divide | ✅ | +| power | ✅ | +| subtract | ✅ | +| true_divide | ❌ | +| floor_divide | ✅ | +| float_power | ❌ | +| fmod | ✅ | +| mod | ✅ | +| modf | ✅ | +| remainder | ✅ | +| divmod | ❌ | +| angle | ✅ | +| real | ✅ | +| imag | ✅ | +| conj | ✅ | +| conjugate | ✅ | +| maximum | ✅ | +| max | ✅ | +| amax | ❌ | +| fmax | ❌ | +| nanmax | ❌ | +| minimum | ✅ | +| min | ✅ | +| amin | ❌ | +| fmin | ❌ | +| nanmin | ❌ | +| convolve | ✅ | +| clip | ✅ | +| sqrt | ✅ | +| cbrt | ❌ | +| square | ✅ | +| absolute | ✅ | +| fabs | ✅ | +| sign | ✅ | +| heaviside | ❌ | +| nan_to_num | ✅ | +| real_if_close | ❌ | +| interp | ❌ | +## NumPy Array Creation +[Back to Table of Contents](#table-of-contents) + +| NumPy Array Creation | Heat | +|---|---| +| empty | ✅ | +| empty_like | ✅ | +| eye | ✅ | +| identity | ❌ | +| ones | ✅ | +| ones_like | ✅ | +| zeros | ✅ | +| zeros_like | ✅ | +| full | ✅ | +| full_like | ✅ | +| array | ✅ | +| asarray | ✅ | +| asanyarray | ❌ | +| ascontiguousarray | ❌ | +| asmatrix | ❌ | +| copy | ✅ | +| frombuffer | ❌ | +| from_dlpack | ❌ | +| fromfile | ❌ | +| fromfunction | ❌ | +| fromiter | ❌ | +| fromstring | ❌ | +| loadtxt | ❌ | +| arange | ✅ | +| linspace | ✅ | +| logspace | ✅ | +| geomspace | ❌ | +| meshgrid | ✅ | +| mgrid | ❌ | +| ogrid | ❌ | +| diag | ✅ | +| diagflat | ❌ | +| tri | ❌ | +| tril | ✅ | +| triu | ✅ | +| vander | ❌ | +| mat | ❌ | +| bmat | ❌ | +## NumPy Array Manipulation +[Back to Table of Contents](#table-of-contents) + +| NumPy Array Manipulation | Heat | +|---|---| +| copyto | ❌ | +| shape | ✅ | +| reshape | ✅ | +| ravel | ✅ | +| flat | ❌ | +| flatten | ✅ | +| moveaxis | ✅ | +| rollaxis | ❌ | +| swapaxes | ✅ | +| T | ❌ | +| transpose | ✅ | +| atleast_1d | ❌ | +| atleast_2d | ❌ | +| atleast_3d | ❌ | +| broadcast | ❌ | +| broadcast_to | ✅ | +| broadcast_arrays | ✅ | +| expand_dims | ✅ | +| squeeze | ✅ | +| asarray | ✅ | +| asanyarray | ❌ | +| asmatrix | ❌ | +| asfarray | ❌ | +| asfortranarray | ❌ | +| ascontiguousarray | ❌ | +| asarray_chkfinite | ❌ | +| require | ❌ | +| concatenate | ✅ | +| stack | ✅ | +| block | ❌ | +| vstack | ✅ | +| hstack | ✅ | +| dstack | ❌ | +| column_stack | ✅ | +| row_stack | ✅ | +| split | ✅ | +| array_split | ❌ | +| dsplit | ✅ | +| hsplit | ✅ | +| vsplit | ✅ | +| tile | ✅ | +| repeat | ✅ | +| delete | ❌ | +| insert | ❌ | +| append | ❌ | +| resize | ❌ | +| trim_zeros | ❌ | +| unique | ✅ | +| flip | ✅ | +| fliplr | ✅ | +| flipud | ✅ | +| reshape | ✅ | +| roll | ✅ | +| rot90 | ✅ | +## NumPy Binary Operations +[Back to Table of Contents](#table-of-contents) + +| NumPy Binary Operations | Heat | +|---|---| +| bitwise_and | ✅ | +| bitwise_or | ✅ | +| bitwise_xor | ✅ | +| invert | ✅ | +| left_shift | ✅ | +| right_shift | ✅ | +| packbits | ❌ | +| unpackbits | ❌ | +| binary_repr | ❌ | +## NumPy IO Operations +[Back to Table of Contents](#table-of-contents) + +| NumPy IO Operations | Heat | +|---|---| +| load | ✅ | +| save | ✅ | +| savez | ❌ | +| savez_compressed | ❌ | +| loadtxt | ❌ | +| savetxt | ❌ | +| genfromtxt | ❌ | +| fromregex | ❌ | +| fromstring | ❌ | +| tofile | ❌ | +| tolist | ❌ | +| array2string | ❌ | +| array_repr | ❌ | +| array_str | ❌ | +| format_float_positional | ❌ | +| format_float_scientific | ❌ | +| memmap | ❌ | +| open_memmap | ❌ | +| set_printoptions | ✅ | +| get_printoptions | ✅ | +| set_string_function | ❌ | +| printoptions | ❌ | +| binary_repr | ❌ | +| base_repr | ❌ | +| DataSource | ❌ | +| format | ❌ | +## NumPy LinAlg Operations +[Back to Table of Contents](#table-of-contents) + +| NumPy LinAlg Operations | Heat | +|---|---| +| dot | ✅ | +| linalg.multi_dot | ❌ | +| vdot | ✅ | +| inner | ❌ | +| outer | ✅ | +| matmul | ✅ | +| tensordot | ❌ | +| einsum | ❌ | +| einsum_path | ❌ | +| linalg.matrix_power | ❌ | +| kron | ❌ | +| linalg.cholesky | ❌ | +| linalg.qr | ✅ | +| linalg.svd | ❌ | +| linalg.eig | ❌ | +| linalg.eigh | ❌ | +| linalg.eigvals | ❌ | +| linalg.eigvalsh | ❌ | +| linalg.norm | ✅ | +| linalg.cond | ❌ | +| linalg.det | ✅ | +| linalg.matrix_rank | ❌ | +| linalg.slogdet | ❌ | +| trace | ✅ | +| linalg.solve | ❌ | +| linalg.tensorsolve | ❌ | +| linalg.lstsq | ❌ | +| linalg.inv | ✅ | +| linalg.pinv | ❌ | +| linalg.tensorinv | ❌ | +## NumPy Logic Functions +[Back to Table of Contents](#table-of-contents) + +| NumPy Logic Functions | Heat | +|---|---| +| all | ✅ | +| any | ✅ | +| isfinite | ✅ | +| isinf | ✅ | +| isnan | ✅ | +| isnat | ❌ | +| isneginf | ✅ | +| isposinf | ✅ | +| iscomplex | ✅ | +| iscomplexobj | ❌ | +| isfortran | ❌ | +| isreal | ✅ | +| isrealobj | ❌ | +| isscalar | ❌ | +| logical_and | ✅ | +| logical_or | ✅ | +| logical_not | ✅ | +| logical_xor | ✅ | +| allclose | ✅ | +| isclose | ✅ | +| array_equal | ❌ | +| array_equiv | ❌ | +| greater | ✅ | +| greater_equal | ✅ | +| less | ✅ | +| less_equal | ✅ | +| equal | ✅ | +| not_equal | ✅ | +## NumPy Sorting Operations +[Back to Table of Contents](#table-of-contents) + +| NumPy Sorting Operations | Heat | +|---|---| +| sort | ✅ | +| lexsort | ❌ | +| argsort | ❌ | +| sort | ✅ | +| sort_complex | ❌ | +| partition | ❌ | +| argpartition | ❌ | +| argmax | ✅ | +| nanargmax | ❌ | +| argmin | ✅ | +| nanargmin | ❌ | +| argwhere | ❌ | +| nonzero | ✅ | +| flatnonzero | ❌ | +| where | ✅ | +| searchsorted | ❌ | +| extract | ❌ | +| count_nonzero | ❌ | +## NumPy Statistical Operations +[Back to Table of Contents](#table-of-contents) + +| NumPy Statistical Operations | Heat | +|---|---| +| ptp | ❌ | +| percentile | ✅ | +| nanpercentile | ❌ | +| quantile | ❌ | +| nanquantile | ❌ | +| median | ✅ | +| average | ✅ | +| mean | ✅ | +| std | ✅ | +| var | ✅ | +| nanmedian | ❌ | +| nanmean | ❌ | +| nanstd | ❌ | +| nanvar | ❌ | +| corrcoef | ❌ | +| correlate | ❌ | +| cov | ✅ | +| histogram | ✅ | +| histogram2d | ❌ | +| histogramdd | ❌ | +| bincount | ✅ | +| histogram_bin_edges | ❌ | +| digitize | ✅ | +## NumPy Random Operations +[Back to Table of Contents](#table-of-contents) + +| NumPy Random Operations | Heat | +|---|---| +| random.rand | ✅ | +| random.randn | ✅ | +| random.randint | ✅ | +| random.random_integers | ❌ | +| random.random_sample | ✅ | +| random.ranf | ✅ | +| random.sample | ✅ | +| random.choice | ❌ | +| random.bytes | ❌ | +| random.shuffle | ❌ | +| random.permutation | ✅ | +| random.seed | ✅ | +| random.get_state | ✅ | +| random.set_state | ✅ | diff --git a/doc/images/fzj_logo.svg b/doc/images/fzj_logo.svg index 53868ecb83..3b765373b7 100644 --- a/doc/images/fzj_logo.svg +++ b/doc/images/fzj_logo.svg @@ -1,86 +1,14 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + +image/svg+xml + + + + +Logo_FZ_Juellich_RGB_schutzzone_weiss + + diff --git a/docker/README.md b/docker/README.md index 9202c974c5..aedeee419d 100644 --- a/docker/README.md +++ b/docker/README.md @@ -3,12 +3,12 @@ There is some flexibility to building the Docker images of Heat. Firstly, one can build from the released version taken from PyPI. This will either be -the latest release or the version set through the `--build-arg=HEAT_VERSION=1.2.0` +the latest release or the version set through the `--build-arg=HEAT_VERSION=X.Y.Z` argument. Secondly one can build a docker image from the GitHub sources, selected through `--build-arg=INSTALL_TYPE=source`. The default branch to be built is main, other -branches can be specified using `--build-arg=HEAT_BRANCH=branchname`. +branches can be specified using `--build-arg=HEAT_BRANCH=`. ## General build @@ -18,13 +18,15 @@ The [Dockerfile](./Dockerfile) guiding the build of the Docker image is located directory. It is typically most convenient to `cd` over here and run the Docker build as: ```console -$ docker build --build-args HEAT_VERSION=1.2.2 --PYTORCH_IMG=22.05-py3 -t heat:local . +$ docker build --build-args HEAT_VERSION=X.Y.Z --PYTORCH_IMG= -t heat . ``` +The heat image is based on the nvidia pytorch container. You can find exisiting tags in the [nvidia container catalog](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch/tags). + We also offer prebuilt images in our [Package registry](https://github.com/helmholtz-analytics/heat/pkgs/container/heat) from which you can pull existing images: ```console -$ docker pull ghcr.io/helmholtz-analytics/heat:1.2.0-dev_torch1.12_cuda11.7_py3.8 +$ docker pull ghcr.io/helmholtz-analytics/heat: ``` ### Building for HPC @@ -37,24 +39,24 @@ image also for HPC systems, such as the ones available at [Jülich Supercomputin To use one of the existing images from our registry: - $ apptainer build heat.sif docker://ghcr.io/helmholtz-analytics/heat:1.2.0-dev_torch1.12_cuda11.7_py3.8 + $ apptainer build heat.sif docker://ghcr.io/helmholtz-analytics/heat: Building the image can require root access in some systems. If that is the case, we recommend building the image on a local machine, and then upload it to the desired HPC system. If you see an error indicating that there is not enough space, use the --tmpdir flag of the build command. [Apptainer docs](https://apptainer.org/docs/user/latest/build_a_container.html) -#### SIB (Singularity Image Builder) +#### SIB (Singularity Image Builder) for Apptainer images A simple `Dockerfile` (in addition to the one above) to be used with SIB could look like this: - FROM ghcr.io/helmholtz-analytics/heat:1.2.0_torch1.12_cuda11.7_py3.8 + FROM ghcr.io/helmholtz-analytics/heat: The invocation to build the image would be: - $ sib upload ./Dockerfile heat_1.2.0_torch1.12_cuda11.7_py3.8 - $ sib build --recipe-name heat_1.2.0_torch1.12_cuda11.7_py3.8 - $ sib download --recipe-name heat_1.2.0_torch1.12_cuda11.7_py3.8 + $ sib upload ./Dockerfile heat + $ sib build --recipe-name heat + $ sib download --recipe-name heat However, SIB is capable of using just about any available Docker image from any registry, such that a specific Singularity image can be built by simply referencing the @@ -62,7 +64,7 @@ available image. SIB is thus used as a conversion tool. ## Running on HPC - $ singularity run --nv heat_1.2.0_torch.11_cuda11.5_py3.9.sif /bin/bash + $ apptainer run --nv heat /bin/bash $ python Python 3.8.13 (default, Mar 28 2022, 11:38:47) [GCC 7.5.0] :: Anaconda, Inc. on linux @@ -70,12 +72,12 @@ available image. SIB is thus used as a conversion tool. >>> import heat as ht ... -The `--nv` argument to `singularity`enables NVidia GPU support, which is desired for +The `--nv` argument to `apptainer` enables NVidia GPU support, which is desired for Heat. ### Multi-node example -The following file can be used as an example to use the singularity file together with SLURM, which allows heat to work in a multi-node environment. +The following file can be used as an example to use the apptainer file together with SLURM, which allows heat to work in a multi-node environment. ```bash #!/bin/bash @@ -85,5 +87,5 @@ The following file can be used as an example to use the singularity file togethe ... -srun --mpi="pmi2" singularity exec --nv heat_1.2.0_torch.11_cuda11.5_py3.9.sif bash -c "cd ~/code/heat/examples/lasso; python demo.py" +srun --mpi="pmi2" apptainer exec --nv heat_1.2.0_torch.11_cuda11.5_py3.9.sif bash -c "cd ~/code/heat/examples/lasso; python demo.py" ``` diff --git a/quick_start.md b/quick_start.md index f021393d96..be595da8fd 100644 --- a/quick_start.md +++ b/quick_start.md @@ -6,16 +6,12 @@ No-frills instructions for [new users](#new-users-condaconda-pippip-hpchpc-docke ### `conda` -A Heat conda build is [in progress](https://github.com/helmholtz-analytics/heat/issues/1050). -The script [heat_env.yml](https://github.com/helmholtz-analytics/heat/blob/main/scripts/heat_env.yml): +The Heat conda build includes all dependencies including OpenMPI. -- creates a virtual environment `heat_env` -- installs all dependencies including OpenMPI using [conda](https://conda.io/projects/conda/en/latest/user-guide/getting-started.html) -- installs Heat via `pip` - -``` -conda env create -f heat_env.yml +```shell +conda create --name heat_env conda activate heat_env +conda -c conda-forge heat ``` [Test](#test) your installation. @@ -34,12 +30,15 @@ pip install heat[hdf5,netcdf] [Test](#test) your installation. +### HPC +Work in progress. + ### Docker Get the docker image from our package repository ``` -docker pull ghcr.io/helmholtz-analytics/heat:1.2.0-dev_torch1.12_cuda11.7_py3.8 +docker pull ghcr.io/helmholtz-analytics/heat: ``` or build it from our Dockerfile @@ -47,9 +46,11 @@ or build it from our Dockerfile ``` git clone https://github.com/helmholtz-analytics/heat.git cd heat/docker -docker build -t heat:latest . +docker build --build-args HEAT_VERSION=X.Y.Z --PYTORCH_IMG= -t heat:latest . ``` +`` should be replaced with an existing version of the official Nvidia pytorch container image. Information and existing tags can be found on the [here](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch) + See [our docker README](https://github.com/helmholtz-analytics/heat/tree/main/docker/README.md) for other details. ### Test @@ -77,7 +78,7 @@ Local torch tensor on rank 1 : tensor([5, 6, 7, 8, 9], dtype=torch.int32) 3. [Fork](https://docs.github.com/en/get-started/quickstart/contributing-to-projects) or, if you have write access, clone the [Heat repository](https://github.com/helmholtz-analytics/heat). -4. Create a virtual environment `heat_dev` with all dependencies via [heat_dev.yml](https://github.com/helmholtz-analytics/heat/blob/main/scripts/heat_dev.yml). Note that `heat_dev.yml` does not install Heat via `pip` (as opposed to [`heat_env.yml`](#conda) for users). +4. Create a virtual environment `heat_dev` with all dependencies via [heat_dev.yml](https://github.com/helmholtz-analytics/heat/blob/main/scripts/heat_dev.yml). Note that `heat_dev.yml` does not install Heat. ``` conda env create -f heat_dev.yml diff --git a/scripts/heat_dev.yml b/scripts/heat_dev.yml index 3de812e489..1ca994771f 100644 --- a/scripts/heat_dev.yml +++ b/scripts/heat_dev.yml @@ -3,12 +3,12 @@ channels: - conda-forge - defaults dependencies: - - python=3.9 + - python=3.10 - openmpi - mpi4py - h5py[version='>=2.9',build=mpi*] - netcdf4 - - pytorch=1.13.0 + - pytorch - torchvision - scipy - pre-commit diff --git a/scripts/heat_env.yml b/scripts/heat_env.yml index 9d9130c22f..1d5e1b6dcd 100644 --- a/scripts/heat_env.yml +++ b/scripts/heat_env.yml @@ -3,14 +3,5 @@ channels: - conda-forge - defaults dependencies: - - python=3.9 - - openmpi - - mpi4py - - h5py[version='>=2.9',build=mpi*] - - netcdf4 - - pytorch=1.13.0 - - torchvision - - scipy - - pip - - pip: - - heat + - python=3.10 + - heat diff --git a/scripts/numpy_coverage_tables.py b/scripts/numpy_coverage_tables.py new file mode 100644 index 0000000000..1d4c8cff6a --- /dev/null +++ b/scripts/numpy_coverage_tables.py @@ -0,0 +1,583 @@ +import heat + +numpy_functions = [] + +# List of numpy functions +headers = {"0": "NumPy Mathematical Functions"} +numpy_mathematical_functions = [ + "sin", + "cos", + "tan", + "arcsin", + "arccos", + "arctan", + "hypot", + "arctan2", + "degrees", + "radians", + "unwrap", + "deg2rad", + "rad2deg", + "sinh", + "cosh", + "tanh", + "arcsinh", + "arccosh", + "arctanh", + "round", + "around", + "rint", + "fix", + "floor", + "ceil", + "trunc", + "prod", + "sum", + "nanprod", + "nansum", + "cumprod", + "cumsum", + "nancumprod", + "nancumsum", + "diff", + "ediff1d", + "gradient", + "cross", + "trapz", + "exp", + "expm1", + "exp2", + "log", + "log10", + "log2", + "log1p", + "logaddexp", + "logaddexp2", + "i0", + "sinc", + "signbit", + "copysign", + "frexp", + "ldexp", + "nextafter", + "spacing", + "lcm", + "gcd", + "add", + "reciprocal", + "positive", + "negative", + "multiply", + "divide", + "power", + "subtract", + "true_divide", + "floor_divide", + "float_power", + "fmod", + "mod", + "modf", + "remainder", + "divmod", + "angle", + "real", + "imag", + "conj", + "conjugate", + "maximum", + "max", + "amax", + "fmax", + "nanmax", + "minimum", + "min", + "amin", + "fmin", + "nanmin", + "convolve", + "clip", + "sqrt", + "cbrt", + "square", + "absolute", + "fabs", + "sign", + "heaviside", + "nan_to_num", + "real_if_close", + "interp", +] +numpy_functions.append(numpy_mathematical_functions) + +numpy_array_creation = [ + "empty", + "empty_like", + "eye", + "identity", + "ones", + "ones_like", + "zeros", + "zeros_like", + "full", + "full_like", + "array", + "asarray", + "asanyarray", + "ascontiguousarray", + "asmatrix", + "copy", + "frombuffer", + "from_dlpack", + "fromfile", + "fromfunction", + "fromiter", + "fromstring", + "loadtxt", + "arange", + "linspace", + "logspace", + "geomspace", + "meshgrid", + "mgrid", + "ogrid", + "diag", + "diagflat", + "tri", + "tril", + "triu", + "vander", + "mat", + "bmat", +] +numpy_functions.append(numpy_array_creation) +headers[str(len(headers))] = "NumPy Array Creation" + +numpy_array_manipulation = [ + "copyto", + "shape", + "reshape", + "ravel", + "flat", + "flatten", + "moveaxis", + "rollaxis", + "swapaxes", + "T", + "transpose", + "atleast_1d", + "atleast_2d", + "atleast_3d", + "broadcast", + "broadcast_to", + "broadcast_arrays", + "expand_dims", + "squeeze", + "asarray", + "asanyarray", + "asmatrix", + "asfarray", + "asfortranarray", + "ascontiguousarray", + "asarray_chkfinite", + "require", + "concatenate", + "stack", + "block", + "vstack", + "hstack", + "dstack", + "column_stack", + "row_stack", + "split", + "array_split", + "dsplit", + "hsplit", + "vsplit", + "tile", + "repeat", + "delete", + "insert", + "append", + "resize", + "trim_zeros", + "unique", + "flip", + "fliplr", + "flipud", + "reshape", + "roll", + "rot90", +] +numpy_functions.append(numpy_array_manipulation) +headers[str(len(headers))] = "NumPy Array Manipulation" + +numpy_binary_operations = [ + "bitwise_and", + "bitwise_or", + "bitwise_xor", + "invert", + "left_shift", + "right_shift", + "packbits", + "unpackbits", + "binary_repr", +] +numpy_functions.append(numpy_binary_operations) +headers[str(len(headers))] = "NumPy Binary Operations" + +numpy_io_operations = [ + # numpy.load + # numpy.save + # numpy.savez_compressed + # numpy.loadtxt + # numpy.savez + # numpy.savetxt + # numpy.genfromtxt + # numpy.fromregex + # numpy.fromstring + # numpy.ndarray.tofile + # numpy.ndarray.tolist + # numpy.array2string + # numpy.array_repr + # numpy.array_str + # numpy.format_float_positional + # numpy.format_float_scientific + # numpy.memmap + # numpy.lib.format.open_memmap + # numpy.set_printoptions + # numpy.get_printoptions + # numpy.set_string_function + # numpy.printoptions + # numpy.binary_repr + # numpy.base_repr + # numpy.DataSource + # numpy.lib.format + "load", + "save", + "savez", + "savez_compressed", + "loadtxt", + "savetxt", + "genfromtxt", + "fromregex", + "fromstring", + "tofile", + "tolist", + "array2string", + "array_repr", + "array_str", + "format_float_positional", + "format_float_scientific", + "memmap", + "open_memmap", + "set_printoptions", + "get_printoptions", + "set_string_function", + "printoptions", + "binary_repr", + "base_repr", + "DataSource", + "format", +] +numpy_functions.append(numpy_io_operations) +headers[str(len(headers))] = "NumPy IO Operations" + +numpy_linalg_operations = [ + # numpy.dot + # numpy.linalg.multi_dot + # numpy.vdot + # numpy.inner + # numpy.outer + # numpy.matmul + # numpy.tensordot + # numpy.einsum + # numpy.einsum_path + # numpy.linalg.matrix_power + # numpy.kron + # numpy.linalg.cholesky + # numpy.linalg.qr + # numpy.linalg.svd + # numpy.linalg.eig + # numpy.linalg.eigh + # numpy.linalg.eigvals + # numpy.linalg.eigvalsh + # numpy.linalg.norm + # numpy.linalg.cond + # numpy.linalg.det + # numpy.linalg.matrix_rank + # numpy.linalg.slogdet + # numpy.trace + # numpy.linalg.solve + # numpy.linalg.tensorsolve + # numpy.linalg.lstsq + # numpy.linalg.inv + # numpy.linalg.pinv + # numpy.linalg.tensorinv + "dot", + "linalg.multi_dot", + "vdot", + "inner", + "outer", + "matmul", + "tensordot", + "einsum", + "einsum_path", + "linalg.matrix_power", + "kron", + "linalg.cholesky", + "linalg.qr", + "linalg.svd", + "linalg.eig", + "linalg.eigh", + "linalg.eigvals", + "linalg.eigvalsh", + "linalg.norm", + "linalg.cond", + "linalg.det", + "linalg.matrix_rank", + "linalg.slogdet", + "trace", + "linalg.solve", + "linalg.tensorsolve", + "linalg.lstsq", + "linalg.inv", + "linalg.pinv", + "linalg.tensorinv", +] +numpy_functions.append(numpy_linalg_operations) +headers[str(len(headers))] = "NumPy LinAlg Operations" + +numpy_logic_operations = [ + # numpy.all + # numpy.any + # numpy.isinf + # numpy.isfinite + # numpy.isnan + # numpy.isnat + # numpy.isneginf + # numpy.isposinf + # numpy.iscomplex + # numpy.iscomplexobj + # numpy.isfortran + # numpy.isreal + # numpy.isrealobj + # numpy.isscalar + # numpy.logical_and + # numpy.logical_or + # numpy.logical_not + # numpy.logical_xor + # numpy.allclose + # numpy.isclose + # numpy.array_equal + # numpy.array_equiv + # numpy.greater + # numpy.greater_equal + # numpy.less + # numpy.less_equal + # numpy.equal + # numpy.not_equal + "all", + "any", + "isfinite", + "isinf", + "isnan", + "isnat", + "isneginf", + "isposinf", + "iscomplex", + "iscomplexobj", + "isfortran", + "isreal", + "isrealobj", + "isscalar", + "logical_and", + "logical_or", + "logical_not", + "logical_xor", + "allclose", + "isclose", + "array_equal", + "array_equiv", + "greater", + "greater_equal", + "less", + "less_equal", + "equal", + "not_equal", +] +numpy_functions.append(numpy_logic_operations) +headers[str(len(headers))] = "NumPy Logic Functions" + +numpy_sorting_operations = [ + # numpy.sort + # numpy.lexsort + # numpy.argsort + # numpy.ndarray.sort + # numpy.sort_complex + # numpy.partition + # numpy.argpartition + # numpy.argmax + # numpy.nanargmax + # numpy.argmin + # numpy.nanargmin + # numpy.argwhere + # numpy.nonzero + # numpy.flatnonzero + # numpy.where + # numpy.searchsorted + # numpy.extract + # numpy.count_nonzero + "sort", + "lexsort", + "argsort", + "sort", + "sort_complex", + "partition", + "argpartition", + "argmax", + "nanargmax", + "argmin", + "nanargmin", + "argwhere", + "nonzero", + "flatnonzero", + "where", + "searchsorted", + "extract", + "count_nonzero", +] +numpy_functions.append(numpy_sorting_operations) +headers[str(len(headers))] = "NumPy Sorting Operations" + +numpy_statistics_operations = [ + # numpy.ptp + # numpy.percentile + # numpy.nanpercentile + # numpy.quantile + # numpy.nanquantile + # numpy.median + # numpy.average + # numpy.mean + # numpy.std + # numpy.var + # numpy.nanmedian + # numpy.nanmean + # numpy.nanstd + # numpy.nanvar + # numpy.corrcoef + # numpy.correlate + # numpy.cov + # numpy.histogram + # numpy.histogram2d + # numpy.histogramdd + # numpy.bincount + # numpy.histogram_bin_edges + # numpy.digitize + "ptp", + "percentile", + "nanpercentile", + "quantile", + "nanquantile", + "median", + "average", + "mean", + "std", + "var", + "nanmedian", + "nanmean", + "nanstd", + "nanvar", + "corrcoef", + "correlate", + "cov", + "histogram", + "histogram2d", + "histogramdd", + "bincount", + "histogram_bin_edges", + "digitize", +] +numpy_functions.append(numpy_statistics_operations) +headers[str(len(headers))] = "NumPy Statistical Operations" + +# numpy random operations +numpy_random_operations = [ + # numpy.random.rand + # numpy.random.randn + # numpy.random.randint + # numpy.random.random_integers + # numpy.random.random_sample + # numpy.random.ranf + # numpy.random.sample + # numpy.random.choice + # numpy.random.bytes + # numpy.random.shuffle + # numpy.random.permutation + # numpy.random.seed + # numpy.random.get_state + # numpy.random.set_state + "random.rand", + "random.randn", + "random.randint", + "random.random_integers", + "random.random_sample", + "random.ranf", + "random.sample", + "random.choice", + "random.bytes", + "random.shuffle", + "random.permutation", + "random.seed", + "random.get_state", + "random.set_state", +] +numpy_functions.append(numpy_random_operations) +headers[str(len(headers))] = "NumPy Random Operations" + +# initialize markdown file +# open the file in write mode +f = open("coverage_tables.md", "w") +# write in file +f.write("# NumPy Coverage Tables\n") +f.write("This file is automatically generated by `./scripts/numpy_coverage_tables.py`.\n") +f.write( + "Please do not edit this file directly, but instead edit `./scripts/numpy_coverage_tables.py` and run it to generate this file.\n" +) +f.write("The following tables show the NumPy functions supported by Heat.\n") + +# create Table of Contents +f.write("## Table of Contents\n") +for i, header in enumerate(headers): + f.write(f"{i+1}. [{headers[header]}](#{headers[header].lower().replace(' ', '-')})\n") +f.write("\n") + +for i, function_list in enumerate(numpy_functions): + f.write(f"## {headers[str(i)]}\n") + # Initialize a list to store the rows of the Markdown table + table_rows = [] + + # Check if functions exist in the heat library and create table rows + for func_name in function_list: + if ( + hasattr(heat, func_name) + or hasattr(heat.linalg, func_name.replace("linalg.", "")) + or hasattr(heat.random, func_name.replace("random.", "")) + ): + support_status = "✅" # Green checkmark for supported functions + else: + support_status = "❌" # Red cross for unsupported functions + + table_row = f"| {func_name} | {support_status} |" + table_rows.append(table_row) + + # Create the Markdown table header + table_header = f"| {headers[str(i)]} | Heat |\n|---|---|\n" + + # Combine the header and table rows + markdown_table = table_header + "\n".join(table_rows) + + # write link to table of contents + f.write("[Back to Table of Contents](#table-of-contents)\n\n") + # Print the Markdown table + f.write(markdown_table) + f.write("\n")