Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added documentation #5

Merged
merged 31 commits into from
Apr 15, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .github/workflows/Documenter.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,12 @@ jobs:
- uses: julia-actions/setup-julia@latest
with:
version: '1.9.4'
- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: '3.x' # Ensures that Python 3.x is used
- name: Install Python dependencies
run: python3 -m pip install matplotlib
- name: Install dependencies
run: julia --project=docs/ -e 'using Pkg; Pkg.develop(PackageSpec(path=pwd())); Pkg.instantiate()'
- name: Build and deploy
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -14,3 +14,4 @@ movies/
utilites/
notes.md
Manifest.toml
!logo.png
2 changes: 0 additions & 2 deletions Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@ version = "2.0.2"

[deps]
CUDA = "052768ef-5323-5732-b1bb-66c8b64840ba"
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
DocumenterTools = "35a29f4d-8980-5a13-9543-d66fff28ecb8"
DrWatson = "634d3b9d-ee7a-5ddf-bec9-22491ea816e1"
FileIO = "5789e2e9-d7fb-5bc7-8068-2c6fae9b9549"
Flux = "587475ba-b771-5e3f-ad9e-33799f191a9c"
Expand Down
155 changes: 22 additions & 133 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,147 +1,36 @@
# Distributed Fourier Neural Operators
# ParametricDFNOs.jl

## Description
[![][license-img]][license-status]
[![Documenter](https://github.com/slimgroup/ParametericDFNOs.jl/actions/workflows/Documenter.yml/badge.svg)](https://github.com/slimgroup/ParametericDFNOs.jl/actions/workflows/Documenter.yml)
[![TagBot](https://github.com/slimgroup/ParametericDFNOs.jl/actions/workflows/TagBot.yml/badge.svg)](https://github.com/slimgroup/ParametericDFNOs.jl/actions/workflows/TagBot.yml)

The Fourier Neural Operator (FNO) is a neural network designed to approximate solutions to partial differential equations (PDEs), specifically for two-phase flows such as CO2 plume evolution in carbon capture and storage (CCS) processes, atmospheric fields, etc. By transforming inputs to frequency space using spectral convolution operators and leveraging the efficiency of Fourier transforms, FNOs offer a significant speed-up in simulation times compared to traditional methods. This project involves extending FNOs to operate in a distributed fashion for handling large-scale, realistic three-dimensional two-phase flow problems.
<!-- [![][zenodo-img]][zenodo-status] -->

We offer support for distributed 2D and 3D time varying problems.
## Getting Started
`ParametricDFNOs.jl` is a Julia Language-based scientific library designed to facilitate training Fourier Neural Operators involving large-scale data using [`ParametricOperators.jl`](https://github.com/slimgroup/ParametricOperators.jl). We offer support for distributed 2D and 3D time varying problems.

### Dependencies
## Setup

- `Julia 1.8.5`
- MPI distribution.
```julia
julia> using Pkg
julia> Pkg.activate("path/to/your/environment")
julia> Pkg.add("ParametricDFNOs")
```

### Installing
This will add `ParametricDFNOs.jl` as dependency to your project

```
git clone https://github.com/turquoisedragon2926/dfno.git
cd dfno
julia
> ] instantiate .
> using MPI
> MPI.install_mpiexecjl()
```
## Documentation

NOTE: Add mpiexecjl to your PATH
Check out the [Documentation](https://slimgroup.github.io/ParametricDFNOs.jl) for more or get started by running some [examples](https://github.com/turquoisedragon2926/ParametricDFNOs.jl-Examples)!

### Executing program on custom dataset for 2D time varying
1. Open `examples/training/training_2d.jl`
## Issues

2. Update data reading function.

* We have provided a wrapper for distributed reading if you do not have a data reading function set up. In order to use this, implement the following two functions:

```
function dist_read_x_tensor(file_name, key, indices)
data = nothing
h5open(file_name, "r") do file
dataset = file[key]
data = dataset[indices...]
end
return reshape(data, 1, (size(data)...))
end
```

```
function dist_read_y_tensor(file_name, key, indices)
data = nothing
h5open(file_name, "r") do file
dataset = file[key]
data = dataset[indices...]
end
return reshape(data, 1, (size(data)...))
end
```

* See `examples/perlmutter/data.jl` on example for how to use this for even cases where your samples are stored across multiple files.

3. Pass required hyperparams following the ModelConfig:

* Line to modify:

```
modelConfig = DFNO_2D.ModelConfig(nblocks=4, partition=partition)
```

* Options:
```
struct ModelConfig
nx::Int = 64
ny::Int = 64
nt::Int = 51
nc_in::Int = 4
nc_mid::Int = 128
nc_lift::Int = 20
nc_out::Int = 1
mx::Int = 8
my::Int = 8
mt::Int = 4
nblocks::Int = 4
dtype::DataType = Float32
partition::Vector{Int} = [1, 4]
end
```

4. Modify parameters to train config based on requirement:

* Line to modify:
```
trainConfig = DFNO_2D.TrainConfig(
epochs=200,
x_train=x_train,
y_train=y_train,
x_valid=x_valid,
y_valid=y_valid,
)
```

* Options:
```
struct TrainConfig
nbatch::Int = 2
epochs::Int = 1
seed::Int = 1234
plot_every::Int = 1
learning_rate::Float32 = 1f-4
x_train::Any
y_train::Any
x_valid::Any
y_valid::Any
end
```

5. Execute the program with number of required workers

```
mpiexecjl --project=./ -n 4 julia examples/training/training_2d.jl
```

## Help

Common problems or issues.

```
This section will be updated with common issues and fixes
```
This section will contain common issues and corresponding fixes. Currently, we only provide support for Julia-1.9

## Authors

[Richard Rex](https://www.linkedin.com/in/richard-rex/) - Georgia Institute of Technology

## Version History

* v2.0.0
* Various bug fixes and memory optimizations
* Ability to scale to $512^3$ across 500 GPUs

* v1.0.0
* Initial working DFNO

## License

This project is licensed under the Creative Commons License - see the LICENSE.md file for details

## Acknowledgments
Richard Rex, [richardr2926@gatech.edu](mailto:richardr2926@gatech.edu) <br/>

This research was carried out with the support of Georgia Research Alliance, Extreme Scale Solutions and partners of the ML4Seismic Center.
[license-status]:LICENSE
<!-- [zenodo-status]:https://doi.org/10.5281/zenodo.6799258 -->
[license-img]:http://img.shields.io/badge/license-MIT-brightgreen.svg?style=flat?style=plastic
<!-- [zenodo-img]:https://zenodo.org/badge/DOI/10.5281/zenodo.3878711.svg?style=plastic -->
22 changes: 14 additions & 8 deletions docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -3,19 +3,23 @@ using ParametricDFNOs

makedocs(
sitename = "ParametricDFNOs.jl",
doctest=false, clean=true,
authors="Richard Rex",
format = Documenter.HTML(),
# modules = [ParametricOperators],
modules = [ParametricDFNOs],
pages=[
"Introduction" => "index.md",
"Quick Start" => "quickstart.md",
"Distribution" => "distribution.md",
"Examples" => [
"3D FFT" => "examples/3D_FFT.md",
"Distributed 3D FFT" => "examples/3D_DFFT.md",
"3D Conv" => "examples/3D_Conv.md",
"Distributed 3D Conv" => "examples/3D_DConv.md",
"2D Forward and Gradient" => "examples/simple_2D.md",
"2D Training" => "examples/training_2D.md",
"3D Forward and Gradient" => "examples/simple_3D.md",
"3D Custom Dataset" => "examples/custom_3D.md",
],
"API" => "api.md",
"API" => [
"2D Time varying" => "api/DFNO_2D.md",
"3D Time varying" => "api/DFNO_3D.md",
"Utilites" => "api/UTILS.md",],
"Future Work" => "future.md",
"Citation" => "citation.md"
]
Expand All @@ -24,6 +28,8 @@ makedocs(
# Automatically deploy documentation to gh-pages.
deploydocs(
repo = "github.com/slimgroup/ParametricDFNOs.jl.git",
push_preview=true,
devurl = "dev",
devbranch = "release",
devbranch = "master",
versions = ["dev" => "dev", "stable" => "v^"],
)
3 changes: 0 additions & 3 deletions docs/src/api.md

This file was deleted.

61 changes: 61 additions & 0 deletions docs/src/api/DFNO_2D.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# 2D Time varying FNO

!!! tip "2D Time varying"
The implementation allows for discretization along time dimension to be 1 (only 1 time step). But you can also treat time as any other dimension, so this could also be as a generic 3D FNO

## 2D Model

```@autodocs
Modules = [ParametricDFNOs.DFNO_2D]
Order = [:type, :function]
Pages = ["model.jl"]
```

## 2D Forward Pass

```@autodocs
Modules = [ParametricDFNOs.DFNO_2D]
Order = [:type, :function]
Pages = ["forward.jl"]
```

## 2D Training

```@autodocs
Modules = [ParametricDFNOs.DFNO_2D]
Order = [:type, :function]
Pages = ["train.jl"]
```

## 2D Data Loading

!!! warning "Critical component"
See [Data Partitioning](@ref) for instructions on how to set it up properly.

```@autodocs
Modules = [ParametricDFNOs.DFNO_2D]
Order = [:type, :function]
Pages = ["data.jl"]
```

```@autodocs
Modules = [ParametricDFNOs.DFNO_2D.UTILS]
Order = [:type, :function]
Pages = ["utils.jl"]
```

## 2D Plotting

```@autodocs
Modules = [ParametricDFNOs.DFNO_2D]
Order = [:type, :function]
Pages = ["plot.jl"]
```

## 2D Checkpoints

```@autodocs
Modules = [ParametricDFNOs.DFNO_2D]
Order = [:type, :function]
Pages = ["weights.jl"]
```
64 changes: 64 additions & 0 deletions docs/src/api/DFNO_3D.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
# 3D Time varying FNO

!!! tip "3D Time varying"
The implementation allows for discretization along time dimension to be 1 (only 1 time step). But you can also treat time as any other dimension, so this could also be as a generic 4D FNO

## 3D Model

```@autodocs
Modules = [ParametricDFNOs.DFNO_3D]
Order = [:type, :function]
Pages = ["model.jl"]
```

## 3D Forward Pass

```@autodocs
Modules = [ParametricDFNOs.DFNO_3D]
Order = [:type, :function]
Pages = ["forward.jl"]
```

## 3D Training

```@autodocs
Modules = [ParametricDFNOs.DFNO_3D]
Order = [:type, :function]
Pages = ["train.jl"]
```

## 3D Data Loading

!!! warning "Critical component"
See [Data Partitioning](@ref) for instructions on how to set it up properly.

```@autodocs
Modules = [ParametricDFNOs.DFNO_3D]
Order = [:type, :function]
Pages = ["data.jl"]
```

!!! tip "Distributed read for complex storage scenarios"
View [Custom 3D Time varying FNO](@ref) for an example of how you can extend this distributed read to a complex storage scheme.

```@autodocs
Modules = [ParametricDFNOs.DFNO_3D.UTILS]
Order = [:type, :function]
Pages = ["utils.jl"]
```

## 3D Plotting

```@autodocs
Modules = [ParametricDFNOs.DFNO_3D]
Order = [:type, :function]
Pages = ["plot.jl"]
```

## 3D Checkpoints

```@autodocs
Modules = [ParametricDFNOs.DFNO_3D]
Order = [:type, :function]
Pages = ["weights.jl"]
```
24 changes: 24 additions & 0 deletions docs/src/api/UTILS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# Utilities

!!! note "Distributed Loss Function"
We provide a distributed relative L2 loss but most distributed loss functions should be straight-forward to build with [`ParametricOperators.jl`](https://github.com/slimgroup/ParametricOperators.jl)

```@autodocs
Modules = [ParametricDFNOs.UTILS]
Order = [:type, :function]
Pages = ["utils.jl"]
```

### GPU Helpers

```@autodocs
Modules = [ParametricDFNOs.DFNO_2D]
Order = [:type, :function]
Pages = ["DFNO_2D.jl"]
```

```@autodocs
Modules = [ParametricDFNOs.DFNO_3D]
Order = [:type, :function]
Pages = ["DFNO_3D.jl"]
```
Binary file added docs/src/assets/logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Loading