# A Tutorial on Community Models and Integrations - PhysicsNeMo

In this tutorial, PhysicsNeMo is used to expand the scope of community models and datasets, such as [The Well](https://github.com/PolymathicAI/the_well) by leveraging physics informed utilities, optimized model layers, and MLOps best practices. Specifically, the tutorial covers:
. Loading and evaluating a checkpoint from `The Well` 
. How to use a pretrained checkpoint from `The Well` and run it as a PhysicsNeMo user
. Training community models with PhysicsNeMo
. Fine-tuning community models with PhysicsNeMo
. Experimenting with different architectures, from the community and internal to PhysicsNeMo

## Loading Models and Data from `The Well`

To begin, a model and dataset from `the_well` is selected for use throughout this example. For this example, the [Magnetohydrodynamics dataset](https://github.com/PolymathicAI/the_well/tree/master/datasets/MHD_64) is used. Magnetohydrodynamics (MHD), is the study of the dynamics of electrically conducting fluids such as plasmas.

Note that any one of the models and dataset combinations may be selected.

In addition to the data, a pre-trained model with a Tucker-Factorized Fourier Neural Operator (TFNO) architecture is used that will be later converted to PhysicsNeMo.

The dataset streaming functionality from `the_well` will be utilized so that the dataset does not need to be downloaded locally. This requires accessing huggingface. 

In [None]:
from the_well.benchmark.models import TFNO
from torchinfo import summary

# Load The Well model
well_model = TFNO.from_pretrained("polymathic-ai/TFNO-MHD_64")

# Have a look at the model summary
summary(well_model, depth=5)

In [None]:
from the_well.data import WellDataset

# Enable streaming the dataset from HuggingFace
# The following line may take a couple of minutes to instantiate the datamodule
dataset = WellDataset(
    well_base_path="hf://datasets/polymathic-ai/",  # access from HF hub
    well_dataset_name="MHD_64",
    well_split_name="train",
)

With the dataset on hand, its features, shape and size can be explored. `The Well` provides a great script for this already, and is left to the reader to explore if desired. A summary is provided below, also available on the [dataset card](https://github.com/PolymathicAI/the_well/blob/master/datasets/MHD_64/README.md) online:

**Dimension of discretized data:** 100 timesteps of 64 $\times$ 64 $\times$ 64 cubes.

**Fields available in the data:** Density (scalar field), velocity (vector field), magnetic field (vector field).

**Number of trajectories:** 10 Initial conditions x 10 combination of parameters = 100 trajectories.

**Estimated size of the ensemble of all simulations:** 71.6 GB.

**Grid type:** uniform grid, cartesian coordinates.

**Initial conditions:** uniform IC.

**Boundary conditions:** periodic boundary conditions.

**Data are stored separated by ($\Delta t$):** 0.01 (arbitrary units).

**Total time range ($t\_{min}$ to $t\_{max}$):** $t\_{min} = 0$, $t\_{max} = 1$.

**Spatial domain size ($L_x$, $L_y$, $L_z$):** dimensionless so 64 pixels.

**Set of coefficients or non-dimensional parameters evaluated:** all combinations of $\mathcal{M}_s=${0.5, 0.7, 1.5, 2.0 7.0} and $\mathcal{M}_A =${0.7, 2.0}.

**Approximate time and hardware used to generate the data:** Downsampled from `MHD_256` after applying ideal low-pass filter.

**What phenomena of physical interest are catpured in the data:** MHD fluid flows in the compressible limit (sub and super sonic, sub and super Alfvenic).

**How to evaluate a new simulator operating in this space:** Check metrics such as Power spectrum, two-points correlation function.

Please cite the associated paper if you use this data in your research:

```
@article{burkhart2020catalogue,
  title={The catalogue for astrophysical turbulence simulations (cats)},
  author={Burkhart, B and Appel, SM and Bialy, S and Cho, J and Christensen, AJ and Collins, D and Federrath, Christoph and Fielding, DB and Finkbeiner, D and Hill, AS and others},
  journal={The Astrophysical Journal},
  volume={905},
  number={1},
  pages={14},
  year={2020},
  publisher={IOP Publishing}
}
```




A single sample can be extracted and inspected. Some notes from `The Well` are shown below. More info on examining their data can be found in [this example](https://github.com/PolymathicAI/the_well/blob/master/docs/tutorials/dataset.ipynb):

The most important elements are `input_fields` and `output_fields`. They represent the time-varying physical fields of the dynamical system and are generally the input and target of our models. For a dynamical system that has 2 spatial dimensions $x$ and $y$, `input_fields` would have a shape $(T_{in}, L_x, L_y, F)$ and `output_fields` would have a shape $(T_{out}, L_x, L_y, F)$. The number of input and output timesteps $T_{in}$ and $T_{out}$ are specified at the instantiation of the dataset with the arguments `n_steps_input` and `n_steps_output`. $L_x$ and $L_y$ are the lengths of the spatial dimensions. $F$ represents the number of physical fields, where vector fields $v = (v_x, v_y)$ and tensor fields $t = (t_{xx}, t_{xy}, t_{yx}, t_{yy})$ are flattened.

In [None]:
sample = dataset[0]
for k, v in sample.items():
    print(f"Key: {k.ljust(20)} Shape: {v.shape}")

print(f"Field Names: {dataset.metadata.field_names}")

Using the model summary from above and the order of operations in the TFNO forward pass, the data is processed by:

1. Applying optional positional encoding
2. Sending inputs through a lifting layer to a high-dimensional latent space
3. Applying optional domain padding to high-dimensional intermediate function representation
4. Applying `n_layers` Fourier/TFNO layers in sequence (SpectralConvolution + skip connections, nonlinearity) 
5. If domain padding was applied, domain padding is removed
6. Projection of intermediate function representation to the output channels


This pretrained model is trained to predict the $T_{out} = 1$ next states given the $T_{in} = 4$ previous states. The input steps are concatenated along their channels, such that the model expects $T_{in} \times F$ channels as input and $T_{out} \times F$ channels as output. Because `WellDataset` is a PyTorch dataset, we can use it conveniently with PyTorch data-loaders.

Essentially we can build a nice example to illustrate the points we make in Converting PyTorch models to PhysicsNeMo models.


[Converting PyTorch Models to PhysicsNeMo Models](https://docs.nvidia.com/deeplearning/physicsnemo/physicsnemo-core/api/physicsnemo.models.html)


In this example, we show:
1. How to bring a pretrained checkpoint from the community ("The well") and run it as a PhysicsNeMo user
2. Then train the same model architecture in PhysicsNeMo - showcasing how easy it is to work with any community PyTorch model
3. Then talk about how someone can experiment with different other architectures - true value of PhysicsNeMo

In [None]:
export PYTHONPATH=$PYTHONPATH:/workspace/PhysicsNeMo-git/physicsnemo

In [None]:
### Loading a Pre-trained Model from The Well
```python
from the_well.benchmark.models import FNO

# Load a pre-trained FNO model
model = FNO.from_pretrained("polymathic-ai/FNO-active_matter")
type(model)
```

In [None]:
import torch
from the_well.benchmark.models import FNO
from physicsnemo.models.module import Module
from physicsnemo.models.meta import ModelMetaData
from physicsnemo.registry import ModelRegistry

well_model = FNO.from_pretrained("polymathic-ai/FNO-active_matter")

pytorch_model = well_model.model

pytorch_model.__name__ = "thewell"


# Simple method
physicsnemo_model = Module.from_torch(pytorch_model, meta=ModelMetaData(name="converted_fno"))


# Or Use a PhysicsNeMo-Compatible Model
class PhysicsNeMoFNO(Module):
    def __init__(self, fno_model, metadata=None):
        super().__init__(metadata=metadata or MetaData())
        self.fno = fno_model
        
    def forward(self, x):
        # PhysicsNeMo expects specific input/output formats
        # Adapt as needed based on your use case
        return self.fno(x)
    
    @classmethod
    def from_well_model(cls, well_model_name):
        """Convert a Well model to PhysicsNeMo format"""
        # Load the Well model
        if "FNO" in well_model_name:
            from the_well.benchmark.models import FNO
            well_model = FNO.from_pretrained(well_model_name)
        elif "UNet" in well_model_name:
            from the_well.benchmark.models import UNet
            well_model = UNet.from_pretrained(well_model_name)
        else:
            raise ValueError(f"Unknown model type: {well_model_name}")
        
        # Create metadata for PhysicsNeMo optimizations
        meta = ModuleMetaData(
            name=well_model_name.replace("polymathic-ai/", "")
        )
        
        # Create PhysicsNeMo model
        return cls(well_model.model, metadata)