# Testing new precision with the Megatron-LM

In this tutorial we present how to test new precision/recipe using Megatron-LM and Nvidia-DLFramework-Inspect. 

[Megatron-LM](https://github.com/NVIDIA/Megatron-LM) is a large-scale transformer model framework developed by NVIDIA for training natural language processing (NLP) models with billions of parameters. It is designed to optimize both model training efficiency and scalability, allowing researchers and developers to push the limits of NLP capabilities.

We will show how to test ideas for new precisions/recipes in few simple steps.


Consider some example precisions/recipe for training GPT-like Transformer:

1. Each weight to the GEMM is casted to -1/0/1 with scaling factor. For the sake of this tutorial, suppose we have implemented function `utils.zero_one_cast(x: torch.Tensor) -> (torch.Tensor, float)` which returns `-1/0/1` tensor and scaling factor as a float. 
2. Each input is casted to 4E3M FP8 precision.

We will present how to <b>emulate</b> behaviour of such recipe - tensors will be casted to this precisions and then casted back to high precision. Then GEMM will be done also in high precision.

Moreover let's suppose we want to test few scenarios:
1. Use new precision for both backward and forward.
2. Use new precision only for forward and use FP8 for backward.
3. Use new precision for both forward and backward, but one in every 5 consecutive layers will be run in high precision.

We will present how to implement it using the Transformer Engine.

#### Feature class implementation

Let's look how feature `FakeFP8Cast` looks like.

```python

@Registry.register_feature(namespace="transformer_engine")
@append_parent_docstring(parent=TEConfigAPIMapper)
class FakeQuantFp8(TEConfigAPIMapper):
    # (...)
    
    @api_method
    def fp8_gemm(self, config, layer_name, gemm, **kwargs):
        return False

    @api_method
    def process_tensor(self, config, layer_name, gemm, **kwargs):
        # (...)
        
        quant_format = config["quant_format"]
        margin = config.get('margin', self._get_margin_default())
        q_tensor = fake_quantize_fp8(kwargs["tensor"], quant_format, margin=margin)
        return q_tensor

```

We will make something similar - but using `utils.zero_one_cast` in the case of the weight:
```python

@Registry.register_feature(namespace="transformer_engine")
@append_parent_docstring(parent=TEConfigAPIMapper)
class NewRecipe(TEConfigAPIMapper):
    # (...)
    
    @api_method
    def fp8_gemm(self, config, layer_name, gemm, tensor_name, **kwargs):
        return False

    @api_method
    def process_tensor(self, config, layer_name, tensor_name, gemm, **kwargs):
        # (...)
        if tensor_name == "weight:
            return utils.zero_one_cast(kwargs["tensor"])
        else:
            quant_format = config["quant_format"]
            margin = config.get('margin', self._get_margin_default())
            q_tensor = fake_quantize_fp8(kwargs["tensor"], quant_format, margin=margin)
            return q_tensor
```

Suppose that our feature is saved in the dir `/path/to/feature/new_precision.py`.

#### Integration with the Megatron-LM

We have succesfully defined our feature, which disabled FP8 GEMM and runs high precision GEMM with fake-casted 
tensors, emulating new precision.

Now, let's look how to use our recipe with Megatron-LM training.
Let's begin with preparing some `config.yaml` file to make experiments in different scenarios.

```yaml
Experiment1:
  enabled: True # Experiment 1 is now enabled, one needs to change it manually to enable other experiment
  layers:
    layer_name_regex_pattern: '*'
  transformer_engine:
    new_recipe:
      enabled: True
      gemms: [fprop, dgrad, wgrad] # forward and backward
Experiment2:
  enabled: False
  layers:
    layer_name_regex_pattern: '*'
  transformer_engine:
    new_recipe:
      enabled: True
      gemms: [fprop] # forward
Experiment3:
  enabled: False
  layers:
    layer_name_regex_pattern: '*[1|2|3|4|6|7|8|9]' # four of every 5 layers
  transformer_engine:
    new_recipe:
      enabled: True
      gemms: [fprop, dgrad, wgrad] # forward and backward

```

1. Provide path to directory containing our feature using `NV_DEBUG_TOOL_FEATURE_DIRS` environment variable.
```bash
export NV_DEBUG_TOOL_FEATURE_DIRS="/path/to/feature"
```

2. Provide the path to a config using the `NV_DEBUG_TOOL_CONFIG` environment variable. If config is not provided, it will run with default setting.
```bash
export NV_DEBUG_TOOL_CONFIG="/path/to/config.yaml"
```

3. Use the `NV_DEBUG_TOOL_ENABLED` environment variable to toggle debug mode in Megatron-LM.
```sh
export NV_DEBUG_TOOL_ENABLED=1
```
4. Run Megatron-LM script with model you want to train. That's it!