# MetaPulsar Usage Example

This notebook demonstrates how to use MetaPulsar to combine pulsar timing data from multiple PTA collaborations into unified Enterprise pulsar objects.

## Overview

The workflow covers:
1. **Manual single-pulsar data preparation** - Creating data structures by hand
2. **MetaPulsar creation** - Using consistent parameter merging strategies
3. **Enterprise integration** - Working with the resulting pulsar objects
4. **Automated discovery** - Processing multiple pulsars automatically
5. **Reference PTA selection** - Different strategies for choosing reference PTAs

## Key Concepts

- **Reference PTA**: The PTA whose parameters are inherited by the MetaPulsar for merged model components
- **Consistent Strategy**: Merges compatible parameters from different PTAs where possible
- **Component Merging**: Controls which parameter types (astrometry, spindown, binary, dispersion) are merged
- **Parameter Naming**: Merged parameters have no suffix, PTA-specific parameters retain PTA suffixes


In [None]:
import loguru
import sys
from metapulsar import (
    create_metapulsar,
    discover_files,
    discover_layout,
    combine_layouts,
)

# Suppress debug output for cleaner example
loguru.logger.remove()
loguru.logger.add(sys.stdout, level="WARNING")


## Part 1: Manual Single-Pulsar Data Preparation

For single pulsars, manually creating the data structure is often the most transparent and flexible approach. This gives you full control over which PTAs to include and their ordering. It is a bit of work though, so we will go over more convenient methods later

### Data Structure

The data structure is a dictionary where:
- **Keys**: PTA names (e.g., 'nanograv_9y', 'epta_dr2')
- **Values**: Lists of file information dictionaries
- **Reference PTA**: The first PTA in the dictionary becomes the reference PTA

### File Information Fields
- `par`: Path to .par file (pulsar parameters)
- `tim`: Path to .tim file (timing observations)  
- `timing_package`: Software used (tempo2/pint)


In [None]:
# Manually create a single-pulsar dictionary with three PTAs
# The reference PTA (first in the dictionary) will be used for parameter
# inheritance where appropriate
pulsar_data = {
    # Reference PTA - parameters from this PTA will be inherited by the MetaPulsar
    # for model components that are merged (astrometry, spindown, binary, dispersion)
    "epta_dr2": [
        {
            "par": "../../data/ipta-dr2/EPTA_v2.2/J0613-0200/J0613-0200.par",
            "tim": "../../data/ipta-dr2/EPTA_v2.2/J0613-0200/J0613-0200_all.tim",
            "timing_package": "tempo2",  # Timing package used
        }
    ],
    "nanograv_9y": [
        {
            "par": "../../data/ipta-dr2/NANOGrav_9y/par/J0613-0200_NANOGrav_9yv1.gls.par",
            "tim": "../../data/ipta-dr2/NANOGrav_9y/tim/J0613-0200_NANOGrav_9yv1.tim",
            "timing_package": "pint",
        }
    ],
    "ppta_dr2": [
        {
            "par": "../../data/ipta-dr2/PPTA_dr1dr2/par/J0613-0200_dr1dr2.par",
            "tim": "../../data/ipta-dr2/PPTA_dr1dr2/tim/J0613-0200_dr1dr2.tim",
            "timing_package": "tempo2",
        }
    ],
}


### MetaPulsar Creation with Consistent Strategy

The `consistent` strategy merges parameters from different PTAs where possible, inheriting values from the reference PTA for merged model components. While MetaPulsar uses both libstempo/tempo2 and PINT under the hood simultaneously, it uses the full timing model parsing capabilities of PINT under the hood to decide how to merge the timing models. Model components and parameters that PINT does not know about are automatically preserved and treated as 'detector-specific'.

### Combination Strategy Options
- `consistent`: Merge compatible parameters, inherit reference PTA values
- `composite`: Keep all parameters separate with PTA suffixes

### Default mergeable Components
- `astrometry`: Position, proper motion, parallax
- `spindown`: Spin frequency and derivatives
- `binary`: Binary orbital parameters (may also use `pulsar_system`)
- `dispersion`: Dispersion measure and derivatives

Any other PINT TimingModel `category` is possible to use in principle, but the above components make astrophysical sense and are included by default.

#### Consistent is the only ''valid'' method
When creating a `consistent` MetaPulsar, the pulsar really only has a single sky position, binary model, and other relevant Astrophysical models. It's a truly unified model. Instead, the `composite` combination method keeps all parameters separate, which is not physical. Each pulsar dataset from each PTA then has separate model parameters, which is not physical. That approach is sometimes called the ''Borg method'' or the ''Frankenstat'' method in the community.

In [None]:
# Create MetaPulsar using the 'consistent' strategy
# This merges parameters from different PTAs where possible
# Both PINT and libstempo are run simultaneously under the hood
# We run libstempo in 'sandbox' mode, so our kernel does not crash
metapulsar = create_metapulsar(
    file_data=pulsar_data,
    combination_strategy="consistent",  # Merge compatible parameters
    combine_components=[
        "astrometry",
        "spindown",
        "binary",
        "dispersion",
    ],  # Components to merge
    parfile_output_dir='./parfiles',
    add_dm_derivatives=True,  # Ensure DM1, DM2 are present
)

print(f"Created MetaPulsar: {metapulsar.name}")
print(f"Reference PTA: {list(pulsar_data.keys())[0]}")
print("Combination strategy: consistent")
print("Components merged: astrometry, spindown, binary, dispersion")


## Part 2: MetaPulsar is an Enterprise Pulsar

The resulting MetaPulsar is a fully functional Enterprise pulsar with all standard attributes. It combines data from multiple PTAs into a single unified object.

### Key Features
- **Combined dataset**: All observations combined
- **Merged Astrophysics**: Only a single Astrophysical model
- **PTA-Specific Parameters**: Detector-specific parameters retain PTA suffixes
- **Full Enterprise Compatibility**: Works with all Enterprise analysis tools


In [None]:
# The resulting MetaPulsar is an Enterprise pulsar with all standard attributes
print("MetaPulsar Enterprise attributes:")
print(f"  Name: {metapulsar.name}")
print(f"  Number of pulsars: {len(metapulsar._pulsars)}")
print(f"  PTA names: {list(metapulsar._pulsars.keys())}")
print(f"  Combination strategy: {metapulsar.combination_strategy}")
print(f"  Model components merged: {metapulsar.combine_components}")

# Show some basic Enterprise pulsar attributes
print("\nEnterprise pulsar attributes:")
print(f"  Number of TOAs: {len(metapulsar.toas)}")
print(
    f"  Frequency range: {metapulsar.freqs.min():.2f} - {metapulsar.freqs.max():.2f} MHz"
)
print(f"  Time span: {(metapulsar.toas.max() - metapulsar.toas.min())/86400.0:.1f} days")

print(
    "\nThe MetaPulsar combines data from multiple PTAs into a single Enterprise pulsar."
)
print(
    f"Merged parameters inherit values from the reference PTA ({list(pulsar_data.keys())[0]})."
)
print("PTA-specific parameters retain their original PTA-specific values.")


### Parameter Naming Conventions

MetaPulsar uses a clear naming convention to distinguish between merged and PTA-specific parameters:

- **Merged parameters**: No suffix, inherit reference PTA values
- **PTA-specific parameters**: Retain PTA suffix (e.g., `_nanograv_9y`, `_epta_dr2`)

This allows you to easily identify which parameters are shared across PTAs and which are specific to individual PTAs.


In [None]:
# Demonstrate parameter naming conventions
print("Parameter naming conventions:")
print("Merged parameters (no suffix):")
fitparams = metapulsar.fitpars
# Get PTA names from our data structure
pta_names = list(pulsar_data.keys())
pta_suffixes = [f"_{pta}" for pta in pta_names]

merged_params = [
    p for p in fitparams if not any(suffix in p for suffix in pta_suffixes)
]
for param in merged_params[:5]:  # Show first 5 merged parameters
    print(f"  {param}")

print("\nPTA-specific parameters (retain PTA suffix):")
pta_specific_params = [
    p for p in fitparams if any(suffix in p for suffix in pta_suffixes)
]
for param in pta_specific_params[:5]:  # Show first 5 PTA-specific parameters
    print(f"  {param}")

print("\nThis naming convention allows you to distinguish between:")
print("  - Merged parameters: Inherit reference PTA values, no suffix")
print("  - PTA-specific parameters: Retain original values, keep PTA suffix")


## Part 3: Automated Multi-Pulsar Processing

For processing multiple pulsars, manually creating data structures becomes cumbersome. MetaPulsar provides utility functions based on regex pattern matching for automation.

### Automated Workflow
1. **Discover layouts**: Automatically detect data release directory structures
2. **Combine layouts**: Merge discovered layouts with predefined patterns
3. **Discover files**: Find all pulsar files using pattern matching
4. **Create MetaPulsars**: Process discovered pulsars automatically (limited to 3 for performance)

### Reference PTA Selection Strategies
- **Auto-selection**: Choose PTA with longest timespan per pulsar
- **Global reference**: Use same PTA as reference for all pulsars
- **Manual**: Reorder PTAs for specific pulsars

**Note**: For demonstration purposes, we limit processing to the first 3 pulsars to keep execution time reasonable.


### Directory Layout Discovery

In [None]:
# Directory layout is different for each PTA data release
# We provide a regexp discovery engine that helps. It's not perfect,
# but it works for the PTA data releases we have seen so far.

epta_layout = discover_layout('../../data/ipta-dr2/EPTA_v2.2', name='EPTA dr2')
ppta_layout = discover_layout('../../data/ipta-dr2/PPTA_dr1dr2', name='PPTA dr1dr2')
nanograv_layout = discover_layout('../../data/ipta-dr2/NANOGrav_9y', name='NANOGrav 9y')

# Combine layouts with:
combined_layout = combine_layouts(epta_layout, ppta_layout, nanograv_layout)

In [None]:
# The pre-defined PTA data release regexp patterns are:
from metapulsar import PTA_DATA_RELEASES
PTA_DATA_RELEASES.keys(), PTA_DATA_RELEASES['epta_dr2']

#### File discovery
Those regexp patterns can be fed to the file discovery service. There are some default regexp expressions as well, but it's unlikely that they match the exact structure you have in your data release

In [None]:
file_data = discover_files(combined_layout)

In [None]:
# If we want to see a more detailed summary of the PTA data release, we can use the pta_summary function
from metapulsar import pta_summary

pta_summary(file_data)

# Filtering for pulsars

The file_data is just a dictionary with file path information. The pulsars are _not_ yet matched between PTAs. This is not done by name, but by coordinate. We provide some utility functions to see which pulsars are in this file structure

In [None]:
from metapulsar import get_pulsar_names_from_file_data, filter_file_data_by_pulsars

In [None]:

# Do coordinate-based pulsar matching between PTAs
pulsar_names = get_pulsar_names_from_file_data(file_data)

len(pulsar_names), pulsar_names[:5]

In [None]:
# We filter to only include 3 pulsars
# In this step we can choose both the 'B' and the 'J' name! The names are resolved automagically by coordinates. Honeybadger don't care.
pulsar_selection = ['B1855+09', 'J1939+2135', 'J0030+0451']

filtered_data = filter_file_data_by_pulsars(file_data, pulsar_selection)
filtered_data

# Part 4: Create MetaPulsars for all discovered pulsars

We can now create MetaPulsars for all discovered pulsars.
We can either auto-select a reference PTA for each pulsar, or use a specific PTA for all pulsars.

### Reference PTA
The reference PTA is the PTA for which the par-file values are taken as 'reference values'. While these parameters are marginalized over, we do need to gauge them and lock them together. This is done by choosing one of the PTAs, and using those values in the par files. This choice is usually not important for the outcome, especially for millisecond pulsars. However, it's wise to choose the most accurate model for it. Usually, the longest PTA dataset is a good bet.


In [None]:
from metapulsar import create_all_metapulsars, filter_file_data_by_pulsars

In [None]:
metapulsars = create_all_metapulsars(filtered_data, reference_pta=None) #='NANOGrav 9y')
# The default options are:
# combination_strategy="consistent",
# combine_components=["astrometry", "spindown", "binary", "dispersion"],
# add_dm_derivatives=True,

# The reference_pta=None option means that the reference PTA will be auto-selected for each pulsar based on observation timespans.

In [None]:
# Now we have a full list of MetaPulsars
# Pulsars by default are listed by their 'B' name if at least one of the PTAs uses that convention
metapulsars

In [None]:
metapulsars['B1855+09'].name