# Advanced Usage of reLAISS

This notebook demonstrates advanced features of the reLAISS library for finding similar astronomical transients.

## Topics Covered
1. Using PCA for dimensionality reduction
2. Creating theorized lightcurves
3. Swapping host galaxies
4. Setting maximum neighbor distances
5. Tweaking ANNOY parameters
6. Finding the optimal number of neighbors

In [None]:
import os
import pandas as pd
import numpy as np
import relaiss

# Create output directories
os.makedirs('./figures', exist_ok=True)
os.makedirs('./sfddata-master', exist_ok=True)

## 1. Using PCA for Dimensionality Reduction

PCA can be used to reduce the dimensionality of the feature space while preserving most of the variance. This can improve search speed and potentially reduce noise in the feature space.

In [None]:
# Initialize client with PCA
client = relaiss.ReLAISS()
client.load_reference(
    path_to_sfd_folder='./sfddata-master',
    use_pca=True,  # Enable PCA
    num_pca_components=20  # Keep 20 components
)

## 2. Creating Theorized Lightcurves

You can create simulated lightcurves for theoretical models and find similar real transients. This is useful for testing theoretical predictions or exploring parameter space.

In [None]:
# Create a theorized lightcurve (example)
theorized_lightcurve_df = pd.DataFrame({
    'ant_mjd': np.linspace(0, 100, 50),
    'ant_mag': np.random.normal(20, 1, 50),
    'ant_magerr': np.random.uniform(0.1, 0.3, 50),
    'ant_passband': ['R'] * 50
})

# Find neighbors using theorized lightcurve
neighbors_df = client.find_neighbors(
    ztf_object_id=None,  # No real transient
    theorized_lightcurve_df=theorized_lightcurve_df,
    n=5,
    plot=True,
    save_figures=True
)

## 3. Swapping Host Galaxies

You can swap the host galaxy of a transient with another one to see how it affects the similarity search. This is useful for studying the impact of host galaxy properties on transient classification.

In [None]:
# Find neighbors with swapped host
neighbors_df = client.find_neighbors(
    ztf_object_id='ZTF21aaublej',
    host_ztf_id='ZTF21aakqjqv',  # Different host galaxy
    n=5,
    plot=True,
    save_figures=True
)

## 4. Setting Maximum Neighbor Distances

You can set a maximum distance threshold to only return neighbors within a certain similarity range. This helps filter out less similar objects.

In [None]:
# Find neighbors within distance threshold
neighbors_df = client.find_neighbors(
    ztf_object_id='ZTF21aaublej',
    n=10,  # Request more neighbors
    max_neighbor_dist=5.0,  # Maximum allowed distance
    plot=True,
    save_figures=True
)

## 5. Tweaking ANNOY Parameters

You can adjust the ANNOY index parameters to balance between search speed and accuracy. The `search_k` parameter controls how thorough the search is, while `weight_lc_feats_factor` lets you adjust the relative importance of lightcurve features.

In [None]:
# Find neighbors with custom ANNOY parameters
neighbors_df = client.find_neighbors(
    ztf_object_id='ZTF21aaublej',
    n=5,
    search_k=2000,  # More thorough search
    weight_lc_feats_factor=1.5,  # Weight lightcurve features more heavily
    plot=True,
    save_figures=True
)

## 6. Finding the Optimal Number of Neighbors

reLAISS can help you determine the optimal number of neighbors to use by analyzing the distance curve. This is useful when you're not sure how many neighbors would be most appropriate for your analysis.

In [None]:
# Find optimal number of neighbors
neighbors_df = client.find_neighbors(
    ztf_object_id='ZTF21aaublej',
    n=20,  # Request more neighbors than needed
    suggest_neighbor_num=True,  # Enable neighbor number suggestion
    plot=True,  # Plot the distance curve
    save_figures=True
)

# The function returns a DataFrame with the neighbors AND suggests the optimal number
# The optimal number is printed to the console and shown in the plot
print("\nNearest Neighbors:")
print(neighbors_df)