# REMIR Light Curve Extraction

Extract multi-band light curves from REMIR pipeline photometry output files.

## How it works

1. Scans all `*_photometry.txt` files inside the `reduced/` folder
2. Parses every header field (OBJECT, DATE-OBS, EXPTIME, filter, image type, zeropoint, RMS, quality flags, limiting magnitude, calibration star count, rejection fraction)
3. Cross-matches the source list against your target coordinates within a configurable radius
4. Reads all source columns including instrumental/catalog magnitudes and quality flags (isolated, crowded, border)
5. Classifies each epoch as **detection** or **upper limit** (limiting magnitude)
6. Outputs a time-sorted CSV and DataFrame ready for plotting

## Input modes

**Single night** — point `INPUT_FOLDER` directly to the pipeline output folder:

```python
INPUT_FOLDER = "/path/to/20260115/proc"
RECURSIVE    = False
```

Expected structure:

```
proc/
└── reduced/
    ├── OGLE-0204_334100_1_H_astro_photometry.txt
    ├── OGLE-0204_334100_2_J_astro_photometry.txt
    └── ...
```

With the usual pipeline run (assuming you are in the right folder):

```bash
python remirpipe.py -i . -o proc -s -v -co (-t OBJECT_NAME)
```

**Multiple nights** — point `INPUT_FOLDER` to the parent and enable recursion:

```python
INPUT_FOLDER = "/path/to/all_nights"
RECURSIVE    = True
```

Expected structure:

```
all_nights/
├── 20260115/proc/reduced/*.txt
├── 20260116/proc/reduced/*.txt
└── ...
```

Batch-process many nights first:

```bash
cd all_nights/
for dir in */; do
    echo "========== Processing: $dir =========="
    python remirpipe.py -i "$dir" -o "$dir"/proc -s -v -co (-t OBJECT_NAME)
done
```

## Source matching

For each photometry file the script finds the **closest source** within `TOLERANCE_ARCSEC` of the target coordinates. All 11 columns from the photometry catalog are read:

| Column | Index | Description |
|--------|-------|-------------|
| `ra` | 0 | Right Ascension [deg] |
| `dec` | 1 | Declination [deg] |
| `x` | 2 | Pixel X position |
| `y` | 3 | Pixel Y position |
| `mag_inst` | 4 | Instrumental magnitude |
| `e_mag_inst` | 5 | Instrumental magnitude error |
| `mag_cat` | 6 | 2MASS catalog magnitude (`-99` if no match) |
| `e_mag_cat` | 7 | 2MASS catalog magnitude error (`-99` if no match) |
| `mag_cal` | 8 | Calibrated magnitude (`mag_inst + ZP`) |
| `e_mag_cal` | 9 | Calibrated magnitude error |
| `flag` | 10 | Quality flag |

### Quality flags

| Flag | Label | Meaning |
|------|-------|---------|
| 0 | `isolated+central` | Best quality — isolated source, away from edges |
| 1 | `crowded` | Nearby neighbours within minimum separation |
| 2 | `border` | Close to image edge |
| 3 | `crowded+border` | Both crowded and near edge — worst quality |

The flag is stored in the output as `flag` and two boolean convenience columns:

- `is_crowded` — `True` if flag is 1 or 3
- `is_border` — `True` if flag is 2 or 3

## Image type

The pipeline tags each photometry file with the image type:

- **`COADD`** — co-added image (inverse-variance weighted mean of N aligned frames). The number of frames is stored in `ncoadd`.
- **`SINGLE`** — individual aligned frame (one dither position).

This lets you filter the light curve to use only coadds (deeper, better S/N) or include single frames (more time resolution).

## Detection vs upper limit logic

```
Source found within TOLERANCE_ARCSEC?
├── YES ─── e_mag_cal ≤ MAX_MAG_ERROR? ─── YES → DETECTION
│                                      └── NO  → UPPER LIMIT (Limiting_mag)
└── NO  ─── Limiting_mag available? ─── YES → UPPER LIMIT (Limiting_mag)
                                    └── NO  → SKIPPED (not in output)
```

Upper limits have `is_detection = False`, `mag_cal = Limiting_mag`, `e_mag_cal = 0.0`.

## Output columns

### From the header

| Column | Description |
|--------|-------------|
| `OBJECT` | Target name |
| `DATE_OBS` | Observation timestamp (ISO format) |
| `EXPTIME` | Exposure time [s] |
| `Filter` | Photometric band (J / H / K) |
| `image_type` | `COADD` or `SINGLE` |
| `ncoadd` | Number of coadded frames (only for coadds) |
| `Zeropoint` | Photometric zeropoint [mag] |
| `ZP_err` | Zeropoint uncertainty [mag] |
| `RMS_residuals` | Zeropoint fit RMS [mag] |
| `RMS_quality` | VERY GOOD / GOOD / MEDIUM / POOR / VERY POOR |
| `ZP_check` | Zeropoint consistency flag |
| `Calibration_stars` | Number of 2MASS stars used |
| `Stars_rejected_frac` | Fraction rejected (e.g. 0.036 = 3.6%) |
| `Rejection_quality` | GOOD / MEDIUM / POOR |
| `Limiting_mag` | 3σ limiting magnitude [mag] |

### From the source match

| Column | Description |
|--------|-------------|
| `mag_cal` | Calibrated magnitude (or limiting mag for upper limits) |
| `e_mag_cal` | Magnitude error (0.0 for upper limits) |
| `mag_inst` | Instrumental magnitude |
| `e_mag_inst` | Instrumental magnitude error |
| `mag_cat` | 2MASS catalog magnitude (`-99` if unmatched) |
| `e_mag_cat` | 2MASS catalog error (`-99` if unmatched) |
| `x`, `y` | Pixel position on image |
| `flag` | Quality flag (0/1/2/3) |
| `is_crowded` | `True` if crowded (flag 1 or 3) |
| `is_border` | `True` if near border (flag 2 or 3) |
| `is_detection` | `True` = detection, `False` = upper limit |
| `separation_arcsec` | Distance to target [arcsec] |

### Computed time columns

| Column | Description |
|--------|-------------|
| `datetime` | Python datetime object |
| `MJD` | Modified Julian Date |
| `MJD_rel` | Days since first epoch |
| `hours_rel` | Hours since first epoch |

In [None]:
"""
Extract a light curve from REMIR pipeline photometry files.

Searches *_photometry.txt in reduced/ folders, cross-matches each
source list against a target position, and returns a time-sorted DataFrame.

Set RECURSIVE = False for a single night folder, True for multi-night trees.
Optionally filters by OBJECT name(s) from the photometry header.
"""

import os
import glob

import numpy as np
import pandas as pd
from astropy.time import Time


# =============================================================================
# Configuration — edit these for your target
# =============================================================================

INPUT_FOLDER     = "path/to/your/folder"   # single night: the proc/ dir
                                            # multi night:  the parent dir
RECURSIVE        = False                    # False = single folder
                                            # True  = recurse into subfolders

TARGET_RA        = 255.70578               # target Right Ascension  [deg]
TARGET_DEC       = -48.78975               # target Declination      [deg]
TOLERANCE_ARCSEC = 1.0                     # cross-match radius      [arcsec]
MAX_MAG_ERROR    = 0.33                    # errors above this → upper limit
OUTPUT_CSV       = "photometry_results.csv"

# Filter by OBJECT keyword.
#   TARGET_OBJECTS = ["GX_339-4"]           → keep only this object
#   TARGET_OBJECTS = ["SN2026acd", "M31"]   → keep two objects
#   TARGET_OBJECTS = None                   → accept everything
TARGET_OBJECTS   = None

# Filter by image type.
#   IMAGE_TYPES = ["COADD"]                → coadds only (deeper, fewer points)
#   IMAGE_TYPES = ["SINGLE"]               → single frames only (more time resolution)
#   IMAGE_TYPES = ["COADD", "SINGLE"]      → both
#   IMAGE_TYPES = None                     → accept everything (same as both)
IMAGE_TYPES      = None


# =============================================================================
# Parser
# =============================================================================

def parse_photometry_file(filepath, target_ra, target_dec,
                          tolerance_arcsec, max_mag_error=0.33,
                          target_objects=None, image_types=None):
    """Parse one *_photometry.txt and return metadata + matched photometry.

    Returns None if the file cannot be read, has no OBJECT match (when
    *target_objects* is set), image type doesn't match (when *image_types*
    is set), or contains no usable photometry.
    """
    rec = dict(
        filename=os.path.basename(filepath),
        # ── header fields ──
        OBJECT=None, DATE_OBS=None, EXPTIME=None, Filter=None,
        image_type=None, ncoadd=None,
        Zeropoint=None, ZP_err=None, RMS_residuals=None, RMS_quality=None,
        ZP_check=None, Calibration_stars=None, Stars_rejected_frac=None,
        Rejection_quality=None, Limiting_mag=None,
        # ── source match fields ──
        x=None, y=None,
        mag_inst=None, e_mag_inst=None,
        mag_cat=None, e_mag_cat=None,
        mag_cal=None, e_mag_cal=None,
        flag=None, is_crowded=False, is_border=False,
        is_detection=False, separation_arcsec=None,
    )

    try:
        with open(filepath, "r") as fh:
            lines = fh.readlines()
    except OSError:
        return None

    # ── header metadata ──────────────────────────────────────────────────
    for line in lines:
        if not line.startswith("#"):
            break
        if "OBJECT:" in line:
            rec["OBJECT"] = line.split("OBJECT:")[1].strip()
        elif "DATE-OBS:" in line:
            rec["DATE_OBS"] = line.split("DATE-OBS:")[1].strip()
        elif "EXPTIME:" in line:
            try:
                rec["EXPTIME"] = float(
                    line.split("EXPTIME:")[1].replace("s", "").strip())
            except ValueError:
                pass
        elif "Filter:" in line:
            rec["Filter"] = line.split("Filter:")[1].strip()
        elif "Image type:" in line:
            itype = line.split("Image type:")[1].strip()
            if itype.startswith("COADD"):
                rec["image_type"] = "COADD"
                try:
                    rec["ncoadd"] = int(
                        itype.split("(")[1].split("frame")[0].strip())
                except (ValueError, IndexError):
                    pass
            else:
                rec["image_type"] = "SINGLE"
        elif "Zeropoint:" in line and "+/-" in line:
            try:
                zp, ze = line.split("Zeropoint:")[1].strip().split("+/-")
                rec["Zeropoint"] = float(zp.strip())
                rec["ZP_err"] = float(ze.replace("mag", "").strip())
            except (ValueError, IndexError):
                pass
        elif "RMS residuals:" in line:
            try:
                rec["RMS_residuals"] = float(
                    line.split("RMS residuals:")[1].replace("mag", "").strip())
            except ValueError:
                pass
        elif "RMS quality:" in line:
            rec["RMS_quality"] = line.split("RMS quality:")[1].strip()
        elif "ZP_check:" in line:
            rec["ZP_check"] = line.split("ZP_check:")[1].strip()
        elif "Calibration stars:" in line:
            try:
                rec["Calibration_stars"] = int(
                    line.split("Calibration stars:")[1].strip())
            except ValueError:
                pass
        elif "Stars rejected:" in line:
            try:
                pct = line.split("(")[1].split("%")[0]
                rec["Stars_rejected_frac"] = float(pct) / 100.0
            except (ValueError, IndexError):
                pass
        elif "Rejection quality:" in line:
            rec["Rejection_quality"] = line.split("Rejection quality:")[1].strip()
        elif "MagLim" in line:
            try:
                rec["Limiting_mag"] = float(
                    line.split(":")[1].replace("mag", "").strip())
            except ValueError:
                pass

    # ── OBJECT filter (early exit) ───────────────────────────────────────
    if target_objects and rec["OBJECT"] not in target_objects:
        return None

    # ── Image type filter (early exit) ───────────────────────────────────
    if image_types and rec["image_type"] not in image_types:
        return None

    # ── find closest source within tolerance ─────────────────────────────
    cos_dec  = np.cos(np.radians(target_dec))
    best_sep = np.inf

    for line in lines:
        if line.startswith("#"):
            continue
        cols = line.strip().split()
        if len(cols) < 11:
            continue
        try:
            ra  = float(cols[0])
            dec = float(cols[1])
            sep = np.hypot((ra - target_ra) * cos_dec * 3600,
                           (dec - target_dec) * 3600)
            if sep < tolerance_arcsec and sep < best_sep:
                best_sep                 = sep
                rec["x"]                 = float(cols[2])
                rec["y"]                 = float(cols[3])
                rec["mag_inst"]          = float(cols[4])
                rec["e_mag_inst"]        = float(cols[5])
                rec["mag_cat"]           = float(cols[6])
                rec["e_mag_cat"]         = float(cols[7])
                rec["mag_cal"]           = float(cols[8])
                rec["e_mag_cal"]         = float(cols[9])
                rec["flag"]              = int(cols[10])
                rec["is_crowded"]        = rec["flag"] in (1, 3)
                rec["is_border"]         = rec["flag"] in (2, 3)
                rec["is_detection"]      = True
                rec["separation_arcsec"] = sep
        except (ValueError, IndexError):
            continue

    # ── demote noisy detections to upper limits ──────────────────────────
    if rec["is_detection"] and rec["e_mag_cal"] > max_mag_error:
        rec["is_detection"] = False

    # ── fill non-detections with limiting magnitude ──────────────────────
    if not rec["is_detection"] and rec["Limiting_mag"] is not None:
        rec["mag_cal"]   = rec["Limiting_mag"]
        rec["e_mag_cal"] = 0.0

    return rec


# =============================================================================
# Collect photometry files
# =============================================================================

if RECURSIVE:
    pattern = os.path.join(INPUT_FOLDER, "**", "reduced", "*_photometry.txt")
    phot_files = sorted(glob.glob(pattern, recursive=True))
else:
    pattern = os.path.join(INPUT_FOLDER, "reduced", "*_photometry.txt")
    phot_files = sorted(glob.glob(pattern))

print(f"Target : RA = {TARGET_RA:.6f}°  DEC = {TARGET_DEC:.6f}°  "
      f"tolerance = {TOLERANCE_ARCSEC}″")
print(f"Mode   : {'recursive' if RECURSIVE else 'single folder'}")
if TARGET_OBJECTS:
    print(f"OBJECT filter : {TARGET_OBJECTS}")
if IMAGE_TYPES:
    print(f"Image type    : {IMAGE_TYPES}")
print(f"Found {len(phot_files)} photometry files\n")


# =============================================================================
# Parse and cross-match
# =============================================================================

rows = []
for fp in phot_files:
    rec = parse_photometry_file(fp, TARGET_RA, TARGET_DEC,
                                TOLERANCE_ARCSEC, MAX_MAG_ERROR,
                                TARGET_OBJECTS, IMAGE_TYPES)
    if rec and rec["DATE_OBS"] and rec["mag_cal"] is not None:
        rows.append(rec)

df = pd.DataFrame(rows)


# =============================================================================
# Add time columns and save
# =============================================================================

df["datetime"]  = pd.to_datetime(df["DATE_OBS"])
df["MJD"]       = df["DATE_OBS"].apply(
    lambda x: Time(x, format="isot").mjd if x else np.nan)
df["MJD_rel"]   = df["MJD"] - df["MJD"].min()       # days since first epoch
df["hours_rel"] = df["MJD_rel"] * 24.0                # hours since first epoch

df = df.sort_values("MJD").reset_index(drop=True)
df.to_csv(OUTPUT_CSV, index=False)


# =============================================================================
# Summary
# =============================================================================

n_det = df["is_detection"].sum()
print(f"Detections   : {n_det}")
print(f"Upper limits : {len(df) - n_det}")
print(f"Objects      : {sorted(df['OBJECT'].dropna().unique())}")
print(f"Filters      : {sorted(df['Filter'].dropna().unique())}")

if "image_type" in df.columns and df["image_type"].notna().any():
    n_coadd  = (df["image_type"] == "COADD").sum()
    n_single = (df["image_type"] == "SINGLE").sum()
    print(f"Image types  : {n_coadd} coadds, {n_single} single frames")

print(f"Time span    : MJD {df['MJD'].min():.4f} – {df['MJD'].max():.4f}  "
      f"({df['MJD_rel'].max():.2f} days)")
print(f"Saved to     : {OUTPUT_CSV}\n")

# Flags summary
if "flag" in df.columns and df["flag"].notna().any():
    det = df[df["is_detection"]]
    n_clean  = (det["flag"] == 0).sum()
    n_crowd  = det["is_crowded"].sum()
    n_border = det["is_border"].sum()
    print(f"Detection quality flags:")
    print(f"  Isolated+central (flag=0)  : {n_clean}")
    print(f"  Crowded          (flag 1/3): {n_crowd}")
    print(f"  Border           (flag 2/3): {n_border}\n")

display(df[["datetime", "MJD", "MJD_rel", "OBJECT", "Filter",
            "image_type", "ncoadd", "EXPTIME",
            "mag_cal", "e_mag_cal", "is_detection", "flag",
            "is_crowded", "is_border", "RMS_quality"]])