### Extracting Solar Flare Events from 2014

This code filters a CSV file to include only solar flare events that occurred in the year 2014 and saves the result to a new file. It also ensures the output directory exists before writing the file.

---

**Key Steps:**

1. **Import Libraries**  
   `pandas` is used for data manipulation, and `os` is used to create directories if they don't exist.

2. **Ensure Output Directory Exists**  
   Before saving the filtered file, the script checks if the directory exists and creates it if needed.

3. **Load the CSV File**  
   The input file is read with date parsing enabled for the `event_starttime` and `event_endtime` columns.

4. **Filter by Year (2014)**  
   The dataset is filtered to retain only rows where the flare event started in the year 2014.

5. **Save the Filtered Data**  
   The filtered DataFrame is saved to the specified output path without row indices.

---

In [None]:
import pandas as pd
import os

def extract_2014_events(csv_path, output_path):
    
    output_dir = os.path.dirname(output_path)
    os.makedirs(output_dir, exist_ok=True)
    
    # Load the data and parse the dates
    df = pd.read_csv(csv_path, parse_dates=['event_starttime', 'event_endtime'])

    # Filter for all of 2014
    events_2014 = df[df['event_starttime'].dt.year == 2014]

    # Save to new CSV
    events_2014.to_csv(output_path, index=False)
    print(f" Saved {len(events_2014)} events from 2014 to {output_path}")

# Example usage
extract_2014_events(
    "flare_summary_final/flare_hek_peaks_data.csv",
    "flare_summary_final/flare_selection/flare_hek_peaks_data_2014.csv"
)

### Extracting and Processing the Strongest Daily Flares with JSOC-Compatible Recordsets

This function identifies the strongest solar flare per day from a CSV dataset, filters by optional instrument criteria, computes observation windows with buffer times, and generates JSOC-compatible recordset strings for each event. It also ensures that output directories exist and prints the resulting recordsets to the screen.

---

**Key Features:**

1. **Directory Creation**  
   Ensures the target directory for the output file exists using `os.makedirs`.

2. **Data Loading**  
   Loads the input CSV and parses the `event_starttime` and `event_endtime` columns as datetime objects.

3. **Peak Flux Cleaning**  
   Converts `fl_peakflux` to numeric and removes invalid or missing values.

4. **Flexible Filtering** *(Optional)*  
   Allows user-defined filtering by:
   - `obs_observatory` (e.g., "SDO")
   - `obs_instrument` (e.g., "AIA")
   - `obs_channelid` (e.g., "131", "94")

5. **Daily Selection**  
   Groups flares by date and selects the one with the highest peak flux for each day.

6. **Buffer Times and JSOC-Compatible Windows**  
   Adds pre-flare and post-flare buffer windows (default = 178 minutes) and rounds them to the nearest minute for compatibility with JSOC data queries.

7. **Duration and Recordset String Construction**  
Calculates duration for each event and builds a JSOC recordset string in the format:

    `aia.lev1_euv_12s[START_TIME/DURATIONm][WAVELENGTH]`

8. **Column Reordering**  
Moves all key time columns to the front of the CSV for readability:

    `jsoc_start_time, pre_flare_start_time, event_starttime, event_peaktime, event_endtime, post_flare_end_time, jsoc_end_time`



9. **Output**  
- Saves the result to CSV.
- Prints each generated JSOC recordset to the screen.

In [None]:
import pandas as pd
import os
from datetime import timedelta

def extract_strongest_flares_per_day(
    csv_path,
    output_path,
    observatory="any",
    instrument="any",
    wavelengths=None,
    buffer_minutes=178,
    wavelength="131"
):
    # Ensure output directory exists
    os.makedirs(os.path.dirname(output_path), exist_ok=True)

    # Load data
    df = pd.read_csv(csv_path, parse_dates=['event_starttime', 'event_endtime'])

    # Ensure fl_peakflux is numeric
    df['fl_peakflux'] = pd.to_numeric(df['fl_peakflux'], errors='coerce')
    df = df.dropna(subset=['fl_peakflux'])

    # Apply filters
    if observatory.lower() != "any":
        df = df[df['obs_observatory'] == observatory]
    if instrument.lower() != "any":
        df = df[df['obs_instrument'] == instrument]
    if wavelengths:
        df = df[df['obs_channelid'].astype(str).isin([str(w) for w in wavelengths])]

    # Extract flare date
    df['flare_date'] = df['event_starttime'].dt.date

    # Strongest per day
    strongest = df.loc[df.groupby('flare_date')['fl_peakflux'].idxmax()].copy()
    strongest = strongest.drop(columns=['flare_date'])

    # Compute JSOC-compatible buffer times
    strongest['pre_flare_start_time'] = strongest['event_starttime'] - timedelta(minutes=buffer_minutes)
    strongest['post_flare_end_time'] = strongest['event_endtime'] + timedelta(minutes=buffer_minutes)
    strongest['jsoc_start_time'] = strongest['pre_flare_start_time'].dt.floor('min')
    strongest['jsoc_end_time'] = strongest['post_flare_end_time'].dt.floor('min')

    '''
    # Calculate duration and recordsets
    strongest['duration'] = strongest['jsoc_end_time'] - strongest['jsoc_start_time']
    strongest['recordset'] = strongest.apply(lambda row: (
        f"aia.lev1_euv_12s[{row['jsoc_start_time'].strftime('%Y-%m-%dT%H:%M:%S')}/"
        f"{int(row['duration'].total_seconds() // 60)}m][{wavelength}]"
    ), axis=1)
    '''
    # Calculate duration and recordsets
    strongest['duration'] = strongest['jsoc_end_time'] - strongest['jsoc_start_time']
    strongest['duration'] = strongest['duration'].dt.total_seconds() // 60  # <-- This updates the column to minutes
    strongest['recordset'] = strongest.apply(lambda row: (
        f"aia.lev1_euv_12s[{row['jsoc_start_time'].strftime('%Y-%m-%dT%H:%M:%S')}/"
        f"{int(row['duration'])}m][{wavelength}]"
    ), axis=1)


    # Ensure proper ordering of time columns at the front
    time_cols = [
        "jsoc_start_time",
        "pre_flare_start_time",
        "event_starttime",
        "event_peaktime" if "event_peaktime" in strongest.columns else None,
        "event_endtime",
        "post_flare_end_time",
        "jsoc_end_time"
    ]
    time_cols = [col for col in time_cols if col]  # Remove None if peaktime is missing
    other_cols = [col for col in strongest.columns if col not in time_cols]
    reordered = time_cols + other_cols
    strongest = strongest[reordered]

    # Save to CSV
    strongest.to_csv(output_path, index=False)
    print(f" Saved strongest flares per day with JSOC recordsets to: {output_path}")

    # Print recordsets
    print("\n JSOC Recordsets:\n")
    for r in strongest['recordset']:
        print(r)

# Example usage
extract_strongest_flares_per_day(
    csv_path="flare_summary_final/flare_selection/flare_hek_peaks_data_2014.csv",
    output_path="flare_summary_final/flare_selection/strongest_flares_2014_SDO_AIA_131.csv",
    observatory="SDO",
    instrument="AIA",
    wavelengths=["131"],
    buffer_minutes=178,
    wavelength="131"
)