# Solar Flare Analysis: Data Exploration

This notebook demonstrates how to load, visualize, and preprocess GOES XRS solar flare data.

## Setup and Imports

In [None]:
import os
import sys
import numpy as np
import pandas as pd
import xarray as xr
import matplotlib.pyplot as plt
from datetime import datetime, timedelta

# Add the project root to the path
project_root = os.path.abspath('..')
if project_root not in sys.path:
    sys.path.append(project_root)

# Import project modules
from src.data_processing.data_loader import load_goes_data, preprocess_xrs_data, remove_background
from src.visualization.plotting import plot_xrs_time_series

SyntaxError: source code string cannot contain null bytes (<string>)

## Loading GOES XRS Data

GOES XRS data is typically provided in NetCDF (.nc) format. Here we'll load a sample file.

### How to obtain GOES XRS data

You can download GOES XRS data from the NOAA NCEI repository:
- Visit: https://www.ncei.noaa.gov/data/goes-space-environment-monitor/access/
- Navigate to the directory for your desired GOES satellite (e.g., GOES-16)
- Find the XRS data files (e.g., avg1m, avg5m)

For this tutorial, we'll use a sample data file from the repository.

In [None]:
# Locate a sample GOES XRS data file
data_dir = settings.DATA_DIR
sample_files = [f for f in os.listdir(data_dir) if f.endswith('.nc')]

if not sample_files:
    print("No .nc files found. Please download GOES XRS data.")
    print("You can download GOES XRS data from: https://www.ncei.noaa.gov/data/goes-space-environment-monitor/access/")
else:
    print(f"Found {len(sample_files)} .nc files:")
    for i, file in enumerate(sample_files):
        print(f"  {i+1}. {file}")
    
    # Use the first file for demonstration
    data_file = os.path.join(data_dir, sample_files[0])
    print(f"\nUsing {data_file} for demonstration")

In [None]:
# Load the sample data file
data = load_goes_data(data_file)

# If no sample file was found, show how to download
if data is None:
    print("\nTo download GOES XRS data using Python, you can use the following code:")
    print("""
    import requests
    import os
    
    # Example URL for GOES-16 XRS 1-minute average data for June 1, 2022
    url = 'https://www.ncei.noaa.gov/data/goes-space-environment-monitor/access/avg1m/2022/06/goes16/csv/g16_xrs_avg1m_20220601_20220601.nc'
    
    # Download the file
    response = requests.get(url)
    if response.status_code == 200:
        os.makedirs(os.path.join('..', 'data'), exist_ok=True)
        with open(os.path.join('..', 'data', 'goes16_xrs_sample.nc'), 'wb') as f:
            f.write(response.content)
        print('Downloaded the file successfully')
    else:
        print(f'Failed to download: {response.status_code}')
    """)

## Exploring the Data Structure

Let's examine the structure of the GOES XRS data:

In [None]:
if data is not None:
    # Print dataset information
    print("Dataset information:")
    print(data.info())
    
    print("\nDataset dimensions:")
    for dim, size in data.dims.items():
        print(f"  {dim}: {size}")
    
    print("\nVariables:")
    for var in data.data_vars:
        print(f"  {var}: {data[var].shape} - {data[var].attrs.get('long_name', '')}")

## Preprocessing the Data

Now, let's preprocess the XRS data for both A and B channels:

In [None]:
if data is not None:
    # Process channel A (0.05-0.4 nm) data
    df_a = preprocess_xrs_data(data, channel='A', remove_bad_data=True, interpolate_gaps=True)
    
    # Process channel B (0.1-0.8 nm) data
    df_b = preprocess_xrs_data(data, channel='B', remove_bad_data=True, interpolate_gaps=True)
    
    # Display sample of preprocessed data
    print("Sample of preprocessed channel A data:")
    display(df_a.head())
    
    print("\nSample of preprocessed channel B data:")
    display(df_b.head())

## Visualizing the Data

Let's visualize both A and B channels of the XRS data:

In [None]:
if data is not None and 'df_a' in locals() and 'df_b' in locals():
    # Plotting XRS-A data
    fig_a = plot_xrs_time_series(df_a, 'xrsa', title='GOES XRS A (0.05-0.4 nm) Data', log_scale=True)
    plt.tight_layout()
    plt.show()
    
    # Plotting XRS-B data
    fig_b = plot_xrs_time_series(df_b, 'xrsb', title='GOES XRS B (0.1-0.8 nm) Data', log_scale=True)
    plt.tight_layout()
    plt.show()

## Removing Background Flux

Solar flare analysis requires removing the background solar flux to isolate flare events:

In [None]:
if data is not None and 'df_b' in locals():
    # Remove background flux from B channel data
    df_b_no_bg = remove_background(
        df_b, 
        window_size=settings.BACKGROUND_PARAMS['window_size'],
        quantile=settings.BACKGROUND_PARAMS['quantile']
    )
    
    # Plot original data, background, and background-subtracted data
    plt.figure(figsize=(12, 8))
    
    plt.semilogy(df_b.index, df_b['xrsb'], 'b-', label='Original XRS-B Flux')
    plt.semilogy(df_b_no_bg.index, df_b_no_bg['xrsb_background'], 'g-', label='Background')
    plt.semilogy(df_b_no_bg.index, df_b_no_bg['xrsb_no_background'], 'r-', label='Background-subtracted')
    
    plt.grid(True, which='both', linestyle='--', alpha=0.5)
    plt.ylabel('Flux (W/m²)')
    plt.title('GOES XRS-B Background Removal')
    plt.legend()
    plt.tight_layout()
    plt.show()

## Summary

In this notebook, we've demonstrated:

1. How to load GOES XRS data from NetCDF files
2. How to preprocess the data, including handling bad data points and interpolating gaps
3. How to visualize the data for both XRS-A and XRS-B channels
4. How to remove the background solar flux to isolate flare events

In the next notebook, we'll explore traditional flare detection techniques.