<a href="https://colab.research.google.com/github/sofiapapadron/The-DESC-ELAsTiCC-Challenge/blob/main/Loading_and_Processing_ELAsTiCC_Datasets_from_GitHub.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**The DESC ELAsTiCC Challenge**

# **0. Entendiemiento del negocio**

About ELAsTiCC

The purpose of ELAsTiCC ("Extended LSST Astronomical Time-series Classification Challenge") is to spur the creation and testing of an end-to-end real-time pipeline for time-domain science. The challenge starts with a simulation of ~5 million detected events that includes ~50 million alerts. These alerts will be streamed from LSST to brokers, who will classify the events and send new alerts with classifications back to DESC. A talk about ELAsTiCC given at the LSSTC Enabling Science Broker Workshop in 2021 can be found on YouTube. Two posters on ELAsTiCC given at conferences can be found below on this page.

For discussion or questions about the challenge, use the #elasticc-comms channel on the DESC Slack.

The first ELAsTiCC campaign ran from September 2022 until early January 2023. Metrics and diagnostics from that campaign can be found on the ELAsTiCC page of the DESC TOM (login required).

The Second ELAsTiCC campaign (dubbed ELAsTiCC2) ran from mid-November to mid-December 2023, streaing alerts at ~3× the rate of the first campaign. Diagnostics and some metrics from that campaign can be found on the ELAsTiCC2 page of the DESC TOM (login required).

There is a new github repository for ELAsTiCC-related code and information: LSSTDESC/elasticc.



#**1. Importar librerias**


In [2]:
pip install requests wget astropy pandas




In [3]:
import requests
from astropy.table import Table
import pandas as pd
import os
import requests
import wget
import gzip
import shutil
from astropy.io import fits
import pandas as pd

# **2. Carga de datos**

In [4]:
# URL del archivo ELASTICC_TRAIN_AGN.txt en GitHub (reemplaza con tu URL de GitHub)
github_url = "https://raw.githubusercontent.com/sofiapapadron/The-DESC-ELAsTiCC-Challenge/main/Automated_URL_Extraction/subdirectory_links/ELASTICC_TRAIN_AGN.txt?token=GHSAT0AAAAAACWAYALQHD6AS6OMY4IK45OOZXCFDTQ"

# Realizar una solicitud GET para obtener el contenido del archivo desde GitHub
response = requests.get(github_url)

In [8]:
if response.status_code == 200:
    # Leer las URLs desde el contenido del archivo en GitHub
    urls = response.text.splitlines()

    # Crear listas para almacenar los DataFrames de info y phot
    info_dfs = []
    phot_dfs = []

    # Iterar sobre cada URL
    for url in urls:
        if 'HEAD' in url:
            # Leer archivo de encabezado
            try:
                data = Table.read(url)
                df = data.to_pandas()
                info_dfs.append(df)
            except Exception as e:
                print(f"Error al leer {url}: {e}")
        elif 'PHOT' in url:
            # Leer archivo fotométrico
            try:
                data = Table.read(url)
                df = data.to_pandas()
                phot_dfs.append(df)
            except Exception as e:
                print(f"Error al leer {url}: {e}")

    # Concatenar todos los DataFrames en un solo DataFrame para info y otro para phot
    info = pd.concat(info_dfs, ignore_index=True) if info_dfs else pd.DataFrame()
    phot = pd.concat(phot_dfs, ignore_index=True) if phot_dfs else pd.DataFrame()

    # Guardar los DataFrames en archivos CSV
    info.to_csv('AGN_info.csv', index=False)
    phot.to_csv('AGN_phot.csv', index=False)

    print("Archivos 'AGN_info.csv' y 'AGN_phot.csv' creados con éxito.")

else:
    print(f"Error al leer el archivo desde GitHub: {response.status_code}")


Archivos 'AGN_info.csv' y 'AGN_phot.csv' creados con éxito.


In [5]:

# Verificar si la solicitud fue exitosa
if response.status_code == 200:
    urls = response.text.splitlines()
else:
    print(f'Error al obtener el archivo desde GitHub: {response.status_code}')
    exit()

In [None]:
# Crear carpeta para almacenar los archivos descargados
os.makedirs('fits_files', exist_ok=True)

# Descargar los archivos
for url in urls:
    print(f'Descargando {url}...')
    wget.download(url, out='fits_files/')

# Descomprimir archivos
for file_name in os.listdir('fits_files'):
    if file_name.endswith('.gz'):
        with gzip.open(f'fits_files/{file_name}', 'rb') as f_in:
            with open(f'fits_files/{file_name[:-3]}', 'wb') as f_out:
                shutil.copyfileobj(f_in, f_out)

print("Proceso terminado")

Descargando https://portal.nersc.gov/cfs/lsst/DESC_TD_PUBLIC/ELASTICC/TRAINING_SAMPLES/ELASTICC_TRAIN_AGN/ELASTICC_TRAIN_NONIaMODEL0-0001_HEAD.FITS.gz...
Descargando https://portal.nersc.gov/cfs/lsst/DESC_TD_PUBLIC/ELASTICC/TRAINING_SAMPLES/ELASTICC_TRAIN_AGN/ELASTICC_TRAIN_NONIaMODEL0-0001_PHOT.FITS.gz...
Descargando https://portal.nersc.gov/cfs/lsst/DESC_TD_PUBLIC/ELASTICC/TRAINING_SAMPLES/ELASTICC_TRAIN_AGN/ELASTICC_TRAIN_NONIaMODEL0-0002_HEAD.FITS.gz...
Descargando https://portal.nersc.gov/cfs/lsst/DESC_TD_PUBLIC/ELASTICC/TRAINING_SAMPLES/ELASTICC_TRAIN_AGN/ELASTICC_TRAIN_NONIaMODEL0-0002_PHOT.FITS.gz...
Descargando https://portal.nersc.gov/cfs/lsst/DESC_TD_PUBLIC/ELASTICC/TRAINING_SAMPLES/ELASTICC_TRAIN_AGN/ELASTICC_TRAIN_NONIaMODEL0-0003_HEAD.FITS.gz...
Descargando https://portal.nersc.gov/cfs/lsst/DESC_TD_PUBLIC/ELASTICC/TRAINING_SAMPLES/ELASTICC_TRAIN_AGN/ELASTICC_TRAIN_NONIaMODEL0-0003_PHOT.FITS.gz...
Descargando https://portal.nersc.gov/cfs/lsst/DESC_TD_PUBLIC/ELASTICC/TRAINI

In [None]:
# Crear listas para almacenar los datos
head_data = []
phot_data = []

# Leer y procesar archivos FITS
for file_name in os.listdir('fits_files'):
    if file_name.endswith('.FITS'):
        with fits.open(f'fits_files/{file_name}') as hdul:
            # Extraer los datos del archivo FITS
            data = hdul[1].data  # Asumiendo que los datos están en la extensión 1
            df = pd.DataFrame(data)

            # Determinar si es un archivo HEAD o PHOT
            if 'HEAD' in file_name:
                head_data.append(df)
            elif 'PHOT' in file_name:
                phot_data.append(df)

print("Proceso terminado")

In [None]:
# Concatenar los datos
head_dataset = pd.concat(head_data, ignore_index=True)
phot_dataset = pd.concat(phot_data, ignore_index=True)

# Guardar los datasets en archivos CSV
head_dataset.to_csv('head_data.csv', index=False)
phot_dataset.to_csv('phot_data.csv', index=False)

print("Archivos procesados y guardados como head_data.csv y phot_data.csv.")
