## 1. Methods for Accessing HEASARC Data  
In this notebook, we will explore various methods to access and download data from the HEASARC (High Energy Astrophysics Science Archive Research Center) archive. While our focus will be on Swift satellite data, the approaches demonstrated here can be applied to data from any HEASARC-hosted mission. The methods covered in this notebook are adapted from the [sciserver_cookbooks](https://github.com/HEASARC/sciserver_cookbooks) GitHub repository.  

#### a) Online Access via Swift Master Catalog
The Swift Master Catalog is a web-based interface that allows users to browse and download Swift mission data. It provides a user-friendly way to search for observations, view metadata, and download datasets directly through the HEASARC website. This method is ideal for quick, manual queries or exploring the data interactively.  
Access the [Swift Master Catalog](https://heasarc.gsfc.nasa.gov/w3browse/swift/swiftmastr.html)

#### b) HEASARC TAP Service (Table Access Protocol)

The TAP service is a standardized Virtual Observatory protocol that enables programmatic access to astronomical catalogs and data tables. Through TAP, users can execute SQL-like queries to retrieve data from the HEASARC archives in a flexible and automated manner. This is particularly useful for large-scale data mining, batch queries, or integrating HEASARC data access into software pipelines.  
Learn more: [TAP Service](https://www.ivoa.net/documents/TAP/20190626/PR-TAP-1.1-20190626.html), [HEASARC API](https://heasarc.gsfc.nasa.gov/docs/archive/apis.html)

#### c) Xamin's API
Xamin is an advanced query and data mining tool provided by HEASARC that supports complex data searches and cross-matching across multiple catalogs. Its API allows users to submit queries programmatically and retrieve data in various formats. Xamin’s capabilities make it a powerful resource for in-depth data exploration and multi-wavelength studies.  
Access the [Xamin](https://heasarc.gsfc.nasa.gov/xamin/)

## 2. Module Imports

Standard Library Modules:
- **`sys`**: Commonly used for handling command-line arguments, exiting programs, and interacting with the Python runtime environment.
- **`os`**: Offers a way to interact with the operating system. Useful for file and directory manipulation, accessing environment variables, and constructing portable file paths.
- **`glob`**: Enables Unix-style pathname pattern matching (e.g., `*.fits`). Often used to find all files in a directory that match a specific pattern.

Non-Standard Modules:

> These modules are not part of the Python Standard Library and must be installed separately (e.g., using `pip install pyvo astropy numpy pandas`).
- **`pyvo`**: A client library for accessing Virtual Observatory (VO) services such as TAP, SIAP, and others. Ideal for querying astronomical catalogs programmatically using IVOA standards. [Learn more](https://pyvo.readthedocs.io/en/latest/)
- **`astropy`**: A comprehensive library for astronomy and astrophysics. Includes utilities for celestial coordinate transformations (`SkyCoord`), FITS file handling, time conversions, and more. [Learn more](https://www.astropy.org/)
- **`numpy`**: A core library for numerical computing in Python. Provides powerful tools for array operations, linear algebra, mathematical functions, and statistical analysis. [Learn more](https://numpy.org/)
- **`pandas`**: A powerful data analysis and manipulation library. Offers data structures like `DataFrame` for handling tabular data, making it ideal for data wrangling and exploration. [Learn more](https://pandas.pydata.org/)

In [2]:
import sys
import os
import glob

import pyvo
import requests
from astropy.coordinates import SkyCoord
from astropy.table import Table
from astropy.time import Time

import numpy as np
import pandas as pd

## 3. Swift Master Catalog  
Link: https://heasarc.gsfc.nasa.gov/w3browse/swift/swiftmastr.html

## 4. HEASARC TAP service

We can use the Virtual Observatory interfaces to the HEASARC to find the data we're interested in. Specifically, we want to look at the observation tables. So first we get a list of all the tables HEASARC serves and then look for the ones related to Swift.

### 4.1.Find the Tables
We begin by querying the Virtual Observatory (VO) service registry. HEASARC provides a TAP service that connects to the same backend used by the Xamin web interface. This means it accesses the same underlying database used in HEASARC's online catalog.

In [3]:
tap_services = pyvo.regsearch(servicetype='tap', keywords=['heasarc'])

Next, we query the first TAP service we found to retrieve a list of all the tables it offers at HEASARC.

In [4]:
heasarc_tables = tap_services[0].service.tables

The HEASARC service contains many tables. Here, we want to list only those tables related to the Swift mission.

In [5]:
for tablename in heasarc_tables.keys():
    if "swift" in tablename:
        print(" {:20s} {}".format(tablename, heasarc_tables[tablename].description))

 swift2sxps           Swift-XRT Point Source Catalog (2SXPS)
 swiftbalog           Swift BAT Instrument Log
 swiftft              Swift Serendipitous Survey in Deep XRT GRB Fields (SwiftFT)
 swiftgrb             Swift Gamma Ray Bursts Catalog
 swiftgrbba           Swift Gamma Ray Burst Compilation by Burst Advocate
 swiftguano           Swift Gamma-ray Urgent Archiver for Novel Opportunities (GUANO)
 swiftmastr           Swift Master Catalog
 swifttdrss           Swift TDRSS Messages
 swiftuvlog           Swift UVOT Instrument Log
 swiftxrlog           Swift XRT Instrument Log


Viewing Columns in the Swift Master Catalog Table (`swiftmastr`): This code lists all the columns in the swiftmastr table along with their descriptions, giving an overview of the available data fields.

In [6]:
for column in heasarc_tables['swiftmastr'].columns:
    print("{:20s} {}".format(column.name, column.description))

att_flag             Attitude Flag
orig_obsid           Observation Number as Originally Assigned (Orig_Target_ID + Orig_Obs_Segment)
af_insaa             Time in SAA Per Sequence, According to the As-Flown Timeline (s)
bat_no_masktag       Number of Sources with Mask-Tag Rate within This Observation
orig_target_id       Trigger Number as Originally Assigned
software_version     Software Version Used in the Pipeline
saa_fraction         Fraction of Observation Spent in the South Atlantic Anomaly (SAA)
"__z_ra_dec"         System unit vector column
processing_version   Version of Processing Script
xrt_expo_pc          Effective Exposure on Source of Photon Counting XRT Mode
bat_exposure         Effective Exposure on Source Considering All BAT Event and Survey Modes
start_time           Start Time of the Observation
uvot_expo_wh         Effective Exposure on Source of UVOT Filter WHITE
bii                  Galactic Latitude (Pointing Position)
uvot_exposure        Effective Exposure on S

### 4.2. Build a Search Query
Querying Swift Master Catalog for Observations of OJ 287:  
First, we get the sky coordinates of the object OJ 287 using its name. Then, we construct an ADQL query to retrieve relevant observation data from the `swiftmastr` table. This query retrieves observation details from the Swift Master Catalog for sources located within 0.1 degrees of OJ 287 that have positive XRT exposure times, sorted by the observation start time.

**ADQL**: ADQL (Astronomical Data Query Language) is a SQL-like language specifically designed for querying astronomical databases and Virtual Observatory services, enabling complex spatial and temporal queries on celestial data.  
See the [NAVO website](https://heasarc.gsfc.nasa.gov/vo/summary/python.html) for more information on how to use these services with python and how to construct [ADQL](https://www.ivoa.net/documents/ADQL/20180112/PR-ADQL-2.1-20180112.html) queries for catalog searches. You can also find more detailed on using these services in the [NASA Virtual Observatory workshop tutorials (NAVO)](https://nasa-navo.github.io/navo-workshop/).

In [7]:
# Get the coordinate for OJ 287
pos = SkyCoord.from_name("OJ 287")

query = """SELECT name, obsid, ra, dec, start_time, processing_date, xrt_exposure, uvot_exposure, bat_exposure, archive_date, target_id, xrt_expo_wt, xrt_expo_pc, uvot_expo_uu, uvot_expo_bb, uvot_expo_vv, uvot_expo_w1, uvot_expo_w2, uvot_expo_m2 
    FROM public.swiftmastr as cat 
    where 
    contains(point('ICRS',cat.ra,cat.dec),circle('ICRS',{},{},0.1))=1 
    and 
    cat.xrt_exposure > 0 order by cat.start_time
    """.format(pos.ra.deg, pos.dec.deg)

In [8]:
results = tap_services[0].search(query).to_table()
results

name,obsid,ra,dec,start_time,processing_date,xrt_exposure,uvot_exposure,bat_exposure,archive_date,target_id,xrt_expo_wt,xrt_expo_pc,uvot_expo_uu,uvot_expo_bb,uvot_expo_vv,uvot_expo_w1,uvot_expo_w2,uvot_expo_m2
Unnamed: 0_level_1,Unnamed: 1_level_1,deg,deg,d,d,s,s,s,d,Unnamed: 10_level_1,s,s,s,s,s,s,s,s
object,object,float64,float64,float64,float64,float64,float64,float64,int32,int32,float64,float64,float64,float64,float64,float64,float64,float64
OJ287,00035011001,133.69442,20.13596,53510.2687731481,56961.0,3793.15000,3796.45100,3958.00000,53521,35011,69.25500,3714.38900,0.00000,0.00000,238.20100,31.01400,3304.76800,222.46800
OJ287,00035011002,133.69491,20.11982,53516.5069560185,56961.0,131.41200,2786.06600,2959.00000,53527,35011,77.63800,52.44200,238.06800,170.66800,238.06800,475.06700,950.07800,714.11700
OJ287,00035011003,133.68470,20.09330,53528.5652893518,56979.0,1256.68700,0.00000,0.00000,53539,35011,3.04500,1253.64200,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000
OJ287,00035905001,133.73301,20.10491,54055.6730555556,57180.0,1995.27700,1979.07000,2012.00000,54066,35905,1.97200,1993.30500,0.00000,0.00000,0.00000,275.52600,851.77200,851.77200
OJ287,00035905002,133.66860,20.12275,54056.0145949074,57178.0,2374.77700,2693.83800,2879.00000,54067,35905,35.51300,2339.26400,229.09700,229.10600,229.10600,459.04200,920.13100,627.35600
OJ287,00035905003,133.72354,20.12954,54057.7507060185,57180.0,3176.70300,3210.54200,3326.00000,54068,35905,17.52500,3159.17800,0.00000,0.00000,0.00000,996.45200,1107.10400,1106.98600
OJ287,00030901001,133.70688,20.11405,54165.1368171296,57192.0,2678.01900,2579.34300,2761.00000,54176,30901,16.42100,2661.59800,222.12600,222.10400,222.09300,444.07500,890.10300,578.84200
OJ287,00030901002,133.69812,20.14249,54169.0416782407,57191.0,2017.08200,1923.52400,2068.00000,54180,30901,7.86900,2009.21300,168.09800,168.10900,168.07500,337.04500,671.07700,411.12000
OJ287,00030901003,133.69688,20.11187,54177.7336458333,57195.0,127.85800,123.72300,158.00000,54188,30901,10.51200,117.34600,0.00000,0.00000,0.00000,123.72300,0.00000,0.00000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...


In [31]:
def mjd_to_iso(mjd):
    y, m, *_ = Time(mjd, format='mjd').iso.split('-')
    return f"{y}_{m}"

obsid_row = results[results['obsid'] == "00035011001"]
obsid_val = obsid_row['obsid'][0]
start_time = obsid_row['start_time'][0]
obs_iso = mjd_to_iso(start_time)

downlink = f"https://heasarc.gsfc.nasa.gov//FTP/swift/data/obs/{obs_iso}/{obsid_val}/"
downlink

'https://heasarc.gsfc.nasa.gov//FTP/swift/data/obs/2005_05/00035011001/'

## 3. Xamin's API

An alternative method to access the data is to use the Xamin API specifically. [Xamin](https://heasarc.gsfc.nasa.gov/xamin/) is the main web portal for accessing HEASARC data, and it offers an API that can be used to query the same tables.  
The base URL (`https://heasarc.gsfc.nasa.gov/xamin/QueryServlet?`) for the Xamin query servelet is, which will be queries using the `requests` module.  

In [31]:
url = "https://heasarc.gsfc.nasa.gov/xamin/QueryServlet?products&"
params = {"table": "swiftmastr",
        "object": "OJ 287",
        "resultmax": "1000"}

result = requests.get(url, params)
# result.text.split('\n')[0:2]

The raw text is converted into a clean pandas DataFrame.

In [36]:
lines = result.text.split('\n')
data_lines = [line for line in lines if line.strip() and not line.startswith('#')]
header = data_lines[0].split('|')
rows = [line.split('|') for line in data_lines[1:]]
df = pd.DataFrame(rows, columns=header)
df.columns = df.columns.str.strip()
df = df.apply(lambda col: col.str.strip() if col.dtype == "object" else col)
df['obsid'] = df['obsid'].ffill()
print(df.head(10))

                                                name        obsid  \
0                                     saa-cold-103-0  00074481018   
1  >  query?table=swiftbalog&constraint=obsid='00...  00074481018   
2  >  query?table=swiftxrlog&constraint=obsid='00...  00074481018   
3        >  /FTP/swift/data/obs/2007_07/00074481018/  00074481018   
4                                     saa-cold-103-0  00074481008   
5  >  query?table=swiftbalog&constraint=obsid='00...  00074481008   
6  >  query?table=swiftxrlog&constraint=obsid='00...  00074481008   
7        >  /FTP/swift/data/obs/2007_02/00074481008/  00074481008   
8                                     saa-cold-103-0  00074481007   
9  >  query?table=swiftbalog&constraint=obsid='00...  00074481007   

            ra          dec           start_time      processing_date  \
0  20 50 12.19  -89 59 26.2  2007-07-08T03:43:01  2015-07-16T00:00:00   
1         None         None                 None                 None   
2         None       

In [46]:
obsid_row = df[df['obsid'] == '00074481018']
obsid_row

Unnamed: 0,name,obsid,ra,dec,start_time,processing_date,xrt_exposure,uvot_exposure,bat_exposure,archive_date,...,__w_swiftmastr_link_swiftuvlog,__p_swiftmastr_swift_obs_root,__p_swiftmastr_swift_obs_uvot_filter,__p_swiftmastr_link_swiftxrlog_obsid,__p_swiftmastr_link_swiftbalog_obsid,__p_swiftmastr_link_swifttdrss_target_id,__w_swiftmastr_link_swifttdrss,__p_swiftmastr_point_bib_table,__p_swiftmastr_point_bib_id,__w_swiftmastr_point_bib
0,saa-cold-103-0,74481018,20 50 12.19,-89 59 26.2,2007-07-08T03:43:01,2015-07-16T00:00:00,247.11,0.0,490.0,2007-07-19,...,False,/FTP/swift/data/obs/2007_07/00074481018/,u*,74481018.0,74481018.0,74481.0,False,swiftmastr,74481.0,False
1,> query?table=swiftbalog&constraint=obsid='00...,74481018,,,,,,,,,...,,,,,,,,,,
2,> query?table=swiftxrlog&constraint=obsid='00...,74481018,,,,,,,,,...,,,,,,,,,,
3,> /FTP/swift/data/obs/2007_07/00074481018/,74481018,,,,,,,,,...,,,,,,,,,,


In [48]:
obsid_ftp = obsid_rows['__p_swiftmastr_swift_obs_root']
obsid_ftp

0    /FTP/swift/data/obs/2007_07/00074481018/
1                                        None
2                                        None
3                                        None
Name: __p_swiftmastr_swift_obs_root, dtype: object

In [53]:
downlink = f"https://heasarc.gsfc.nasa.gov/{obsid_ftp[0]}"
downlink

'https://heasarc.gsfc.nasa.gov//FTP/swift/data/obs/2007_07/00074481018/'