# This is a sample Jupyter Notebook

Below is an example of a code cell. 
Put your cursor into the cell and press Shift+Enter to execute it and select the next one, or click 'Run Cell' button.

Press Double Shift to search everywhere for classes, files, tool windows, actions, and settings.

To learn more about Jupyter Notebooks in PyCharm, see [help](https://www.jetbrains.com/help/pycharm/ipython-notebook-support.html).
For an overview of PyCharm, go to Help -> Learn IDE features or refer to [our documentation](https://www.jetbrains.com/help/pycharm/getting-started.html).

# This Jupyter Notebook for VO data discovery
I'll experiment here with pyvo library and different methods of data discovery

Let's start with discovering scanned photographic plates via ObsCore table and bulk downloading using datalink


In [2]:
print("Hello World!")
import astropy
import pyvo

Hello World!


Connect to the GAVO TAP service

In [3]:
tap_url = "http://dc.g-vo.org/tap"
tap_service = pyvo.dal.TAPService(tap_url)
tap_service

TAPService(baseurl : 'http://dc.g-vo.org/tap', description : 'None')

Query ObsCore table about all images containing our source

In [5]:
query = """
SELECT *
FROM ivoa.Obscore
WHERE dataproduct_type = 'image'
  AND CONTAINS(
        POINT('ICRS', 156.01124674021, 48.14753874736),
        s_region
      ) = 1
  AND NOT obs_collection LIKE '%RASS%'
ORDER BY t_min
"""
result = tap_service.search(query)

Convert TAPResults into VOTable, add isot datetime column
Convert t_min and t_max from mjd into datetime and make a table
Save table

In [5]:
# from astropy.time import Time
#
# table = result.to_table()
# table["date_min"] = Time(table['t_min'], format="mjd").isot
# table["date_max"] = Time(table['t_max'], format="mjd").isot
# table.write("table.txt", format="ascii", overwrite=True)




In [8]:
from pathlib import Path

def build_filename(download_dir_name, content_type, url, suffix=''):
    download_dir = Path(download_dir_name)
    download_dir.mkdir(parents=True, exist_ok=True)
    if content_type == "application/fits":
        ext = ".fits"
    elif content_type.startswith("image/"):
        ext = "." + content_type.split("/", 1)[1]
    else:
        ext = ""
    path = Path(Path(url).name + "_" + suffix)
    if ext and path.suffix.lower() != ext:
        path = path.with_suffix(ext)

    return download_dir / path



Download image using datalink with a specified semantics

In [37]:
import requests

def download_by_semantics(datalink, suffix='', semantics='#this', download_dir_name='plates'):
    dl_list = list(datalink.bysemantics(semantics))
    if len(dl_list) == 0:
        print(datalink, semantics)
        print('is empty, skipping')
        raise ValueError("datalink table is empty, skipping")
    link = list(datalink.bysemantics(semantics))[0]
    url = link["access_url"]
    content_type = link["content_type"].lower()

    filename = build_filename(download_dir_name=download_dir_name, content_type=content_type, url=url, suffix=suffix)
    if filename.exists():
        print(f"Skipping {filename.name} (already exists)")
        return

    print(f"Downloading {filename} ...")
    r = requests.get(url)
    r.raise_for_status()  # stop if error
    with open(filename, "wb") as f:
        f.write(r.content)
    print(f"{filename} saved")


Apply *download_by_semantics* to previews. We try to download preview with #preview-image semantics first, expecting better quality

In [55]:
from astropy.time import Time

for res in list(result)[:5]:
    # print(help(r))
    dl = res.getdatalink()
    jyear = Time(res["t_min"], format="mjd").jyear
    jyear_str = f'{jyear:.0f}'
    try:
        download_by_semantics(dl, semantics='#preview-image', suffix=jyear_str)
    except (requests.exceptions.HTTPError, ValueError) as e:
        print(e)
        print(f'Downloading {dl} by semantics #preview-image failed')
        try:
            download_by_semantics(dl, semantics='#preview', suffix=jyear_str)
        except requests.exceptions.HTTPError as e:
            print(f'Downloading {dl} by semantics #preview failed')


Skipping b01941_1888.jpeg (already exists)
Skipping i00601_1890.jpeg (already exists)
Skipping i00811_1890.jpeg (already exists)
Skipping i01126_1890.jpeg (already exists)
Skipping i02347_1891.jpeg (already exists)


Download #this (most likely it is a downgraded fits image)

In [56]:
from astropy.time import Time

for res in list(result)[:3]:
    dl = res.getdatalink()
    jyear = Time(res["t_min"], format="mjd").jyear
    jyear_str = f'{jyear:.0f}'
    try:
        download_by_semantics(dl, semantics='#this', suffix=jyear_str)
    except (requests.exceptions.HTTPError, ValueError) as e:
        print(e)
        print(f'Downloading {dl} by semantics #this failed')


Skipping b01941_1888.fits (already exists)
Skipping i00601_1890.fits (already exists)
Skipping i00811_1890.fits (already exists)


Here we will touch **SODA** functionality.
Quite often we don't need to download **whole** image. Instead, we can cut out FOI **on the server side**.

By the way, datacenters may provide us with downgraded (binned) fits under semantics #this hiding full-resolution image under semantics #coderived. That is reasonable from the datacenter point of view, they save resources from occasional  downloading. But when we search for transients or perform photometry we do need *high quality* image, but usually we don't need the *whole* plate

Let's take a closer view at the first datalink table from our query results

In [6]:
res_first = list(result)[0]
dl = res_first.getdatalink()
t = dl.to_table()
t.show_in_notebook()
t

ID,access_url,service_def,error_message,description,semantics,content_type,content_length,local_semantics,content_qualifier
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,byte,Unnamed: 8_level_1,Unnamed: 9_level_1
object,object,object,object,object,object,object,int64,object,object
ivo://org.gavo.dc/~?applause/q/DR3/scans/HAM-AG/AG03152_y.fits,https://www.plate-archive.org/files/DR3/scans/HAM-AG/AG03152_y.fits,,,The main data product,#this,image/fits,1037645824,,
ivo://org.gavo.dc/~?applause/q/DR3/scans/HAM-AG/AG03152_y.fits,https://www.plate-archive.org/files/DR3/previews/HAM-AGH/AG03152_pre.jpg,,,A preview or thumbnail of the data,#preview,,--,,


We are lucky: this datalink contains something with #proc semantics; **descriptions** says, that this is precisely what we need. Unfortunately, we are not able to choose cutout size time, as mentioned in the description, but this is not a case for all datalinks (indeed, some of them do not provide users with SODA functionality at all)

To explore SODA functionality we should know the URL of endpoint for submitting SODA requests (where methods live) and input parameters. All this we can extract from datalink table metadata and datalink table itself. Exploring corresponding votable (xml) helps to figure out where these things live

In [57]:
# Explore datalink table resources:
[print(p) for r in dl.votable.resources for p in r.params]
from astropy.io.votable import writeto
writeto(dl.votable, "datalink.vot")


<PARAM ID="accessURL" arraysize="*" datatype="char" name="accessURL" ucd="meta.ref.url" value="http://dc.g-vo.org/dasch/q/dl/dlget"/>
<PARAM ID="standardID" arraysize="*" datatype="char" name="standardID" value="ivo://ivoa.net/std/soda#sync-1.0"/>


Take a look at the datalink. We can extract some parameters we see in the TOPCAT "Table Parameters" window this way (I hope there are special methods in pyvo to hide this underhood things). Firsts of all we need to figure out input parameters:

In [59]:
vot = dl.votable  # pyvo.dal.adhoc.DatalinkResults VOTable
input_params_tmp = {}
# Iterate over all resources
for res in vot.resources:
    # Iterate over all GROUPs in the resource
    for group_tmp in res.groups:
        if group_tmp.ID == "inputParams":
            for item_tmp in group_tmp.iter_fields_and_params():
                print(f'{item_tmp.name=}, {item_tmp.ucd=}, {item_tmp.value=}')
                print(f'{item_tmp.description=}')
                input_params_tmp[item_tmp.name] = item_tmp.value


item.name='ID', item.ucd='meta.id;meta.main', item.value='ivo://org.gavo.dc/~?dasch/q/i00811'
item.description='The publisher DID of the dataset of interest'
item.name='POS', item.ucd='phys.angArea;obs', item.value=''
item.description='SIAv2-compatible cutout specification'


SODA endpoint is here:

In [60]:
proc_link_tmp = next(dl.bysemantics("#proc"))

Try to execute SODA service

In [62]:
soda_url_tmp = proc_link_tmp.access_url
contents_type_tmp = proc_link_tmp.content_type
ra_deg_tmp = 156.01124674021
dec_deg_tmp = 48.14753874736
radius_deg_tmp = 0.5
params_tmp = {
    "ID": input_params_tmp["ID"],
    "POS": f'CIRCLE {ra_deg_tmp} {dec_deg_tmp} {radius_deg_tmp}'
}
r = requests.get(soda_url_tmp, params=params_tmp)
r

<Response [200]>

Check and save result:

In [None]:
r.raise_for_status()
content_type_tmp = r.headers['Content-Type']
with open('tmp.tmp', "wb") as f:
    f.write(r.content)
    print(f'file is ready')

Gathering all things together
**The Algorithm**:
First we try to download cutout around FOI
if fails, download the whole fits image (thing with #this semantics)
if fails, download #preview-image
if fails, download #preview
if fails, well, screw it

In [6]:
def download_cutout(dtl, ra_deg, dec_deg, radius_deg):
    proc_link = next(dtl.bysemantics("#proc"))
    input_params = {}
    for resource in dtl.votable.resources:
    # Iterate over all GROUPs in the resource
        for group in resource.groups:
            if group.ID == "inputParams":
                for item in group.iter_fields_and_params():
                    input_params[item.name] = item.value

    soda_url = proc_link.access_url
    contents_type = proc_link.content_type

    params = {
        "ID": input_params["ID"],
        "POS": f'CIRCLE {ra_deg} {dec_deg} {radius_deg}'
    }

    print(f'cutout_{ra_deg:.0f}_{dec_deg:.0f}')
    cutout_name = build_filename(download_dir_name="plates", content_type="application/fits", url=input_params["ID"], suffix=f'cutout_{ra_deg:.0f}_{dec_deg:.0f}')
    print(f"Downloading {cutout_name} ...")
    if cutout_name.exists():
        print(f"Skipping {cutout_name.name} (already exists)")
        return
    r = requests.get(soda_url, params=params)
    r.raise_for_status()
    content_type = r.headers['Content-Type']
    cutout_name = build_filename(download_dir_name="plates", content_type="application/fits", url=input_params["ID"], suffix=f'cutout_{ra_deg:.0f}_{dec_deg:.0f}')
    with open(cutout_name, "wb") as f:
        f.write(r.content)
        print(f'{cutout_name} is ready')

In [53]:
from astropy.time import Time

foi_ra_deg = 156.01124674021
foi_dec_deg = 48.14753874736
foi_radius_deg = 0.5
result_list = list(result)

for res in result:
# if True:
#     res = result_list[100]
    dl = res.getdatalink()
    t = dl.to_table()
    t.show_in_notebook()
    t
    jyear = Time(res["t_min"], format="mjd").jyear
    jyear_str = f'{jyear:.0f}'

    # Try cutout:
    try:
        download_cutout(dl, foi_ra_deg, foi_dec_deg, foi_radius_deg)
    except (requests.exceptions.HTTPError, StopIteration, ValueError):
        print("Datalink do not provide #proc")
        # Well, try to get the whole image
        print("Trying to download whole fits image by #this semantics")
        try:
            download_by_semantics(dl, semantics='#this', suffix=jyear_str)
        except (requests.exceptions.HTTPError, StopIteration, ValueError) as e:
            print(e)
            print(f'Downloading {dl} by semantics #this failed')
            # We still have alternatives
            print('Trying to download preview in the best quality with #peview-image semantics ')
            try:
                download_by_semantics(dl, semantics='#preview-image', suffix=jyear_str)
            except (requests.exceptions.HTTPError, StopIteration, ValueError) as e:
                print(e)
                print(f'Downloading {dl} by semantics #preview-image failed')
                # I doubt we will be happy with thumbnails, but seems, this is a last chance
                try:
                    download_by_semantics(dl, semantics='#preview', suffix=jyear_str)
                except (requests.exceptions.HTTPError, StopIteration, ValueError) as e:
                    print(f'Downloading {dl} by semantics #preview failed')
                    # Well, to hell with it, let's move on
                    # continue


cutout_156_48
Skipping b01941_cutout_156_48.fits (already exists)
cutout_156_48
Skipping i00601_cutout_156_48.fits (already exists)
cutout_156_48
Skipping i00811_cutout_156_48.fits (already exists)
cutout_156_48
Skipping i01126_cutout_156_48.fits (already exists)
cutout_156_48
Skipping i02347_cutout_156_48.fits (already exists)
cutout_156_48
Skipping i05998_cutout_156_48.fits (already exists)
cutout_156_48
Skipping i05999_cutout_156_48.fits (already exists)
cutout_156_48
Skipping i06043_cutout_156_48.fits (already exists)
cutout_156_48
Skipping i06162_cutout_156_48.fits (already exists)
cutout_156_48
Skipping i08266_cutout_156_48.fits (already exists)
cutout_156_48
Skipping i08359_cutout_156_48.fits (already exists)
cutout_156_48
Skipping i08382_cutout_156_48.fits (already exists)
cutout_156_48
Skipping i08395_cutout_156_48.fits (already exists)
cutout_156_48
Skipping i12127_cutout_156_48.fits (already exists)
cutout_156_48
Skipping i12306_cutout_156_48.fits (already exists)
cutout_156