<h1>Virtual Observatory: Overview</h1>

The **Virtual Observatory** (VO) is the vision that astronomical datasets and other resources should work as a seamless whole. Many projects and data centres worldwide are working towards this goal. The International Virtual Observatory Alliance (IVOA) is an organisation that debates and agrees the technical standards that are needed to make the VO possible. It also acts as a focus for VO aspirations, a framework for discussing and sharing VO ideas and technology, and body for promoting and publicising the VO. (http://www.ivoa.net/)


Useful for:
* browsing images,
* searching catalogs,
* finding available data for an object.

The VO can be seen as a **kind of club of data services that all follow the same rules**. Tools and web portals that understand these rules can fetch data from these services and know what to do with the data when they get it. The advantage of standardised services is that **you don't have to go to lots of different web pages and learn a different interface for each one**.

<h2>Standard data formats</h2>

VO services are expected to return standard data formats, so that tools can make sensible use of the data. Generally this either means industry standard formats such as JPG, or well known astronomical formats such as FITS. However, you will also come across a specialised VO format for tables of information, known as **VOTable**. All VO tools know how to deal with VO Tables, so you don't need to understand the structure, but the main advantage is that VOTable has more flexible descriptive metadata than for example CSV files or FITS tables.

Another VO improvement for data tables is that any table column has an associated "**Universal Content Descriptor**" or **UCD**, which tells software using the table what kind of quantity is in that column. For example, if the name of a table column is "alpha" it would be unclear what it is, but if its UCD is given as "pos.eq.RA" then the software will knows that it is a Right Ascension.

Main Data Services:

* **Cone search** services offer the simplest access to astronomical catalogues. The input is sky-position and radius. The return is a subset of the catalogue within that radius.

* **Table Access Protocol** (**TAP**) services offer more flexible access to data tables, along the lines astronomers have become used to in making queries to databases like those of SDSS or UKIDSS. The input is a query in Astronomical Data Query Language (ADQL -- http://docs.g-vo.org/adql-gaia/html/), which is basically a standardised version of SQL. The return is a data table.  

* **Simple Image Access Protocol** (**SIAP**) services offer access to pixel data. The input is a position and a size. If the SIAP service is a "cut-out" service, the return will be an image  centred at the requested position, with the requested size. If it is an "Atlas" service, which holds a collection of standard sized data frames, then the size is used to look for frames centred within that distance of the requested position, and the whole data frame(s) returned.

* **Simple Spectral Access Protocol** (**SSAP**) services provide access to spectra. The input is a position and size. Like the "atlas" version of image access, the return will be any spectra whose target positions are within the stated distance of the requested position.


<h2>VO-based Applications</h2>

* **TOPCAT - Tool for OPerations on Catalogues And Tables** (http://www.star.bris.ac.uk/~mbt/topcat/):  interactive graphical viewer and editor for tabular data. Useful for the analysis and manipulation of source catalogues and other tables. **STILTS - Starlink Tables Infrastructure Library Tool Set** (http://www.star.bris.ac.uk/~mbt/stilts/): the command-line counterpart of the GUI table analysis tool TOPCAT.
* **Aladin** (https://aladin.u-strasbg.fr/aladin.gml): interactive sky atlas to visualize digitized astronomical images or full surveys, superimpose entries from astronomical catalogues or databases, and interactively access related data and information from the Simbad database, the VizieR service and other archives for all known astronomical objects in the field.
* **VOSA - VO Sed Analyzer** (http://svo2.cab.inta-csic.es/theory/vosa/): tool to query photometrical catalogs, read user photometry-tables, calculate syntetic photometry, model the observed SED, generate HR diagrams, provide an estimate of mass and age of the sources.

(FYI: interesting tutorials available at this link: https://www.asterics2020.eu/dokuwiki/doku.php?id=open:wp4:school4:program)




<h1>PyVO</h1>

PyVO lets you find and retrieve astronomical data available from archives that support standard IVOA virtual observatory service protocols.  PyVO is built on top of the widely used Astropy package (Astropy Collaboration et al.,
2013), an integrated set of astronomically-oriented modules.
This allows users to discover and download data and process and analyze it with the robust capabilities of Astropy.
PyVO is installable via pip.



<h2>Requirements</h2>
* numpy (> 1.4.0)
* astropy
* requests


<h2>Getting Started</h2>


Initialization:



In [93]:
%matplotlib inline
import pyvo

In general, we should define a **service**, then perform a **query** with parameters specific to the service type, and store the **results**.


<h2>Services</h2>
There are five types of services with different purposes but a similiar interface available.


<h3>Table Access Protocol</h3>
Unlike the other services, this one works with tables queryable by an sql-ish language called ADQL instead of predefined search constraints.





In [94]:
service = pyvo.dal.TAPService("http://simbad.u-strasbg.fr:80/simbad/sim-tap")

In [95]:
resultset = service.search("SELECT * FROM basic")



In [96]:
resultset


<Table masked=True length=50000>
    coo_bibcode     coo_err_angle coo_err_maj ... sp_type update_date   vlsr 
                         deg          mas     ...                            
       object           int16       float32   ...  object    object   float64
------------------- ------------- ----------- ... ------- ----------- -------
2018yCat.1345....0G            90      0.0283 ...          2018-06-29      --
2018yCat.1345....0G            90      0.0255 ...          2018-06-29      --
2018yCat.1345....0G            90      0.0295 ...          2018-06-29      --
2018yCat.1345....0G            90      0.0825 ...          2018-06-30      --
2003yCat.2246....0C            83        80.0 ...          2011-06-27      --
2018yCat.1345....0G            90      0.0259 ...          2018-06-29      --
2018yCat.1345....0G            90       0.027 ...          2018-06-29      --
2018yCat.1345....0G            90      0.0233 ...          2018-06-29      --
2018yCat.1345....0G            

<h3>Simple Image Access</h3>

Basic queries are done with the _pos_ and _size_ parameters described in Astrometrical parameters, with size being the rectangular region around pos.

In [97]:
from astropy.coordinates import SkyCoord 
from astropy.units import Quantity

pos = SkyCoord.from_name('Eta Carinae')
size = Quantity(0.5, unit="deg")
sia_service = pyvo.dal.SIAService("http://dc.zah.uni-heidelberg.de/hppunion/q/im/siap.xml")
sia_results = sia_service.search(pos=pos, size=size)
sia_results



<Table masked=True length=8>
                                     accref                                      ...
                                                                                 ...
                                     object                                      ...
-------------------------------------------------------------------------------- ...
http://dc.zah.uni-heidelberg.de/getproduct/boydende/data/fits/HAR081_006454.fits ...
http://dc.zah.uni-heidelberg.de/getproduct/boydende/data/fits/HAR081_006478.fits ...
http://dc.zah.uni-heidelberg.de/getproduct/boydende/data/fits/HAR081_006479.fits ...
http://dc.zah.uni-heidelberg.de/getproduct/boydende/data/fits/HAR081_006481.fits ...
http://dc.zah.uni-heidelberg.de/getproduct/boydende/data/fits/HAR081_006498.fits ...
http://dc.zah.uni-heidelberg.de/getproduct/boydende/data/fits/HAR081_006507.fits ...
http://dc.zah.uni-heidelberg.de/getproduct/boydende/data/fits/HAR081_006510.fits ...
http://dc.zah.uni-heidelberg.de/getp

It is possible to select the format that we want to receive, either graphics (JPG, PNG, GIF) or metadata.




<h3>Simple Spectrum Access</h3>

Access to (one-dimensional) spectra resembles image access, with some subtile differences:

The size parameter is called diameter here, and hence the search region is always circular with pos as center:

In [98]:
ssa_service = pyvo.dal.SSAService("http://www.isdc.unige.ch/vo-services/lc")
ssa_results = ssa_service.search(pos=pos, diameter=size)
ssa_results



<Table masked=True length=5>
                Title                 Instrument ... Bandpass  Access format  
                                                 ...                          
                object                  object   ...  object       object     
------------------------------------- ---------- ... -------- ----------------
     V band OMC magnitude light Curve        omc ... INTEGRAL application/fits
17.8-80.0 keV ISGRI count light curve      isgri ... INTEGRAL application/fits
 80.0-250 keV ISGRI count light curve      isgri ... INTEGRAL application/fits
  3.0-10.2 keV JEMX count light curve       jemx ... INTEGRAL application/fits
 10.2-34.9 keV JEMX count light curve       jemx ... INTEGRAL application/fits

<h3>Simple Cone Search</h3>

The Simple Cone Search returns results – typically catalog entries – within a circular region on the sky defined by the parameters pos (again, ICRS) and radius:

In [99]:
scs_service = pyvo.dal.SCSService('http://dc.zah.uni-heidelberg.de/arihip/q/cone/scs.xml')
scs_results = scs_service.search(pos=pos, radius=size)
scs_results

<Table masked=True length=5>
        _r        hipno      raj2000      ...   pmraHIP      pmdeHIP   
       deg                     deg        ...   deg / yr     deg / yr  
     float64      object     float64      ...   float32      float32   
----------------- ------ ---------------- ... ------------ ------------
0.125481911011817  52558 161.187612383333 ... -2.01389e-06  7.47222e-07
0.273291439427868  52562   161.1964495375 ... -2.14167e-06  9.05556e-07
0.387423868771207  52806 161.934655158333 ... -4.31389e-06 -3.33333e-08
 0.44762655677728  52827 162.022529929167 ... -2.10556e-06  7.52778e-07
0.458205458134352  52488 160.967739020833 ... -1.81944e-06  5.63889e-07

<h2>Queries</h2>

In order to make more complex queries, the knowledge of ADQL is required. However, some examples and prompts are provided in TOPCAT. By means of a TAP query, one can retreive ConeSearch informations as well. 


<h3>Uploads</h3>
Some TAP services allow you to upload your own tables to make them accessible in queries.

For this the various query methods have a uploads keyword, which accepts a dictionary of table name and content.

The mechanism behind this parameter is smart enough to distinct between various types of content, either a str pointing to a local file or a file-like object, a Table or DALResults for an inline upload, or a url str pointing to a remote resource.

The uploaded tables will be available as TAP_UPLOAD.name.



In [100]:
import pandas as pd
data = [['11 Com', 185.179, 17.793, 'HIP 60202'],
['11 UMi', 229.275, 71.824, 'HIP 74793'],
['14 And', 352.823, 39.236, 'HIP 116076'],
['14 Her', 242.601, 43.818, 'HIP 79248'],
['16 Cyg', 295.467, 50.518, 'HIP 96901'],
['18 del', 314.608, 10.839, 'HIP 103527'],
['1RXS1609', 242.376, -21.083, ''],
['24 Boo', 217.158, 49.845, 'HIP 70791'],
['24 Sex', 155.868, -0.902, 'HIP 50887'],
['2MASS J02192210-3925225', 34.842, -39.423, ''],
['2MASS J04414489+2301513', 70.437, 23.031, ''],
['2MASS 1207', 181.889, -39.548, '2MASS J12073346-3932539,2MASSW J1207334-393254,USNO-B1.0 0504-00258166,TWA 27,WDS J12076-3933AB,NAME 2M1207,** CVN   12,NAME 2M1207A,DENIS J120733.4-393254,WISE J120733.42-393254.2,2M 1207-39 b,2MASSW J1207334-393254 b,2MASS1207-3932 B,2M1207B,2M1207 b,2M 1207 b,2MASS J12073346-3932539 b'],
['2MASS 1938+4603', 294.636, 46.066, ''],
['2MASS J21402931+1625183', 325.122, 16.422, ''],
['2MASS J22362452+4751425', 339.102, 47.862, ''],
['30 Ari', 39.241, 24.648, 'HIP 12184'],
['4 UMa', 130.053, 64.328, 'HIP 42527'],
['42 Dra', 276.496, 65.563, 'HIP 90344'],
['47 UMa', 164.867, 40.43, 'HIP 53721']]

c=pd.DataFrame(data,columns=["Host","ra","dec","alias"])
c
c.to_csv("table.csv")


Turn that into an Astropy table:

In [101]:
import astropy.table as tab
local_table = tab.Table.from_pandas(c[['Host','ra','dec','alias']])



Select service: SIMBAD TAP SERVICE:

In [102]:
s="http://simbad.u-strasbg.fr:80/simbad/sim-tap"
service = pyvo.dal.TAPService(s)    

In [103]:
query="SELECT t.*,  basic.main_id, basic.dec as dec_2, basic.ra as ra_2 FROM TAP_UPLOAD.t1 as t \
LEFT OUTER JOIN ident ON ident.id = t.Host LEFT OUTER JOIN basic ON ident.oidref = basic.oid"
        

response = service.run_sync(query,uploads={"t1": local_table},timeout=None)
table=response.table
table



Host,ra,dec,alias,main_id,dec_2,ra_2
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,deg,deg
object,float64,float64,object,object,float64,float64
11 Com,185.179,17.793,HIP 60202,* 11 Com,17.792872016363056,185.17927325102127
11 UMi,229.275,71.824,HIP 74793,* 11 UMi,71.82389929736556,229.27454806843
14 And,352.823,39.236,HIP 116076,* 14 And,39.23619735870278,352.82255302216294
14 Her,242.601,43.818,HIP 79248,* 14 Her,43.81763853737972,242.6013136477292
16 Cyg,295.467,50.518,HIP 96901,* 16 Cyg,50.52544444444444,295.45454166666667
18 del,314.608,10.839,HIP 103527,* 18 Del,10.839285026460557,314.60805686290246
1RXS1609,242.376,-21.083,,,--,--
24 Boo,217.158,49.845,HIP 70791,* g Boo,49.8448503045835,217.1575543882676
24 Sex,155.868,-0.902,HIP 50887,* 24 Sex,-0.9022436559763889,155.86820579804623
2MASS J02192210-3925225,34.842,-39.423,,2MASS J02192210-3925225,-39.422922298465274,34.842127084335


In [104]:
table=table.filled(0)
table=table[table['ra_2']==0]
table

Host,ra,dec,alias,main_id,dec_2,ra_2
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,deg,deg
object,float64,float64,object,object,float64,float64
1RXS1609,242.376,-21.083,,,0.0,0.0
2MASS 1207,181.889,-39.548,"2MASS J12073346-3932539,2MASSW J1207334-393254,USNO-B1.0 0504-00258166,TWA 27,WDS J12076-3933AB,NAME 2M1207,** CVN 12,NAME 2M1207A,DENIS J120733.4-393254,WISE J120733.42-393254.2,2M 1207-39 b,2MASSW J1207334-393254 b,2MASS1207-3932 B,2M1207B,2M1207 b,2M 1207 b,2MASS J12073346-3932539 b",,0.0,0.0
2MASS 1938+4603,294.636,46.066,,,0.0,0.0
30 Ari,39.241,24.648,HIP 12184,,0.0,0.0


In [105]:
local_table = table[['Host','ra','dec','alias']]
query="SELECT tc.*, db.main_id, db.dec as dec_2,db.ra as ra_2,db.otype_txt  FROM basic AS db  \
JOIN TAP_UPLOAD.t1 AS tc ON 1=CONTAINS(POINT('ICRS',db.ra, db.dec),   CIRCLE('ICRS',tc.ra, tc.dec ,0.005))"
        
service = pyvo.dal.TAPService(s)    
response = service.run_sync(query,uploads={"t1": local_table},timeout=None)
tablenew=response.table
tablenew



Host,ra,dec,alias,main_id,dec_2,ra_2,otype_txt
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,deg,deg,Unnamed: 7_level_1
object,float64,float64,object,object,float64,float64,object
1RXS1609,242.376,-21.083,,NAME 1RXS1609 b,-21.08304035577167,242.3762817466054,Pl
1RXS1609,242.376,-21.083,,1RXS J160929.1-210524,-21.08304035577167,242.3762817466054,pr*
2MASS 1207,181.889,-39.548,"2MASS J12073346-3932539,2MASSW J1207334-393254,USNO-B1.0 0504-00258166,TWA 27,WDS J12076-3933AB,NAME 2M1207,** CVN 12,NAME 2M1207A,DENIS J120733.4-393254,WISE J120733.42-393254.2,2M 1207-39 b,2MASSW J1207334-393254 b,2MASS1207-3932 B,2M1207B,2M1207 b,2M 1207 b,2MASS J12073346-3932539 b",TWA 27,-39.54833333333333,181.88944583333333,BD*
2MASS 1207,181.889,-39.548,"2MASS J12073346-3932539,2MASSW J1207334-393254,USNO-B1.0 0504-00258166,TWA 27,WDS J12076-3933AB,NAME 2M1207,** CVN 12,NAME 2M1207A,DENIS J120733.4-393254,WISE J120733.42-393254.2,2M 1207-39 b,2MASSW J1207334-393254 b,2MASS1207-3932 B,2M1207B,2M1207 b,2M 1207 b,2MASS J12073346-3932539 b",TWA 27B,-39.54844444444444,181.8895833333333,BD*
2MASS 1938+4603,294.636,46.066,,Kepler-451b,46.06642754353695,294.63588507183835,Pl?
2MASS 1938+4603,294.636,46.066,,2MASS J19383146+4603452,46.0625554007124,294.6311159937204,*
2MASS 1938+4603,294.636,46.066,,Kepler-451,46.0664275435369,294.6358850718383,HS*
30 Ari,39.241,24.648,HIP 12184,* 30 Ari Bb,24.648056290960557,39.24060370514417,Pl
30 Ari,39.241,24.648,HIP 12184,* 30 Ari B,24.648056290960557,39.24060370514417,**


In [106]:
local_table = table[['Host','ra','dec','alias']]
query="SELECT tc.*, db.main_id, db.dec as dec_2,db.ra as ra_2, db.otype_txt FROM basic AS db  \
JOIN TAP_UPLOAD.t1 AS tc ON 1=CONTAINS(POINT('ICRS',db.ra, db.dec),   CIRCLE('ICRS',tc.ra, tc.dec ,0.005))\
WHERE db.otype_txt != 'Pl' AND db.otype_txt != 'Pl?'"
        
service = pyvo.dal.TAPService(s)    
response = service.run_sync(query,uploads={"t1": local_table},timeout=None)
table=response.table
table





Host,ra,dec,alias,main_id,dec_2,ra_2,otype_txt
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,deg,deg,Unnamed: 7_level_1
object,float64,float64,object,object,float64,float64,object
1RXS1609,242.376,-21.083,,1RXS J160929.1-210524,-21.08304035577167,242.3762817466054,pr*
2MASS 1207,181.889,-39.548,"2MASS J12073346-3932539,2MASSW J1207334-393254,USNO-B1.0 0504-00258166,TWA 27,WDS J12076-3933AB,NAME 2M1207,** CVN 12,NAME 2M1207A,DENIS J120733.4-393254,WISE J120733.42-393254.2,2M 1207-39 b,2MASSW J1207334-393254 b,2MASS1207-3932 B,2M1207B,2M1207 b,2M 1207 b,2MASS J12073346-3932539 b",TWA 27,-39.54833333333333,181.88944583333333,BD*
2MASS 1207,181.889,-39.548,"2MASS J12073346-3932539,2MASSW J1207334-393254,USNO-B1.0 0504-00258166,TWA 27,WDS J12076-3933AB,NAME 2M1207,** CVN 12,NAME 2M1207A,DENIS J120733.4-393254,WISE J120733.42-393254.2,2M 1207-39 b,2MASSW J1207334-393254 b,2MASS1207-3932 B,2M1207B,2M1207 b,2M 1207 b,2MASS J12073346-3932539 b",TWA 27B,-39.54844444444444,181.8895833333333,BD*
2MASS 1938+4603,294.636,46.066,,2MASS J19383146+4603452,46.0625554007124,294.6311159937204,*
2MASS 1938+4603,294.636,46.066,,Kepler-451,46.0664275435369,294.6358850718383,HS*
30 Ari,39.241,24.648,HIP 12184,* 30 Ari B,24.648056290960557,39.24060370514417,**


<h2>Resultsets and Records</h2>

Resultsets contain primarily tabular data and might also provide binary datasets and/or access to additional data services.

To obtain the names of the columns in a service response, write:

In [107]:
print(sia_results.fieldnames)

('accref', 'mime', 'accsize', 'centerAlpha', 'centerDelta', 'imageTitle', 'instId', 'dateObs', 'nAxes', 'pixelSize', 'pixelScale', 'refFrame', 'wcs_equinox', 'wcs_projection', 'wcs_refPixel', 'wcs_refValues', 'wcs_cdmatrix', 'bandpassId', 'bandpassUnit', 'bandpassRefval', 'bandpassHi', 'bandpassLo', 'pixflags', 'coverage', 'object', 'datalink_url')


Rich metadata equivalent to what is found in VOTables (including unit, ucd, utype, and xtype) is available through the getdesc() method:



In [108]:
print(sia_results.getdesc('object').ucd)

meta.name


Iterating over a resultset gives the rows in the result:

In [109]:
print(scs_results.fieldnames)
for row in scs_results:
     print(row['hipno'])

('_r', 'hipno', 'raj2000', 'dej2000', 'pmra', 'pmde', 'err_ra', 'err_pmra', 'err_de', 'err_pmde', 'parallax', 'vrad', 'mv', 'km', 'kbin', 'kae', 'raLTP', 'deLTP', 'pmraLTP', 'pmdeLTP', 'raSTP', 'deSTP', 'pmraSTP', 'pmdeSTP', 'raHIP', 'deHIP', 'pmraHIP', 'pmdeHIP')
b'52558'
b'52562'
b'52806'
b'52827'
b'52488'


As with general numpy arrays, accessing individual columns via names gives an array of all of their values:

In [110]:
print(response.fieldnames)
column = response['main_id']
column

('Host', 'ra', 'dec', 'alias', 'main_id', 'dec_2', 'ra_2', 'otype_txt')


masked_array(data=[b'1RXS J160929.1-210524', b'TWA 27', b'TWA 27B',
                   b'2MASS J19383146+4603452', b'Kepler-451',
                   b'*  30 Ari B'],
             mask=[False, False, False, False, False, False],
       fill_value='?',
            dtype=object)

whereas integers retrieve the rows:

In [111]:
row = response[0]
row

(b'1RXS1609', 242.376, -21.083, b'', b'1RXS J160929.1-210524', -21.08304035577167, 242.3762817466054, b'pr*')

and both combined gives a single value:

In [112]:
value = response['main_id', 0]
value

b'1RXS J160929.1-210524'