# Harley Wood School for Astronomy 2019 

<img src="https://research.smp.uq.edu.au/asa2019/static/asa19/img/HWSA2019-logo.png" width=300>

## Part I - Good Code Etiquette or how to make your code more effective and efficient

In this part of the workshop we will look at an example code to reproduce HR diagrams using Gaia data.

<img src="https://www.cosmos.esa.int/documents/29201/1666086/kine_all.png/8b9de0b4-8eb1-ad73-0922-9bf323687f6e?t=1524224828914" width=400>

The above Gaia Hertzsprung-Russell diagrams, Gaia absolute magnitude versus GBP-GRP colour, are a function of the stars tangential velocity (VT), using Gaia DR2 with relative parallax uncertainty better than 10% and low extinction stars (E(B-V)<0.015), together with astrometric and photometric quality filters. The colour scale represents the square root of the density


## Table of Contents

1. [Downloading the data](#Downloading-the-data)
2. [Cleaning the data](#Cleaning-the-data)
3. [Plotting the HR diagram](#Plotting-the-HR-diagram)


### Required libraries

This notebook uses several Python packages that come standard with the [Anaconda Python distribution](http://continuum.io/downloads). The primary libraries that we'll be using are:

* **astropy**
* **astroquery**
* **numpy**
* **pandas**
* **matplotlib**
* **seaborn**

To make sure you have all of the packages you need, install them with `conda`:

    conda install [package name]
    conda install -c astropy astroquery
    
`conda` may ask you to update some of the packages if you don't have the most recent version. Allow it to do so.

Alternatively, if you can install the packages with [pip](https://pip.pypa.io/en/stable/installing/) (a Python package manager):

    pip install [package name]
    
Be sure to restart your kernel if you had to install new packages.

# Downloading the data

We can download data from Gaia using the astroquery library, specifically we are using the Table Access Protocol (TAP) specified by the International Virtual Observatory Alliance.

[TAP astroquery docs](https://astroquery.readthedocs.io/en/latest/utils/tap.html)

[Gaia Tap examples](https://gaia.aip.de/cms/documentation/tap-interface/)

In [1]:
#List available tables
from astroquery.utils.tap.core import TapPlus

gaia = TapPlus(url="https://gaia.aip.de/tap")
tables = gaia.load_tables()
for table in (tables):
    print(table.get_qualified_name())



Created TAP+ (v1.0.1) - Connection:
	Host: gaia.aip.de
	Use HTTPS: True
	Port: 443
	SSL Port: 443
Retrieving tables...
Parsing tables...
Done.
gdr2.gaia_source
gdr2.sso_observation
gdr2.sso_source
gdr2.vari_cepheid
gdr2.vari_classifier_class_definition
gdr2.vari_classifier_definition
gdr2.vari_classifier_result
gdr2.vari_long_period_variable
gdr2.vari_rotation_modulation
gdr2.vari_rrlyrae
gdr2.vari_short_timescale
gdr2.vari_time_series_statistics
gdr2.epoch_photometry
gdr2.dr1_neighbourhood
gdr2.allwise_best_neighbour
gdr2.allwise_neighbourhood
gdr2.apassdr9_best_neighbour
gdr2.apassdr9_neighbourhood
gdr2.gsc23_best_neighbour
gdr2.gsc23_neighbourhood
gdr2.hipparcos2_best_neighbour
gdr2.hipparcos2_neighbourhood
gdr2.panstarrs1_best_neighbour
gdr2.panstarrs1_neighbourhood
gdr2.ppmxl_best_neighbour
gdr2.ppmxl_neighbourhood
gdr2.ravedr5_best_neighbour
gdr2.ravedr5_neighbourhood
gdr2.sdssdr9_best_neighbour
gdr2.sdssdr9_neighbourhood
gdr2.tmass_best_neighbour
gdr2.tmass_neighbourhood
gdr2.ty

In [4]:
# Load DR2 source table and check columns
from astroquery.utils.tap.core import TapPlus

gaia = TapPlus(url="http://gea.esac.esa.int/tap-server/tap")
table = gaia.load_table('gaiadr2.gaia_source')
print("Number of columns = {}".format(len(table.columns)))
for column in (table.columns):
    print(column.name)

Created TAP+ (v1.0.1) - Connection:
	Host: gea.esac.esa.int
	Use HTTPS: False
	Port: 80
	SSL Port: 443
Retrieving table 'gaiadr2.gaia_source'
Parsing table 'gaiadr2.gaia_source'...
Done.
Number of columns = 96
solution_id
designation
source_id
random_index
ref_epoch
ra
ra_error
dec
dec_error
parallax
parallax_error
parallax_over_error
pmra
pmra_error
pmdec
pmdec_error
ra_dec_corr
ra_parallax_corr
ra_pmra_corr
ra_pmdec_corr
dec_parallax_corr
dec_pmra_corr
dec_pmdec_corr
parallax_pmra_corr
parallax_pmdec_corr
pmra_pmdec_corr
astrometric_n_obs_al
astrometric_n_obs_ac
astrometric_n_good_obs_al
astrometric_n_bad_obs_al
astrometric_gof_al
astrometric_chi2_al
astrometric_excess_noise
astrometric_excess_noise_sig
astrometric_params_solved
astrometric_primary_flag
astrometric_weight_al
astrometric_pseudo_colour
astrometric_pseudo_colour_error
mean_varpi_factor_al
astrometric_matched_observations
visibility_periods_used
astrometric_sigma5d_max
frame_rotator_object_type
matched_observations
dupli

**WARNING** This query takes a long time. Please load the data from the file given to you by the instructors

In [None]:
# # Download gaia dr 2 source table, save to disk
# gaia = TapPlus(url="http://gea.esac.esa.int/tap-server/tap")
# job = gaia.launch_job_async("select * from gaiadr2.gaia_source order by source_id", dump_to_file=True)
# print(job)

In [None]:
# # return result of query 
# t = job.get_results()

# Cleaning the data

The aim of this section is to make sure we have useful data, ie:
- remove NaN
- calculate the absolute magnitude
- subset into different location or velocity bins, we are interestedn in:

|Name        |Type        |UCD         |Unit        |Description |
|------------|------------|------------|------------|------------|
|bp_rp 	|float 	|phot.color 	|Magnitude[mag] 	|BP - RP colour|
|bp_g 	|float 	|phot.color 	|Magnitude[mag] 	|BP - G colour|
|g_rp 	|float 	|phot.color 	|Magnitude[mag] 	|G - RP colour|
|radial_velocity 	|double 	|spect.dopplerVeloc.opt 	|Velocity[km/s] 	|Radial velocity |

In [None]:
%%timeit
#read data 
from astropy.io.votable import parse_single_table
table = parse_single_table("async_20190630210155.vot")

t = table.to_table(use_names_over_ids=True)


In [None]:
%%timeit
#read data 
from astropy.io.votable import parse_single_table
columns = ['phot_g_mean_mag', 'parallax']
table = parse_single_table("async_20190630210155.vot", columns=columns)
print("Done reading table")
#t = table.to_table(use_names_over_ids=True)


In [None]:
%%timeit
df = t.to_pandas()

#check the data frame
df.head()

In [None]:
%%timeit
#convert to pandas df and calculate absolute mag
import numpy as np
from math import *

df['mg'] = 0
df['dist'] = 0

for c, v in enumerate(df['phot_g_mean_mag']):
    
    p =df.loc[c,'parallax']
    if p>0:
        df.loc[c,'mg'] = v + 5 * log10(p) - 10
        df.loc[c,'dist'] = 1000/p
    else:
        df.loc[c,'mg'] = np.nan
        df.loc[c,'dist'] = np.nan



In [None]:
#convert to pandas df and calculate absolute mag
def second_attempt_at_abs_mag_and_dist(df):
    import pandas as pd
    import numpy as np

    df['mg2'] = 0
    df['dist2'] = 0

    for c, v in enumerate(df['phot_g_mean_mag']):

        p =df.loc[c,'parallax']
        if p>0:
            df.loc[c,'mg2'] = v + 5 * log10(p) - 10
            df.loc[c,'dist2'] = 1000/p
        else:
            df.loc[c,'mg2'] = np.nan
            df.loc[c,'dist2'] = np.nan

#%timeit second_attempt_at_abs_mag_and_dist(df)

In [None]:
#convert to pandas df and calculate absolute mag
def third_attempt_at_abs_mag_and_dist(df):
    import pandas as pd
    import numpy as np
    import math

    apparent_mags = list(df['phot_g_mean_mag'].values())
    parallax = list(df['parallax'].values())
    abs_mags = [mag + 5*math.log10(dist) - 10 if dist > 0 else np.nan for mag, dist in zip(apparent_mags, parallax)]
    dists = [1000.0/d if d > 0 else np.nan for d in parallax ]
    
    df['mg3'] = abs_mags
    df['dist3'] = dists

%timeit third_attempt_at_abs_mag_and_dist(df)

In [None]:
#convert to pandas df and calculate absolute mag
def fourth_attempt_at_abs_mag_and_dist(df):
    import pandas as pd
    import numpy as np
    import math

    apparent_mags = df['phot_g_mean_mag'].to_numpy()
    parallax = df['parallax'].to_numpy()
    abs_mags = [mag + 5*np.log10(dist) - 10 if dist > 0 else np.nan for mag, dist in zip(apparent_mags, parallax)]
    dists = [1000.0/d if d > 0 else np.nan for d in parallax ]
    
    df['mg4'] = abs_mags
    df['dist4'] = dists

%timeit fourth_attempt_at_abs_mag_and_dist(df)

In [None]:
#convert to pandas df and calculate absolute mag
def fifth_attempt_at_abs_mag_and_dist(df):
    import pandas as pd
    import numpy as np
    import math

    apparent_mags = df['phot_g_mean_mag'].to_numpy()
    parallax = df['parallax'].to_numpy()
    abs_mags = apparent_mags + 5.0*np.log10(parallax) - 10
    dist = 1000.0/parallax

    bad_inds = (~np.isfinite(parallax) | (parallax <= 0))
    abs_mags[bad_inds] = np.nan
    dist[bad_inds] = np.nan
  
    df['mg5'] = abs_mags
    df['dist5'] = dist

%timeit fifth_attempt_at_abs_mag_and_dist(df)

In [None]:
#convert to pandas df and calculate absolute mag
def sixth_attempt_at_abs_mag_and_dist(df):
    import numpy as np

    apparent_mags = df['phot_g_mean_mag'].to_numpy()
    parallax = df['parallax'].to_numpy()
    
    abs_mags = np.full_like(apparent_mags, np.nan)
    dist = np.full_like(parallax, np.nan)

    good_inds = (np.isfinite(parallax) & (parallax > 0))
    abs_mags[good_inds] = apparent_mags[good_inds] + 5.0*np.log10(parallax[good_inds]) - 10
    dist[good_inds] = 1000.0/parallax[good_inds]

    df['mg6'] = abs_mags
    df['dist6'] = dist

%timeit sixth_attempt_at_abs_mag_and_dist(df)

In [None]:
def add_abs_mag_and_distance(df):
    import numpy as np
    df['optim_abs_mag'] = df['phot_g_mean_mag'] + 5*np.log10(df['parallax']) - 10
    df['optim_dist'] = 1000.0/df['parallax']
       
%timeit  add_abs_mag_and_distance(df)

In [None]:
%load_ext line_profiler

In [None]:
df.shape

In [6]:
import astropy
astropy.__version__

'3.2.1'