# Using pandas accessors

## What are pandas accessors?

Pandas DataFrames provides a rich set of feautres that makes data manituplation and exploration easy.
However, because it is built for general purpose, it can lack common domain-specific operations.
It is tempting to subclass pandas.DataFrame in order to add these as methods but this is generally not recommended.
Instead, pandas provides an interface to add custom accessors.

Custom accessors to pandas DataFrame or Series provides a namespace under which you can define your own methods that manipulates your specific data. For gapipes, this is `g`.

When you import `gapipes`, it will register a custom accessor `g` for all DataFrames and Series.
Under this `g` namespace, we can populate our custom methods for e.g., creating an astropy coordinates object from the given table (DataFrame).

Let's quickly fetch some random Gaia data.

In [1]:
%matplotlib inline
import gapipes as gp

d = gp.gaia.query("""
select *
from gaiadr2.gaia_source
where
  1=contains(point('', ra, dec),
             circle('', 130.226, 19.665, 1))
  and parallax between 4.613 and 7.312
""")

In [24]:
print(f"{len(d)} rows, {len(d.columns)} columns")

651 rows, 96 columns


Under the namespace `g`, `.icrs` and `.galactic` properties will
create and return astropy's `ICRS` and `Galactic` coordinate objects.

In [29]:
icrs = d.g.icrs    # astropy.coordinates.ICRS
galactic = d.g.galactic
print(type(icrs), type(galactic))

<class 'astropy.coordinates.builtin_frames.icrs.ICRS'> <class 'astropy.coordinates.builtin_frames.galactic.Galactic'>


To make the covariance matrix for all sources within the dataframe, we can do

In [13]:
cov = d.g.make_cov()
print(cov.shape)

(651, 3, 3)


The same method is defined for pandas Series as well.

In [17]:
source = d.iloc[0]
cov = source.g.make_cov()
print(cov.shape)
print(cov)

(3, 3)
[[ 0.03407061  0.03978523 -0.01119716]
 [ 0.03978523  0.0693536  -0.01736009]
 [-0.01119716 -0.01736009  0.01528202]]


For a single source (pandas.Series), you can quickly look up the source position in Simbad with `.open_simbad()`, which will open your browser doing the Simbad position search.

In [None]:
source.g.open_simbad()

Here are all the methods for DataFrames and Series.

In [20]:
print('\n'.join(list(filter(lambda x: not x.startswith('_'), dir(d.g)))))

correct_brightsource_pm
distmod
galactic
icrs
make_cov
plot_xyz_icrs
vdec
vdec_error
vra
vra_error


In [23]:
print('\n'.join(list(filter(lambda x: not x.startswith('_'), dir(d.iloc[0].g)))))

distmod
icrs
make_cov
open_simbad


The accessor is attached to _any_ DataFrame or Series including slices of the original data, which can save a lot of time for exploratory data analysis.

In [33]:
subset = d.iloc[:5]
subset.g.icrs

<ICRS Coordinate: (ra, dec, distance) in (deg, deg, pc)
    [(130.12157181, 18.79627847, 181.87376145),
     (130.05526312, 18.72393993, 190.20501526),
     (130.13982678, 18.67443893, 188.26622276),
     (130.24423701, 18.67500043, 152.59152456),
     (130.06723764, 18.79161546, 209.60959948)]
 (pm_ra_cosdec, pm_dec, radial_velocity) in (mas / yr, mas / yr, km / s)
    [(-60.7075949 , -52.76753264,         nan),
     (-35.80150307, -12.40964744,         nan),
     (-37.06980874, -11.88689953, 29.54114051),
     (-33.0232434 ,  -5.58953023, 36.23858577),
     (-22.05408719, -19.82573781,  9.80905085)]>

Check out each method/property docstring or API documentation for details.