In [None]:
# Load packages
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [6]:
df = pd.read_csv("data/hipparcos-voidmain.csv")
df.head()

Unnamed: 0,Catalog,HIP,Proxy,RAhms,DEdms,Vmag,VarFlag,r_Vmag,RAdeg,DEdeg,...,Survey,Chart,Notes,HD,BD,CoD,CPD,(V-I)red,SpType,r_SpType
0,H,1,,00 00 00.22,+01 05 20.4,9.1,,H,0.000912,1.089013,...,S,,,224700.0,B+00 5077,,,0.66,F5,S
1,H,2,,00 00 00.91,-19 29 55.8,9.27,,G,0.003797,-19.498837,...,,,,224690.0,B-20 6688,,,1.04,K3V,4
2,H,3,,00 00 01.20,+38 51 33.4,6.61,,G,0.005008,38.859286,...,S,,,224699.0,B+38 5108,,,0.0,B9,S
3,H,4,,00 00 02.01,-51 53 36.8,8.06,,H,0.008382,-51.893546,...,S,,,224707.0,,,P-52 12237,0.43,F0V,2
4,H,5,,00 00 02.39,-40 35 28.4,8.55,,H,0.009965,-40.591224,...,,,,224705.0,,C-41 15372,P-41 9991,0.95,G8III,2


In [7]:
print(df.columns)

Index(['Catalog', 'HIP', 'Proxy', 'RAhms', 'DEdms', 'Vmag', 'VarFlag',
       'r_Vmag', 'RAdeg', 'DEdeg', 'AstroRef', 'Plx', 'pmRA', 'pmDE',
       'e_RAdeg', 'e_DEdeg', 'e_Plx', 'e_pmRA', 'e_pmDE', 'DE:RA', 'Plx:RA',
       'Plx:DE', 'pmRA:RA', 'pmRA:DE', 'pmRA:Plx', 'pmDE:RA', 'pmDE:DE',
       'pmDE:Plx', 'pmDE:pmRA', 'F1', 'F2', '---', 'BTmag', 'e_BTmag', 'VTmag',
       'e_VTmag', 'm_BTmag', 'B-V', 'e_B-V', 'r_B-V', 'V-I', 'e_V-I', 'r_V-I',
       'CombMag', 'Hpmag', 'e_Hpmag', 'Hpscat', 'o_Hpmag', 'm_Hpmag', 'Hpmax',
       'HPmin', 'Period', 'HvarType', 'moreVar', 'morePhoto', 'CCDM', 'n_CCDM',
       'Nsys', 'Ncomp', 'MultFlag', 'Source', 'Qual', 'm_HIP', 'theta', 'rho',
       'e_rho', 'dHp', 'e_dHp', 'Survey', 'Chart', 'Notes', 'HD', 'BD', 'CoD',
       'CPD', '(V-I)red', 'SpType', 'r_SpType'],
      dtype='object')


# Important fields

https://heasarc.gsfc.nasa.gov/W3Browse/all/hipparcos.html

Most photometry/astronomy data sets include the magnitude information in each passband/filter for each object in the catalog. You can then use the individual magnitudes to calculate color indices to get additional information about the star's color. One thing about this photometry data set is that a lot of color indices are already calculated, so it eliminates the need to create additional features for this data.

Copied from the Hipparcos Catalog:

 Name
Name of the star in the recommended format for Hipparcos stars, as created by concatenating the prefix 'HIP ' and the Hip_Number identifier in the original catalog. Entries in the Hipparcos (HIP) Catalog have exactly the same identifier as in the Hipparcos Input Catalog (HIC), notice.

RA
Right ascension in the specified equinox for epoch J1991.25. This was given in the ICRS reference system (J2000 equator) in the original Hipparcos Catalog, and thus equinox 2000 should be specified to avoid inaccuracies due to the non-rigorous HEASARC coordinate precession algorithm. This parameter was given to a truncated precision of 0.01 seconds of time in the original Hipparcos Catalog. If the 'precise' RA is desired, one should use the value of the parameter RA_deg which contains the complete RA in decimal degrees.

Dec
Declination in specified equinox for epoch J1991.25. This was given in the ICRS reference system (J2000 equator) in the original Hipparcos Catalog, and thus equinox 2000 should be specified to avoid inaccuracies due to the non-rigorous HEASARC coordinate precession algorithm. This parameter was given to a truncated precision of 0.1 arcseconds in the original Hipparcos Catalog. If the 'precise' declination is desired, one should use the value of the parameter Dec_deg which contains the complete declination in decimal degrees.

LII
Galactic longitude.

BII
Galactic latitude.

HIP_Number
The Hipparcos Catalog running number, which is the same as the that in the Hipparcos Input Catalog. The star entries are, with a few exceptions, ordered by increasing HIP number, which basically follows the order of the object's right ascension (Equinox J2000) independent of declination.

Prox_10asec
A proximity flag which provides a coarse indication of the presence of nearby objects within 10 arcseconds of the position of the given star. If non-blank, it indicates that there are one or more distinct Hipparcos ('H') or Tycho ('T') Catalog entries; if both 'H' and 'T' apply, then 'H' is the adopted value, notice.

Vmag
The magnitude in Johnson V band, given to a precision of 0.01 magnitudes in the original Hipparcos Catalog.

Var_Flag
A coarse variability flag which indicates if the entry (or one of the components in the case of a multiple system) is variable in its Hipparcos magnitude Hip_mag at the level of:

       1: < 0.06mag ; 2: 0.06-0.6mag ; 3: >0.6mag
  

Vmag_Source
The source of the V magnitude:

       G:  ground-based multicolor photometry, either directly in or
           reduced to the Johnson UBV system
       H:  Hipparcos magnitude Hip_mag, combined with information on the
           color index (either V-I or BT_mag-VT_mag), in combination with
           the luminosoty class
       T:  Tycho photometry, i.e., VT_mag and BT_mag-VT_mag
        :  no data available
  

RA_Deg
The right ascension expressed in degrees for epoch J1991.25 (JD2448349.0625 (TT)) in the ICRS (International Celestial Reference System, consistent with J2000) reference system, and given to a precision of 10-8 degrees in the original Hipparcos Catalog. There are 263 cases where these fields are missing (no astrometric solution could be found).

Dec_Deg
The declination expressed in degrees for epoch J1991.25 (JD2448349.0625 (TT)) in the ICRS (International Celestial Reference System, consistent with J2000) reference system, and given to a precision of 10-8 degrees in the original Hipparcos Catalog. There are 263 cases where these fields are missing (no astrometric solution could be found)

Astrom_Ref_Dbl
Reference flag for astrometric parameters of double and multiple systems. This flag indicates that the astrometric parameters refer to:

    A, B etc: the letter indicates the specified component of a double
              or multiple system
           *: the photocentre of a double or multiple system included in
              Part C of the Double and Multiple Systems Annex
           +: the centre of mass: for such an entry, an orbit is given in
              Part O of the Double and Multiple Systems Annex
  

Parallax
The trigonometric parallax pi in units of milliarcseconds: thus to calculate the distance D in parsecs, D = 1000/pi. The estimated parallax is given for every star, even if it appears to be insignificant or negative.

PM_RA
The proper motion component in the RA direction expressed in milliarcseconds per Julian year (mas/yr), and given with respect to the ICRS reference system: mu_RA* = mu_RA x cos (declination).

PM_Dec
The proper motion component in the declination direction expressed in milliarcseconds per Julian year (mas/yr), and given with respect to the ICRS reference system.

RA_Error
The standard error in the Right Ascension given at the catalog epoch, J1991.25, and expressed in milliarcseconds: sigma_RA* = sigma_RA x cos (declination).

Dec_Error
The standard error in the declination given at the catalog epoch, J1991.25, and expressed in milliarcseconds.

Parallax_Error
The standard error in the parallax given in milliarcseconds.

PM_RA_Error
The standard error in the proper motion component in the RA direction expressed in milliarcseconds per Julian year (mas/yr): sigma_mu_RA* = sigma_mu_RA x cos (declination).

PM_Dec_Error
The standard error in the proper motion component in the declination direction expressed in milliarcseconds per Julian year (mas/yr), sigma_mu_declination.

Crl_Dec_RA
The correlation coefficient expressed as a real numerical value (in the printed catalog this is expressed in per cent, notice): (declination over RA).

Crl_Plx_RA
The correlation coefficient expressed as a real numerical value (in the printed catalog this is expressed in per cent, notice): (parallax over RA).

Crl_Plx_Dec
The correlation coefficient expressed as a real numerical value (in the printed catalog this is expressed in per cent, notice): (parallax over declination).

Crl_Pmra_RA
The correlation coefficient expressed as a real numerical value (in the printed catalog this is expressed in per cent, notice): (proper motion in RA over RA).

Crl_Pmra_Dec
The correlation coefficient expressed as a real numerical value (in the printed catalog this is expressed in per cent, notice): (proper motion in RA over declination).

Crl_Pmra_Plx
The correlation coefficient expressed as a real numerical value (in the printed catalog this is expressed in per cent, notice): (proper motion in RA over parallax).

Crl_Pmdec_RA
The correlation coefficient expressed as a real numerical value (in the printed catalog this is expressed in per cent, notice): (proper motion in declination over RA).

Crl_Pmdec_Dec
The correlation coefficient expressed as a real numerical value (in the printed catalog this is expressed in per cent, notice): (proper motion in declination over declination).

Crl_Pmdec_Plx
The correlation coefficient expressed as a real numerical value (in the printed catalog this is expressed in per cent, notice): (proper motion in declination over parallax).

Crl_Pmdec_Pmra
The correlation coefficient expressed as a real numerical value (in the printed catalog this is expressed in per cent, notice): (proper motion in declination over proper motion in RA).

Reject_Percent
The percentage of data that had to be rejected in order to obtain an acceptable solution.

Quality_Fit
The goodness-of-fit statistic: this number indicates the goodness of fit of the astrometric solution to the accepted data (i.e., excluding the rejected data). For good fits, this should approximately follow a normal distribution with zero mean value and unit standard deviation. Values exceeding, say +3, thus indicate a bad fit to the data.

BT_Mag
The mean magnitude in the Tycho photometric system, B_T.

BT_Mag_Error
The standard error of the B_T magnitude, BT_mag.

VT_Mag
The mean magnitude in the Tycho photometric system, V_T.

VT_Mag_Error
The standard error of the V_T magnitude, VT_mag.

BT_Mag_Ref_Dbl
a reference flag for BT_mag and VT_mag which indicates, for non-single stars, the component measured in Tycho photometry, or indicates that several components have been directly measured together by Tycho, or have had their Tycho data combined. The flag takes the following values:

   A, B, etc. : the Tycho photometry refers to the designated Hipparcos
                Catalog component
            * : the Tycho photometry refers to all components of the
                relevant Hipparcos entry
            - : the Tycho photometry refers to a single-pointing triple or
                quadruple system, for which only a close pair has been
                observed by Tycho, the other components being too faint
                to be detected by Tycho
  

BV_Color
The (B-V) color index in, or reduced to, the Johnson UBV system.

BV_Color_Error
The standard error of the (B-V) color index, BV_Color.

BV_Mag_Source
The source of the (B-V) color index, BV_Color:

            G: indicates that it was taken from ground-based observations
            T: indicates that it was determined from the transformed Tycho
               (B_T-V_T) data
             : indicates that no data are available
  

VI_Color
the (V-I) color index in Cousins' photometric system; it represents the best available (V-I) value at the time of the Hipparcos Catalog publication.

VI_Color_Error
The standard error in the (V-I) color index, VI_Color.

VI_Color_Source
The Source of the (V-I) color index, VI_Color (see Section 1.3, Appendix 5 of the published Hipparcos Catalog for full details):

       'A'        :for an observation of V-I in Cousins' system;
       'B' to 'K' :when V-I derived from measurements in other
                   bands/photoelectric systems
       'L' to 'P' :when V-I derived from Hipparcos and Star Mapper
                   photometry
       'Q'        :for long-period variables
       'R' to 'T' :when colours are unknown
  

Mag_Ref_Dbl
A reference flag for the (B-V) and (V-I) color indices and the V magnitude Vmag (and all their standard errors) which is set to '*' when they refer to the combined light of double or multiple systems which are otherwise resolved by the main mission astrometry and photometry.

HIP_Mag
The median magnitude H_P in the Hipparcos photometric system, and defined on the basis of the accepted observations (or field transits) for a given star. Note that the Hipparcos magnitude could not be determined for 14 stars.

HIP_Mag_Error
The standard error of the median magnitude H_P.

Scat_HIP_Mag
The scatter of the H_P observations.

N_Obs_HIP_Mag
The number of H_P observations: this is the number of photometric observations (or field transits) used for the construction of the median, standard error, and scatter in H_P.

HIP_Mag_Ref_Dbl
A reference flag for the Hipparcos photometric parameters. For a double or multiple entry, this flag indicates that the photometry refers to:

   A, B, etc. : the specified component of a double or multiple system
            * : combined photometry of a double system, corrected for
                attenuation by the detector's instantaneous field of view
                profile response
            - : combined photometry of a double system, NOT corrected for
                attenuation by the detector's instantaneous field of view
                profile response
  

HIP_Mag_Max
The observed magnitude at maximum luminosity. This is defined as the 5th percentile of the epoch photometry.

HIP_Mag_Min
The observed magnitude at minimum luminosity. This is defined as the 95th percentile of the epoch photometry.

Var_Period
The variability period, or a provisional estimate of such a period, derived on the basis of the Hipparcos data (possibly in combination with ground-based observations) and expressed in days, with a precision of 0.01 days.

HIP_Var_Type
The variability type: the sources of scatter in the photometric data are various, and this flag indicates the origin of the extra scatter, which may be astrophysical, or, in some cases, instrumental. See Section 1.3, Appendix 2 of the published Hipparcos Catalog for a more detailed description. Amongst astrophysical sources of variability, this parameter only distinguishes between 'M' (micro-variables), 'P' (periodic variables), and 'U' (unsolved variables). Further variability details for the periodic or unsolved variables are included in the Variability Annex. The flag takes the following values:

       C : no variability detected ("constant")
       D : duplicity-induced variability
       M : possibly micro-variable, with amplitude < 0.03 mag (stars
           classified with high confidence as micro-variable are flagged U)
       P : periodic variable
       R : the V-I colour index was revised during the variability analysis
       U : unsolved variable which does not fall in the other categories;
           this class also includes irregular or semi-regular variables,
           and possibly varaibles with amplitude > or ~ 0.03 mag
         : a blank indicates that the entry could not be classified as
           variable or constant with any degree of certainty
  

Var_Data_Annex
A Variability Annex flag indicating the existence of additional tabular data in the Variability Annex, where '1' means that additional data are provided in a table of periodic variables, and '2' means that additional data are provided in a table of 'unsolved' variables.

Var_Curv_Annex
A Variability Annex flag indicating the existence of a light curve, or a folded light curve, in the Variability Annex, where 'A' means the light curve is folded, and 'B' or 'C' mean that the light curve is NOT folded.

CCDM_ID
The Catalog of Components of Double and Multiple Stars (CCDM) identifier.

CCDM_History
The historical status of the CCDM identifier. The flag takes the following values:

       H : system determined as double or multiple by the Hipparcos
           observations, and was previously unknown as double or multiple
       I : system previously identified as multiple, as given in Annex 1
           of the Hipparcos Input Catalog (HIC)
       M : miscellaneous (system had been previously identified, after
           publication of the HIC, using other more recently available
           catalogs and compilations)
  

CCDM_N_Entries
The number of separate catalog entries with the same CCDM identifier.

CCDM_N_Comp
The number of components into which the entry was resolved as a result of the satellite observations and data reductions.

Dbl_Mult_Annex
The Double and Multiple Systems Annex flag. This indicates that further details of this system are given in one of the 5 (mutually exclusive) parts of the Double and Multiple Systems Annex labelled as follows:

       C : solutions for the components
       G : acceleration or higher order terms
       O : orbital solutions
       V : variability-induced movers (apparent motion arises from variability
           of one of the components of a double system)
       X : stochastic solution (probably astrometric binaries of short period)
  

Astrom_Mult_Source
A flag for the source of the absolute astrometry. This parameter qualifies the source of the astrometric parameters for some of the entries with a value of 'C' for the parameter Dbl_Mult_Annex. The values are as follows:

       P : primary target of a 2- or 3-pointing system
       F : secondary or tertiary of a 2- or 3-pointing 'fixed' system
           (common parallax and proper motions)
       I : secondary or tertiary of a 2- or 3-pointing 'independent'
           system (no constraints on parallax or proper motions)
       L : secondary or tertiary of a 2- or 3-pointing 'linear' system
           (common parallax)
       S : astrometric parameters from 'single-star merging' process.
  

Dbl_Soln_Qual
A solution quality flag which indicates the reliability of the double or multiple star solution, and is set for all entries in Part C of the Double and Multiple Systems Annex. The flags can be understood as follows:

       A: 'good', or reliable solution
       B: 'fair', or moderately reliable solution
       C: 'poor', or less reliable solution
       D: uncertain solution
       S: suspected non-single, i.e., possible double or multiple,
          although no significant or convincing non-single star solution
          was found
  

Dbl_Ref_ID
Component designation for the double star parameters, Dbl_theta, dbl_rho, etc. The first letter gives the 'reference' component, and the second letter gives the subsidiary component. In the case of the Hipparcos observations, the reference component is always defined to be the brighter component (in median H_P) such that the magnitude difference between the components (Diff_Hip_Mag) is always positive.

Dbl_Theta
The rounded value for the position angle between the components specified in the Dbl_Ref_id field, expressed in degrees (in the usual sense measured counterclockwise from North).

Dbl_Rho
The rounded value for the angular separation between the components specified in the Dbl_Ref_id field, expressed in arcseconds.

Rho_Error
The standard error of the angular separation, Dbl_Rho, given in arcseconds.

Diff_HIP_Mag
The Hipparcos magnitude difference of the components specified in the Dbl_Ref_id field, expressed in magnitudes.

Dhip_Mag_Error
The standard error of the Hipparcos magnitude difference, expressed in magnitudes.

Survey_Star
A flag indicating a `survey' star. The `survey' was the basic list of bright stars added to and merged with the total list of proposed stars, to provide a stellar sample (almost) complete to well-defined limits. A flag 'S' indicates that the entry is contained within this `survey', whose limiting magnitude is a function of the stars's spectral type and galactic latitude b and is defined by:

     V <= 7.9 + 1.1 x |sin b| for spectral types earlier or equal to G5
     V <= 7.3 + 1.1 x |sin b| for spectral types later than G5
  

If no spectral data were available, the break was taken at (B-V) = 0.8 mag.

ID_Chart
A flag indicating an identification chart. Where identification of a star using ground-based telescopes might prove difficult or ambiguous, identification chrats were constructed and are available in Volume 13 of the printed catalog. Charts correspond to the object observed by the satellite (i.e., at the posotion given in this catalog), even if it was not the intended target. The flag takes the following values: 'D' for charts produced directly from the STScI Digitized Sky Survey (776 entries) or 'G' for charts constructed from the Guide Star Catalog (10877 entries).

Notes
A flag indicating a note is given at the end of the volume(s) in the printed catalog. The flag has the following meaning:

       D : double and multiple systems note only (Volume 10)
       G : general note only (Volumes 5-9)
       P : photometric (including variability) notes only (Volume 11)
       W : D + P only
       X : D + G only
       Y : G + P only
       Z : D + G + P
  

HD_ID
HD/HDE/HDEC identifier (CDS Catalog <III 135>).

BD_ID
Bonner Durchmusterung (BD) identifier (CDS Catalogs <I 119>, <I 122>). BD identifiers, unlike the CoD and CPD identifiers, may carry a suffix letter for additional stars, i.e., stars with suffixes 'A', "B', 'P', or 'S': these stars were added to the BD Catalog after the original numbering was made, and such suffixes do not imply that the entry is a component of a double or multiple system.

CoD_ID
Cordoba Durchmusterung (CoD) identifier (CDS Catalog <I 114>).

CPD_ID
Cape Photographic Durchmusterung (CPD) identifier (CDS Catalog <I 108>).

VI_Color_Reduct
The (V-I) color index used for the photometric processing (not necessarily the same as the `final' value given in the parameter VI_mag).

Spect_Type
The MK or HD spectral type acquired from ground-based compilations and primarily taken from the Hipparcos Input Catalog, with some updates, especially for variable stars.

Spect_Type_Source
The source of the spectral type. The flag indicates the source as follows:

   1 : Michigan catalogue for the HD stars, vol. 1 (Houk+, 1975) <III/31>
   2 : Michigan catalogue for the HD stars, vol. 2 (Houk, 1978) <III/51>
   3 : Michigan Catalogue for the HD stars, vol. 3 (Houk, 1982) <III/80>
   4 : Michigan Catalogue for the HD stars, vol. 4 (Houk+, 1988) <III/133>
   G : updated after publication of the HIC <I/196>
   K : General Catalog of Variable Stars, 4th Ed. (Kholopov+ 1988) <II/139>
   S : SIMBAD database at http://cdsweb.u-strasbg.fr/Simbad.html
   X : Miscellaneous
     : A blank entry has no corresponding information.
  