In [47]:
import pandas as pd
import numpy as np
import matplotlib as plt
%matplotlib inline

## Read in the data

In [59]:
hyg = pd.read_csv("hygdata_v3.csv")

#### Identify the column names
Some of these column names are ambiguous, and some we wont use.  Let's first rename them so we know what we're talking about!

In [60]:
# List out the column names
print(list(hyg))

['id', 'hip', 'hd', 'hr', 'gl', 'bf', 'proper', 'ra', 'dec', 'dist', 'pmra', 'pmdec', 'rv', 'mag', 'absmag', 'spect', 'ci', 'x', 'y', 'z', 'vx', 'vy', 'vz', 'rarad', 'decrad', 'pmrarad', 'pmdecrad', 'bayer', 'flam', 'con', 'comp', 'comp_primary', 'base', 'lum', 'var', 'var_min', 'var_max']


In [61]:
# Make a new list with more intuitive names
new_names = ["ID", "HipparcosID", "HenryDraperID", "HarvardRevisedID", "GlieseID", "BayerFlamsteed", "ProperName", 
             "RA", "Dec", "Distance", "ProperMotion(RA)", "ProperMotion(Dec)", "RadialVelocity", "Magnitude",
             "AbsoluteMagnitude", "SpectralType", "ColorIndex", "X", "Y", "Z", "Vx", "Vy", "Vz", "RA(radians)",
             "Dec(radians)", "ProperMotionRA(radians)", "ProperMotionDec(radians)", "BayerDesignation",
             "FlamsteedNumber", "Constellation", "CompanionID", "PrimaryCompanion", "BaseName", "Luminosity",
             "VariableStarID", "VariableMinMagnitude", "VariableMaxMagnitude"]
hyg.columns = new_names

#### Understanding the column names
So what does it all mean?  Not everyone has an undergraduate degree in astrophysics you know!  Here is a description of the column attributes formatted from [astronexus' GitHub repository](https://github.com/astronexus/HYG-Database):

- **ID:** The database primary key.  
- **HipparcosID:** The star's ID in the Hipparcos catalog, if known.  
- **HenryDraperID:** The star's ID in the Henry Draper catalog, if known.  
- **HarvardRevisedID:** The star's ID in the Harvard Revised catalog, which is the same as its number in the Yale Bright Star Catalog.  
- **GlieseID:** The star's ID in the third edition of the Gliese Catalog of Nearby Stars.  
- **BayerFlamsteed:** The Bayer / Flamsteed designation, primarily from the Fifth Edition of the Yale Bright Star Catalog. This is a combination of the Bayer designation and Flamsteed number.  The Flamsteed number, if present, is given first; then a three-letter abbreviation for the Bayer Greek letter; the Bayer superscript number, if present; and finally, the three-letter constellation abbreviation. Thus Alpha Andromedae has the field value "21Alp And", and Kappa1 Sculptoris (no Flamsteed number) has "Kap1Scl".    
- **ProperName:** A common name for the star, such as "Barnard's Star" or "Sirius" taken primarily from the Hipparcos project's web site, which lists representative names for the 150 brightest stars and many of the 150 closest stars. A few names have been added to this list from catalogs mostly now forgotten (e.g., Lalande, Groombridge, and Gould ["G."]) except for certain nearby stars which are still best known by these designations.  
- **RA, Dec:** The star's right ascension and declination, for epoch and equinox 2000.0.
- **Distance:** The star's distance in parsecs, the most common unit in astrometry. To convert parsecs to light years, multiply by 3.262. A value >= 100000 indicates missing or dubious (e.g., negative) parallax data in Hipparcos.  
- **ProperMotion(RA), ProperMotion(Dec):** The star's proper motion in right ascension and declination, in milliarcseconds per year.  
- **RadialVelocity:** The star's radial velocity in km/sec, where known.  
- **Magnitude:** The star's apparent visual magnitude.  
- **AbsoluteMagnitude:** The star's absolute visual magnitude (its apparent magnitude from a distance of 10 parsecs).  
- **SpectralType:** The star's spectral type, if known.  
- **ColorIndex:** The star's color index (blue magnitude - visual magnitude), where known.  
- **X, Y, Z:** The Cartesian coordinates of the star, in a system based on the equatorial coordinates as seen from Earth. +X is in the direction of the vernal equinox (at epoch 2000), +Z towards the north celestial pole, and +Y in the direction of R.A. 6 hours, declination 0 degrees.  
- **Vx, Vy, Vz:** The Cartesian velocity components of the star, in the same coordinate system described immediately above. They are determined from the proper motion and the radial velocity (when known). The velocity unit is parsecs per year.  These are small values (around 1 millionth of a parsec per year)!.  
- **RA(radians), Dec(radians), ProperMotionRA(radians), ProperMotionDec(radians):** The positions in radians, and proper motions in radians per year.  
- **BayerDesignation:** The Bayer designation as a distinct value.  
- **FlamsteedNumber:** The Flamsteed number as a distinct value.  
- **Constellation:** The standard constellation abbreviation.  We should create a table that spells out the full constellation name, number of stars within, and location in the sky that is mapped to this abbreviation.
- **CompanionID, PrimaryCompanion, BaseName:** Identifies a star in a multiple star system. CompanionID = ID of companion star, PrimaryCompanion = ID of primary star for this component, and BaseName = catalog ID or name for this multi-star system. Currently only used for Gliese stars.  
- **Luminosity:** Star's luminosity as a multiple of Solar luminosity.  
- **VariableStarID:** Star's standard variable star designation, when known.  
- **VariableMinMagnitude, VariableMaxMagnitude:** Star's approximate magnitude range, for variables. This value is based on the Hp magnitudes for the range in the original Hipparcos catalog, adjusted to the V magnitude scale to match the "Magnitude" field.  

#### Remove some data
There's a bunch of observations near the end that don't have a HipparcosID or very limited data overall.  Let's remove anything where there's no HipparcosID except for the first obervation (for the sun!).

In [62]:
hipparcos_null = np.where(hyg["HipparcosID"].isnull()==True)
hipparcos_null = list(hipparcos_null[0][1:])  # Observation 0 is the sun, that's why we start at 1.
hyg = hyg.drop(hipparcos_null, axis = 0)

#### Export a new .csv
Now that everything is much more readable, let's write a new .csv to work with

In [66]:
hyg.to_csv("hygClean.csv")

PermissionError: [Errno 13] Permission denied