### Stars of the Hipparcos Catalogue
#### Writing an efficient Python code for large data sets

Astronomers and astrophysicists work with many different data sets obtained through ground-based and space-based telescopes. The volume of these data sets can be so large that includes over a million gigabytes of information. 

The hip_sp.csv is a short version of the main Hipparcos catalog. The hip_sp.csv file contains 118218 rows and 11 columns, selected out of 78 columns from the main Hipparcos catalog. The selected columns are:

<ul>    
<li> Hip_No -- unique Hipparcos number </li>
<li> Alpha in (h,m,s) & Delta in (d,m,s)-- right ascension and declination represent the stellar coordinates </li>
<li> Vmag -- visual magnitude is a measure of the apparent stellar brightness </li>
<li> B-V and V-I -- color indexes indicate star's color </li>
<li> Plx -- trigonometric parallax in milli arcseconds </li>
<li> e_Plx -- standard error in Plx in milliarcseconds </li>
<li> Var_period -- a period (in days) for variable stars </li>
<li> Var_type -- type of variability </li>
<li> Spectral_type -- a spectral type of an object.  This quantity is a measure of the stellar temperature and color. </li>
</ul>

### Importing data 

In [1]:
import numpy as np
import pandas as pd

file = '../data/hip_sp.csv'

new_column_names = ['Hip_No', 'Alpha', 'Delta','Vmag', 'Plx', 'e_Plx', 'B-V', 'V-I', 'Var_period', 'Var_type','Spectral_type']
hip_sp1 = pd.read_csv(file, header = None, sep =',',
                usecols = [1,2,3,4,5,6,7,8,9,10,11],  
                names = new_column_names,
                low_memory = False)

col_list = ['Vmag', 'Plx', 'e_Plx', 'B-V', 'V-I', 'Var_period']

for  col in col_list:
  hip_sp1[col] = pd.to_numeric(hip_sp1[col],  errors = 'coerce')

hip_sp1.head(10)

Unnamed: 0,Hip_No,Alpha,Delta,Vmag,Plx,e_Plx,B-V,V-I,Var_period,Var_type,Spectral_type
0,1,00 00 00.22,+01 05 20.4,9.1,3.54,1.39,0.482,0.55,,,F5
1,2,00 00 00.91,-19 29 55.8,9.27,21.9,3.1,0.999,1.04,,C,K3V
2,3,00 00 01.20,+38 51 33.4,6.61,2.81,0.63,-0.019,0.0,,C,B9
3,4,00 00 02.01,-51 53 36.8,8.06,7.75,0.97,0.37,0.43,,,F0V
4,5,00 00 02.39,-40 35 28.4,8.55,2.87,1.11,0.902,0.9,,,G8III
5,6,00 00 04.35,+03 56 47.4,12.31,18.8,4.99,1.336,1.55,,,M0V:
6,7,00 00 05.41,+20 02 11.8,9.64,17.74,1.3,0.74,0.79,,C,G0
7,8,00 00 06.55,+25 53 11.3,9.05,5.17,1.95,1.102,3.92,327.5,P,M6e-M8.5e Tc
8,9,00 00 08.48,+36 35 09.4,8.59,4.81,0.99,1.067,1.03,,C,G5
9,10,00 00 08.70,-50 52 01.5,8.59,10.76,1.1,0.489,0.56,,,F6V


In [2]:
hip_sp1.info(verbose = True)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 118218 entries, 0 to 118217
Data columns (total 11 columns):
 #   Column         Non-Null Count   Dtype  
---  ------         --------------   -----  
 0   Hip_No         118218 non-null  int64  
 1   Alpha          118218 non-null  object 
 2   Delta          118218 non-null  object 
 3   Vmag           118217 non-null  float64
 4   Plx            117955 non-null  float64
 5   e_Plx          117955 non-null  float64
 6   B-V            116937 non-null  float64
 7   V-I            116943 non-null  float64
 8   Var_period     2541 non-null    float64
 9   Var_type       118218 non-null  object 
 10  Spectral_type  118218 non-null  object 
dtypes: float64(6), int64(1), object(4)
memory usage: 9.9+ MB
