# Exploring Hipparcos catalog using Python and Pandas


This notebook demonstrates how Python and Pandas can be used to explore and analyse data presented in the form of a catalog. 

The major Hipparcos catalog is a result of a three and half year long Hipparcos satellite mission, and it was released in 1997. Although the main catalog, with over 110 000 stars is small, compared to the final database of 2 million objects, it is still one of the first modern sources of the high-precision measurements of the stellar positions on the sky. It provides stellar space coordinates with high accuracy to the thousandths decimal place.

In this notebook, we will explore some of the basic features of the main Hipparcos catalog. All data files and the documentation explaining table columns, (Readme.txt), can be imported from https://www.cosmos.esa.int/web/hipparcos/catalogues. We have combined data for the entire sky in one file, named hip_main.dat. 

The hip_main.dat contains 118218 rows and 77 columns. We will select 9 columns from the file and import them in a data frame. The selected columns are:
            Hip_No -- unique Hipparcos number 
            Alpha & Delta -- right ascension and declination represent the stellar coordinates 
            Vmag -- visual magnitude as a measure of the apparent stellar brightness
            B-V and V-I -- color indexes indicate star's color
            Var_period and Var_type -- represent parameters for the variable stars, a period in days                                     and the type of variability         
            Spectral_type -- a spectral type of an object represent stellar temperature and color.  

1. Importing Python libraries

In [11]:
import numpy as np
import pandas as pd

2. Reading input data file

To read the main Hipparcos catalog file hip_main.dat, we will use the panda's function read_csv and specify what columns to read and the column names as a python list.  

In [12]:
#file path and the file name
file ='/Users/Ljiljana/Documents/Projects/HipparcosProject/hip_main.dat'

#list of the column names
new_column_names=['Hip_No', 'Alpha', 'Delta', 'Vmag', 'B-V', 'V-I', 'Var_period', 'Var_type', 'Spectral_type']
Hip= pd.read_csv(file, header=None, sep='|',
                usecols=[1,3,4,5,37,40,51,52,76],
                names=new_column_names)

#printing the first lines of the hip_main.dat
print(Hip.head(5))

   Hip_No        Alpha        Delta  Vmag     B-V   V-I Var_period Var_type  \
0       1  00 00 00.22  +01 05 20.4   9.1   0.482  0.55                       
1       2  00 00 00.91  -19 29 55.8  9.27   0.999  1.04                   C   
2       3  00 00 01.20  +38 51 33.4  6.61  -0.019  0.00                   C   
3       4  00 00 02.01  -51 53 36.8  8.06   0.370  0.43                       
4       5  00 00 02.39  -40 35 28.4  8.55   0.902  0.90                       

  Spectral_type  
0  F5            
1  K3V           
2  B9            
3  F0V           
4  G8III         


3. Changing data types

To perform numerical manipulations with the data frame, we will change the data type for the numerical columns using pd.to_numeric(). By default, the columns are of the object (string) type. The info shows that the Hip data frame has 118218 rows and 9 columns. We can also notice that some columns have missing values.

In [13]:
#changing data types, with errors='coerce' invalid parsing is set to NaN
Hip['Vmag'] = pd.to_numeric(Hip['Vmag'],  errors='coerce')
Hip['B-V'] = pd.to_numeric(Hip['B-V'],  errors='coerce')
Hip['V-I'] = pd.to_numeric(Hip['V-I'],  errors='coerce')
Hip['Var_period'] = pd.to_numeric(Hip['Var_period'],  errors='coerce')

#printing data frame infos
print(Hip.info(verbose=True))

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 118218 entries, 0 to 118217
Data columns (total 9 columns):
Hip_No           118218 non-null int64
Alpha            118218 non-null object
Delta            118218 non-null object
Vmag             118217 non-null float64
B-V              116937 non-null float64
V-I              116943 non-null float64
Var_period       2541 non-null float64
Var_type         118218 non-null object
Spectral_type    118218 non-null object
dtypes: float64(4), int64(1), object(4)
memory usage: 8.1+ MB
None


4. Finding the brightest and the faintest star in the catalog

The brightest star in the sky is Sirius (Vmag=-1.4). After examining data frame, we can notice that Hipparcos observed and collected some data on Sirius!

In [14]:
#Sorting 'Vmag' column in descending and ascending orders
print(Hip.sort_values(by='Vmag')[0:1])   #brightest star
print(Hip.sort_values(by='Vmag', ascending=False)[0:1])  #faintest star

       Hip_No        Alpha        Delta  Vmag    B-V   V-I  Var_period  \
32324   32349  06 45 09.25  -16 42 47.3 -1.44  0.009 -0.02         NaN   

      Var_type Spectral_type  
32324        U  A0m...        
       Hip_No        Alpha        Delta   Vmag  B-V  V-I  Var_period Var_type  \
70015   70079  14 20 28.21  -44 31 56.3  14.08  NaN  NaN         NaN            

      Spectral_type  
70015                


5. Calculating the number of variable stars in the catalog

After exploring the Hip_main.dat file and Readme.txt, we can see  that most stars do not have values in the variability type and period fields. Some stars are marked with the letter 'P' indicating that they are periodic variables (changing their brightness in regular intervals) or letters 'M' and 'U' pointing to the other types of variability. 

To explore variable stars,  we will group and filter the Var_type and Var_period columns. 

In [15]:
#grouping variable stars according to their type
var_groups=Hip.groupby('Var_type')['Hip_No'].count()
print(var_groups)

#total number of stars in the catalog
star_number=Hip['Hip_No'].value_counts(dropna=False).sum()

#counting the number of periodic & other types of variables
var=Hip[(Hip['Var_type'] =='P') | (Hip['Var_type'] == 'U') | (Hip['Var_type'] == 'M') ].count()

#counting ratio
ratio=round(var.Hip_No/star_number*100, 1)

print('Number of variable stars:',var[0], '...',ratio, '%')

Var_type
     46596
C    46552
D    12361
M     1045
P     2708
R     1172
U     7784
Name: Hip_No, dtype: int64
Number of variable stars: 11537 ... 9.8 %


6. Finding the number of bluish stars in the catalog

Stars with the negative colour index B-V are hot, blue stars.

In [7]:
#Filtering 'B-V' column
blue_number=(Hip[(Hip['B-V'] <0) &(Hip['B-V'] > -0.5)]).count()
star_number=Hip['B-V'].value_counts(dropna=False).sum()
ratio = round(blue_number.Hip_No/star_number*100, 1)

print('Number of blue stars:', blue_number.Hip_No, '...',ratio, '%')

Number of blue stars: 6894 ... 5.8 %


7. A function for sorting stars by Delta angle 


In [16]:
#input parameters: NumPy array of Delta values  & number of stars
def sort_stars(Dec_arr, No_stars):
    north_stars=0
    south_stars=0
    for i in range(0,No_stars):
       ar_list=Dec_arr[i].split(' ')
       new_ar=[float(i) for i in ar_list]
       dec_deg= round((new_ar[2]/60+new_ar[1])/60+new_ar[0], 2)
       if dec_deg >= 0:
          north_stars=north_stars+1
       else:
          south_stars=south_stars+1
    return (north_stars, south_stars)

8. Calculating the number of the northern and southern stars in the catalog

Some stars can be observed from the northern or the southern hemisphere. To find this, we grouped stars into two groups according to their Delta coordinates (expressed in angular units). Northern stars have Delta between 0 and +90 degrees, while southern stars have Delta between 0 and -90 degrees. 

In [17]:
#creating a NumPy array of 'Delta' objects
delta_arr=Hip.Delta 
print(delta_arr.head(5))

#counting the total number of stars
total_stars=delta_arr.count()   

north_stars,south_stars=sort_stars(delta_arr,total_stars)

0    +01 05 20.4
1    -19 29 55.8
2    +38 51 33.4
3    -51 53 36.8
4    -40 35 28.4
Name: Delta, dtype: object


9. Printing the results

These are some of the interesting features of the Hipparcos catalog...

In [122]:
print("Total number of stars in the Hipparcos catalog:", total_stars)
print("Number of the northern stars:", north_stars,'...',round(north_stars/total_stars*100, 2), '%')
print("Number of the southern stars:", south_stars, '...',round(south_stars/total_stars*100, 2),'%')


Total number of stars in the Hipparcos catalog: 118218
Number of the northern stars: 58251 ... 49.27 %
Number of the southern stars: 59967 ... 50.73 %
