# Exploring Hipparcos catalog using Python and Pandas


In this notebook, we will explore some of the basic features of the main Hipparcos catalog such as: the number of stars observed, range of stellar magnitudes, type of stars, number of variable stars etc.. 

All data files and the documentation explaining table columns, (Readme.txt), can be imported from <a  href="https://www.cosmos.esa.int/web/hipparcos/catalogues"> Esa website </a>. We have combined data for the entire sky in one file, named hip_main.dat.

The <b>hip_main.dat</b> contains 118218 rows and 78 columns. We will select 9 columns from the file and import them into a data frame. The selected columns are: 
<ul>
      <li> Hip_No -- unique Hipparcos number </li>
      <li> Alpha & Delta -- right ascension and declination represent the stellar coordinates </li>
      <li> Vmag -- visual magnitude as a measure of the apparent stellar brightness </li> 
      <li> B-V and V-I -- color indexes indicate star's color </li>
      <li> Var_period -- a period (in days) for variable stars </li>
      <li> Var_type -- type of variability </li>
      <li> Spectral_type -- a spectral type of an object represent stellar temperature and color. </li>
</ul>

### 1. Importing Python libraries

In [1]:
import numpy as np
import pandas as pd

### 2. Reading input data file

To read the main Hipparcos catalog file hip_main.dat, we will use the panda's function read_csv and specify what columns to read and the column names as a python list.

In [2]:
#file path and the file name
file ='data/hip_main.dat'

#list of the column names
new_column_names=['Hip_No', 'Alpha', 'Delta', 'Vmag', 'B-V', 'V-I', 'Var_period', 'Var_type', 'Spectral_type']
Hip= pd.read_csv(file, header=None, sep='|',
                usecols=[1,3,4,5,37,40,51,52,76],
                names=new_column_names, low_memory=False)

#printing the first lines of the hip_main.dat
Hip.head(5)

Unnamed: 0,Hip_No,Alpha,Delta,Vmag,B-V,V-I,Var_period,Var_type,Spectral_type
0,1,00 00 00.22,+01 05 20.4,9.1,0.482,0.55,,,F5
1,2,00 00 00.91,-19 29 55.8,9.27,0.999,1.04,,C,K3V
2,3,00 00 01.20,+38 51 33.4,6.61,-0.019,0.0,,C,B9
3,4,00 00 02.01,-51 53 36.8,8.06,0.37,0.43,,,F0V
4,5,00 00 02.39,-40 35 28.4,8.55,0.902,0.9,,,G8III


### 3. Changing data types

To perform numerical manipulations with the data frame, we will change the data type for the numerical columns using pd.to_numeric(). By default, the columns are of the object (string) type. The info shows that the Hip data frame has 118218 rows and 9 columns. We can also notice that some columns have missing values.

In [3]:
#changing data types, with errors='coerce' invalid parsing is set to NaN
col_list=['Vmag', 'B-V', 'V-I', 'Var_period']

for  col in col_list:
    Hip[col]= pd.to_numeric(Hip[col],  errors='coerce')
    
#printing data frame info
Hip.info(verbose=True)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 118218 entries, 0 to 118217
Data columns (total 9 columns):
Hip_No           118218 non-null int64
Alpha            118218 non-null object
Delta            118218 non-null object
Vmag             118217 non-null float64
B-V              116937 non-null float64
V-I              116943 non-null float64
Var_period       2541 non-null float64
Var_type         118218 non-null object
Spectral_type    118218 non-null object
dtypes: float64(4), int64(1), object(4)
memory usage: 8.1+ MB


### 4. Finding the brightest and the faintest star in the catalog

The brightest star in the sky is Sirius (Vmag=-1.4). After examining data frame, we can notice that Hipparcos observed and collected some data about Sirius!

In [4]:
#the brightest observed star with the Hipparcos
max_magnitude=Hip.nsmallest(1,'Vmag')
display(max_magnitude)

#the faintest observed star with the Hipparcos
min_magnitude=Hip.nlargest(1,'Vmag')
display(min_magnitude)

Unnamed: 0,Hip_No,Alpha,Delta,Vmag,B-V,V-I,Var_period,Var_type,Spectral_type
32324,32349,06 45 09.25,-16 42 47.3,-1.44,0.009,-0.02,,U,A0m...


Unnamed: 0,Hip_No,Alpha,Delta,Vmag,B-V,V-I,Var_period,Var_type,Spectral_type
70015,70079,14 20 28.21,-44 31 56.3,14.08,,,,,


In [5]:
#Range of stellar magnitudes observed with Hipparcos:
print('Vmag:', max_magnitude.iloc[0]['Vmag'], ' to ', min_magnitude.iloc[0]['Vmag'])

Vmag: -1.44  to  14.08


### 5. Calculating the number of variable stars in the catalog

After exploring the Hip_main.dat file and Readme.txt, we can see  that most stars do not have values in the variability type and period fields. Some stars are marked with the letter 'P' indicating that they are periodic variables (changing their brightness in regular intervals) or letters 'M' and 'U' pointing to the other types of variability. 

To explore variable stars,  we will group and filter the Var_type and Var_period columns. 

In [6]:
#grouping variable stars according to their type
var_groups=Hip.groupby('Var_type')['Hip_No'].count()
display(var_groups)

#total number of stars in the catalog
star_number=Hip['Hip_No'].value_counts(dropna=False).sum()

#counting the number of periodic & other types of variables
var=Hip[(Hip['Var_type'] =='P') | (Hip['Var_type'] == 'U') | (Hip['Var_type'] == 'M') ].count()

#counting ratio
ratio=round(var.Hip_No/star_number*100, 1)

print ('Number of variable stars:',var[0], '...',ratio, '%')

Var_type
     46596
C    46552
D    12361
M     1045
P     2708
R     1172
U     7784
Name: Hip_No, dtype: int64

Number of variable stars: 11537 ... 9.8 %


### 6. Finding the number of bluish stars in the catalog

Stars with the negative colour index B-V are hot, blue stars.

In [7]:
#filtering 'B-V' column
blue_number=(Hip[(Hip['B-V'] <0) &(Hip['B-V'] > -0.5)]).count()
star_number=Hip['B-V'].value_counts(dropna=False).sum()
ratio = round(blue_number.Hip_No/star_number*100, 1)

print('Number of blue stars:', blue_number.Hip_No, '...',ratio, '%')

Number of blue stars: 6894 ... 5.8 %


### 7. Converting the declination angle Delta

In [8]:
#converting Delta angle to degrees

def convert_delta(d,m,s):
    deg=round((s/60+m)/60+abs(d), 2)
    if d < 0:
        return -deg
    else:
        return deg

### 8. Calculating the number of the northern and southern stars in the catalog

Some stars can be observed from the northern or the southern hemisphere. To find this, we grouped stars into two groups according to their Delta coordinates (expressed in angular units). Northern stars have Delta between 0 and +90 degrees, while southern stars have Delta between 0 and -90 degrees. 

In [9]:
#selecting Delta column from the Hip dataframe
delta=Hip[['Delta']]

#counting the total number of stars
total_stars=delta.count()  

#splitting Delta column & changing the data type
delta=delta['Delta'].str.split(' ', expand=True)
delta.columns=['deg', 'min', 'sec']
delta=delta.astype({'deg':float, 'min':float, 'sec': float})

#converting deg, min, sec to degrees
delta['Delta_deg']= delta.apply(
     lambda row: convert_delta(row['deg'], row['min'], row['sec']),
     axis=1)
display(delta.head(5))

#filter Delta_deg column for southern and northern stars
delta_south=delta.loc[delta['Delta_deg']<0]
south_stars=round(delta_south['Delta_deg'].count()/total_stars *100, 2)

delta_north=delta.loc[delta['Delta_deg']>=0]
north_stars=round(delta_north['deg'].count()/total_stars *100, 2)


Unnamed: 0,deg,min,sec,Delta_deg
0,1.0,5.0,20.4,1.09
1,-19.0,29.0,55.8,-19.5
2,38.0,51.0,33.4,38.86
3,-51.0,53.0,36.8,-51.89
4,-40.0,35.0,28.4,-40.59


### 9. Printing the results

These are some of the interesting features of the Hipparcos catalog...

In [10]:

print("Total number of stars in Hipparcos catalogue: %d" %total_stars)
print("Number of Hipparcos northern stars: %5.2f" %north_stars,'%')
print("Number of Hipparcos southern stars: %5.2f" %south_stars,'%')

Total number of stars in Hipparcos catalogue: 118218
Number of Hipparcos northern stars: 49.27 %
Number of Hipparcos southern stars: 50.73 %
