### Writing an efficient Python code 

Having fast and reliable computational tools and algorithms is extremely important when working with large data sets. In Python, this is achieved by using tools, such as Pandas and Numpy arrays, and by writing readable and efficient codes in the so-called Pythonic way.  

This notebook is an extension of the <b> Stars-Of-Hipparcos-catalog </b> project, demonstrating several additional Python tools used for writing more efficient, readable codes with a fast runtime and minimal memory usage.

The same ten Hipparcos columns used in the <b> Stars-Of-Hipparcos-catalog </b> notebook are utilized in this notebook. Two additional columns, apparent stellar magnitudes and the absolute stellar magnitudes,  are calculated in the same way as in the previous notebook. 

### Importing  and preparing data 

In [1]:
#importing Python libraries
import numpy as np
import pandas as pd

file = '../data/hip_sp.csv'

#defining column names
new_column_names = ['Hip_No', 'Alpha', 'Delta','Vmag','Plx', 'e_Plx', 'B-V','e_B-V', 'ccdm_h','Spectral_type']

#importing data
hip_all_stars = pd.read_csv(file, header = None, sep =',',
                usecols = [1,2,3,4,5,6,7,8,9,10],  
                names = new_column_names,
                low_memory = False)

#changing column types
col_list = ['Vmag', 'Plx', 'e_Plx', 'B-V', 'e_B-V']
for  col in col_list:
   hip_all_stars[col] = pd.to_numeric(hip_all_stars[col],  errors = 'coerce') 

#selecting only single stars with no ccdm_h flag
df = hip_all_stars.loc[hip_all_stars['ccdm_h']==' ']

#dropping ccdm_h column
hip = df.drop(['ccdm_h'], axis = 1) 

#displaying DataFrame
hip.head(5)

Unnamed: 0,Hip_No,Alpha,Delta,Vmag,Plx,e_Plx,B-V,e_B-V,Spectral_type
0,1,00 00 00.22,+01 05 20.4,9.1,3.54,1.39,0.482,0.025,F5
1,2,00 00 00.91,-19 29 55.8,9.27,21.9,3.1,0.999,0.002,K3V
3,4,00 00 02.01,-51 53 36.8,8.06,7.75,0.97,0.37,0.009,F0V
4,5,00 00 02.39,-40 35 28.4,8.55,2.87,1.11,0.902,0.013,G8III
5,6,00 00 04.35,+03 56 47.4,12.31,18.8,4.99,1.336,0.02,M0V:


In [2]:
#selecting stars with positive Plx and with the relative errors in Plx < 0.80
hip_stars = hip.loc[(hip['Plx']>0) & (hip['e_Plx']/hip['Plx'].abs()<0.80)].copy()

hip_stars['Mv'] = hip_stars['Vmag'] + 5 - 5*np.log10(1000/hip_stars['Plx'])

#rounding numbers
hip_stars = hip_stars.round({'Mv': 2})

print(hip_stars.shape)

(85480, 10)


### How to eliminate loops

We will list several possible ways to eliminate loops because they are inefficient and take more lines of code than needed. 

In [3]:
%%time
#for loop approach

star_list = hip_stars[['Spectral_type', 'Mv']]

suntype_stars = []
for i,j in star_list.iterrows():
    if 'G2V' in j['Spectral_type']:
        suntype_stars.append(j['Mv'])

print('List of the absolute magnitudes for Sun-type stars:', suntype_stars[:2], '...')
print(len(suntype_stars))  

List of the absolute magnitudes for Sun-type stars: [4.36, 3.26] ...
551
Wall time: 13.2 s


In [4]:
%%time
#list comprehension approach

from statistics import mean
df_sp = hip_stars[['Spectral_type', 'Mv']]

df_sunlike = df_sp[df_sp['Spectral_type'].str.strip() == 'G2V']
print(len(df_sunlike))

sunlike_avg = mean(df_sunlike['Mv'])
print('The average absolute magnitude of the sun-like stars:', 
      round(sunlike_avg, 2))

462
The average absolute magnitude of the sun-like stars: 4.02
Wall time: 87.5 ms


In [5]:
%%time
#NumPy approach

numpy_sunlike= np.array(df_sunlike['Mv'])
print(len(numpy_sunlike))

sunlike_avg = numpy_sunlike.mean()
print('The average absolute magnitude of the sun-like stars:', 
      round(sunlike_avg, 2))

462
The average absolute magnitude of the sun-like stars: 4.02
Wall time: 984 µs


### Using NumPy arrays

Using NumPy arrays is the most efficient way of applying complex calculations on a set of numbers.

In [6]:
%%time
#list of right ascension in degrees

alpha_list = [*range(1,360,1)]
alpha_np = np.array(alpha_list)
alpha_np_c = np.cos(alpha_np)*np.sin(alpha_np)
print(alpha_np_c[0:10])

[ 0.45464871 -0.37840125 -0.13970775  0.49467912 -0.27201056 -0.26828646
  0.49530368 -0.14395166 -0.37549362  0.45647263]
Wall time: 998 µs


### Combining objects

We will combine the list of stellar absolute magnitudes with the list of stellar spectral types by using zip method as a more efficient way of combining two objects than the by using loops. 

In [7]:
%%time

Mv_list = hip_stars['Mv']
Sp_list = hip_stars['Spectral_type']

star_infos_zip = zip(Mv_list, Sp_list)
star_infos_zip_list = [* star_infos_zip]

print(type(star_infos_zip_list))
print(star_infos_zip_list[0:3])

<class 'list'>
[(1.85, 'F5          '), (5.97, 'K3V         '), (2.51, 'F0V         ')]
Wall time: 44.7 ms
