# Writing efficient Python code 

In this project, we will explore how to write a Python program that is readable while at the same time efficient with the fast runtime and the minimal memory usage. 

A part of the main Hipparcos catalog was extracted into the Hp_Sp.csv file.  
Hip_Sp.csv contains four columns: 
<ul>
      <li> Hip_No -- unique Hipparcos number </li>
      <li> Vmag -- visual magnitude as a measure of stellar apparent brightness </li> 
      <li> Mv -- absolute stellar magnitude, is a measure of the real steller brightnes and it is calculated from the Hipparcos aparent visual magnitude (Vmag) and the Hipparcos measured parallax (Plx). </li>
      <li> Spectral_type -- is a measure of stellar temperature or color. </li>
</ul>

### Importing data 

In [1]:
%%time
import numpy as np
import pandas as pd

file = '../data/Hip_Sp.csv'
new_column_names = ['Hip_No', 'Vmag', 'Mv', 'Spectral_type']
hip_sp = pd.read_csv(file, header = 0, sep=',',
                  usecols=[1,2,3,4],
                  names=new_column_names)
hip_sp.head(5)

Wall time: 795 ms


Unnamed: 0,Hip_No,Vmag,Mv,Spectral_type
0,1,9.1,1.845016,F5
1,2,9.27,5.972221,K3V
2,3,6.61,-1.146468,B9
3,4,8.06,2.506509,F0V
4,5,8.55,0.839409,G8III


###  Pythonic .vs. non-pythonic code

How many stars from our Hp_Sp.csv file are more luminous than the Sun, knowing that the absolute magnitude of the Sun is 4.83? To answer this question,  we need to count the number of stars from the Mv column of the hip_sp data frame. All-stars from the catalog with the absolute magnitudes, Mv, less than 4.83, are more luminous than our Sun.  

In [2]:
%%time

#Non-Pythonic Way

star_list = []
for i in range(0,len(hip_sp['Mv'])):
    mag = hip_sp['Mv'][i]
    if mag < 4.83:
       star_list.append(mag)

print(len(star_list))

104597
Wall time: 571 ms


In [3]:
%%time

#Pythonic Way

star_list = [mag for mag in hip_sp['Mv'] if mag < 4.83]

print(len(star_list))

104597
Wall time: 19 ms


### Examining runtime 

To select the most efficient code we will examine the runtime using one of the magic commands. The module timeit will time many executions for one statement. We can set the number of runs using -r option and the number of loops using -n option. On the other hand, the %%time command measures actual time to complete a command and it can be affected by any other operations in the computer. 

In [4]:
import timeit

%timeit star_list = [mag for mag in hip_sp['Mv'] if mag < 4.83]

%timeit -r2 -n10 star_list = [mag for mag in hip_sp['Mv'] if mag < 4.83]

13.2 ms ± 79 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
13.7 ms ± 20.4 µs per loop (mean ± std. dev. of 2 runs, 10 loops each)


For example, we can compare times that takes to create a list by using the standard syntax [] or by using Python's built-in function list().

In [5]:
%timeit -r2 -n10 Mv_list1 = [hip_sp['Mv']]

%timeit -r2 -n10 Mv_list2 = list(hip_sp['Mv'])

3.53 µs ± 2.03 µs per loop (mean ± std. dev. of 2 runs, 10 loops each)
10.1 ms ± 39 µs per loop (mean ± std. dev. of 2 runs, 10 loops each)


###  List of Hipparacos numbers for different stars 

Let's create a list of Hip Ids and an indexed list of absolute magnitudes using Python's built-in functions.

In [6]:
%%time
hip_id_list = list(hip_sp['Hip_No'])

hip_id_list1 = [* range(1, hip_id_list[-1])]
print(len(hip_id_list1))

118321
Wall time: 13 ms


In [7]:
%%time
mag_list = list(hip_sp['Mv'])

indexed_list = [* enumerate(mag_list, 1)]
print(indexed_list[0])

(1, 1.8450163101289387)
Wall time: 24 ms


### Rounding values using  dataframes

In [8]:
%%time

hip_sp2 = hip_sp.round({'Mv': 2})
print(hip_sp2.head(5))

   Hip_No  Vmag    Mv Spectral_type
0       1  9.10  1.85  F5          
1       2  9.27  5.97  K3V         
2       3  6.61 -1.15  B9          
3       4  8.06  2.51  F0V         
4       5  8.55  0.84  G8III       
Wall time: 12 ms


In [9]:
%%time

Mv_list = round(hip_sp['Mv'], 2)
print(Mv_list[0:5])

0    1.85
1    5.97
2   -1.15
3    2.51
4    0.84
Name: Mv, dtype: float64
Wall time: 14 ms


### Using NumPy array 

Using NumPy arrays is the most efficient way of applying complex calculations on a set of numbers. 

In [10]:
%%time
#list of right ascension in degrees

alpha_list = [*range(1,360,1)]
alpha_np = np.array(alpha_list)
alpha_np_c = np.cos(alpha_np)*np.sin(alpha_np)
print(alpha_np_c[0:10])

[ 0.45464871 -0.37840125 -0.13970775  0.49467912 -0.27201056 -0.26828646
  0.49530368 -0.14395166 -0.37549362  0.45647263]
Wall time: 1 ms


### Combining objects

We will combine the list of stellar absolute magnitudes with the list of stellar spectral types and try to find the most efficient way of combining two objects. Using zip method is more efficient than using for loop. 

In [11]:
%%time

Mv_list = hip_sp['Mv']
Sp_list = hip_sp['Spectral_type']

star_infos = []
for i,magnitud in enumerate(Mv_list):
    star_infos.append((magnitud, Sp_list[i]))

print(type(star_infos)) 
print(star_infos[0:3])

<class 'list'>
[(1.8450163101289387, 'F5          '), (5.972220574200591, 'K3V         '), (-1.1464684004746015, 'B9          ')]
Wall time: 354 ms


In [12]:
%%time

Mv_list = hip_sp['Mv']
Sp_list = hip_sp['Spectral_type']

star_infos_zip = zip(Mv_list, Sp_list)
star_infos_zip_list = [* star_infos_zip]

print(type(star_infos_zip_list))
print(star_infos_zip_list[0:3])

<class 'list'>
[(1.8450163101289387, 'F5          '), (5.972220574200591, 'K3V         '), (-1.1464684004746015, 'B9          ')]
Wall time: 27 ms


### Counting and grouping in Python

In this step, we will try to find out what is the fastest way to count how many stars belong to each spectral type. We have 113759 stars from the Hipparcos catalog. First, we will use the standard method of counting using loops, then we will count using a special Python object called Counter based on dictionary object.

In [13]:
%%time

#Counting using loop

Sp_list = hip_sp['Spectral_type']

spectral_groups = {}
for spectral_type in Sp_list:
    if spectral_type not in spectral_groups:
       spectral_groups[spectral_type] = 1
    else:
       spectral_groups[spectral_type] += 1 

#printing first three spectral groups
print(list(spectral_groups.items())[:3])

[('F5          ', 3801), ('K3V         ', 213), ('B9          ', 1499)]
Wall time: 27 ms


In [14]:
%%time

#Counting using counter 

Sp_list = hip_sp['Spectral_type']

from collections import Counter

#create instance of counter
counter_dict = Counter(Sp_list)

#printing first three spectral groups
print(list(counter_dict.items())[:3])
#printing the most common spectral groups 
print(counter_dict.most_common(3))

[('F5          ', 3801), ('K3V         ', 213), ('B9          ', 1499)]
[('K0          ', 8303), ('G5          ', 5892), ('A0          ', 4811)]
Wall time: 14 ms


### Finding common stars between two lists

What is the best way to find common objects in two lists? Python's built-in set type is much faster than the standard way of searching through two lists. Let's select stars from the Hipparcos catalog with the same spectral types. 

In [15]:
%%time

list_1 = (hip_sp['Spectral_type'][:50000])
list_2 = (hip_sp['Spectral_type'][50001:])

set_1 = set(list_1)
set_2 = set(list_2)

common_stars = set_1.intersection(set_2)
print(len(common_stars))

1320
Wall time: 9 ms


### Finding difference and union between two lists

With the two lists of stars from the previous step, we can, for example, find stars that exist only in the first list but not in the second, or in one line we can extract all possible spectral classes from the two lists without heaving to repat the common types from the two lists.

In [16]:
%%time

diff_list = set_1.difference(set_2)
print(len(diff_list))

unique_list = set_1.union(set_2)
print(len(unique_list))

1030
3871
Wall time: 1 ms


### Finding an element in a list

What is the fastest way to search for an element in a list of 113759 objects? We will show below that if a list is of type of set we will be able to find an object faster than in an ordinary list or a tuple. 

In [17]:
list_1 = list(hip_sp['Spectral_type'])
 
new_list = [i.strip(' ') for i in list_1]
print(type(new_list))

%timeit 'A2' in new_list

<class 'list'>
167 ns ± 2.26 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)


In [18]:
new_object = tuple([i.strip(' ') for i in list_1])
print(type(new_object))

%timeit 'A2' in new_object

<class 'tuple'>
180 ns ± 10.3 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)


In [19]:
new_object = set([i.strip(' ') for i in list_1])
print(type(new_object))

%timeit 'A2' in new_object

<class 'set'>
45 ns ± 0.355 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)


### How to eliminate loops

We will list several possible ways to eliminate loops because they are inefficient and take more lines of code than needed. For example, we can select all-stars with the same spectral type, G2V as our Sun. 

In [35]:
%%time
#for loop approach

star_list = hip_sp[['Spectral_type', 'Mv']]

suntype_stars = []
for i,j in star_list.iterrows():
    if 'G2V' in j['Spectral_type']:
        suntype_stars.append(j['Mv'])

print('List of absolute magnitudes for Sun-type stars:', suntype_stars[:2], '...')
print(len(suntype_stars))  

List of absolute magnitudes for Sun-type stars: [4.364230575944209, 4.375772199531408] ...
691
Wall time: 7.6 s


In [63]:
%%time
#list comprehension approach

from statistics import mean
df_sp = hip_sp[['Spectral_type', 'Mv']]

df_sunlike = df_sp[df_sp['Spectral_type'].str.strip() == 'G2V']
print(len(df_sunlike))

sunlike_avg = mean(df_sunlike['Mv'])
print('Average absolute magnitude of sun-like stars:', 
      round(sunlike_avg, 2))

568
Average absolute magnitude of sun-like stars: 3.95
Wall time: 49.4 ms


In [75]:
%%time
#NumPy approach

numpy_sunlike= np.array(df_sunlike['Mv'])
print(len(numpy_sunlike))

sunlike_avg = numpy_sunlike.mean()
print('Average absolute magnitude of sun-like stars:', 
      round(sunlike_avg, 2))

568
Average absolute magnitude of sun-like stars: 3.95
Wall time: 0 ns
