<a href="https://colab.research.google.com/github/lorenzo1285/Sofw_Eng_Practices/blob/main/optimizing_code_common_books.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Optimizing Code: Common Books
Here's the code your coworker wrote to find the common book ids in `books_published_last_two_years.txt` and `all_coding_books.txt` to obtain a list of recent coding books.

In [11]:
import time
import pandas as pd
import numpy as np
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [24]:
with open('/content/books-published-last-two-years.txt', 'r') as f: 
    recent_books = f.read().split('\n')

In [22]:
with open('/content/all-coding-books.txt', 'r') as f:
    coding_books = f.read().split('\n')

In [26]:
start = time.time()
recent_coding_books = []

for book in recent_books:
    if book in coding_books:
        recent_coding_books.append(book)

print(len(recent_coding_books))
print('Duration: {} seconds'.format(time.time() - start))

96
Duration: 16.106045961380005 seconds


### Tip #1: Use vector operations over loops when possible

Use numpy's `intersect1d` method to get the intersection of the `recent_books` and `coding_books` arrays.

In [28]:
start = time.time()
recent_coding_books = np.intersect1d(recent_books, coding_books)
print(len(recent_coding_books))
print('Duration: {} seconds'.format(time.time() - start))

96
Duration: 0.034322261810302734 seconds


In [30]:
print(np.intersect1d(recent_books, coding_books))


['1219701' '1258335' '1264806' '1473766' '1694425' '1713507' '1715546'
 '1900178' '1901264' '1962694' '2009541' '2038925' '2239694' '2439487'
 '2442952' '2462622' '2644909' '2645238' '2706358' '2920394' '2986045'
 '2989078' '3036263' '3066256' '3172199' '3264002' '3290103' '3349989'
 '3517640' '3783712' '4069963' '4137576' '4245126' '4281481' '4580997'
 '4623179' '4717544' '4959393' '4976621' '4993512' '5205726' '5353921'
 '5406308' '5764540' '5766722' '5890905' '5951873' '6005218' '6163266'
 '6445882' '6495493' '6522620' '6595167' '6599509' '6637024' '6889040'
 '6964516' '6975356' '6977874' '7144292' '7148530' '7170269' '7201791'
 '7231742' '7286175' '7286871' '7308127' '7356628' '7401186' '7406586'
 '7531095' '7663370' '7668560' '7689591' '7804101' '7804836' '7852176'
 '7955543' '8196889' '8255889' '8502866' '8558628' '8604850' '8621688'
 '8819824' '8873515' '8879982' '8897482' '8919160' '9180837' '9193737'
 '9255617' '9348635' '9443002' '9497646' '9624309']


### Tip #2: Know your data structures and which methods are faster
Use the set's `intersection` method to get the common elements in `recent_books` and `coding_books`.

In [29]:
start = time.time()
recent_coding_books = set(recent_books).intersection(coding_books)
print(len(recent_coding_books))
print('Duration: {} seconds'.format(time.time() - start))

96
Duration: 0.00873255729675293 seconds
