# CSV + API

In this reboot, we are going to use:

- The [Goodreads books](https://www.kaggle.com/jealousleopard/goodreadsbooks) dataset from Kaggle.
- The [Open Library Books API](https://openlibrary.org/dev/docs/api/books)

The goal of this livecode is to load the data from a CSV + loop over rows to enrich each row with information such as:

- List of subjects (Science, Humor, Travel, etc.)
- The cover URL of the book
- Other information you'd find useful in the JSON API

First, download the CSV in the local folder:

## Loading the `DataFrame`

In [None]:
#!curl -L https://gist.githubusercontent.com/ssaunier/351b17f5a7a009808b60aeacd1f4a036/raw/books.csv > books.csv

Then import the usual suspects!

In [1]:
import pandas as pd
import requests
import numpy as np

In [2]:
books_df = pd.read_csv('books.csv')

In [3]:
books_df.head()

Unnamed: 0,bookID,title,authors,average_rating,isbn,isbn13,language_code,# num_pages,ratings_count,text_reviews_count
0,1,Harry Potter and the Half-Blood Prince (Harry ...,J.K. Rowling-Mary GrandPré,4.56,0439785960,9780439785969,eng,652,1944099,26249
1,2,Harry Potter and the Order of the Phoenix (Har...,J.K. Rowling-Mary GrandPré,4.49,0439358078,9780439358071,eng,870,1996446,27613
2,3,Harry Potter and the Sorcerer's Stone (Harry P...,J.K. Rowling-Mary GrandPré,4.47,0439554934,9780439554930,eng,320,5629932,70390
3,4,Harry Potter and the Chamber of Secrets (Harry...,J.K. Rowling,4.41,0439554896,9780439554893,eng,352,6267,272
4,5,Harry Potter and the Prisoner of Azkaban (Harr...,J.K. Rowling-Mary GrandPré,4.55,043965548X,9780439655484,eng,435,2149872,33964


In [4]:
books_df.drop(columns=['bookID', 'text_reviews_count', 'ratings_count', 'isbn'], inplace=True)

In [5]:
books_df.head(3)

Unnamed: 0,title,authors,average_rating,isbn13,language_code,# num_pages
0,Harry Potter and the Half-Blood Prince (Harry ...,J.K. Rowling-Mary GrandPré,4.56,9780439785969,eng,652
1,Harry Potter and the Order of the Phoenix (Har...,J.K. Rowling-Mary GrandPré,4.49,9780439358071,eng,870
2,Harry Potter and the Sorcerer's Stone (Harry P...,J.K. Rowling-Mary GrandPré,4.47,9780439554930,eng,320


## Exploring the API

In [6]:
'https://openlibrary.org/api/books?bibkeys=ISBN:9780439785969&format=json&jscmd=data'

'https://openlibrary.org/api/books?bibkeys=ISBN:9780439785969&format=json&jscmd=data'

In [7]:
books_df['isbn13'][0]

9780439785969

In [11]:
url = 'https://openlibrary.org/api/books'

isbn13 = 9780439785970

params = {
    'bibkeys': f'ISBN:{isbn13}',
    'format': 'json',
    'jscmd': 'data'
}

requests.get(url, params).json()

{}

In [19]:
def fetch_cover_url(isbn13):
    url = 'https://openlibrary.org/api/books'
    
    params = {
    'bibkeys': f'ISBN:{isbn13}',
    'format': 'json',
    'jscmd': 'data'
    }

    response = requests.get(url, params).json()
    if f'ISBN:{isbn13}' in response:
        return response[f"ISBN:{isbn13}"].get("cover", {}).get("large", "")
    return ''

In [17]:
fetch_cover_url('9780439785969')

'https://covers.openlibrary.org/b/id/9326654-L.jpg'

In [20]:
for index, row in books_df.head(15).iterrows():
    print(fetch_cover_url(row['isbn13']))

https://covers.openlibrary.org/b/id/9326654-L.jpg
https://covers.openlibrary.org/b/id/12025650-L.jpg
https://covers.openlibrary.org/b/id/7572543-L.jpg
https://covers.openlibrary.org/b/id/10301720-L.jpg
https://covers.openlibrary.org/b/id/10580458-L.jpg
https://covers.openlibrary.org/b/id/278981-L.jpg
https://covers.openlibrary.org/b/id/742235-L.jpg
https://covers.openlibrary.org/b/id/279436-L.jpg
https://covers.openlibrary.org/b/id/12617870-L.jpg

https://covers.openlibrary.org/b/id/10176291-L.jpg
https://covers.openlibrary.org/b/id/8769632-L.jpg
https://covers.openlibrary.org/b/id/7892560-L.jpg
https://covers.openlibrary.org/b/id/12725608-L.jpg
https://covers.openlibrary.org/b/id/6815851-L.jpg


## Adding `cover_url` on the `DataFrame`

In [52]:
books_df['cover_url'] = None

In [53]:
books_df.head()

Unnamed: 0,title,authors,average_rating,isbn13,language_code,# num_pages,cover_url
0,Harry Potter and the Half-Blood Prince (Harry ...,J.K. Rowling-Mary GrandPré,4.56,9780439785969,eng,652,
1,Harry Potter and the Order of the Phoenix (Har...,J.K. Rowling-Mary GrandPré,4.49,9780439358071,eng,870,
2,Harry Potter and the Sorcerer's Stone (Harry P...,J.K. Rowling-Mary GrandPré,4.47,9780439554930,eng,320,
3,Harry Potter and the Chamber of Secrets (Harry...,J.K. Rowling,4.41,9780439554893,eng,352,
4,Harry Potter and the Prisoner of Azkaban (Harr...,J.K. Rowling-Mary GrandPré,4.55,9780439655484,eng,435,


### Iterrows

In [54]:
%%time

for index, row in books_df.head(15).iterrows():
    if row["cover_url"] is None:
        isbn = row['isbn13']
        print(f"Fetching cover for {row['title']}")
        cover_url = fetch_cover_url(isbn)
        if cover_url:
            books_df.loc[index, "cover_url"] = cover_url
        else:
            books_df.loc[index, "cover_url"] = ""

Fetching cover for Harry Potter and the Half-Blood Prince (Harry Potter  #6)
Fetching cover for Harry Potter and the Order of the Phoenix (Harry Potter  #5)
Fetching cover for Harry Potter and the Sorcerer's Stone (Harry Potter  #1)
Fetching cover for Harry Potter and the Chamber of Secrets (Harry Potter  #2)
Fetching cover for Harry Potter and the Prisoner of Azkaban (Harry Potter  #3)
Fetching cover for Harry Potter Boxed Set  Books 1-5 (Harry Potter  #1-5)
Fetching cover for Unauthorized Harry Potter Book Seven News: "Half-Blood Prince" Analysis and Speculation
Fetching cover for Harry Potter Collection (Harry Potter  #1-6)
Fetching cover for The Ultimate Hitchhiker's Guide: Five Complete Novels and One Story (Hitchhiker's Guide to the Galaxy  #1-5)
Fetching cover for The Ultimate Hitchhiker's Guide to the Galaxy
Fetching cover for The Hitchhiker's Guide to the Galaxy (Hitchhiker's Guide to the Galaxy  #1)
Fetching cover for The Hitchhiker's Guide to the Galaxy (Hitchhiker's Guide t

In [55]:
books_df.head(15)

Unnamed: 0,title,authors,average_rating,isbn13,language_code,# num_pages,cover_url
0,Harry Potter and the Half-Blood Prince (Harry ...,J.K. Rowling-Mary GrandPré,4.56,9780439785969,eng,652,https://covers.openlibrary.org/b/id/9326654-L.jpg
1,Harry Potter and the Order of the Phoenix (Har...,J.K. Rowling-Mary GrandPré,4.49,9780439358071,eng,870,https://covers.openlibrary.org/b/id/12025650-L...
2,Harry Potter and the Sorcerer's Stone (Harry P...,J.K. Rowling-Mary GrandPré,4.47,9780439554930,eng,320,https://covers.openlibrary.org/b/id/7572543-L.jpg
3,Harry Potter and the Chamber of Secrets (Harry...,J.K. Rowling,4.41,9780439554893,eng,352,https://covers.openlibrary.org/b/id/10301720-L...
4,Harry Potter and the Prisoner of Azkaban (Harr...,J.K. Rowling-Mary GrandPré,4.55,9780439655484,eng,435,https://covers.openlibrary.org/b/id/10580458-L...
5,Harry Potter Boxed Set Books 1-5 (Harry Potte...,J.K. Rowling-Mary GrandPré,4.78,9780439682589,eng,2690,https://covers.openlibrary.org/b/id/278981-L.jpg
6,"Unauthorized Harry Potter Book Seven News: ""Ha...",W. Frederick Zimmerman,3.69,9780976540601,en-US,152,https://covers.openlibrary.org/b/id/742235-L.jpg
7,Harry Potter Collection (Harry Potter #1-6),J.K. Rowling,4.73,9780439827607,eng,3342,https://covers.openlibrary.org/b/id/279436-L.jpg
8,The Ultimate Hitchhiker's Guide: Five Complete...,Douglas Adams,4.38,9780517226957,eng,815,https://covers.openlibrary.org/b/id/12617870-L...
9,The Ultimate Hitchhiker's Guide to the Galaxy,Douglas Adams,4.38,9780345453747,eng,815,


### Apply -> Pandas DataFrame

In [56]:
books_df['cover_url'] = None

In [59]:
books_df.head(3)

Unnamed: 0,title,authors,average_rating,isbn13,language_code,# num_pages,cover_url
0,Harry Potter and the Half-Blood Prince (Harry ...,J.K. Rowling-Mary GrandPré,4.56,9780439785969,eng,652,
1,Harry Potter and the Order of the Phoenix (Har...,J.K. Rowling-Mary GrandPré,4.49,9780439358071,eng,870,
2,Harry Potter and the Sorcerer's Stone (Harry P...,J.K. Rowling-Mary GrandPré,4.47,9780439554930,eng,320,


In [64]:
%%time
books_df['cover_url'] = books_df.head(15).apply(lambda row: fetch_cover_url(row['isbn13']), axis=1)

CPU times: user 250 ms, sys: 38.1 ms, total: 288 ms
Wall time: 21.2 s


In [65]:
books_df.head(15)

Unnamed: 0,title,authors,average_rating,isbn13,language_code,# num_pages,cover_url
0,Harry Potter and the Half-Blood Prince (Harry ...,J.K. Rowling-Mary GrandPré,4.56,9780439785969,eng,652,https://covers.openlibrary.org/b/id/9326654-L.jpg
1,Harry Potter and the Order of the Phoenix (Har...,J.K. Rowling-Mary GrandPré,4.49,9780439358071,eng,870,https://covers.openlibrary.org/b/id/12025650-L...
2,Harry Potter and the Sorcerer's Stone (Harry P...,J.K. Rowling-Mary GrandPré,4.47,9780439554930,eng,320,https://covers.openlibrary.org/b/id/7572543-L.jpg
3,Harry Potter and the Chamber of Secrets (Harry...,J.K. Rowling,4.41,9780439554893,eng,352,https://covers.openlibrary.org/b/id/10301720-L...
4,Harry Potter and the Prisoner of Azkaban (Harr...,J.K. Rowling-Mary GrandPré,4.55,9780439655484,eng,435,https://covers.openlibrary.org/b/id/10580458-L...
5,Harry Potter Boxed Set Books 1-5 (Harry Potte...,J.K. Rowling-Mary GrandPré,4.78,9780439682589,eng,2690,https://covers.openlibrary.org/b/id/278981-L.jpg
6,"Unauthorized Harry Potter Book Seven News: ""Ha...",W. Frederick Zimmerman,3.69,9780976540601,en-US,152,https://covers.openlibrary.org/b/id/742235-L.jpg
7,Harry Potter Collection (Harry Potter #1-6),J.K. Rowling,4.73,9780439827607,eng,3342,https://covers.openlibrary.org/b/id/279436-L.jpg
8,The Ultimate Hitchhiker's Guide: Five Complete...,Douglas Adams,4.38,9780517226957,eng,815,https://covers.openlibrary.org/b/id/12617870-L...
9,The Ultimate Hitchhiker's Guide to the Galaxy,Douglas Adams,4.38,9780345453747,eng,815,


### Real comparison `iterrows` x `apply`

In [66]:
books_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 13719 entries, 0 to 13718
Data columns (total 7 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   title           13719 non-null  object 
 1   authors         13719 non-null  object 
 2   average_rating  13719 non-null  float64
 3   isbn13          13719 non-null  int64  
 4   language_code   13719 non-null  object 
 5   # num_pages     13719 non-null  int64  
 6   cover_url       15 non-null     object 
dtypes: float64(1), int64(2), object(4)
memory usage: 750.4+ KB


In [67]:
%%time
for index, row in books_df.iterrows():
    books_df.loc[index, 'average_rating'] = row['average_rating'] + 1

CPU times: user 6.79 s, sys: 29.7 ms, total: 6.82 s
Wall time: 6.81 s


In [68]:
books_df.head(3)

Unnamed: 0,title,authors,average_rating,isbn13,language_code,# num_pages,cover_url
0,Harry Potter and the Half-Blood Prince (Harry ...,J.K. Rowling-Mary GrandPré,5.56,9780439785969,eng,652,https://covers.openlibrary.org/b/id/9326654-L.jpg
1,Harry Potter and the Order of the Phoenix (Har...,J.K. Rowling-Mary GrandPré,5.49,9780439358071,eng,870,https://covers.openlibrary.org/b/id/12025650-L...
2,Harry Potter and the Sorcerer's Stone (Harry P...,J.K. Rowling-Mary GrandPré,5.47,9780439554930,eng,320,https://covers.openlibrary.org/b/id/7572543-L.jpg


In [69]:
%%time
books_df['average_rating'] = books_df.apply(lambda x: x['average_rating']+1, axis=1)

CPU times: user 327 ms, sys: 0 ns, total: 327 ms
Wall time: 321 ms


In [70]:
6.81*1000

6810.0

In [71]:
6810 / 321

21.214953271028037

In [None]:
books_df.head(3)

### Map -> Pandas Series

In [72]:
books_df['average_rating']

0        6.56
1        6.49
2        6.47
3        6.41
4        6.55
         ... 
13714    5.82
13715    5.72
13716    5.53
13717    6.43
13718    6.29
Name: average_rating, Length: 13719, dtype: float64

In [73]:
type(books_df['average_rating'])

pandas.core.series.Series

In [74]:
%%time
books_df['average_rating'] = books_df['average_rating'].map(lambda x: x+1)

CPU times: user 9.56 ms, sys: 0 ns, total: 9.56 ms
Wall time: 8.13 ms


In [78]:
'hello'.upper()

'HELLO'

In [79]:
books_df['title']

0        Harry Potter and the Half-Blood Prince (Harry ...
1        Harry Potter and the Order of the Phoenix (Har...
2        Harry Potter and the Sorcerer's Stone (Harry P...
3        Harry Potter and the Chamber of Secrets (Harry...
4        Harry Potter and the Prisoner of Azkaban (Harr...
                               ...                        
13714                                       M Is for Magic
13715                                         Black Orchid
13716                          InterWorld (InterWorld  #1)
13717                                  The Faeries' Oracle
13718                        The World of The Dark Crystal
Name: title, Length: 13719, dtype: object

In [81]:
books_df['title'].map(str.upper)

0        HARRY POTTER AND THE HALF-BLOOD PRINCE (HARRY ...
1        HARRY POTTER AND THE ORDER OF THE PHOENIX (HAR...
2        HARRY POTTER AND THE SORCERER'S STONE (HARRY P...
3        HARRY POTTER AND THE CHAMBER OF SECRETS (HARRY...
4        HARRY POTTER AND THE PRISONER OF AZKABAN (HARR...
                               ...                        
13714                                       M IS FOR MAGIC
13715                                         BLACK ORCHID
13716                          INTERWORLD (INTERWORLD  #1)
13717                                  THE FAERIES' ORACLE
13718                        THE WORLD OF THE DARK CRYSTAL
Name: title, Length: 13719, dtype: object

## Calling the API with multiple ISBNs at a time

In [82]:
isbns = [9780439785969, 9780439358071, 9780439554930]
[f"ISBN:{isbn}" for isbn in isbns]

['ISBN:9780439785969', 'ISBN:9780439358071', 'ISBN:9780439554930']

In [83]:
",".join([f"ISBN:{isbn}" for isbn in isbns])

'ISBN:9780439785969,ISBN:9780439358071,ISBN:9780439554930'

In [84]:
def fetch_books(isbns):
    url = "https://openlibrary.org/api/books"
    bibkeys = ",".join([f"ISBN:{isbn}" for isbn in isbns])
    params = {
        'bibkeys': bibkeys,
        'format': 'json',
        'jscmd': 'data'
    }
    response = requests.get(url, params=params).json()
    return response

In [92]:
np.array_split(books_df[['title']].head(20), 5)[4]

Unnamed: 0,title
16,In a Sunburned Country
17,I'm a Stranger Here Myself: Notes on Returning...
18,The Lost Continent: Travels in Small Town America
19,Neither Here nor There: Travels in Europe


In [93]:
books_df.set_index("isbn13", inplace=True)

In [94]:
%%time

for group in np.array_split(books_df.head(100), 5): # 5 groups of 20 books
    books = fetch_books(list(group.index))
    for isbn_code, book in books.items():
        isbn = int(isbn_code.strip("ISBN:"))
        books_df.loc[isbn, "cover_url"] = book.get("cover", {}).get("large", "")

CPU times: user 204 ms, sys: 978 µs, total: 205 ms
Wall time: 14.6 s


In [95]:
books_df.head(100)

Unnamed: 0_level_0,title,authors,average_rating,language_code,# num_pages,cover_url
isbn13,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
9780439785969,Harry Potter and the Half-Blood Prince (Harry ...,J.K. Rowling-Mary GrandPré,7.56,eng,652,https://covers.openlibrary.org/b/id/9326654-L.jpg
9780439358071,Harry Potter and the Order of the Phoenix (Har...,J.K. Rowling-Mary GrandPré,7.49,eng,870,https://covers.openlibrary.org/b/id/12025650-L...
9780439554930,Harry Potter and the Sorcerer's Stone (Harry P...,J.K. Rowling-Mary GrandPré,7.47,eng,320,https://covers.openlibrary.org/b/id/7572543-L.jpg
9780439554893,Harry Potter and the Chamber of Secrets (Harry...,J.K. Rowling,7.41,eng,352,https://covers.openlibrary.org/b/id/10301720-L...
9780439655484,Harry Potter and the Prisoner of Azkaban (Harr...,J.K. Rowling-Mary GrandPré,7.55,eng,435,https://covers.openlibrary.org/b/id/10580458-L...
...,...,...,...,...,...,...
9780451528612,Anna Karenina,Leo Tolstoy-David Magarshack-Priscilla Meyer,7.04,eng,960,https://covers.openlibrary.org/b/id/295745-L.jpg
9780140449174,Anna Karenina,Leo Tolstoy-Richard Pevear-Larissa Volokhonsky...,7.04,eng,837,
9780822001836,CliffsNotes on Tolstoy's Anna Karenina,Marianne Sturman-Leo Tolstoy,6.89,eng,80,https://covers.openlibrary.org/b/id/6951821-L.jpg
9781593080273,Anna Karenina,Leo Tolstoy-Amy Mandelker-Constance Garnett,7.04,eng,803,https://covers.openlibrary.org/b/id/869620-L.jpg
