# NYT Books: Finding the popular best-sellers

## Introduction 

In this lab, we will compile a list of books that were best-sellers during the summer of 2014. To get this list, we will use the [Books API](https://developer.nytimes.com/books_api.json), from the New York Times.

---------------------------

## Step 1. Acquiring access to the API

Like most APIs, NYT requires that each developer should have a private (secret) key in order to use their services. This way, they are able to throttle the number of requests that are being issued. According to their website, this limit is at 5,000 requests per day. You can register [here](http://developer.nytimes.com/apps/register). When asked, request a key for the *Books API*. Your key will be in the next web page. Save it in a text file named _key.txt_, in the same folder as this notebook.

Now that we donwloaded our key, we should read it into a variable to use it for the rest of this notebook.

In [3]:
key = ""
with open('key.txt','r') as f:
    key = f.readline().strip()

if len(key) > 0:
    print("Succesfully retrieved API key")

Succesfully retrieved API key


Let's make a sample request to check that everything works fine. We will retrieve the names of NYT best-seller lists.

Before we generate the request string, we need to read the [README](https://developer.nytimes.com/books_api.json#/README). According to [the documentation](https://developer.nytimes.com/books_api.json), the request must follow this URI structure:
```
http://api.nytimes.com/svc/books/{version}/lists/names[.response_format]?api-key={your-API-key}
```
We need to replace *{version}* with *v3*, the *response_format* with *json* and include our secrete API key.

In [5]:
import requests

response = requests.get("http://api.nytimes.com/svc/books/v3/lists/names.json?api-key=%s"%(key))
print(response)

<Response [200]>


We can try to print the response that NYT returns.

In [8]:
print(response.text)

{"status":"OK","copyright":"Copyright (c) 2016 The New York Times Company.  All Rights Reserved.","num_results":53,"results":[{"list_name":"Combined Print and E-Book Fiction","display_name":"Combined Print & E-Book Fiction","list_name_encoded":"combined-print-and-e-book-fiction","oldest_published_date":"2011-02-13","newest_published_date":"2016-09-11","updated":"WEEKLY"},{"list_name":"Combined Print and E-Book Nonfiction","display_name":"Combined Print & E-Book Nonfiction","list_name_encoded":"combined-print-and-e-book-nonfiction","oldest_published_date":"2011-02-13","newest_published_date":"2016-09-11","updated":"WEEKLY"},{"list_name":"Hardcover Fiction","display_name":"Hardcover Fiction","list_name_encoded":"hardcover-fiction","oldest_published_date":"2008-06-08","newest_published_date":"2016-09-11","updated":"WEEKLY"},{"list_name":"Hardcover Nonfiction","display_name":"Hardcover Nonfiction","list_name_encoded":"hardcover-nonfiction","oldest_published_date":"2008-06-08","newest_publi

It is not possible to read the raw response. Instead, we need to decode the raw response as JSON and use the `json` library to print it.

In [7]:
import json
print(json.dumps(response.json(), indent=2))

{
  "num_results": 53,
  "status": "OK",
  "results": [
    {
      "updated": "WEEKLY",
      "newest_published_date": "2016-09-11",
      "display_name": "Combined Print & E-Book Fiction",
      "list_name": "Combined Print and E-Book Fiction",
      "list_name_encoded": "combined-print-and-e-book-fiction",
      "oldest_published_date": "2011-02-13"
    },
    {
      "updated": "WEEKLY",
      "newest_published_date": "2016-09-11",
      "display_name": "Combined Print & E-Book Nonfiction",
      "list_name": "Combined Print and E-Book Nonfiction",
      "list_name_encoded": "combined-print-and-e-book-nonfiction",
      "oldest_published_date": "2011-02-13"
    },
    {
      "updated": "WEEKLY",
      "newest_published_date": "2016-09-11",
      "display_name": "Hardcover Fiction",
      "list_name": "Hardcover Fiction",
      "list_name_encoded": "hardcover-fiction",
      "oldest_published_date": "2008-06-08"
    },
    {
      "updated": "WEEKLY",
      "newest_published_date":

Now, this is much better! We can easily see that the response consists of a response status, the number of results and a list of the best-seller lists. For each of these lists, we get information about its name, its update frequency, its lifetime and its codename. 

Instead of JSON, we can also set the response type to be XML.

In [97]:
response = requests.get("http://api.nytimes.com/svc/books/v3/lists/names.xml?api-key=%s"%(key))
print(response.text[:500])

<?xml version="1.0" encoding="UTF-8"?>
<result_set><status>OK</status><copyright>Copyright (c) 2016 The New York Times Company.  All Rights Reserved.</copyright><num_results>53</num_results><results><result><list_name>Combined Print and E-Book Fiction</list_name><display_name><![CDATA[Combined Print & E-Book Fiction]]></display_name><list_name_encoded>combined-print-and-e-book-fiction</list_name_encoded><oldest_published_date>2011-02-13</oldest_published_date><newest_published_date>2016-09-11</n


Again, as before, we can use a library to print the XML in a readable way. 

In [11]:
import xml.dom.minidom

xml_parser = xml.dom.minidom.parseString(response.text)
pretty_response = xml_parser.toprettyxml()

print(pretty_response)

<?xml version="1.0" ?>
<result_set>
	<status>OK</status>
	<copyright>Copyright (c) 2016 The New York Times Company.  All Rights Reserved.</copyright>
	<num_results>53</num_results>
	<results>
		<result>
			<list_name>Combined Print and E-Book Fiction</list_name>
			<display_name>
<![CDATA[Combined Print & E-Book Fiction]]>			</display_name>
			<list_name_encoded>combined-print-and-e-book-fiction</list_name_encoded>
			<oldest_published_date>2011-02-13</oldest_published_date>
			<newest_published_date>2016-09-11</newest_published_date>
			<updated>WEEKLY</updated>
		</result>
		<result>
			<list_name>Combined Print and E-Book Nonfiction</list_name>
			<display_name>
<![CDATA[Combined Print & E-Book Nonfiction]]>			</display_name>
			<list_name_encoded>combined-print-and-e-book-nonfiction</list_name_encoded>
			<oldest_published_date>2011-02-13</oldest_published_date>
			<newest_published_date>2016-09-11</newest_published_date>
			<updated>WEEKLY</updated>
		</result>
		<result>
			<list

-----------------

## Step 2. Parsing the responses

In this section, we practice some of the basic Python tools that you have learned so far and the powerful string handling methods that Python offers. Our goal is to be able to pick the interesting parts of the response and transform them in a format that will be useful to us.

Our first task will be to isolate the names of all the best-seller lists of the NYT. Fill in the rest of the ```print_names_from_XML()``` function that reads the XML response and prints all these names.

Hint: Our _pretty_ formatter puts each tag on a separate line. You may want to read the documentation of the [`split()`](https://docs.python.org/3/library/stdtypes.html#str.split), [`strip()`](https://docs.python.org/3/library/stdtypes.html#str.strip) and `startswith()` functions.

In [43]:
def print_names_from_XML(response):
    """Prints the names of all the best-seller lists that are in the response.
    
    Parameters:
        response: Response object
        The response object that is a result of a get request for the names of the
        best-selling lists from the Books API. 
    
    """   
    xml_parser = xml.dom.minidom.parseString(response.text)
    pretty_response = xml_parser.toprettyxml()
    
    # Fill-in the code that prints the list names

In [42]:
response = requests.get("http://api.nytimes.com/svc/books/v3/lists/names.xml?api-key=%s"%(key))
print_names_from_XML(response)

Combined Print and E-Book Fiction
Combined Print and E-Book Nonfiction
Hardcover Fiction
Hardcover Nonfiction
Trade Fiction Paperback
Mass Market Paperback
Paperback Nonfiction
E-Book Fiction
E-Book Nonfiction
Hardcover Advice
Paperback Advice
Advice How-To and Miscellaneous
Chapter Books
Childrens Middle Grade
Childrens Middle Grade E-Book
Childrens Middle Grade Hardcover
Childrens Middle Grade Paperback
Paperback Books
Picture Books
Series Books
Young Adult
Young Adult E-Book
Young Adult Hardcover
Young Adult Paperback
Hardcover Graphic Books
Paperback Graphic Books
Manga
Combined Print Fiction
Combined Print Nonfiction
Animals
Business Books
Celebrities
Crime and Punishment
Culture
Education
Espionage
Expeditions Disasters and Adventures
Fashion Manners and Customs
Food and Fitness
Games and Activities
Hardcover Business Books
Health
Humor
Indigenous Americans
Relationships
Paperback Business Books
Family
Hardcover Political Books
Race and Civil Rights
Religion Spirituality and Fait

Can you do the same thing by using the ElementTree XML API (see, [XML tree and elements](https://docs.python.org/3/library/xml.etree.elementtree.html#xml-tree-and-elements) and [XPath support]( https://docs.python.org/3/library/xml.etree.elementtree.html#xpath-support)) ?

In [44]:
import xml.etree.ElementTree as ET
def XPath_print_names_from_XML(response):
    """Prints the names of all the best-seller lists that are in the response.
    
    Parameters:
        response: Response object
        The response object that is a result of a get request for the names of the
        best-selling lists from the Books API. 
    
    """     
    # Fill-in the code that prints the list names

Can you do the same thing for the JSON response? Notice that a JSON object is basically a dictionary.

In [58]:
def print_names_from_JSON(response):
    """Prints the names of all the best-seller lists that are in the response.
    
    Parameters:
        response: Response object
        The response object that is a result of a get request for the names of the
        best-selling lists from the Books API. 
    
    """     
    # Fill-in the code that prints the list names

In [59]:
response = requests.get("http://api.nytimes.com/svc/books/v3/lists/names.json?api-key=%s"%(key))
print_names_from_JSON(response)

Combined Print and E-Book Fiction
Combined Print and E-Book Nonfiction
Hardcover Fiction
Hardcover Nonfiction
Trade Fiction Paperback
Mass Market Paperback
Paperback Nonfiction
E-Book Fiction
E-Book Nonfiction
Hardcover Advice
Paperback Advice
Advice How-To and Miscellaneous
Chapter Books
Childrens Middle Grade
Childrens Middle Grade E-Book
Childrens Middle Grade Hardcover
Childrens Middle Grade Paperback
Paperback Books
Picture Books
Series Books
Young Adult
Young Adult E-Book
Young Adult Hardcover
Young Adult Paperback
Hardcover Graphic Books
Paperback Graphic Books
Manga
Combined Print Fiction
Combined Print Nonfiction
Animals
Business Books
Celebrities
Crime and Punishment
Culture
Education
Espionage
Expeditions Disasters and Adventures
Fashion Manners and Customs
Food and Fitness
Games and Activities
Hardcover Business Books
Health
Humor
Indigenous Americans
Relationships
Paperback Business Books
Family
Hardcover Political Books
Race and Civil Rights
Religion Spirituality and Fait

Let's try something more complicated. Pick your favorite list. Your task is to print the titles of the books that were best-sellers for the list you picked, on the week of July 1st, 2014.

Notice: If you read the API documentation carefully, you will see that 
>the service returns _20_ results at a time. Use the offset parameter to page through the results.

The total number of books that you should be expecting is returned as `num_results`. It is easier to handle the response if you are working with JSON, so prefer it over the XML.

In [133]:
# Write your code here   

Number of books: 25

TOP SECRET TWENTY-ONE by Janet Evanovich
THE SILKWORM by Robert Galbraith
ALL FALL DOWN by Jennifer Weiner
MR. MERCEDES by Stephen King
THE GOLDFINCH by Donna Tartt
SHATTERED by Kevin Hearne
HARDLINE by Meredith Wild
THE ARRANGEMENT 15 by H. M. Ward
WRITTEN IN MY OWN HEART'S BLOOD by Diana Gabaldon
UNLUCKY 13 by James Patterson and Maxine Paetro
THE HUSBAND'S SECRET by Liane Moriarty
SECOND WATCH by J. A. Jance
THE TARGET by David Baldacci
FIELD OF PREY by John Sandford
THE OLD BLUE LINE by J. A. Jance
ALL THE LIGHT WE CANNOT SEE by Anthony Doerr
TERMINAL CITY by Linda Fairstein
A GAME OF THRONES: FIVE-BOOK SET by George R. R. Martin
ORPHAN TRAIN by Christina Baker Kline
WALLBANGER by Alice Clayton
GUIDEBOOK TO MURDER by Lynn Cahoon
BETTER WHEN HE'S BAD by Jay Crownover
GONE GIRL by Gillian Flynn
THE ONE AND ONLY by Emily Giffin
THE NEIGHBOR by Dean Koontz


Perfect! By now you should know how to _navigate_ the responses of the API.

## Step 3. Putting it all together

We are now ready to tackle our original problem; to compile a summary for the best-sellers over a period of 2 months. 

First, we need to become confident working with dates. Since we want to issue requests that span a period of 2 months, we need to be able to automatically advance a day, without needing to keep the logistics of how many days each month has. To this end, we will use the `datetime` library. Here is an example

In [96]:
import datetime

now = datetime.datetime.now()
print("Now:", now)
print("Now (only date):", now.date())
print("Tomorrow:", now + datetime.timedelta(days=1))
print("Now (formatted):", now.strftime("%d:%m:%Y"))

new_year = "01-01-2017"
new_year_date = datetime.datetime.strptime(new_year, "%m-%d-%Y")
print("Parsed", new_year_date.date())

Now: 2016-09-08 14:29:27.446358
Now (only date): 2016-09-08
Tomorrow: 2016-09-09 14:29:27.446358
Now (formatted): 08:09:2016
Parsed 2017-01-01


For a better look at the documentation, you can check [here](http://pymotw.com/3/datetime/) and [here](https://docs.python.org/3/library/datetime.html#strftime-strptime-behavior).

The basic component of our project will be a function that takes as input a date and a list name, executes as many requests to the Books API as needed to get the list of books for that day and returns the list together with the date of its publication.

To return more than one elements from a function (a tuple of elements) we write
```python
def foo():
    return "foo", 42
```
and then
```python
r = foo()
print r[0] # "foo"
print r[1] # 42
```
or
```python
txt, num = foo()
print txt # "foo"
print num # 42
```

Write a function that, given a list name and a date, returns a tuple with the books that were best-sellers for that date and the date on which the list was published by the NYT.

In [141]:
import datetime
import time

def get_books(date, list_name):
    """Returns a tuple containing the list of books and the publication date of the list
    
    Parameters:
        date: datetime
            The day for which  we want to check the best-selling list.
    
        list_name: string
            The name of best-selling list that want to check. This needs to follow
            the Books API guidelines, e.g. 'hardcore-fiction'.
    
    Returns:
        books_set: set
            The set of books that were best-sellers according to NYT.
        
        published_date: datetime
            The date on which the list was published.
            
    """

Notice that the free API key that we have has a limit of 8 QPS (queries per second). If we send multiple queries and pass this limit, we will get back an error instead of the answer. To avoid this situation, a naive way is, after each query, to wait $1/8=0.125$ seconds. The command for this is 
```python
time.sleep(0.125)
```

Let's now test our function:

In [142]:
date = datetime.date(2014,7,1)
list_name = "hardcover-fiction"
book_list, book_date = get_books(date, list_name)

print(book_list)
print()
print("Published on", book_date)

{'THE INVENTION OF WINGS', 'THE ONE AND ONLY', 'TOP SECRET TWENTY-ONE', "WRITTEN IN MY OWN HEART'S BLOOD", 'MIDNIGHT IN EUROPE', 'ALL FALL DOWN', 'ROGUES', 'NANTUCKET SISTERS', 'THE MATCHMAKER', 'THE VACATIONERS', 'THE HURRICANE SISTERS', 'CHINA DOLLS', 'SKIN GAME', "THE HUSBAND'S SECRET", 'FIELD OF PREY', 'THE SILKWORM', 'SHATTERED', 'MR. MERCEDES', 'GHOST SHIP', 'ALL THE LIGHT WE CANNOT SEE', 'UNLUCKY 13', 'TERMINAL CITY', 'NATCHEZ BURNING', 'THE TARGET', 'THE GOLDFINCH'}

Published on 2014-07-06 00:00:00


Great! The final step is to write a function that takes a time window and a list name, and returns a dictionary with the books that were best-sellers as keys and the number of weeks that they were in the list as values.

In [149]:
import datetime

def most_popular(start_date, end_date, list_name):
    """Returns the books and the number of weeks that were best-sellers for the given time window
    
    Parameters:
        start_date: datetime
            The first day to check.
        
        end_date: datetime
            The last day to check.
            
        list_name: string
            The name of best-selling list that want to check. This needs to follow
            the Books API guidelines, e.g. 'hardcore-fiction'.
    
    Returns:
        books_dict: dictionary
            Dictionary of book titles with the number of weeks on the requested NYT
    """ 

Again, let's test our function. It might take a while to run (because of the QPS limit).

In [150]:
start_date = datetime.date(2014,6,1)
end_date = datetime.date(2014,8,31)
list_name = "hardcover-fiction"

books_dict = most_popular(start_date, end_date, list_name)
for book in books_dict:
    print(book, ":", books_dict[book])

THE SNOW QUEEN : 1
NANTUCKET SISTERS : 1
FAST TRACK : 2
THE ONE AND ONLY : 8
CUT AND THRUST : 3
TOP SECRET : 2
TOP SECRET TWENTY-ONE : 9
CALIFORNIA : 3
THE INVENTION OF WINGS : 14
WRITTEN IN MY OWN HEART'S BLOOD : 10
ROBERT LUDLUM'S THE BOURNE ASCENDANCY : 1
A PERFECT LIFE : 4
SIXTH GRAVE ON THE EDGE : 1
PAW AND ORDER : 1
THE MAGICIAN'S LAND : 2
COP TOWN : 3
A SHIVER OF LIGHT : 2
THE KRAKEN PROJECT : 2
THE BOOK OF LIFE : 5
DELICIOUS! : 3
I AM PILGRIM : 1
THE LINCOLN MYTH : 4
CLOSE YOUR EYES, HOLD HANDS : 2
BORN OF FURY : 1
A LONG TIME GONE : 1
LANDLINE : 1
THE DEAD WILL TELL : 1
ROBERT B. PARKER'S CHEAP SHOT : 1
WILLIAM SHAKESPEARE'S THE JEDI DOTH RETURN : 1
THE HEIST : 5
THE KILL SWITCH : 3
SIGHT UNSEEN : 1
THE HUSBAND'S SECRET : 14
SNIPER'S HONOR : 2
I'VE GOT YOU UNDER MY SKIN : 2
FIELD OF PREY : 6
DAYS OF RAGE : 1
THE SMOKE AT DAWN : 3
FOR ALL TIME : 1
TOM CLANCY: SUPPORT AND DEFEND : 4
SHATTERED : 1
MR. MERCEDES : 11
GHOST SHIP : 4
LOVE LETTERS : 1
SUMMER HOUSE WITH SWIMMING POOL :