## Live Code: Python Sets

In this example we'll play around with call results from the [New York Times Books API](https://developer.nytimes.com/docs/books-product/1/overview) to demonstrate the use of __set operations__ (stay tuned for week 6, to learn more about APIs). 

With the Book API we can access data from the NY Times Bestseller List
The Books API has service, that returns best sellers for a specified date and list-name.
The request requires two parameters: {publishing date} and {list}

We'll look at following categories: 
* Hardcover Fiction
* Hardcover Nonfiction
* Paperback Trade Fiction
* Paperback Nonfiction

These are updated weekly, we’ll look at lists of the current and previous week for comparison.

In the first part of this code we'll create sets of titles for each category and week, in the second section we'll make use of set operations to get insights about the bestsellers. 

Things that we cand find out:
- which books have stayed in the top 15 compared to the previous week? 
- which titles are newcomers?
- ...

### Generating Sets

In [1]:
# import requests and json libraries
import requests
import json

# this function will make requests to the Books API
# and generate sets of bestsellers for different lists
# by passing 'date' as an argument, we can later call this function 
# several times for the lists of the current and the previous weeks
def generateSets(date):
    
    # if you want to play around with the API, please make your own key at https://developer.nytimes.com/
    authorized_key = "QftZeSssSfBqTSFet3RBaTE9inc3iWAw"
    # create list of the categories we want to access:
    categories = ['hardcover-fiction', 'hardcover-nonfiction', 'paperback-nonfiction', 'trade-fiction-paperback']
    
    """ This is an excerpt of the data structure the API will return:      
{(...)
 (...)
 'results': {(...)
     (...)
     'books': [{(...)
         (...)
         'title': 'LITTLE FIRES EVERYWHERE',
         'contributor': 'by Celeste Ng',
    
    """
    
    # our goal is to create a set for each of the above categories, 
    # containing the title of the top 15 books
    
    # step 1: 
    # declare a global variable
    global bestseller_titles 
    
    # create an empty, nested list (one list for each category)
    bestseller_titles = [[],[],[],[]] # they will hold information from the 'title' key 
    
    # step 2: 
    # populate those lists in a nested while loop:
                
    """ PSEUDO CODE: 
    
# iterate through list in 'bestseller_titles': 
n = 0
while n < number of lists in 'bestseller_titles'(4)
    call the api_url, and pass category[n]
    get the response, and store as json 
    
    # access the 'books' key, define 'path' in json structure
    books = data['results']['books']
    # iterate through titles in 'books':
    j = 0
    while j < number of books in 'books':
        add books[j]['title'] to bestseller_titles[n]
        j += 1
        
    n += 1 
    
    """

    n = 0 # create variable 'n' to count
    while n < len(bestseller_titles): # for each empty list 'bestseller_titles'
        # call the API-url
        # use string formatters to parse in the date ('current'), category (with index 'n'), and the API-key
        api_url = "https://api.nytimes.com/svc/books/v3/lists/{}/{}.json?api-key={}".format(date, categories[n], authorized_key)

        # call the API with requests
        response = requests.get(api_url)
        # create a variable called 'data' to hold the json formatted result
        data = response.json()

        # define the 'path' inside the json structure
        books = data['results']['books']
        
        # then iterate through 'titles' in 'books':
        j = 0 # create variable 'j' to count
        # while 'j' is smaller than the number of books
        while j < len(books):
            # add the title to the 'nth' list in 'bestseller_titles'
            bestseller_titles[n].append(books[j]['title'])
            j += 1 # count +1

        n += 1 # count +1
    
    # step 3:
    # print the populated lists as a sanity check
    print(bestseller_titles)

In [2]:
# call the generatSets() function 
# with 'date' = 'current' to recieve this week's bestseller list
generateSets('current')
print(len(bestseller_titles))

4


In [3]:
# create a set from each nested list
hc_fiction_jun21 = set(bestseller_titles[0]) 
hc_nonfiction_jun21 = set(bestseller_titles[1])
pb_nonfiction_jun21 = set(bestseller_titles[2])
pb_fiction_jun21 = set(bestseller_titles[3])

print('Hardcover Fiction, June 21:\n', hc_fiction_jun21)
print('\nHardcover Nonfiction, June 21:\n', hc_nonfiction_jun21)
print('\nPaperback Nonfiction, June 21:\n', pb_nonfiction_jun21)
print('\nPaperback Fiction, June 21:\n', pb_fiction_jun21)

Hardcover Fiction, June 21:

Hardcover Nonfiction, June 21:
 {'ME AND WHITE SUPREMACY', 'UNITED STATES OF SOCIALISM', 'OUR TIME IS NOW', 'BETWEEN THE WORLD AND ME', 'COUNTDOWN 1945', 'THE SPLENDID AND THE VILE', 'UNTAMED', 'HOW TO BE AN ANTIRACIST', 'THE DEFICIT MYTH', "I'M STILL HERE", 'EDUCATED', 'BECOMING', 'MY VANISHING COUNTRY', 'FORTITUDE', 'THE MAMBA MENTALITY'}

Paperback Nonfiction, June 21:
 {'WHY ARE ALL THE BLACK KIDS SITTING TOGETHER IN THE CAFETERIA?', 'THE COLOR OF LAW', 'JUST MERCY', 'THE NEW JIM CROW', 'WAKING UP WHITE', "WHY I'M NO LONGER TALKING TO WHITE PEOPLE ABOUT RACE", 'THE GREAT INFLUENZA', 'THE COLOR OF COMPROMISE', 'WHITE RAGE', 'BORN A CRIME', 'STAMPED FROM THE BEGINNING', 'THE BODY KEEPS THE SCORE', 'WHITE FRAGILITY', 'RAISING WHITE KIDS', 'SO YOU WANT TO TALK ABOUT RACE'}

Paperback Fiction, June 21:
 {'THE BLUEST EYE', 'THE WOMAN IN THE WINDOW', 'ELEANOR OLIPHANT IS COMPLETELY FINE', 'THE FAMILY UPSTAIRS', 'THE NIGHTINGALE', 'THE TATTOOIST OF AUSCHWITZ', 

In [4]:
# call the generatSets() function again
# with 'date' = '2020-06-14' to recieve last week's bestseller list
generateSets('2020-06-14')



In [5]:
# create a set from each nested list
hc_fiction_jun14 = set(bestseller_titles[0]) 
hc_nonfiction_jun14 = set(bestseller_titles[1]) 
pb_nonfiction_jun14 = set(bestseller_titles[2]) 
pb_fiction_jun14 = set(bestseller_titles[3]) 

print('Hardcover Fiction, June 14:\n', hc_fiction_jun14)
print('\nHardcover Nonfiction, June 14:\n', hc_nonfiction_jun14)
print('\nPaperback Nonfiction, June 14:\n', pb_nonfiction_jun14)
print('\nPaperback Fiction, June 14:\n', pb_fiction_jun14)

Hardcover Fiction, June 14:

Hardcover Nonfiction, June 14:
 {'ME AND WHITE SUPREMACY', 'AMERICAN CRUSADE', 'THE SPLENDID AND THE VILE', 'HOLLYWOOD PARK', 'PLAGUE OF CORRUPTION', 'UNTAMED', 'HOW TO BE AN ANTIRACIST', 'THE CHIFFON TRENCHES', 'BREATH', 'EDUCATED', 'HIDDEN VALLEY ROAD', 'BECOMING', 'MY VANISHING COUNTRY', 'FORTITUDE', 'THE MAMBA MENTALITY'}

Paperback Nonfiction, June 14:
 {'THE COLOR OF LAW', 'JUST MERCY', 'THE NEW JIM CROW', 'BRAIDING SWEETGRASS', 'THE GREAT INFLUENZA', 'GRIT', 'THINKING, FAST AND SLOW', 'OUTLIERS', 'BORN A CRIME', 'THE BODY KEEPS THE SCORE', 'A WOMAN OF NO IMPORTANCE', 'WHITE FRAGILITY', 'UNORTHODOX', 'SAPIENS', 'SO YOU WANT TO TALK ABOUT RACE'}

Paperback Fiction, June 14:
 {'THE OVERSTORY', 'THE WOMAN IN THE WINDOW', 'BEACH READ', 'THE NIGHTINGALE', 'THE TATTOOIST OF AUSCHWITZ', 'CALL ME BY YOUR NAME', 'CIRCE', 'THIS TENDER LAND', 'THE BOOK WOMAN OF TROUBLESOME CREEK', 'BEFORE WE WERE YOURS', 'LITTLE FIRES EVERYWHERE', 'CITY OF GIRLS', 'A GENTLEMAN I

## Set Operations

Now that we have declared multiple sets of books, let's make use of set operations to get insights about the bestsellers.

In [6]:
# create an intersection function

def intersection(A, B):
    inter = A & B
    print(inter)

intersection(pb_nonfiction_jun21, pb_nonfiction_jun14)

{'THE COLOR OF LAW', 'JUST MERCY', 'THE NEW JIM CROW', 'THE GREAT INFLUENZA', 'BORN A CRIME', 'THE BODY KEEPS THE SCORE', 'WHITE FRAGILITY', 'SO YOU WANT TO TALK ABOUT RACE'}


In [7]:
# create a difference function

def difference(A, B):
    diff = A - B
    print(diff)
   
difference(pb_nonfiction_jun21, pb_nonfiction_jun14)

{'WHY ARE ALL THE BLACK KIDS SITTING TOGETHER IN THE CAFETERIA?', 'WAKING UP WHITE', "WHY I'M NO LONGER TALKING TO WHITE PEOPLE ABOUT RACE", 'THE COLOR OF COMPROMISE', 'WHITE RAGE', 'STAMPED FROM THE BEGINNING', 'RAISING WHITE KIDS'}


In [8]:
# create a union function 

def union(A, B):
    union = A.union(B)
    print(union)
    
union(pb_nonfiction_jun21, pb_nonfiction_jun14)

{'WHY ARE ALL THE BLACK KIDS SITTING TOGETHER IN THE CAFETERIA?', 'THE COLOR OF LAW', 'THE GREAT INFLUENZA', 'WHITE RAGE', 'A WOMAN OF NO IMPORTANCE', 'WHITE FRAGILITY', 'RAISING WHITE KIDS', 'SAPIENS', 'JUST MERCY', 'THE NEW JIM CROW', 'WAKING UP WHITE', 'BRAIDING SWEETGRASS', "WHY I'M NO LONGER TALKING TO WHITE PEOPLE ABOUT RACE", 'GRIT', 'THINKING, FAST AND SLOW', 'THE COLOR OF COMPROMISE', 'OUTLIERS', 'BORN A CRIME', 'STAMPED FROM THE BEGINNING', 'THE BODY KEEPS THE SCORE', 'UNORTHODOX', 'SO YOU WANT TO TALK ABOUT RACE'}


In [9]:
# perform an operation on more than two sets

all_fiction = pb_fiction_jun14 | pb_fiction_jun21 | hc_fiction_jun21 | hc_fiction_jun14
print(all_fiction)

