## Live Code: Python Sets

In this example we'll play around with call results from the [New York Times Books API](https://developer.nytimes.com/docs/books-product/1/overview) to demonstrate the use of __set operations__ (stay tuned for week 6, to learn more about APIs). The Books API provides information about the The New York Times bestsellers lists. We'll look at following categories, which are updated weekly: 
* Hardcover Fiction
* Hardcover Nonfiction
* Paperback Trade Fiction
* Paperback Nonfiction

In the first part of this code we'll create sets of titles for each category and week, in the second section we'll make use of set operations to get insights about the bestsellers. Which books have stayed in the top 15 compared to the previous week? Which titles are newcomers?

### Generating Sets

In [1]:
# import requests and json libraries - needed to make an API call
import requests
import json

# this function will make requests to the nytimes Books API
def generateSets(date):
    
    authorized_key = "QftZeSssSfBqTSFet3RBaTE9inc3iWAw" # if you want to play around with the API, please make your own key at https://developer.nytimes.com/
    # to make a request with this API, we need to choose the specific bestseller lists we want to access 
    categories = ['hardcover-fiction', 'hardcover-nonfiction', 'paperback-nonfiction', 'trade-fiction-paperback']
    
    # our goal is to create a set for each of these categories, containing the respective books' titles and authors
    
    # step 1: 
    # set two global variables
    global bestseller_titles # this one will be populated with information from the 'title' key 
    global bestseller_authors # this one will be populated with information from the 'author' key 
    # declare them as nested list (one list for each category)
    bestseller_titles = [[],[],[],[]] 
    bestseller_authors = [[],[],[],[]] 
    
    # step 2: 
    # populate those lists in a while loop:
    n = 0 # create variable 'n' to count
    while n < len(bestseller_titles): # while 'n' is smaller than number of empty lists in 'bestseller_titles'
        # call the API-url
        # use string formatters to parse in the date ('current'), category (with index 'n'), and the API-key
        api_url = "https://api.nytimes.com/svc/books/v3/lists/{}/{}.json?api-key={}".format(date, categories[n], authorized_key)

        # call the API with requests
        response = requests.get(api_url)
        # create a variable called 'data' to hold the json formatted result
        data = response.json()
        
        """ This is an excerpt of the data structure the API returns:      
{(...)
 (...)
 'results': {(...)
     (...)
     'books': [{(...)
         (...)
         'title': 'LITTLE FIRES EVERYWHERE',
         'contributor': 'by Celeste Ng',
        
        """

        # we want to access the information stored in the key, 'results'
        # 'results' maps to a dictionariy that contains the 'books' list
        # define the 'path'
        books = data['results']['books']
        
        # then iterate through 'titles' and 'contributor' in 'books':
        j = 0 # create variable 'j' to count
        # while 'j' is smaller than the number of books
        while j < len(books):
            # add the title to the 'nth' list in 'bestseller_titles'
            bestseller_titles[n].append(books[j]['title'])
            # add the contributor to the 'nth' list in 'bestseller_authors'
            bestseller_authors[n].append(books[j]['contributor'])
            j += 1 # count +1

        n += 1 # count +1
    
    # step 3:
    # print the populated lists as a sanity check
    print(bestseller_titles)
    print('\n', bestseller_authors)

In [2]:
# call the generatSets() function 
# with 'date' = 'current' to recieve this week's bestseller list
generateSets('current')


 [['by Brit Bennett', 'by Delia Owens', 'by Lucy Foley', 'by John Grisham', 'by Emily Giffin', 'by Michael Connelly', 'by Nora Roberts', 'by Stephen King', 'by Jennifer Weiner', 'by David Baldacci', 'by Jeanine Cummins', 'by Megha Majumdar', 'by Alex Michaelides', 'by Emma Straub', 'by Kiley Reid'], ['by Ibram X. Kendi', 'by Glennon Doyle', 'by Ta-Nehisi Coates', 'by Michelle Obama', 'by Erik Larson', 'by Layla F. Saad', "by Dinesh D'Souza", 'by Tara Westover', 'by James Nestor', 'by Bakari Sellers', 'by Rutger Bregman', 'by Kobe Bryant', 'by Judy Mikovits and Kent Heckenlively', 'by Eric Cervini', 'by Malcolm Gladwell'], ['by Ijeoma Oluo', 'by Robin DiAngelo', 'by Richard Rothstein', 'by Michelle Alexander', 'by Bryan Stevenson', 'by Ibram X. Kendi', 'by Beverly Tatum', 'by Trevor Noah', 'by Jennifer Harvey', 'by Carol Anderson', 'by Bessel van der Kolk', 'by Debby Irving', 'by Brittney Cooper', 'by John M. Barry', 'by Resmaa Menakem'], ['by Celeste Ng', 'by Sally Rooney', 'by Lisa J

In [3]:
# create a set from each nested list
# concatenate the 'title' and 'author' list, using list comprehension and the zip() function
# -> set([template for item1, item2 in zip(list1, list2)])
hc_fiction_jun21 = set([i + ' ' + j for i, j in zip(bestseller_titles[0], bestseller_authors[0])]) 
hc_nonfiction_jun21 = set([i + ' ' + j for i, j in zip(bestseller_titles[1], bestseller_authors[1])])
pb_nonfiction_jun21 = set([i + ' ' + j for i, j in zip(bestseller_titles[2], bestseller_authors[2])])
pb_fiction_jun21 = set([i + ' ' + j for i, j in zip(bestseller_titles[3], bestseller_authors[3])])

print('Hardcover Fiction, June 21:\n', hc_fiction_jun21)
print('\nHardcover Nonfiction, June 21:\n', hc_nonfiction_jun21)
print('\nPaperback Nonfiction, June 21:\n', pb_nonfiction_jun21)
print('\nPaperback Fiction, June 21:\n', pb_fiction_jun21)

Hardcover Fiction, June 21:

Hardcover Nonfiction, June 21:
 {'THE SPLENDID AND THE VILE by Erik Larson', 'ME AND WHITE SUPREMACY by Layla F. Saad', 'HOW TO BE AN ANTIRACIST by Ibram X. Kendi', "UNITED STATES OF SOCIALISM by Dinesh D'Souza", 'EDUCATED by Tara Westover', 'PLAGUE OF CORRUPTION by Judy Mikovits and Kent Heckenlively', 'BECOMING by Michelle Obama', 'BETWEEN THE WORLD AND ME by Ta-Nehisi Coates', 'BREATH by James Nestor', "THE DEVIANT'S WAR by Eric Cervini", 'MY VANISHING COUNTRY by Bakari Sellers', 'UNTAMED by Glennon Doyle', 'THE MAMBA MENTALITY by Kobe Bryant', 'TALKING TO STRANGERS by Malcolm Gladwell', 'HUMANKIND by Rutger Bregman'}

Paperback Nonfiction, June 21:
 {'THE COLOR OF LAW by Richard Rothstein', 'WHITE FRAGILITY by Robin DiAngelo', 'WHY ARE ALL THE BLACK KIDS SITTING TOGETHER IN THE CAFETERIA? by Beverly Tatum', 'THE NEW JIM CROW by Michelle Alexander', 'WAKING UP WHITE by Debby Irving', 'JUST MERCY by Bryan Stevenson', 'BORN A CRIME by Trevor Noah', 'THE GR

In [4]:
# call the generatSets() function again
# with 'date' = '2020-06-14' to recieve last week's bestseller list
generateSets('2020-06-14')


 [['by Delia Owens', 'by Nora Roberts', 'by Michael Connelly', 'by John Grisham', 'by Stephen King', 'by Jennifer Weiner', 'by Jeanine Cummins', 'by David Baldacci', 'by Emma Straub', 'by James Patterson and Maxine Paetro', 'by Alex Michaelides', 'by Jojo Moyes', 'by Sue Monk Kidd', 'by Clive Cussler and Robin Burcell', 'by Scott Turow'], ['by Glennon Doyle', 'by Judy Mikovits and Kent Heckenlively', 'by Michelle Obama', 'by Erik Larson', 'by Ibram X. Kendi', 'by James Nestor', 'by Tara Westover', 'by Mikel Jollett', 'by Pete Hegseth', 'by Layla F. Saad', 'by Dan Crenshaw', 'by Kobe Bryant', 'by Robert Kolker', 'by André Leon Talley', 'by Bakari Sellers'], ['by Robin DiAngelo', 'by John M. Barry', 'by Michelle Alexander', 'by Bessel van der Kolk', 'by Ijeoma Oluo', 'by Bryan Stevenson', 'by Sonia Purnell', 'by Deborah Feldman', 'by Trevor Noah', 'by Yuval Noah Harari', 'by Robin Wall Kimmerer', 'by Angela Duckworth', 'by Malcolm Gladwell', 'by Richard Rothstein', 'by Daniel Kahneman']

In [5]:
# create a set from each nested list
# concatenate the 'title' and 'author' list, using list comprehension and the zip() function
# -> set([template for item1, item2 in zip(list1, list2)])
hc_fiction_jun14 = set([i + ' ' + j for i, j in zip(bestseller_titles[0], bestseller_authors[0])]) 
hc_nonfiction_jun14 = set([i + ' ' + j for i, j in zip(bestseller_titles[1], bestseller_authors[1])])
pb_nonfiction_jun14 = set([i + ' ' + j for i, j in zip(bestseller_titles[2], bestseller_authors[2])])
pb_fiction_jun14 = set([i + ' ' + j for i, j in zip(bestseller_titles[3], bestseller_authors[3])])

print('Hardcover Fiction, June 14:\n', hc_fiction_jun14)
print('\nHardcover Nonfiction, June 14:\n', hc_nonfiction_jun14)
print('\nPaperback Nonfiction, June 14:\n', pb_nonfiction_jun14)
print('\nPaperback Fiction, June 14:\n', pb_fiction_jun14)

Hardcover Fiction, June 14:

Hardcover Nonfiction, June 14:
 {'THE SPLENDID AND THE VILE by Erik Larson', 'ME AND WHITE SUPREMACY by Layla F. Saad', 'HOW TO BE AN ANTIRACIST by Ibram X. Kendi', 'PLAGUE OF CORRUPTION by Judy Mikovits and Kent Heckenlively', 'EDUCATED by Tara Westover', 'FORTITUDE by Dan Crenshaw', 'HIDDEN VALLEY ROAD by Robert Kolker', 'BECOMING by Michelle Obama', 'THE CHIFFON TRENCHES by André Leon Talley', 'BREATH by James Nestor', 'AMERICAN CRUSADE by Pete Hegseth', 'HOLLYWOOD PARK by Mikel Jollett', 'MY VANISHING COUNTRY by Bakari Sellers', 'UNTAMED by Glennon Doyle', 'THE MAMBA MENTALITY by Kobe Bryant'}

Paperback Nonfiction, June 14:
 {'SAPIENS by Yuval Noah Harari', 'THE COLOR OF LAW by Richard Rothstein', 'WHITE FRAGILITY by Robin DiAngelo', 'A WOMAN OF NO IMPORTANCE by Sonia Purnell', 'GRIT by Angela Duckworth', 'UNORTHODOX by Deborah Feldman', 'THE NEW JIM CROW by Michelle Alexander', 'THINKING, FAST AND SLOW by Daniel Kahneman', 'JUST MERCY by Bryan Stevens

## Set Operations

Now that we have declared multiple sets of books, let's make use of set operations to get insights about the bestsellers.

In [6]:
# create an intersection function to test if a books shows up in two categories
def intersection(A , B): 
    inter = set(A) & set(B)
    print('A & B\nFollowing books match your criteria:\n{}\n'.format(inter))

# call the function
# show titles in paperback nonfiction, that were both on this and last week's bestseller list
intersection(pb_nonfiction_jun21, pb_nonfiction_jun14)

A & B
Following books match your criteria:
{'THE COLOR OF LAW by Richard Rothstein', 'WHITE FRAGILITY by Robin DiAngelo', 'THE NEW JIM CROW by Michelle Alexander', 'JUST MERCY by Bryan Stevenson', 'THE GREAT INFLUENZA by John M. Barry', 'BORN A CRIME by Trevor Noah', 'SO YOU WANT TO TALK ABOUT RACE by Ijeoma Oluo', 'THE BODY KEEPS THE SCORE by Bessel van der Kolk'}



In [7]:
# create a difference function
def difference(A , B): 
    diff = set(A) - set(B)
    print('A - B\nFollowing books match your criteria:\n{}\n'.format(diff))

# call the function
# show this week's newcomers in the paperback nonfiction category
difference(pb_nonfiction_jun21, pb_nonfiction_jun14)

A - B
Following books match your criteria:
{'WHY ARE ALL THE BLACK KIDS SITTING TOGETHER IN THE CAFETERIA? by Beverly Tatum', 'WAKING UP WHITE by Debby Irving', 'ELOQUENT RAGE by Brittney Cooper', 'RAISING WHITE KIDS by Jennifer Harvey', 'STAMPED FROM THE BEGINNING by Ibram X. Kendi', 'WHITE RAGE by Carol Anderson', "MY GRANDMOTHER'S HANDS by Resmaa Menakem"}



In [8]:
# create a union function to show two categories combined
def union(A , B): 
    union = set(A) | set(B)
    print('A | B\nFollowing books match your criteria:\n{}\n'.format(union))

# call the function
# show paperback nonfiction titles of this and last week combined
union(pb_nonfiction_jun21, pb_nonfiction_jun14)

A | B
Following books match your criteria:
{'THE COLOR OF LAW by Richard Rothstein', 'WHITE FRAGILITY by Robin DiAngelo', 'A WOMAN OF NO IMPORTANCE by Sonia Purnell', 'GRIT by Angela Duckworth', 'UNORTHODOX by Deborah Feldman', 'THINKING, FAST AND SLOW by Daniel Kahneman', 'JUST MERCY by Bryan Stevenson', 'WHITE RAGE by Carol Anderson', "MY GRANDMOTHER'S HANDS by Resmaa Menakem", 'SAPIENS by Yuval Noah Harari', 'WHY ARE ALL THE BLACK KIDS SITTING TOGETHER IN THE CAFETERIA? by Beverly Tatum', 'THE NEW JIM CROW by Michelle Alexander', 'WAKING UP WHITE by Debby Irving', 'BORN A CRIME by Trevor Noah', 'THE GREAT INFLUENZA by John M. Barry', 'BRAIDING SWEETGRASS by Robin Wall Kimmerer', 'SO YOU WANT TO TALK ABOUT RACE by Ijeoma Oluo', 'ELOQUENT RAGE by Brittney Cooper', 'OUTLIERS by Malcolm Gladwell', 'RAISING WHITE KIDS by Jennifer Harvey', 'STAMPED FROM THE BEGINNING by Ibram X. Kendi', 'THE BODY KEEPS THE SCORE by Bessel van der Kolk'}



In [9]:
# Show ALL nonfiction bestsellers, current and last week combined
all_nonfiction = pb_nonfiction_jun21 | pb_nonfiction_jun14 | hc_nonfiction_jun21 | hc_nonfiction_jun14
print(all_nonfiction)

{"UNITED STATES OF SOCIALISM by Dinesh D'Souza", 'HOW TO BE AN ANTIRACIST by Ibram X. Kendi', 'PLAGUE OF CORRUPTION by Judy Mikovits and Kent Heckenlively', 'THE COLOR OF LAW by Richard Rothstein', 'WHITE FRAGILITY by Robin DiAngelo', 'A WOMAN OF NO IMPORTANCE by Sonia Purnell', 'GRIT by Angela Duckworth', 'THINKING, FAST AND SLOW by Daniel Kahneman', 'BECOMING by Michelle Obama', 'FORTITUDE by Dan Crenshaw', "THE DEVIANT'S WAR by Eric Cervini", 'AMERICAN CRUSADE by Pete Hegseth', 'MY VANISHING COUNTRY by Bakari Sellers', 'WHITE RAGE by Carol Anderson', 'ME AND WHITE SUPREMACY by Layla F. Saad', 'EDUCATED by Tara Westover', 'WHY ARE ALL THE BLACK KIDS SITTING TOGETHER IN THE CAFETERIA? by Beverly Tatum', 'THE NEW JIM CROW by Michelle Alexander', 'WAKING UP WHITE by Debby Irving', 'BETWEEN THE WORLD AND ME by Ta-Nehisi Coates', 'BORN A CRIME by Trevor Noah', 'SO YOU WANT TO TALK ABOUT RACE by Ijeoma Oluo', 'ELOQUENT RAGE by Brittney Cooper', 'OUTLIERS by Malcolm Gladwell', 'STAMPED FROM