# STA 220 Assignment 2

Due __February 20__ by __11:59pm__. Submit your work by uploading it to Gradescope through Canvas.

Please rename this file as "H2_Lastname_Firstname_srnr", where srnr are the last four digits of your student's ID number and export it as as pdf-file. 

The objective of this assignment is to solidify your understanding of Scraping and XML.

Instructions:

1. Provide your solutions in new cells following each exercise description. Create as many new cells as necessary. Use code cells for your Python scripts and Markdown cells for explanatory text or answers to non-coding questions.

2. Prioritize code readability. Just as in writing a book, the clarity of each line matters. Adopt the __one-statement-per-line__ rule. If you have a lengthy code statement, consider breaking it into multiple lines for clarity. Note you can use `'''` to start and end strings in Python that are written over multiple lines.

3. To help understand and maintain code, you should add comments to explain your code. Use the hash symbol (#) to start writing a comment.

4. Submit your final work as a __.pdf__ file on __Gradescope__. To convert your .ipynb file into one of these formats, navigate to "File", select "Download as", and then choose either "PDF via LaTeX" or "HTML". If "PDF via LaTeX" does not work for you, export to "HTML", and then use Chrome to print the .html file into PDF. Gradescope only accepts PDF files.

5. This assignment will be graded on your proficiency in programming. Be sure to demonstrate your abilities and submit your own, correct and readable solutions. 

## Setting

We will scrape the website [books.toscrape.com](https://books.toscrape.com) and use an XML parser to get the information. You may also use Beautifoulsoup4 instead. The following packages may be useful:

In [41]:
import requests
import lxml.html as lx
import re
import pandas as pd
import time

Furthermore, you want to declare some variables before. Feel free to adjust them (in particular the headers):

In [3]:
base_url = 'https://books.toscrape.com/'
headers = {
    'User-Agent': "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/144.0.0.0 Safari/537.36"
}

## Exercise 1 [5 Points]

### 1a) [2 Points]

#### Task

Write a function `get_categories` (no arguments) that returns a dictionary consisting of:
- Keys: the Book categories (such as Travel, History) that can be found on the left side of the page `https://books.toscrape.com/index.html`
- Values: the _relative_ links to the page that lists all books of the category. The relative link should sart with 'catalogue/cateogry' and end with 'index.html'

#### Solution START

In [None]:
def get_categories():
    #define the url
    url = "https://books.toscrape.com/index.html"
    #get the data from the url
    result = requests.get(url)
    #check the status
    result.raise_for_status()
    #get the info from the sidebar of the webpage    
    sidebar = lx.fromstring(result.content)

    #Create an empty dictionary called categories
    categories = {}

    # for loop that gets the categories from the sidebar
    for i in sidebar.xpath('//div[contains(@class,"side_categories")]//ul/li/ul/li/a'):
        name = i.text_content().strip()
        href = (i.get("href") or "").strip()

        # make the path relative
        while href.startswith("../"):
            href = href[3:]
        href = href.lstrip("/")
        # make sure they sart with 'catalogue/cateogry' and end with 'index.html'
        if href.startswith("catalogue/category/") and href.endswith("index.html"):
            categories[name] = href
    
    return categories

#### Solution END

Please run the following code to get full credit:

In [8]:
categories = get_categories()
len(categories)

50

#### Example

In [9]:
pd.DataFrame.from_dict(categories, orient = 'index').head()

Unnamed: 0,0
Travel,catalogue/category/books/travel_2/index.html
Mystery,catalogue/category/books/mystery_3/index.html
Historical Fiction,catalogue/category/books/historical-fiction_4/...
Sequential Art,catalogue/category/books/sequential-art_5/inde...
Classics,catalogue/category/books/classics_6/index.html


### 1b) [1 Points]

#### Task

Write a function `get_books_from_page` that gets an `url` as argument and returns a list of links to the books found on this page (without clicking on the next button). The `url` shall be one link that refers to a page if you click on one of the categories, e.g., [https://books.toscrape.com/catalogue/category/books/classics_6/index.html](https://books.toscrape.com/catalogue/category/books/classics_6/index.html).
The function should return `None` if the page does not contain any books. (See the examples below.)

#### Solution START

In [10]:
def get_books_from_page(url: str):
    #try to get the info from the webpage but if there is no book info return None
    try:
        result = requests.get(url, timeout=30)
        result.raise_for_status()
    except requests.exceptions.HTTPError:
        return None

    #Get the links
    links = lx.fromstring(result.content)
    hrefs = links.xpath('//article[contains(@class,"product_pod")]//h3/a/@href')
    if not hrefs:
        return None
    #define the base url
    base = "https://books.toscrape.com/catalogue/"
    #Make the empty list
    list_of_links = []

    for href in hrefs:
        #Create the full url and add it to the list
        clean = href.replace("../", "")
        full_url = base + clean
        list_of_links.append(full_url)

    return list_of_links

#### Solution END

Please run the following code to get full credit:

In [11]:
get_books_from_page('https://books.toscrape.com/catalogue/category/books/art_25/index.html')

['https://books.toscrape.com/catalogue/wall-and-piece_971/index.html',
 'https://books.toscrape.com/catalogue/feathers-displays-of-brilliant-plumage_695/index.html',
 'https://books.toscrape.com/catalogue/art-and-fear-observations-on-the-perils-and-rewards-of-artmaking_559/index.html',
 'https://books.toscrape.com/catalogue/the-new-drawing-on-the-right-side-of-the-brain_550/index.html',
 'https://books.toscrape.com/catalogue/history-of-beauty_521/index.html',
 'https://books.toscrape.com/catalogue/the-story-of-art_500/index.html',
 'https://books.toscrape.com/catalogue/the-art-book_490/index.html',
 'https://books.toscrape.com/catalogue/ways-of-seeing_94/index.html']

In [12]:
get_books_from_page('https://books.toscrape.com/catalogue/category/books/art_25/page-2.html') is None

True

#### Example

In [13]:
get_books_from_page('https://books.toscrape.com/catalogue/category/books/classics_6/index.html')[:5]

['https://books.toscrape.com/catalogue/the-secret-garden_413/index.html',
 'https://books.toscrape.com/catalogue/the-metamorphosis_409/index.html',
 'https://books.toscrape.com/catalogue/the-pilgrims-progress_353/index.html',
 'https://books.toscrape.com/catalogue/the-hound-of-the-baskervilles-sherlock-holmes-5_348/index.html',
 'https://books.toscrape.com/catalogue/little-women-little-women-1_331/index.html']

In [14]:
get_books_from_page('https://books.toscrape.com/catalogue/category/books/classics_6/page-3.html') is None

True

### 1c) [2 Points]

#### Task

Write a function `get_all_books_of_category` that takes a string `category` as argument and does the following:
- Use the dictionary `categories` from 1a) to look up the link to the first page of the category (ending with `.index.html`)
- Call the function `get_books_from_page` for the first page of the category and store the result as a list `book_list`.
- Loop i from 2 to 10:
- Call the function `get_books_from_page` for the i-th page of the category (ending with `.page-i.html`) and add it to `book_list`.
- Stop the loop if the function returns a None. In particular, the loop shall not try to access the page $i$ if already the $(i-1)$-th page returned a None.

Afterwards, the function shall return the list `book_links` that contains urls to all books of this category. (See examples below.)
You may print a statement how many pages could be found for the category.

Note that the return must be a list whose elements are urls (strings). In particular, it must not be a list of lists!

Remark: The difference between 1c) and 1b) is that here we want to get all books of one category while for 1b) we had to get all books that were listed on one of the pages of a category. Thus, 1c) is more or less applying 1b) to all pages of a category.

#### Solution START

In [15]:
def get_all_books_of_category(category: str):
    #Use the dictionary `categories` from 1a) to look up the link to the first page of the category
    categories = get_categories() 
    if category not in categories:
        raise KeyError(f"Unknown category: {category}")

    #Create the links
    base_url = "https://books.toscrape.com/"
    relative_link = categories[category]                    
    full_url = base_url + relative_link

    #Call the function `get_books_from_page` for the first page of the category and store the result as a list `book_list`.
    book_list = get_books_from_page(full_url)         
    if book_list is None:
        print(f"Found 0 pages for category '{category}'.")
        return []
    #Set the first page number as 1
    page_number = 1
    #Create the page 1 url
    page_url_template = full_url.replace("index.html", "page-{}.html")
    #Loop through pages 2-10
    for i in range(2, 11):
        page_url = page_url_template.format(i)
        page_books = get_books_from_page(page_url)
        #stop the loop
        if page_books is None:
            break
        #or keep the loop going by adding 1 to the page number
        book_list.extend(page_books) 
        page_number += 1
    #print the number of pages found per category
    print(f"Found {page_number} page(s) for category '{category}'.")
    return book_list


#### Solution END

Please run the following code to get full credit:

In [16]:
get_all_books_of_category('Art')

Found 1 page(s) for category 'Art'.


['https://books.toscrape.com/catalogue/wall-and-piece_971/index.html',
 'https://books.toscrape.com/catalogue/feathers-displays-of-brilliant-plumage_695/index.html',
 'https://books.toscrape.com/catalogue/art-and-fear-observations-on-the-perils-and-rewards-of-artmaking_559/index.html',
 'https://books.toscrape.com/catalogue/the-new-drawing-on-the-right-side-of-the-brain_550/index.html',
 'https://books.toscrape.com/catalogue/history-of-beauty_521/index.html',
 'https://books.toscrape.com/catalogue/the-story-of-art_500/index.html',
 'https://books.toscrape.com/catalogue/the-art-book_490/index.html',
 'https://books.toscrape.com/catalogue/ways-of-seeing_94/index.html']

#### Example

In [17]:
fantasy = get_all_books_of_category('Mystery')

Found 2 page(s) for category 'Mystery'.


In [18]:
fantasy[:5]

['https://books.toscrape.com/catalogue/sharp-objects_997/index.html',
 'https://books.toscrape.com/catalogue/in-a-dark-dark-wood_963/index.html',
 'https://books.toscrape.com/catalogue/the-past-never-ends_942/index.html',
 'https://books.toscrape.com/catalogue/a-murder-in-time_877/index.html',
 'https://books.toscrape.com/catalogue/the-murder-of-roger-ackroyd-hercule-poirot-4_852/index.html']

## Exercise 2 [5 Points]

Consider the following code snippet:
```python
    response = requests.get(url, headers = headers)
    response.raise_for_status()
    response.encoding = "utf-8"
    html = lx.fromstring(response.text)
```
where url is the url (as a string) to the page of one book, e.g. `url = 'https://books.toscrape.com/catalogue/unicorn-tracks_951/index.html'`.

The following functions shall take the object `html` as described above as argument.

Please run the following code to get full credit:

In [19]:
url = 'https://books.toscrape.com/catalogue/unicorn-tracks_951/index.html'
response = requests.get(url, headers = headers)
response.raise_for_status()
response.encoding = "utf-8"
html = lx.fromstring(response.text)

### 2a) [1 Points]

#### Task

Write a function `get_rating` that gets the `html` (as described above) as argument and returns the number of stars (as integer) the book's rating has. For this, you may use the following dictionary:

In [None]:
stars = {"One": 1, "Two": 2, "Three": 3, "Four": 4, "Five": 5}

#### Solution START

In [None]:
def get_rating(html):
    #Given the stars dictionaary
    stars = {"One": 1, "Two": 2, "Three": 3, "Four": 4, "Five": 5}

    # Select the element that contains the star rating
    rating_elem = html.xpath('//p[contains(@class, "star-rating")]')
    #return None if no rating
    if not rating_elem:
        return None

    # Get all class names and find the one that matches the stars dictionary
    rate = rating_elem[0].get("class").split()

    #for every element in rate, if that element matches a key in the starts dict, return that value of the key
    for i in rate:
        if i in stars:
            return stars[i]

    return None

#### Solution END

Please run the following code to get full credit:

In [24]:
get_rating(html)

3

### 2b) [1 Points]

#### Task

Write a function `get_book_title` that gets the `html` (as described above) as argument and returns the book's title (as string).

#### Solution START

In [25]:
def get_book_title(html):
    #parse out the title
    title = html.xpath('//div[contains(@class, "product_main")]//h1/text()')
    if not title:
        return None
    #print the title as a string
    return title[0]

#### Solution END

Please run the following code to get full credit:

In [26]:
get_book_title(html)

'Unicorn Tracks'

### 2c) [1 Points]

#### Task

Write a function `get_stock` that gets the `html` (as described above) as argument and returns how many books are still available (the stock). Note that the return must be an integer, not a string.

#### Solution START

In [28]:
def get_stock(html):
    #parse out the stock
    text = html.xpath('//p[contains(@class, "instock")]/text()')

    if not text:
        return None

    # Strip whitspace from each piece and join them into one clean string
    availability = " ".join(t.strip() for t in text)

    # Extract the number of available books and return it as an integer
    try:
        number = availability.split("(")[1].split()[0]
        return int(number)
    except (IndexError, ValueError):
        return None

#### Solution END

Please run the following code to get full credit:

In [29]:
get_stock(html)

16

In [30]:
type(get_stock(html))

int

### 2d) [2 Points]

#### Task

The following task is meant to combine the previous work/functions to a meaningful output.

Write a function `get_book_info` that takes an url (as string) to one of the book pages as argument and does the following:
- Uses the requests module and an xml parser to get the xml-parsed html code of the page.
- Gets the title `title` of the book
- Calculates how many books are available (`stock`)
- Gets the rating `rating` of the book
- Uses `pd.read_html` to read the one table of the book page that contains information like UPC/Tax/Number of reviews and stores the result as a pandas DataFrame called `table`.
- Adds the following entries to the DataFrame: 'Rating': `rating`, 'Title': `title` and 'Stock': `stock`.
- Sets the column consisting of the descriptions (like 'Rating', 'Title', 'UPC', etc) as index of the DataFrame.
- Returns the DataFrame.

For this task, you may use the functions you defined in earlier tasks.

#### Solution START

In [36]:
def get_book_info(url: str):
    # Uses the requests module and an xml parser to get the xml-parsed html code of the page.
    result = requests.get(url, timeout=30)
    result.raise_for_status()
    result.encoding = "utf-8"
    info = lx.fromstring(result.text)
    #Gets the title `title` of the book
    title = get_book_title(info)
    #Calculates how many books are available (`stock`)
    stock = get_stock(info)
    #Gets the rating `rating` of the book
    rating = get_rating(info)

    # Uses `pd.read_html` to read the one table of the book 
    # page that contains information like UPC/Tax/Number of reviews and 
    # stores the result as a pandas DataFrame called `table`.
    table = pd.read_html(url)[0]   

    # Adds the following entries to the DataFrame: 
    # 'Rating': `rating`, 'Title': `title` and 'Stock': `stock`.
    table.loc[len(table)] = ["Rating", int(rating) if rating is not None else None]
    table.loc[len(table)] = ["Title", title]
    table.loc[len(table)] = ["Stock", int(stock) if stock is not None else None]

    # Sets the column consisting of the descriptions (like 'Rating', 'Title', 'UPC', etc) as index of the DataFrame.
    table = table.set_index([0])
    return table



#### Solution END

Please run the following code to get full credit:

In [37]:
get_book_info("https://books.toscrape.com/catalogue/salt_731/index.html")

Unnamed: 0_level_0,1
0,Unnamed: 1_level_1
UPC,86cbddb61ea78bb7
Product Type,Books
Price (excl. tax),£46.78
Price (incl. tax),£46.78
Tax,£0.00
Availability,In stock (14 available)
Number of reviews,0
Rating,4
Title,salt.
Stock,14


#### Example

In [38]:
get_book_info("https://books.toscrape.com/catalogue/unicorn-tracks_951/index.html")

Unnamed: 0_level_0,1
0,Unnamed: 1_level_1
UPC,7ae099f3898e0209
Product Type,Books
Price (excl. tax),£18.78
Price (incl. tax),£18.78
Tax,£0.00
Availability,In stock (16 available)
Number of reviews,0
Rating,3
Title,Unicorn Tracks
Stock,16


## Exercise 3 [5 Points]

### 3a) [1 Points]

#### Task

Write a function `get_all_books` (no arguments) that does the following:
- Gets all categories using a function from Exercise 1 and does for all categories `c` the following:
- Applies the function `get_all_books_of_category` to `c`.
- For each link to one book `l`, it applies the function `get_book_info` and adds one more line to the returned DataFrame consisting of ['Category': `c`].
- Concatenates all such Dataframes to one single DataFrame.

Afterwards, create a DataFrame `df` that is the return of the function `get_all_books`. Consider using the time module if necessary. 

#### Solution START

In [43]:
def get_all_books():
    #Gets all categories using a function from Exercise 1
    categories = get_categories()  
    #Empty df
    rows = []
    # Applies the function `get_all_books_of_category` to `c` for all categories in c
    for c in categories.keys():
        book_links = get_all_books_of_category(c) 

        #For each link to one book `l`, it applies the function `get_book_info`
        for l in book_links:
            #Avoid server issues
            time.sleep(0.5)
            #get the book info
            info_df = get_book_info(l)
            #Skip if no info
            if info_df is None or info_df.empty:
                continue

            #Convert info_df to a dictionary
            values = info_df.iloc[:, 0]
            book_dict = values.to_dict()

            # adds one more line to the returned DataFrame consisting of ['Category': `c`]
            book_dict["Category"] = c

            rows.append(book_dict)

    df = pd.DataFrame(rows)

    return df

# Build df
df = get_all_books()


Found 1 page(s) for category 'Travel'.
Found 2 page(s) for category 'Mystery'.
Found 2 page(s) for category 'Historical Fiction'.
Found 4 page(s) for category 'Sequential Art'.
Found 1 page(s) for category 'Classics'.
Found 1 page(s) for category 'Philosophy'.
Found 2 page(s) for category 'Romance'.
Found 1 page(s) for category 'Womens Fiction'.
Found 4 page(s) for category 'Fiction'.
Found 2 page(s) for category 'Childrens'.
Found 1 page(s) for category 'Religion'.
Found 6 page(s) for category 'Nonfiction'.
Found 1 page(s) for category 'Music'.
Found 8 page(s) for category 'Default'.
Found 1 page(s) for category 'Science Fiction'.
Found 1 page(s) for category 'Sports and Games'.
Found 4 page(s) for category 'Add a comment'.
Found 3 page(s) for category 'Fantasy'.
Found 1 page(s) for category 'New Adult'.
Found 3 page(s) for category 'Young Adult'.
Found 1 page(s) for category 'Science'.
Found 1 page(s) for category 'Poetry'.
Found 1 page(s) for category 'Paranormal'.
Found 1 page(s) f

#### Solution END

In [44]:
df

Unnamed: 0,UPC,Product Type,Price (excl. tax),Price (incl. tax),Tax,Availability,Number of reviews,Rating,Title,Stock,Category
0,a22124811bfa8350,Books,£45.17,£45.17,£0.00,In stock (19 available),0,2,It's Only the Himalayas,19,Travel
1,ce60436f52c5ee68,Books,£49.43,£49.43,£0.00,In stock (15 available),0,4,Full Moon over Noah’s Ark: An Odyssey to Mount...,15,Travel
2,f9705c362f070608,Books,£48.87,£48.87,£0.00,In stock (14 available),0,3,See America: A Celebration of Our National Par...,14,Travel
3,1809259a5a5f1d8d,Books,£36.94,£36.94,£0.00,In stock (8 available),0,2,Vagabonding: An Uncommon Guide to the Art of L...,8,Travel
4,a94350ee74deaa07,Books,£37.33,£37.33,£0.00,In stock (7 available),0,3,Under the Tuscan Sun,7,Travel
...,...,...,...,...,...,...,...,...,...,...,...
995,2b5054a4192e9b06,Books,£52.65,£52.65,£0.00,In stock (14 available),0,4,Why the Right Went Wrong: Conservatism--From G...,14,Politics
996,3968e3fbf4695d7c,Books,£56.86,£56.86,£0.00,In stock (12 available),0,1,Equal Is Unfair: America's Misguided Fight Aga...,12,Politics
997,bb8245f52c7cce8f,Books,£36.58,£36.58,£0.00,In stock (15 available),0,1,Amid the Chaos,15,Cultural
998,88c21fcd38e2486e,Books,£19.19,£19.19,£0.00,In stock (15 available),0,5,Dark Notes,15,Erotica


### 3b) [1 Points]

#### Task

Use the DataFrame `df` to determine how many books the page `books.toscrape.com` has. Add a new column `Price` to the DataFrame that contains the Price (incl. tax) as float (without the currency).

#### Solution START

In [45]:
#Determine the number of books
num_books = len(df)
print(f"Number of books on books.toscrape.com: {num_books}")

#Add the Price column
df["Price"] = (
    df["Price (incl. tax)"]
    #remove the currency
      .str.replace("£", "", regex=False)
      #make it a float
      .astype(float)
)

Number of books on books.toscrape.com: 1000


#### Solution END

Please run the following code to get full credit:

In [46]:
df.head(10)

Unnamed: 0,UPC,Product Type,Price (excl. tax),Price (incl. tax),Tax,Availability,Number of reviews,Rating,Title,Stock,Category,Price
0,a22124811bfa8350,Books,£45.17,£45.17,£0.00,In stock (19 available),0,2,It's Only the Himalayas,19,Travel,45.17
1,ce60436f52c5ee68,Books,£49.43,£49.43,£0.00,In stock (15 available),0,4,Full Moon over Noah’s Ark: An Odyssey to Mount...,15,Travel,49.43
2,f9705c362f070608,Books,£48.87,£48.87,£0.00,In stock (14 available),0,3,See America: A Celebration of Our National Par...,14,Travel,48.87
3,1809259a5a5f1d8d,Books,£36.94,£36.94,£0.00,In stock (8 available),0,2,Vagabonding: An Uncommon Guide to the Art of L...,8,Travel,36.94
4,a94350ee74deaa07,Books,£37.33,£37.33,£0.00,In stock (7 available),0,3,Under the Tuscan Sun,7,Travel,37.33
5,cc1936a9f4e93477,Books,£44.34,£44.34,£0.00,In stock (7 available),0,2,A Summer In Europe,7,Travel,44.34
6,48736df57e7bec9f,Books,£30.54,£30.54,£0.00,In stock (6 available),0,1,The Great Railway Bazaar,6,Travel,30.54
7,9e60929f521fa280,Books,£56.88,£56.88,£0.00,In stock (6 available),0,4,A Year in Provence (Provence #1),6,Travel,56.88
8,366a236aa1ea6f07,Books,£23.21,£23.21,£0.00,In stock (3 available),0,1,The Road to Little Dribbling: Adventures of an...,3,Travel,23.21
9,747cf7fca2ccdbd4,Books,£38.95,£38.95,£0.00,In stock (3 available),0,3,Neither Here nor There: Travels in Europe,3,Travel,38.95


#### Example

In [47]:
df.head(5)

Unnamed: 0,UPC,Product Type,Price (excl. tax),Price (incl. tax),Tax,Availability,Number of reviews,Rating,Title,Stock,Category,Price
0,a22124811bfa8350,Books,£45.17,£45.17,£0.00,In stock (19 available),0,2,It's Only the Himalayas,19,Travel,45.17
1,ce60436f52c5ee68,Books,£49.43,£49.43,£0.00,In stock (15 available),0,4,Full Moon over Noah’s Ark: An Odyssey to Mount...,15,Travel,49.43
2,f9705c362f070608,Books,£48.87,£48.87,£0.00,In stock (14 available),0,3,See America: A Celebration of Our National Par...,14,Travel,48.87
3,1809259a5a5f1d8d,Books,£36.94,£36.94,£0.00,In stock (8 available),0,2,Vagabonding: An Uncommon Guide to the Art of L...,8,Travel,36.94
4,a94350ee74deaa07,Books,£37.33,£37.33,£0.00,In stock (7 available),0,3,Under the Tuscan Sun,7,Travel,37.33


### 3c) [3 Points]

#### Task

Group the DataFrame `df` of the last task by category and do the following:
- Provide a pandas Series `books_per_category` that lists all categories (as keys) and the number of books of this category (as value).
- Provide a DataFrame `avg_price_per_category` that reports the average book price per category.
- Provide another DataFrame `books_to_order` that contains the $20$ most expensive books whose
  1. stock is less than $10$ AND
  2. rating is at least three stars

#### Solution START

In [65]:
# Make sure the needed columns are numeric
df["Price"] = pd.to_numeric(df["Price"], errors="coerce")
df["Stock"] = pd.to_numeric(df["Stock"], errors="coerce")
df["Rating"] = pd.to_numeric(df["Rating"], errors="coerce")


# Provide a pandas Series `books_per_category` 
# that lists all categories (as keys) and the number of 
# books of this category (as value).
books_per_category = df.groupby("Category").size()
books_per_category.name = "UPC"

# Provide a DataFrame `avg_price_per_category` that 
# reports the average book price per category.
avg_price_per_category = (
    df.groupby("Category", as_index=False)["Price"]
      .mean()
      .sort_values("Category")
)

#Provide another DataFrame `books_to_order` that contains the 20 most expensive books 
# whose stock is less than 10 AND rating is at least three stars
books_to_order = (
    df.loc[(df["Stock"] < 10) & (df["Rating"] >= 3)]
      .sort_values("Price", ascending=False)
      .head(20)
      .loc[:, :]
      .reset_index(drop=True)
)

#### Solution END

Please run the following code to get full credit:

In [66]:
avg_price_per_category

Unnamed: 0,Category,Price
0,Academic,13.12
1,Add a comment,35.796418
2,Adult Fiction,15.36
3,Art,38.52
4,Autobiography,37.053333
5,Biography,33.662
6,Business,32.46
7,Childrens,32.638276
8,Christian,42.496667
9,Christian Fiction,34.385


In [67]:
type(avg_price_per_category)

pandas.DataFrame

In [68]:
books_per_category

Category
Academic                1
Add a comment          67
Adult Fiction           1
Art                     8
Autobiography           9
Biography               5
Business               12
Childrens              29
Christian               3
Christian Fiction       6
Classics               19
Contemporary            3
Crime                   1
Cultural                1
Default               152
Erotica                 1
Fantasy                48
Fiction                65
Food and Drink         30
Health                  4
Historical              2
Historical Fiction     26
History                18
Horror                 17
Humor                  10
Music                  13
Mystery                32
New Adult               6
Nonfiction            110
Novels                  1
Paranormal              1
Parenting               1
Philosophy             11
Poetry                 19
Politics                3
Psychology              7
Religion                7
Romance                35
Sci

In [69]:
books_to_order.shape

(20, 12)

In [70]:
books_to_order

Unnamed: 0,UPC,Product Type,Price (excl. tax),Price (incl. tax),Tax,Availability,Number of reviews,Rating,Title,Stock,Category,Price
0,9cc207168a03470d,Books,£59.99,£59.99,£0.00,In stock (4 available),0,3,The Perfect Play (Play by Play #1),4,Romance,59.99
1,07e6810fd3236bda,Books,£59.98,£59.98,£0.00,In stock (5 available),0,3,Last One Home (New Beginnings #1),5,Fiction,59.98
2,6478ccb4416e6a5d,Books,£59.92,£59.92,£0.00,In stock (6 available),0,5,The Barefoot Contessa Cookbook,6,Food and Drink,59.92
3,9c4d061c1e2fe6bf,Books,£59.71,£59.71,£0.00,In stock (4 available),0,3,The Bone Hunters (Lexy Vaughan & Steven Macaul...,4,Thriller,59.71
4,60376aa71be66083,Books,£59.45,£59.45,£0.00,In stock (6 available),0,4,The Man Who Mistook His Wife for a Hat and Oth...,6,Nonfiction,59.45
5,c53d9fefcda371e9,Books,£59.04,£59.04,£0.00,In stock (3 available),0,5,Life Without a Recipe,3,Autobiography,59.04
6,6e712ea24e77bd96,Books,£58.99,£58.99,£0.00,In stock (1 available),0,3,Listen to Me (Fusion #1),1,Romance,58.99
7,4fd0a2a350f016e6,Books,£58.87,£58.87,£0.00,In stock (9 available),0,4,Unlimited Intuition Now,9,Default,58.87
8,612369a5947a012e,Books,£58.81,£58.81,£0.00,In stock (5 available),0,5,Approval Junkie: Adventures in Caring Too Much,5,Autobiography,58.81
9,63e20a0f98218a87,Books,£58.75,£58.75,£0.00,In stock (1 available),0,4,Myriad (Prentor #1),1,Fantasy,58.75


#### Example

In [71]:
avg_price_per_category.head()

Unnamed: 0,Category,Price
0,Academic,13.12
1,Add a comment,35.796418
2,Adult Fiction,15.36
3,Art,38.52
4,Autobiography,37.053333


In [72]:
books_per_category.head()

Category
Academic          1
Add a comment    67
Adult Fiction     1
Art               8
Autobiography     9
Name: UPC, dtype: int64

In [73]:
books_to_order.head()

Unnamed: 0,UPC,Product Type,Price (excl. tax),Price (incl. tax),Tax,Availability,Number of reviews,Rating,Title,Stock,Category,Price
0,9cc207168a03470d,Books,£59.99,£59.99,£0.00,In stock (4 available),0,3,The Perfect Play (Play by Play #1),4,Romance,59.99
1,07e6810fd3236bda,Books,£59.98,£59.98,£0.00,In stock (5 available),0,3,Last One Home (New Beginnings #1),5,Fiction,59.98
2,6478ccb4416e6a5d,Books,£59.92,£59.92,£0.00,In stock (6 available),0,5,The Barefoot Contessa Cookbook,6,Food and Drink,59.92
3,9c4d061c1e2fe6bf,Books,£59.71,£59.71,£0.00,In stock (4 available),0,3,The Bone Hunters (Lexy Vaughan & Steven Macaul...,4,Thriller,59.71
4,60376aa71be66083,Books,£59.45,£59.45,£0.00,In stock (6 available),0,4,The Man Who Mistook His Wife for a Hat and Oth...,6,Nonfiction,59.45
