# Goodreads
Goodreads is the world’s largest site for readers and book recommendations. Our mission is to help people find and share books they love. Goodreads launched in January 2007.

## A Few Things You Can Do On Goodreads

* See what books your friends are reading.
* Track the books you're reading, have read, and want to read.
* Check out your personalized book recommendations. Our recommendation engine analyzes 20 billion data points to give suggestions tailored to your literary tastes.
* Find out if a book is a good fit for you from our community’s reviews.


## Code
Below you can find the code, in this example I decided to build a function that will get the book's review by the title and author.

In [None]:
import requests
import json
import xml.etree.ElementTree as ET
import config
from bs4 import BeautifulSoup

api_key = config.api_key
secret_key = config.secret_key

### Test connection
Let us thest the connection if we are receiving the response while calling for a concrete title

In [None]:
response = requests.get('http://www.goodreads.com/book/title.xml' + '?key=' + api_key + '&title=' + 'The Sixth Man' + '&author=Andre Iguodala' )

In [None]:
print(response.content.decode('utf-8'))

## Review the response
While reviewing the response from the Goodreads we can see XML scheme. I found ElementTree package especially usefull  to process XML response. 
We parse the HTTPS response into ElementTree and decode it as 'utf-8' to get our root element.
```python
root = ET.fromstring(response.content.decode('utf-8'))
```
While running
```python
root.getchildren()
```
We can see that there are two main elements.
```bash
[<Element 'Request' at 0x00000198BE1368B8>,
 <Element 'book' at 0x00000198BE136D18>
```
The request element will contain information about authentication and method used as in usual response. However the book element contains everything that we need. In this case I recommend to focus only on this one, so I will overwrite the root element.
```python
root = root.find('book')
```


In [None]:
root = ET.fromstring(response.content.decode('utf-8'))
root = root.find('book')

### Check the possible tags
Now we need to check which tags can be used and which data can we extract.
```python
root.getchildren()
```
We can see list of nodes:
```bash
<Element 'id' at 0x000001CA305127C8>,
 <Element 'title' at 0x000001CA30512778>,
 <Element 'isbn' at 0x000001CA305125E8>,
 <Element 'isbn13' at 0x000001CA305126D8>,
 <Element 'asin' at 0x000001CA30512688>,
 <Element 'kindle_asin' at 0x000001CA30512638>,
 <Element 'marketplace_id' at 0x000001CA305124A8>,
 <Element 'country_code' at 0x000001CA30512598>,
 <Element 'image_url' at 0x000001CA30512548>,
 <Element 'small_image_url' at 0x000001CA305124F8>,
 <Element 'publication_year' at 0x000001CA30512368>,
 <Element 'publication_month' at 0x000001CA30512458>,
 <Element 'publication_day' at 0x000001CA30512408>,
 <Element 'publisher' at 0x000001CA305123B8>,
 <Element 'language_code' at 0x000001CA30512228>,
 <Element 'is_ebook' at 0x000001CA30512318>,
 <Element 'description' at 0x000001CA305122C8>,
 <Element 'work' at 0x000001CA30512278>,
 <Element 'average_rating' at 0x000001CA3050BB38>,
 <Element 'num_pages' at 0x000001CA3050BAE8>,
 <Element 'format' at 0x000001CA3050BA98>,
 <Element 'edition_information' at 0x000001CA3050BA48>,
 <Element 'ratings_count' at 0x000001CA3050B9F8>,
 <Element 'text_reviews_count' at 0x000001CA3050B9A8>,
 <Element 'url' at 0x000001CA3050B958>,
 <Element 'link' at 0x000001CA3050B908>,
 <Element 'authors' at 0x000001CA3050B868>,
 <Element 'reviews_widget' at 0x000001CA3050B408>,
 <Element 'popular_shelves' at 0x000001CA3051ECC8>,
 <Element 'book_links' at 0x000001CA3051EF48>,
 <Element 'buy_links' at 0x000001CA30505138>,
 <Element 'series_works' at 0x000001CA305110E8>
```
To get value of the node we need to call additionally the text method, otherwise we will receive only node id.
```python
title = root.find('title').text
```
Some of those nodes were nested. We can check them by finding this tag and runing a search in it i.e.:
```python
year = root.find('work').find('original_publication_year').text
```

Let us start with extraction of the book rating and its title.

In [None]:
name = root.find('title').text
rating = root.find('average_rating').text
print(name, rating)

After testing it, we can now wrap it into a function **get_book_rating** that takes one parameter which is root element.
The same we can do afterwards with description and publication year.

In [None]:
def get_book_rating(book):
    title = book.find('title').text
    rating = book.find('average_rating').text
    return title, rating

In [None]:
get_book_rating(root)

In [None]:
def get_book_description(title, author):
    response = requests.get('http://www.goodreads.com/book/title.xml' + '?key=' + api_key + '&title=' + title + '&author=' + author)
    root = ET.fromstring(response.content.decode('utf-8'))
    for book in root.findall('book'):
        title = book.find('title').text
        description = book.find('description').text
        return title, description

In [None]:
get_book_description('the cambridge mysteries', 'Barbara Cleverly')

In [None]:
def get_book_year(root):
    for book in root.findall('book'):
        title = book.find('title').text
        for work in book.find('work'):
            year = work.find('original_publication_year').text
        return title, year

In [None]:
get_book_year(root)

While playing with the above code we might find that description contains HTML formating such as ```<b>``` or ```\n```. This is why I implemented bs4 extraction of the text from response. For some cases you might find that there was thrown an error due to empty tag. This is the reason why I implemented also try-except.
```python
    try:
        soup = BeautifulSoup(response.find('description').text)
        description = soup.get_text().replace('\n','')
    except: 
        description = None
        print('There was an error while processing the description')
```

### Final function
Below you can find the final functions and sample implementation 

In [None]:
def get_response(title, author, api_key=api_key):
    """This function will querry GoodReads for books, based on the provided title and author. 
    It returns a respons as utf-8 ElementTree.
    By default it will use API key saved under api_key variable."""
    response = requests.get('http://www.goodreads.com/book/title.xml' + '?key=' + api_key + '&title=' + title + '&author=' + author)
    root = ET.fromstring(response.content.decode('utf-8'))
    root = root.find('book')
    
    return root

def get_book_data(response):
    """This function fetches all important data based on the parameter response. 
    It returns as string: title, authors, rating, number of ratings, year of original publication, description and link to cover image."""
    title = response.find('title').text
    rating = response.find('average_rating').text
    try:
        soup = BeautifulSoup(response.find('description').text)
        description = soup.get_text().replace('\n','')
    except: 
        description = None
        print('There was an error while processing the description')
    image_url = response.find('image_url').text
    #for work in response.find('work'):
    year = response.find('work').find('original_publication_year').text
    ratings_count = response.find('work').find('ratings_count').text
    pages = response.find('num_pages').text
    authors = []
    for author in response.find('authors'):
        author = author.find('name').text
        authors.append(author)
    str_authors = str(authors).strip("[]").replace("'","")
    
    return title, str_authors, rating, ratings_count, year, pages, description, image_url

In [None]:
root = get_response('The Cambridge Mysteries', 'Barbara Cleverly')
get_book_data(root)