### Part I: Working with HTML, XML, and JSON
##### read json file into a dataframe

In [1]:
#import pandas and json into python
import pandas as pd
import json

In [2]:
#open the web page containing the json data
json = pd.read_json('https://raw.githubusercontent.com/gegeli638/DAV-5400/master/books.json')
#rearrange the columns
json = json[['name','author','isbn','publish_date']]
#check the results
json

Unnamed: 0,name,author,isbn,publish_date
0,Harry Potter,J.K. Rowling,9788700631625,1997
1,Computer Programming for Kids and Other Beginners,"[Warren Sande, Carter Sande]",9783446422278,2009
2,Python for Kids,Jason.R.Briggs,9781457185533,2009


##### read html file into a dataframe

In [3]:
#open the web page containing the html data
tables = pd.read_html('https://raw.githubusercontent.com/gegeli638/DAV-5400/master/books.html')
#the first item in the list is a data frame
html = tables[0]
#rearrange the columns
html = html[['name','author','isbn','publish_date']]
#check the results
html

Unnamed: 0,name,author,isbn,publish_date
0,Harry Potter,J.K.Rowling,9788700631625,1997
1,Computer Programming for Kids and Other Beginners,"Warren Sander,Carter Sande",9783446422278,2009
2,Python for Kids,Jason.R.Briggs,9781457185533,2009


##### read xml file into a dataframe

In [4]:
#load the urllib.request function so that we can use a web path with the 
import urllib.request
#load the objectify() function from the lxml library
from lxml import objectify
#open the web page containing the data set
path, headers = urllib.request.urlretrieve('https://raw.githubusercontent.com/gegeli638/DAV-5400/master/books.xml')
#objectify() is then used to parse the web page
parsed = objectify.parse(open(path))
#now get a reference to the root node of the XML file
root = parsed.getroot()

In [5]:
#define an empty list that will be used to store the parsed data
data = []
#root.book is a generator that we use to extract each <book> element from the XML data
for elt in root.book:
    el_data = {}
    for child in elt.getchildren():
        el_data[child.tag] = child.pyval
    data.append(el_data)

In [6]:
#check the results
xml = pd.DataFrame(data)
#rearrange the columns
xml = xml[['name','author','isbn','publish_date']]
xml

Unnamed: 0,name,author,isbn,publish_date
0,Harry Potter,J K. Rowling,9788700631625,1997
1,Computer Programming for Kids and Other Beginners,"Warren Sande, Carter Sande",9783446422278,2009
2,Python for Kids,Jason.R.Briggs,9781457185533,2009


##### Now shows each of the three sources in dataframe. The three dataframes are identical.


### Part II : Working with Web API’s


- Firstly, singing up the New York Times web site and get an account. 
- Choose one of the rich set of apis and get the api key. 
- I choose books api and this is my api key: ftk7RSDL8GkRdRBaOAscaoHpgV5nkOYk. 
- In 'try this api', get an url and request for this api website.
- Via this api, read json data in python and then transform it into a dataframe.
- Now we can search any books we want in the 'Enter the book:'.

In [7]:
import requests

book=input('Enter the book:')

#submit our request via the developer.nytimes api
#note the inclusion of the required API key
res=requests.get('https://api.nytimes.com/svc/books/v3/reviews.json?title='+book+'&api-key=ftk7RSDL8GkRdRBaOAscaoHpgV5nkOYk')

#test if this resques work
#print(res)

#read in json data and transform it into a dataframe
book_name = pd.DataFrame(res.json()['results'])
book_name

Enter the book:1q84


Unnamed: 0,book_author,book_title,byline,isbn13,publication_dt,summary,uri,url,uuid
0,Haruki Murakami,1Q84,JANET MASLIN,"[9780307476463, 9780307593313, 9780307957023, ...",2011-11-10,"In “1Q84,” the Japanese novelist Haruki Muraka...",nyt://book/00000000-0000-0000-0000-000000000000,http://www.nytimes.com/2011/11/10/books/1q84-b...,00000000-0000-0000-0000-000000000000
1,Haruki Murakami,1Q84,KATHRYN SCHULZ,"[9780307476463, 9780307593313, 9780307957023, ...",2011-11-06,Haruki Murakami has translated Raymond Chandle...,nyt://book/00000000-0000-0000-0000-000000000000,http://www.nytimes.com/2011/11/06/books/review...,00000000-0000-0000-0000-000000000000


- In this books api, we can also use another method to get the dataframe.
- set up the books type as Hardcover Fiction, use the url and api key to request the website.
- read json data in python via this api and trandform it into a dataframe.

In [8]:
import requests

#submit our request via the developer.nytimes api
#note the inclusion of the required API key
res=requests.get('https://api.nytimes.com/svc/books/v3/lists.json?list=Hardcover%20Fiction&api-key=ftk7RSDL8GkRdRBaOAscaoHpgV5nkOYk')

#test if this resques work
#print(res)

#read in json data and transform it into a dataframe
book_name = pd.DataFrame(res.json()['results'])
book_name

Unnamed: 0,amazon_product_url,asterisk,bestsellers_date,book_details,dagger,display_name,isbns,list_name,published_date,rank,rank_last_week,reviews,weeks_on_list
0,https://www.amazon.com/Minute-Midnight-Atlee-P...,0,2019-11-23,"[{'title': 'A MINUTE TO MIDNIGHT', 'descriptio...",0,Hardcover Fiction,"[{'isbn10': '1538761602', 'isbn13': '978153876...",Hardcover Fiction,2019-12-08,1,0,"[{'book_review_link': '', 'first_chapter_link'...",1
1,https://www.amazon.com/Where-Crawdads-Sing-Del...,0,2019-11-23,"[{'title': 'WHERE THE CRAWDADS SING', 'descrip...",0,Hardcover Fiction,"[{'isbn10': '0735219095', 'isbn13': '978073521...",Hardcover Fiction,2019-12-08,2,4,"[{'book_review_link': '', 'first_chapter_link'...",64
2,https://www.amazon.com/Guardians-Novel-John-Gr...,0,2019-11-23,"[{'title': 'THE GUARDIANS', 'description': 'Cu...",0,Hardcover Fiction,"[{'isbn10': '0385544189', 'isbn13': '978038554...",Hardcover Fiction,2019-12-08,3,2,"[{'book_review_link': '', 'first_chapter_link'...",6
3,https://www.amazon.com/Twisted-Twenty-Six-Step...,0,2019-11-23,"[{'title': 'TWISTED TWENTY-SIX', 'description'...",0,Hardcover Fiction,"[{'isbn10': '0399180192', 'isbn13': '978039918...",Hardcover Fiction,2019-12-08,4,1,"[{'book_review_link': '', 'first_chapter_link'...",2
4,https://www.amazon.com/Blue-Moon-Jack-Reacher-...,0,2019-11-23,"[{'title': 'BLUE MOON', 'description': 'Jack R...",0,Hardcover Fiction,"[{'isbn10': '0399593543', 'isbn13': '978039959...",Hardcover Fiction,2019-12-08,5,3,"[{'book_review_link': '', 'first_chapter_link'...",4
5,https://www.amazon.com/Clancy-Code-Honor-Jack-...,0,2019-11-23,"[{'title': 'TOM CLANCY: CODE OF HONOR', 'descr...",0,Hardcover Fiction,"[{'isbn10': '0525541721', 'isbn13': '978052554...",Hardcover Fiction,2019-12-08,6,0,"[{'book_review_link': '', 'first_chapter_link'...",1
6,https://www.amazon.com/Dutch-House-Novel-Ann-P...,0,2019-11-23,"[{'title': 'THE DUTCH HOUSE', 'description': '...",0,Hardcover Fiction,"[{'isbn10': '0062963678', 'isbn13': '978006296...",Hardcover Fiction,2019-12-08,7,6,[{'book_review_link': 'https://www.nytimes.com...,9
7,https://www.amazon.com/Night-Renée-Ballard-Har...,0,2019-11-23,"[{'title': 'THE NIGHT FIRE', 'description': 'H...",0,Hardcover Fiction,"[{'isbn10': '0316485616', 'isbn13': '978031648...",Hardcover Fiction,2019-12-08,8,5,"[{'book_review_link': '', 'first_chapter_link'...",5
8,https://www.amazon.com/Institute-Novel-Stephen...,0,2019-11-23,"[{'title': 'THE INSTITUTE', 'description': 'Ch...",0,Hardcover Fiction,"[{'isbn10': '1982110562', 'isbn13': '978198211...",Hardcover Fiction,2019-12-08,9,8,[{'book_review_link': 'https://www.nytimes.com...,11
9,https://www.amazon.com/Olive-Again-Novel-Eliza...,0,2019-11-23,"[{'title': 'OLIVE, AGAIN', 'description': 'In ...",0,Hardcover Fiction,"[{'isbn10': '0812996542', 'isbn13': '978081299...",Hardcover Fiction,2019-12-08,10,7,[{'book_review_link': 'https://www.nytimes.com...,6
