# Benjamin Freund
# Week 12 Assignment

## Part 1 - Working with HTML, XML, and JSON

### Links to Books on Amazon

For the first task in this assignment, I had to choose three books about a subject. I chose to look at bitcoin books. Below are the Amazon links to the three books I chose.

- [Bitcoin, Blockchain, and Cryptoassets: A Comprehensive Introduction](https://www.amazon.com/Bitcoin-Blockchain-Cryptoassets-Comprehensive-Introduction/dp/0262539160/ref=asc_df_0262539160/?tag=hyprod-20&linkCode=df0&hvadid=459726176530&hvpos=&hvnetw=g&hvrand=15873989128630039395&hvpone=&hvptwo=&hvqmt=&hvdev=c&hvdvcmdl=&hvlocint=&hvlocphy=9067609&hvtargid=pla-942088286373&psc=1)
- [A brief introduction to Bitcoin: Educational, valuable and deep insights](https://www.amazon.com/brief-introduction-Bitcoin-Educational-valuable/dp/1706937768/ref=asc_df_1706937768/?tag=hyprod-20&linkCode=df0&hvadid=459440273404&hvpos=&hvnetw=g&hvrand=15873989128630039395&hvpone=&hvptwo=&hvqmt=&hvdev=c&hvdvcmdl=&hvlocint=&hvlocphy=9067609&hvtargid=pla-974191075669&psc=1)
- [Bitcoin and Cryptocurrency Technologies: A Comprehensive Introduction](https://www.amazon.com/Bitcoin-Cryptocurrency-Technologies-Comprehensive-Introduction/dp/0691171696/ref=sr_1_1?dchild=1&keywords=bitcoin+goldfeder&qid=1606937977&s=books&sr=1-1)

### HTML

First, I had to create a HTML file that housed pertinent information about each of the three aforementioned books. I chose to include the title, the author(s), the book cover (i.e., paperback or hardcover), and the price on Amazon. The HTML file itself can be accessed via GitHub [here](https://raw.githubusercontent.com/freundb3/AIM-5001/master/books.html).

Below, I read the HTML file into a Pandas dataframe.

In [1]:
# Importing the Pandas library
import pandas as pd

# Storing the URL to my HTML file on GitHub
url = 'https://raw.githubusercontent.com/freundb3/AIM-5001/master/books.html'

# Using the read_html function to read url into a Pandas dataframe
books = pd.read_html(url)

# Indexing the first value (the dataframe itself) from books
books_html = books[0]

# Displaying books_html
books_html

Unnamed: 0_level_0,Title,Author(s),Book Cover,Amazon Price
Unnamed: 0_level_1,"Bitcoin, Blockchain, and Cryptoassets: A Comprehensive Introduction","Fabian Schar, Aleksander Berentsen",Paperback,$49.88
Unnamed: 0_level_2,"A brief introduction to Bitcoin: Educational, valuable and deep insights",Cosmin Novac,Paperback,$9.99
Unnamed: 0_level_3,Bitcoin and Cryptocurrency Technologies: A Comprehensive Introduction,"Arvind Narayanan, Joseph Bonneau, Edward Felten, Andrew Miller, Steven Goldfeder",Hardcover,$27.18


### XML

Next, I had to create a XML file that housed the same information about the same books. The XML file itself can be accessed via GitHub [here](https://raw.githubusercontent.com/freundb3/AIM-5001/master/books.xml).

Below, I read the XML file into a Pandas dataframe.

In [2]:
# Importing urllib.request
import urllib.request

# Importing objectify from the lxml library
from lxml import objectify

# Reading the URL into path and headers
path, headers = urllib.request.urlretrieve('https://raw.githubusercontent.com/freundb3/AIM-5001/master/books.xml')

# Parsing the path
parsed = objectify.parse(open(path))

# Getting the root node of parsed
root = parsed.getroot()

# Creating an empty list data
data = []

# Looping through to create a dictionary of tag names
for elt in root.book:
    el_data = {}
    for child in elt.getchildren():
        el_data[child.tag] = child.pyval
    data.append(el_data)
    
# Creating a dataframe of data
books_xml = pd.DataFrame(data)

# Displaying books_xml
books_xml

Unnamed: 0,title,authors,book_cover,Amazon_price
0,"Bitcoin, Blockchain, and Cryptoassets: A Compr...","Fabian Schar, Aleksander Berentsen",Paperback,$49.88
1,"A brief introduction to Bitcoin: Educational, ...",Cosmin Novac,Paperback,$9.99
2,Bitcoin and Cryptocurrency Technologies: A Com...,"Arvind Narayanan, Joseph Bonneau, Edward Felte...",Hardcover,$27.18


### JSON

Finally, I had to create a JSON file that housed the same information about the same books. The JSON file itself can be accessed via GitHub [here](https://raw.githubusercontent.com/freundb3/AIM-5001/master/books.json).

Below, I read the JSON file into a Pandas dataframe.

In [3]:
# Reading the JSON file into a Pandas dataframe
books_json = pd.read_json('https://raw.githubusercontent.com/freundb3/AIM-5001/master/books.json')

# Displaying books_json
books_json

Unnamed: 0,Title,Author(s),Book Cover,Amazon Price
0,"Bitcoin, Blockchain, and Cryptoassets: A Compr...","Fabian Schar, Aleksander Berentsen",Paperback,$49.88
1,"A brief introduction to Bitcoin: Educational, ...",Cosmin Novac,Paperback,$9.99
2,Bitcoin and Cryptocurrency Technologies: A Com...,"Arvind Narayanan, Joseph Bonneau, Edward Felte...",Hardcover,$27.18


When I look at the dataframes generated from these three files, I can clearly tell that the dataframe generated from the HTML file is different than the dataframes generated from the XML and JSON files. The HTML-based dataframe is missing many of the formatting aspects that are contained within the XML-based and JSON-based dataframes.

## Part 2 - Working with Web API's

The task in this part of the homework was to take JSON data from one of the New York Times' APIs and store it in a Pandas dataframe that's suitable for analyzing. I chose the Books API to look at some of the New York Times' bestsellers. Below is the code I ran to perform this task.

In [4]:
# Import the requests library
import requests

# Import json_normalize from the pandas library
from pandas import json_normalize

# Storing the url in api_url
api_url = 'https://api.nytimes.com/svc/books/v3/lists/current/hardcover-fiction.json?api-key=PjMi3WHp1q5ol9ERwqGX54ESDSOfBun1'

# Using the get function to get the most recent requests to the URL
resp = requests.get(api_url)

# Storing the information as a JSON file
api_data = resp.json()

# Using json_normalize to take the books key and create a Pandas dataframe with the information within it
books_api_full = json_normalize(api_data['results'], record_path = 'books', errors = 'ignore', record_prefix = '_')

# Choosing which columns to include for analysis in the dataframe
books_api = books_api_full[['_title', '_author', '_description', '_rank', '_rank_last_week', '_weeks_on_list', '_primary_isbn10', '_primary_isbn13']]

# Getting rid of the underscores before the column names
books_api.columns = ['title', 'author', 'description', 'rank', 'rank_last_week', 'weeks_on_list', 'primary_isbn10', 'primary_isbn13']

# Displaying books_api
books_api

Unnamed: 0,title,author,description,rank,rank_last_week,weeks_on_list,primary_isbn10,primary_isbn13
0,READY PLAYER TWO,Ernest Cline,"In a sequel to “Ready Player One,” Wade Watts ...",1,0,1,1524761338,9781524761332
1,DEADLY CROSS,James Patterson,The 28th book in the Alex Cross series. An inv...,2,0,1,0316420255,9780316420259
2,THE RETURN,Nicholas Sparks,A doctor serving in the Navy in Afghanistan go...,3,5,9,1538728575,9781538728574
3,A TIME FOR MERCY,John Grisham,The third book in the Jake Brigance series. A ...,4,3,7,0385545967,9780385545969
4,DAYLIGHT,David Baldacci,The F.B.I. agent Atlee Pine’s search for her t...,5,2,2,1538761696,9781538761694
5,THE AWAKENING,Nora Roberts,The first book in the Dragon Heart Legacy seri...,6,0,1,1250272610,9781250272614
6,THE LAW OF INNOCENCE,Michael Connelly,The sixth book in the Mickey Haller series. Ha...,7,4,3,0316485624,9780316485623
7,RHYTHM OF WAR,Brandon Sanderson,The fourth book in the Stormlight Archive seri...,8,1,2,0765326388,9780765326386
8,THE SENTINEL,Lee Child and Andrew Child,Jack Reacher intervenes on an ambush in Tennes...,9,7,5,isbn10 mus,9781984818461
9,THE VANISHING HALF,Brit Bennett,The lives of twin sisters who run away from a ...,10,12,26,0525536299,9780525536291
