### API key Generation
### URL: https://developer.nytimes.com/get-started
##### Register apps

To register an app:

Select My Apps from the user drop-down.
Click + New App to create a new app.
Enter a name and description for the app in the New App dialog.
Click Create.
Click the APIs tab.
Click the access toggle to enable or disable access to an API product from the app.

##### Access the API keys

Select My Apps from th user drop-down.
Click the app in the list.
View the API key on the App Details tab.
Confirm that the status of the API key is Approved.

In [1]:
## importing the required libraries
import requests
import pandas as pd
from pandas.io.json import json_normalize  ## to normalize the data into dataframe
import json
from json2xml import json2xml

In [2]:
#Generate API key using above and store in the current folder inside apiKeyDetail.txt file
apiKey = open("./apiKeyDetail.txt", "r").readline()

### Part I (18 points): Working with HTML, XML, and JSON
Pick three of your favorite books on one of your favorite subjects. At least one of the books should have more than one author. For each book, include the title, authors, and two or three other attributes that you find interesting. Take the information that you’ve selected about these three books, and separately create three files which store the book’s information in HTML (using an html table), XML, and JSON formats (e.g. “books.html”, “books.xml”, and “books.json”). To help you better understand the different file structures, I’d prefer that you create each of these files “by hand” unless you’re already very comfortable with the file formats. Write Python code, using your packages of choice, to load the information from each of the three sources into separate PANDAS data frames. Are the three data frames identical? Your deliverable is the three source files and the Python code. Post the three source files to GitHub and package your Python code within a Jupyter notebook (along with your code for Part II below) and post it to GitHub as well.


In [3]:
apiBooks = 'https://api.nytimes.com/svc/books/v3/lists/best-sellers/history.json'

In [4]:
# function to read file using api and then return data in dictionary format
def returnJsonFile():
    dataApi=requests.get(apiBooks+'?api-key='+apiKey)
    dictData = dataApi.json()
    return dictData 

# function to call and receive dictionary. Then normalize data to dataframe.
def readBookData():
    dataJson = returnJsonFile()
    return json_normalize(dataJson['results'])

## generic function to write dataframe to file, need to provide dataframe, filename and extension of the file
def write2File(dfWrite,fileName = 'book', extFile= 'json'):
    nFile = open(fileName+"."+extFile, "w")
    nFile.write(dfWrite)
    nFile.close()

In [5]:
dfBooks = readBookData()

In [6]:
dfBooks.head()

Unnamed: 0,title,description,contributor,author,contributor_note,price,age_group,publisher,isbns,ranks_history,reviews
0,"""I GIVE YOU MY BODY ...""",The author of the Outlander novels gives tips ...,by Diana Gabaldon,Diana Gabaldon,,0.0,,Dell,"[{'isbn10': '0399178570', 'isbn13': '978039917...","[{'primary_isbn10': '0399178570', 'primary_isb...","[{'book_review_link': '', 'first_chapter_link'..."
1,"""MOST BLESSED OF THE PATRIARCHS""",A character study that attempts to make sense ...,by Annette Gordon-Reed and Peter S. Onuf,Annette Gordon-Reed and Peter S Onuf,,0.0,,Liveright,"[{'isbn10': '0871404427', 'isbn13': '978087140...","[{'primary_isbn10': '0871404427', 'primary_isb...","[{'book_review_link': '', 'first_chapter_link'..."
2,#ASKGARYVEE,The entrepreneur expands on subjects addressed...,by Gary Vaynerchuk,Gary Vaynerchuk,,0.0,,HarperCollins,"[{'isbn10': '0062273124', 'isbn13': '978006227...","[{'primary_isbn10': '0062273124', 'primary_isb...","[{'book_review_link': '', 'first_chapter_link'..."
3,#GIRLBOSS,An online fashion retailer traces her path to ...,by Sophia Amoruso,Sophia Amoruso,,0.0,,Portfolio/Penguin/Putnam,"[{'isbn10': '039916927X', 'isbn13': '978039916...","[{'primary_isbn10': '1591847931', 'primary_isb...","[{'book_review_link': '', 'first_chapter_link'..."
4,#IMOMSOHARD,,by Kristin Hensley and Jen Smedley,Kristin Hensley and Jen Smedley,,0.0,,HarperOne,"[{'isbn10': '006285769X', 'isbn13': '978006285...","[{'primary_isbn10': '006285769X', 'primary_isb...","[{'book_review_link': '', 'first_chapter_link'..."


In [7]:
# test for one particular title
#dfBook2authors = dfBooks[dfBooks['title'] == '"MOST BLESSED OF THE PATRIARCHS"'][['title','author','contributor','publisher']]

In [8]:
# Any author contains " and ", it is assumed that it has two authors
dfBook2authors = dfBooks[dfBooks['author'].str.contains(' and ')][['title','author','contributor','publisher']]

In [9]:
dfBook2authors

Unnamed: 0,title,author,contributor,publisher
1,"""MOST BLESSED OF THE PATRIARCHS""",Annette Gordon-Reed and Peter S Onuf,by Annette Gordon-Reed and Peter S. Onuf,Liveright
4,#IMOMSOHARD,Kristin Hensley and Jen Smedley,by Kristin Hensley and Jen Smedley,HarperOne
5,#NEVERAGAIN,David Hogg and Lauren Hogg,by David Hogg and Lauren Hogg,Random House


In [10]:
#converting the data with more than 1 author and converting to html
htmlFile =dfBook2authors.to_html()

In [11]:
write2File(htmlFile, 'book','html')

In [12]:
# saving the file into html table
#html_file = open("book.html", "w")
#html_file.write(htmlFile)
#html_file.close()

In [13]:
## converting File to XML and Json
dfBooksAll= dfBooks[['title','author','contributor','publisher']]

### Below function is to create XML file from the pandas dataframe and it is directed to "to_xml" so that it can be directly called from dataframe

In [14]:
def to_xml(df, filename=None, mode='w'):
    def row_to_xml(row):
        xml = ['<item>']
        for i, col_name in enumerate(row.index):
            xml.append('  <field name="{0}">{1}</field>'.format(col_name, row.iloc[i]))
        xml.append('</item>')
        return '\n'.join(xml)
    res = '\n'.join(df.apply(row_to_xml, axis=1))

    if filename is None:
        return res
    with open(filename, mode) as f:
        f.write(res)

pd.DataFrame.to_xml = to_xml

In [15]:
#converting the data with more than 1 author and converting to XML
xmlFile =dfBooksAll.to_xml()

In [16]:
write2File(xmlFile, 'book','xml')

In [17]:
# saving the file into html table
#xml_file = open("book.xml", "w")
#xml_file.write(xmlFile)
#xml_file.close()

In [18]:
# extracting the data in json format
jsonFile =dfBooksAll.to_json()

In [19]:
write2File(jsonFile, 'book','json')

In [20]:
# saving the file into html table
#json_file = open("book.json", "w")
#json_file.write(jsonFile)
#json_file.close()

In [21]:
## another way to generate json2xml
## !pip install json2xml

In [22]:
xmlFile2 = json2xml.Json2xml(jsonFile).to_xml()

In [23]:
xmlFile2

'<all>{"title":{"0":"\\"I GIVE YOU MY BODY ...\\"","1":"\\"MOST BLESSED OF THE PATRIARCHS\\"","2":"#ASKGARYVEE","3":"#GIRLBOSS","4":"#IMOMSOHARD","5":"#NEVERAGAIN","6":"$100 STARTUP","7":"$20 PER GALLON","8":"\'57, Chicago","9":"\'ROCK OF AGES: \'\'ROLLING STONE\'\' HISTORY OF ROCK AND ROLL\'","10":"\'THE HIGH ROAD TO CHINA: GEORGE BOGLE, THE PANCHEN LAMA AND THE FIRST BRITISH EXPEDITION TO TIBET\'","11":"\'TIL DEATH","12":"\'TIL DEATH DO US PART","13":"\'Til Faith Do Us Part: How Interfaith Marriage is Transforming America","14":"\'TIS THE SEASON","15":"------, THAT\'S DELICIOUS","16":"...and the Horse He Rode In On: The People V. Kenneth Starr","17":".HACK G.U.   , VOL. 5","18":"1 Ragged Ridge Road","19":"1,000 PLACES TO SEE BEFORE YOU DIE"},"author":{"0":"Diana Gabaldon","1":"Annette Gordon-Reed and Peter S Onuf","2":"Gary Vaynerchuk","3":"Sophia Amoruso","4":"Kristin Hensley and Jen Smedley","5":"David Hogg and Lauren Hogg","6":"Chris Guillebeau","7":"Christopher Steiner","8":"Stev

### Part II (12 points): Working with Web API’s
The New York Times web site provides a rich set of APIs, as described here: https://developer.nytimes.com You’ll need to start by signing up for an API key.
Your task is to then choose one of the New York Times APIs and construct an interface in Python to read JSON data accessible via the API and transform that data into a Pandas data frame that is suitable for use in data analysis work.

###### Reading api key from the text file where api key is stored for security purpose

# Reading data from Movie Review API

In [24]:
apiMovieReview = 'https://api.nytimes.com/svc/movies/v2/reviews/search' ##search.json?query=godfather&api-key=yourkey

##### Function created below to read data from api and return Json format
##### Function has  parameter where we can pass the movie name, if no movie is passed it will be defaulted to godfather movie as given in website API
This function accepts the movie name and then add the moviename in api and api key value. 
Read data from URL and then return the data in json format

In [25]:
def readDataJson(searchMovie='godfather'):
    searchMovie = '?query=' + searchMovie
    dataApi=requests.get(apiMovieReview+'.json'+searchMovie+'&api-key='+apiKey)
    return dataApi.json()

In [26]:
# converting the json file to dictionary
jsonMovie = readDataJson('Titanic') # example1 - passing the value as titanic, it will search this word in all movie names and return the data

In [27]:
type(jsonMovie)

dict

In [28]:
# Using below json_normalize, converting the results data from dictionary in the dataframe
dfMovie = json_normalize(jsonMovie['results'])

In [29]:
dfMovie.head() # showing 5 records from the data

Unnamed: 0,display_title,mpaa_rating,critics_pick,byline,headline,summary_short,publication_date,opening_date,date_updated,multimedia,link.type,link.url,link.suggested_link_text
0,Titanic Town,,0,STEPHEN HOLDEN,Titanic Town (Movie),Belfast housewife turns peace activist. Beauti...,2000-09-01,,2017-11-02 04:16:14,,article,http://www.nytimes.com/2000/09/01/movies/film-...,Read the New York Times Review of Titanic Town
1,The Chambermaid on the Titanic,,1,Stephen Holden,"Chambermaid on the Titanic, The (Movie)",French foundry worker eroticizes his one-night...,1998-08-14,1998-08-14,2017-11-02 04:17:55,,article,http://www.nytimes.com/1998/08/14/movies/film-...,Read the New York Times Review of The Chamberm...
2,Titanic,PG-13,1,Janet Maslin,Titanic (Movie),"For once, a much-touted event movie is exactly...",1997-12-19,1997-12-19,2017-11-02 04:17:54,,article,http://www.nytimes.com/1997/12/19/movies/film-...,Read the New York Times Review of Titanic
3,Raise the Titanic,PG,0,JANET MASLIN,Raise the Titanic (Movie),"No, don't! Adventureless salvage job.",1980-08-01,1980-08-01,2017-11-02 04:17:30,,article,http://www.nytimes.com/1980/08/01/archives/rai...,Read the New York Times Review of Raise the Ti...


### Example 2 to read MOSTPOPULAR API to get different data

## Three Example API
https://api.nytimes.com/svc/mostpopular/v2/emailed/7.json?api-key=yourkey
https://api.nytimes.com/svc/mostpopular/v2/shared/1/facebook.json?api-key=yourkey
https://api.nytimes.com/svc/mostpopular/v2/viewed/1.json?api-key=yourkey

In [30]:
apiMostPopular =  'https://api.nytimes.com/svc/mostpopular/v2/' ## Common Url is stored in variable

In [31]:
def readmostPopularJson(jsonDtl,sharedType, period='1'):
    # emailed - most emailed article
    # shared - most shared
    # viewed - most viewed
    if jsonDtl == 'shared' and len(sharedType) !=0:
        jsonDtl = jsonDtl+'/'+str(period)+'/'+sharedType
    else:
        jsonDtl = jsonDtl+'/'+period
    dataApi=requests.get(apiMostPopular+jsonDtl+'.json?api-key='+apiKey)
    dictData = dataApi.json()
    return json_normalize(dictData['results'])

In [32]:
## most shared if we pass 1st parameter as shared
dfShared = readmostPopularJson('shared','facebook','1')

In [33]:
dfShared.head()

Unnamed: 0,url,adx_keywords,subsection,share_count,count_type,column,eta_id,section,id,asset_id,...,abstract,published_date,source,updated,des_facet,org_facet,per_facet,geo_facet,media,uri
0,https://www.nytimes.com/2019/11/27/nyregion/hu...,"Agriculture and Farming;Hull-O Farms (Durham, ...",,1,SHARED-FACEBOOK,,0,New York,100000006832232,100000006832232,...,The aging owners of a Catskills farm say it “h...,2019-11-27,The New York Times,2019-11-29 16:52:08,[AGRICULTURE AND FARMING],"[HULL-O FARMS (DURHAM, NY), FAMILY BUSINESS, P...",,[CATSKILLS (NYS AREA)],"[{'type': 'image', 'subtype': 'photo', 'captio...",nyt://article/713ed717-5920-5223-a5fa-c7d43b0b...
1,https://www.nytimes.com/2019/11/27/opinion/tha...,"Pilgrims (Plymouth, Mass);Native Americans;Tha...",,2,SHARED-FACEBOOK,,0,Opinion,100000006849251,100000006849251,...,"Before you fill your plate, please remember wh...",2019-11-27,The New York Times,2019-11-29 18:15:19,"[PILGRIMS (PLYMOUTH, MASS), NATIVE AMERICANS, ...",[INDIGENOUS PEOPLE],,"[PLYMOUTH (MASS), UNITED STATES]","[{'type': 'image', 'subtype': 'photo', 'captio...",nyt://article/315b0797-8d77-5170-943a-6ef4adb0...
2,https://www.nytimes.com/2019/11/29/us/politics...,"Presidential Election of 2020;Harris, Kamala D...",politics,3,SHARED-FACEBOOK,,0,U.S.,100000006848078,100000006848078,...,Ms. Harris is the only 2020 Democrat who has f...,2019-11-29,The New York Times,2019-11-30 04:35:46,[PRESIDENTIAL ELECTION OF 2020],[PRIMARIES AND CAUCUSES],"[HARRIS, KAMALA D]",,"[{'type': 'image', 'subtype': 'photo', 'captio...",nyt://article/70bf6582-a21c-5d54-8207-6db42dcc...
3,https://www.nytimes.com/2019/11/29/health/new-...,Therapy and Rehabilitation;Elderly;Nursing Hom...,,4,SHARED-FACEBOOK,The New Old Age,0,Health,100000006844259,100000006844259,...,Medicare revamped its reimbursement policy for...,2019-11-29,The New York Times,2019-11-29 10:02:01,"[THERAPY AND REHABILITATION, ELDERLY, NURSING ...","[MEDICARE, ELDER CARE, PHYSICAL THERAPY, HOME ...",,,"[{'type': 'image', 'subtype': 'photo', 'captio...",nyt://article/0ce7be61-25e3-5af0-b0dc-900fd2b0...
4,https://www.nytimes.com/2019/11/28/arts/dance/...,Dancing;Blacks;New York City Ballet;School of ...,dance,5,SHARED-FACEBOOK,,0,Arts,100000006841890,100000006841890,...,"This year, for the first time, New York City B...",2019-11-28,The New York Times,2019-11-29 16:32:21,"[DANCING, BLACKS]","[NEW YORK CITY BALLET, SCHOOL OF AMERICAN BALLET]","[NEBRES, CHARLOTTE]",,"[{'type': 'image', 'subtype': 'photo', 'captio...",nyt://article/499fed06-7340-5d7f-b3fa-1124ebe4...


In [34]:
## most viewed if we pass 1st parameter as viewed
dfViewed = readmostPopularJson('viewed','','1')

In [35]:
dfViewed.head()

Unnamed: 0,url,adx_keywords,column,section,byline,type,title,abstract,published_date,source,id,asset_id,views,des_facet,org_facet,per_facet,geo_facet,media,uri
0,https://www.nytimes.com/2019/11/29/us/politics...,"Presidential Election of 2020;Harris, Kamala D...",,U.S.,"By JONATHAN MARTIN, ASTEAD W. HERNDON and ALEX...",Article,How Kamala Harris’s Campaign Unraveled,Ms. Harris is the only 2020 Democrat who has f...,2019-11-29,The New York Times,100000006848078,100000006848078,1,[PRESIDENTIAL ELECTION OF 2020],[PRIMARIES AND CAUCUSES],"[HARRIS, KAMALA D]",,"[{'type': 'image', 'subtype': 'photo', 'captio...",nyt://article/70bf6582-a21c-5d54-8207-6db42dcc...
1,https://www.nytimes.com/2019/11/27/opinion/tha...,"Pilgrims (Plymouth, Mass);Native Americans;Tha...",,Opinion,By CHARLES M. BLOW,Article,The Horrible History of Thanksgiving,"Before you fill your plate, please remember wh...",2019-11-27,The New York Times,100000006849251,100000006849251,2,"[PILGRIMS (PLYMOUTH, MASS), NATIVE AMERICANS, ...",[INDIGENOUS PEOPLE],,"[PLYMOUTH (MASS), UNITED STATES]","[{'type': 'image', 'subtype': 'photo', 'captio...",nyt://article/315b0797-8d77-5170-943a-6ef4adb0...
2,https://www.nytimes.com/interactive/2019/books...,,,Books,,Interactive,100 Notable Books of 2019,"The year’s notable fiction, poetry and nonfict...",2019-11-25,The New York Times,100000006844418,100000006844418,3,,,,,"[{'type': 'image', 'subtype': 'photo', 'captio...",nyt://interactive/d999ea9c-6a29-51f3-bcf3-6c1b...
3,https://www.nytimes.com/2019/11/27/nyregion/hu...,"Agriculture and Farming;Hull-O Farms (Durham, ...",,New York,By COREY KILGANNON,Article,"After 240 Years and 7 Generations, Forced to S...",The aging owners of a Catskills farm say it “h...,2019-11-27,The New York Times,100000006832232,100000006832232,4,[AGRICULTURE AND FARMING],"[HULL-O FARMS (DURHAM, NY), FAMILY BUSINESS, P...",,[CATSKILLS (NYS AREA)],"[{'type': 'image', 'subtype': 'photo', 'captio...",nyt://article/713ed717-5920-5223-a5fa-c7d43b0b...
4,https://www.nytimes.com/2019/11/29/world/europ...,Bridges and Tunnels;Attacks on Police;Great Br...,,World,By MARK LANDLER and MEGAN SPECIA,Article,Stabbings Around London Bridge Kill 2 in ‘Terr...,The authorities said that several people were ...,2019-11-29,The New York Times,100000006851429,100000006851429,5,"[BRIDGES AND TUNNELS, ATTACKS ON POLICE]",,,"[GREAT BRITAIN, LONDON (ENGLAND)]","[{'type': 'image', 'subtype': 'photo', 'captio...",nyt://article/e9ab3f9d-2b1c-5e8c-9c85-d7beff2c...


In [36]:
## most emailed if we pass 1st parameter as emailed
dfEmailed = readmostPopularJson('emailed','','1')

In [37]:
dfEmailed.head()

Unnamed: 0,url,adx_keywords,subsection,email_count,count_type,column,eta_id,section,id,asset_id,...,abstract,published_date,source,updated,des_facet,org_facet,per_facet,geo_facet,media,uri
0,https://www.nytimes.com/2019/11/27/opinion/tha...,"Pilgrims (Plymouth, Mass);Native Americans;Tha...",,1,EMAILED,,0,Opinion,100000006849251,100000006849251,...,"Before you fill your plate, please remember wh...",2019-11-27,The New York Times,2019-11-29 18:15:19,"[PILGRIMS (PLYMOUTH, MASS), NATIVE AMERICANS, ...",[INDIGENOUS PEOPLE],,"[PLYMOUTH (MASS), UNITED STATES]","[{'type': 'image', 'subtype': 'photo', 'captio...",nyt://article/315b0797-8d77-5170-943a-6ef4adb0...
1,https://www.nytimes.com/2019/11/28/opinion/bra...,"Brain;Vagus Nerve;Heart;Barrett, Lisa Feldman ...",,2,EMAILED,,0,Opinion,100000006849074,100000006849074,...,You are not just thinking with your brain.,2019-11-28,The New York Times,2019-11-29 22:03:48,[BRAIN],"[VAGUS NERVE, HEART]","[BARRETT, LISA FELDMAN (1963- ), PORGES, STEPH...",,"[{'type': 'image', 'subtype': 'photo', 'captio...",nyt://article/3c4c308d-cf86-5598-9455-0348c83c...
2,https://www.nytimes.com/2019/11/28/opinion/tru...,"Thanksgiving Day;Trump, Donald J;Presidential ...",,3,EMAILED,,0,Opinion,100000006847913,100000006847913,...,"My brother dishes on Trump, impeachment and th...",2019-11-28,The New York Times,2019-11-29 22:03:48,"[THANKSGIVING DAY, PRESIDENTIAL ELECTION OF 2020]",,"[TRUMP, DONALD J]",,"[{'type': 'image', 'subtype': 'photo', 'captio...",nyt://article/abe360d5-8eb6-5d2a-8614-700b0d19...
3,https://www.nytimes.com/2019/11/27/movies/the-...,Movies;The Irishman (Movie);I Heard You Paint ...,,4,EMAILED,,0,Movies,100000006831916,100000006831916,...,The movie hits Netflix on Wednesday. Here’s a ...,2019-11-27,The New York Times,2019-11-29 16:25:03,"[MOVIES, ORGANIZED CRIME, ORGANIZED LABOR]","[NETFLIX INC, INTERNATIONAL BROTHERHOOD OF TEA...","[HOFFA, JAMES R, SHEERAN, FRANK]",,"[{'type': 'image', 'subtype': 'photo', 'captio...",nyt://article/eb29ac92-fdfe-54e1-a632-0de564bd...
4,https://www.nytimes.com/2019/11/27/magazine/63...,Children and Childhood;Television;Documentary ...,,5,EMAILED,,0,Magazine,100000006838494,100000006838494,...,"In 1964, with “Seven Up!” Michael Apted stumbl...",2019-11-27,The New York Times,2019-11-29 23:08:54,[CHILDREN AND CHILDHOOD],"[TELEVISION, DOCUMENTARY FILMS AND PROGRAMS]","[APTED, MICHAEL]",[GREAT BRITAIN],"[{'type': 'image', 'subtype': 'photo', 'captio...",nyt://article/fd549a1d-cba8-5269-bf74-6faa71eb...
