# `Python Json Module & Parsing example`

# <font color=red>Mr Fugu Data Science</font>

# (‚óï‚Äø‚óï‚úø)

# Purpose & Outcome:

+ `Part 01`: Get an Idea how to use basics of Json Module
    + Understand how to open a file, using `with vs with open`
    + dealing with json files and strings
    + load and dump files

+ `Part 02`: parse some news from an API with JSON data
    + Connect to a news API form google news
        + parse data
            + date time and 1 nested column to handle

In [None]:
import json
import pandas as pd
from collections import defaultdict
import requests
import datetime

In [None]:

with open('nytimes_api.json','r') as js:
    k=json.load(js)

pd.DataFrame(k)
# 
for i in k:
    print(i['url'])
    

https://www.nytimes.com/2020/09/03/us/michael-reinoehl-arrest-portland-shooting.html
https://www.nytimes.com/2020/08/28/movies/chadwick-boseman-dead.html
https://www.nytimes.com/2020/08/30/us/portland-trump-rally-shooting.html
https://www.nytimes.com/2020/08/29/health/coronavirus-testing.html
https://www.nytimes.com/2020/08/28/nyregion/nyc-tenants-rnc-video-trump.html
https://www.nytimes.com/2020/09/03/nyregion/daniel-prude-police-rochester.html
https://www.nytimes.com/2020/08/27/us/kyle-rittenhouse-kenosha-shooting-video.html
https://www.nytimes.com/2020/08/28/world/europe/greece-girl-unicorn-rescue.html
https://www.nytimes.com/2020/08/30/nyregion/nyc-suburbs-housing-demand.html
https://www.nytimes.com/2020/09/04/us/after-6-murder-trials-and-nearly-24-years-charges-dropped-against-curtis-flowers.html
https://www.nytimes.com/2020/09/02/health/covid-19-vaccine-cdc-plans.html
https://www.nytimes.com/2020/08/31/world/europe/coronavirus-covid-spain-second-wave.html
https://www.nytimes.com/

# Difference between `with` and `with open`

When you use:

`file_to_open = open('some file','r')
read_file = file_to_open.read()
Now you do some task type type have fun.. 
file_to_open.close()`

Notice that you have to close the file when you are done doing your function stuff like calculations or whatever.

Now, if you use this next method it will close automatically and you don't have to remember. This is good for most people because like me, we are lazy or forgretful.

`with open('some fun fancy file here','r') as myfile:
    file_to_read = myfile.read()
    then do some stuff with your file now
`

# We can specify the parameters we want for the file when we call it in:

think of it like this: **open(*file*,*mode*)**

we have different *modes/parameters to put here*

`Read`: `r`

`Write`: `w`

`Append`: `a`

`Create`: `x`
    
`Binary`: `b`

These are the basics, there are variations of course: 

`Read binary`: `rb`

`Write binary`: `wb`

`Reading & Writing`: `+`

have fun with your hearts content to mix and match for your needs.

`----------------------------------`

# json.dump vs json.dump`s`

We should use a term called `Serialization` where we are *encoding* or taking an object or data structure and storing it in a format that can be used for later. 

+ `json.dump`: method without the `'s'`; is used to store as a file

Whereas; 

+ `json.dumps`: is used to convert 'encode' any Python object into a python string.

`-------------------------------`

Btw, we all know that this dump is very similar to the funny english term üí© Muahahah.
Just as funny is the database term shard which reminds us of ... Yes, I know computer science can have some humor once in a while

`Conversion Table`: for encoding/decoding json type equivalents of Python objects
    
| Json   	| Python                                 	|
|--------	|----------------------------------------	|
| Object 	| dict()                                 	|
| Array  	| list,tuple                             	|
| String 	| str                                    	|
| number 	| int, float, int &  float 	|
| true   	| True                                   	|
| false  	| False                                  	|
| null   	| None                                   	|


`-----------------------------------------------`

# `json.load & json.loads`: your decoding part 'deserialization'



# Can take your python dictionary and convert to json

In [None]:
ny_times_dta=open('nytimes_api.json')

# Decode the JSON data and read into DF
pp=pd.DataFrame(json.load(ny_times_dta))

# closed file after we read it in
ny_times_dta.close()

# `DF conversion is another way instead of open/with open method`

In [None]:
ny_df=pd.DataFrame(k)

# Convert to json if you want
cc=ny_df.to_json(orient='records')

# Reading the json if it was in json format
pd.read_json(cc).head()

Unnamed: 0,uri,url,id,asset_id,source,published_date,updated,section,subsection,nytdsection,...,byline,type,title,abstract,des_facet,org_facet,per_facet,geo_facet,media,eta_id
0,nyt://article/f0510da8-1ef8-5442-a909-8af53b7d...,https://www.nytimes.com/2020/09/03/us/michael-...,100000007321101,100000007321101,New York Times,2020-09-03,2020-09-05 10:04:00,U.S.,,u.s.,...,"By Hallie Golden, Mike Baker and Adam Goldman",Article,Suspect in Fatal Portland Shooting Is Killed b...,Law enforcement agents killed Michael Forest R...,"[Murders, Attempted Murders and Homicides, Ant...",[Patriot Prayer],"[Reinoehl, Michael (1972-2020), Danielson, Aar...",[Portland (Ore)],"[{'type': 'image', 'subtype': 'photo', 'captio...",0
1,nyt://article/607123ea-14ba-5f9c-ab43-7d8b6c7a...,https://www.nytimes.com/2020/08/28/movies/chad...,100000007314593,100000007314593,New York Times,2020-08-28,2020-08-31 10:07:14,Movies,,movies,...,By Reggie Ugwu and Michael Levenson,Article,‚ÄòBlack Panther‚Äô Star Chadwick Boseman Dies of ...,The actor also played groundbreaking figures l...,"[Deaths (Obituaries), Movies, Actors and Actre...",[],"[Boseman, Chadwick]",[],"[{'type': 'image', 'subtype': 'photo', 'captio...",0
2,nyt://article/6bff4972-07cc-5b20-bd16-39f9cf19...,https://www.nytimes.com/2020/08/30/us/portland...,100000007315198,100000007315198,New York Times,2020-08-30,2020-09-05 10:05:01,U.S.,,u.s.,...,By Mike Baker,Article,One Person Dead in Portland After Clashes Betw...,A man affiliated with a right-wing group was s...,"[George Floyd Protests (2020), Demonstrations,...",[Patriot Prayer],[],"[Portland (Ore), Oregon]","[{'type': 'image', 'subtype': 'photo', 'captio...",0
3,nyt://article/0487a919-ec10-5bf5-8f65-449c7a78...,https://www.nytimes.com/2020/08/29/health/coro...,100000007294406,100000007294406,New York Times,2020-08-29,2020-09-01 21:09:22,Health,,health,...,By Apoorva Mandavilli,Article,Your Coronavirus Test Is Positive. Maybe It Sh...,The usual diagnostic tests may simply be too s...,"[Coronavirus (2019-nCoV), Tests (Medical), Con...","[Centers for Disease Control and Prevention, F...",[],[],"[{'type': 'image', 'subtype': 'photo', 'captio...",0
4,nyt://article/7e66f291-6167-5d78-b942-4937278f...,https://www.nytimes.com/2020/08/28/nyregion/ny...,100000007313944,100000007313944,New York Times,2020-08-28,2020-09-03 11:03:02,New York,,new york,...,By Matthew Haag,Article,N.Y.C. Tenants Say They Were Tricked Into Appe...,"‚ÄúI am not a Trump supporter,‚Äù one of the tenan...","[Public and Subsidized Housing, Republican Nat...","[Housing Authority (NYC), Housing and Urban De...","[Patton, Lynne M, de Blasio, Bill, Trump, Dona...",[New York City],"[{'type': 'image', 'subtype': 'photo', 'captio...",0


# Part 02: `Google News Api`

This is what I used today:  https://newsapi.org/s/google-news-api

Then the parameters you can use are here: just scroll down page to see

https://newsapi.org/docs/endpoints/top-headlines


In [None]:

# API_key
api_k='Put the api key here'


url = ('http://newsapi.org/v2/everything?'
       'q=Technology&'
       'from=2020-08-18&'
       'sortBy=popularity&'
       'pageSize=60&'
       'apiKey='+api_k)

response = requests.get(url)


response.json()

{'status': 'ok',
 'totalResults': 97294,
 'articles': [{'source': {'id': None, 'name': 'Lifehacker.com'},
   'author': 'Joel Cunningham',
   'title': "How to Turn Off Alexa's Creepy 'Whisper Mode'",
   'description': 'I love my smart speaker‚Äîas much as one can ever love a piece of privacy-stealing technology that only exists to gather information about you, I suppose‚Äîbut that doesn‚Äôt mean I don‚Äôt find many things about it creepy, in a dystopian sort of way. And one of the ‚Ä¶',
   'url': 'https://lifehacker.com/how-to-turn-off-alexas-creepy-whisper-mode-1844908094',
   'urlToImage': 'https://i.kinja-img.com/gawker-media/image/upload/c_fill,f_auto,fl_progressive,g_center,h_675,pg_1,q_80,w_1200/xnejlf6wytytpdfjhlxc.jpg',
   'publishedAt': '2020-08-31T20:30:00Z',
   'content': 'I love my smart speakeras much as one can ever love a piece of privacy-stealing technology that only exists to gather information about you, I supposebut that doesnt mean I dont find many things abou‚Ä¶ [+1

`------------------`

`Category`: this is what you would like to get the headlines for.
+ Options: here I am using everything, which is an endpoint. This will be good if we want to do some news analysis later.

**`'http://newsapi.org/v2/everything?'`**

**`q`**: is used as a keyword or phrase lookup

**`'from=2020-08-18&'`** the dates for lookup are limited for the free version but good enough for practice anyway

`'sortBy=popularity&'`: self-explanatory I would assume

In [None]:
for i in response.json():

    p=response.json()[i]
    
pd.DataFrame(p).head()

Unnamed: 0,source,author,title,description,url,urlToImage,publishedAt,content
0,"{'id': None, 'name': 'Lifehacker.com'}",Joel Cunningham,How to Turn Off Alexa's Creepy 'Whisper Mode',I love my smart speaker‚Äîas much as one can eve...,https://lifehacker.com/how-to-turn-off-alexas-...,https://i.kinja-img.com/gawker-media/image/upl...,2020-08-31T20:30:00Z,I love my smart speakeras much as one can ever...
1,"{'id': None, 'name': 'Lifehacker.com'}",Elizabeth Yuko,How to Avoid Getting a Last-Minute Booking Blo...,Airbnb is cracking down on parties. They are n...,https://lifehacker.com/how-to-avoid-getting-a-...,https://i.kinja-img.com/gawker-media/image/upl...,2020-09-06T14:00:00Z,Airbnb is cracking down on parties. They are n...
2,"{'id': None, 'name': 'Lifehacker.com'}","Beth Skwarecki on Vitals, shared by Beth Skwar...",Tackle a Hill Head-On,Did you find a new trail to run or hike for la...,https://vitals.lifehacker.com/tackle-a-hill-he...,https://i.kinja-img.com/gawker-media/image/upl...,2020-09-11T16:30:00Z,Did you find a new trail to run or hike for la...
3,"{'id': None, 'name': 'Lifehacker.com'}",David Murphy,How Do I Share a Hard Drive on My Home Network?,I will never fault someone for asking a tech q...,https://lifehacker.com/how-do-i-share-a-hard-d...,https://i.kinja-img.com/gawker-media/image/upl...,2020-08-21T13:00:00Z,I will never fault someone for asking a tech q...
4,"{'id': 'engadget', 'name': 'Engadget'}",Richard Lawler,Watch Elon Musk's Neuralink reveal live at 6PM ET,Last year Neuralink launched with lofty promis...,https://www.engadget.com/neuralink-bmi-livestr...,https://o.aolcdn.com/images/dims?resize=1200%2...,2020-08-28T21:48:56Z,Last year Neuralink launched with lofty promis...


+ `We can read in the data but there is some formatting we need to take care of`

In [None]:

g=defaultdict(list)

for i in p:
    for j in i.items():
        # values of source are dictionaries: we want only the 'names'
        if j[0]=='source':
            g[j[0]].append(j[1]['name'])
        
        # take everything else 
        else:
            g[j[0]].append(j[1])

pd.DataFrame(g).head()

Unnamed: 0,source,author,title,description,url,urlToImage,publishedAt,content
0,Lifehacker.com,Joel Cunningham,How to Turn Off Alexa's Creepy 'Whisper Mode',I love my smart speaker‚Äîas much as one can eve...,https://lifehacker.com/how-to-turn-off-alexas-...,https://i.kinja-img.com/gawker-media/image/upl...,2020-08-31T20:30:00Z,I love my smart speakeras much as one can ever...
1,Lifehacker.com,Elizabeth Yuko,How to Avoid Getting a Last-Minute Booking Blo...,Airbnb is cracking down on parties. They are n...,https://lifehacker.com/how-to-avoid-getting-a-...,https://i.kinja-img.com/gawker-media/image/upl...,2020-09-06T14:00:00Z,Airbnb is cracking down on parties. They are n...
2,Lifehacker.com,"Beth Skwarecki on Vitals, shared by Beth Skwar...",Tackle a Hill Head-On,Did you find a new trail to run or hike for la...,https://vitals.lifehacker.com/tackle-a-hill-he...,https://i.kinja-img.com/gawker-media/image/upl...,2020-09-11T16:30:00Z,Did you find a new trail to run or hike for la...
3,Lifehacker.com,David Murphy,How Do I Share a Hard Drive on My Home Network?,I will never fault someone for asking a tech q...,https://lifehacker.com/how-do-i-share-a-hard-d...,https://i.kinja-img.com/gawker-media/image/upl...,2020-08-21T13:00:00Z,I will never fault someone for asking a tech q...
4,Engadget,Richard Lawler,Watch Elon Musk's Neuralink reveal live at 6PM ET,Last year Neuralink launched with lofty promis...,https://www.engadget.com/neuralink-bmi-livestr...,https://o.aolcdn.com/images/dims?resize=1200%2...,2020-08-28T21:48:56Z,Last year Neuralink launched with lofty promis...


# We can take `PublishedAt` and convert  directly from the DF
 
 But, it is also in a strange format...

In [None]:
type(pd.DataFrame(g)['publishedAt'][0]) # evaluate the type and it is a str NOT date

oo=[]
for i in pd.DataFrame(g)['publishedAt']:
    # pay attention to the format
    oo.append(datetime.datetime.strptime(i, '%Y-%m-%dT%H:%M:%S'+'Z'))

# verify that the columns entries are converted
type(pd.DataFrame(oo)[0][0])

pandas._libs.tslibs.timestamps.Timestamp

In [None]:
news_parse=pd.DataFrame(g)

# new column with our updated time formatting
news_parse['publishedAt']=pd.DataFrame(oo)

news_parse.head()

Unnamed: 0,source,author,title,description,url,urlToImage,publishedAt,content
0,Lifehacker.com,Joel Cunningham,How to Turn Off Alexa's Creepy 'Whisper Mode',I love my smart speaker‚Äîas much as one can eve...,https://lifehacker.com/how-to-turn-off-alexas-...,https://i.kinja-img.com/gawker-media/image/upl...,2020-08-31 20:30:00,I love my smart speakeras much as one can ever...
1,Lifehacker.com,Elizabeth Yuko,How to Avoid Getting a Last-Minute Booking Blo...,Airbnb is cracking down on parties. They are n...,https://lifehacker.com/how-to-avoid-getting-a-...,https://i.kinja-img.com/gawker-media/image/upl...,2020-09-06 14:00:00,Airbnb is cracking down on parties. They are n...
2,Lifehacker.com,"Beth Skwarecki on Vitals, shared by Beth Skwar...",Tackle a Hill Head-On,Did you find a new trail to run or hike for la...,https://vitals.lifehacker.com/tackle-a-hill-he...,https://i.kinja-img.com/gawker-media/image/upl...,2020-09-11 16:30:00,Did you find a new trail to run or hike for la...
3,Lifehacker.com,David Murphy,How Do I Share a Hard Drive on My Home Network?,I will never fault someone for asking a tech q...,https://lifehacker.com/how-do-i-share-a-hard-d...,https://i.kinja-img.com/gawker-media/image/upl...,2020-08-21 13:00:00,I will never fault someone for asking a tech q...
4,Engadget,Richard Lawler,Watch Elon Musk's Neuralink reveal live at 6PM ET,Last year Neuralink launched with lofty promis...,https://www.engadget.com/neuralink-bmi-livestr...,https://o.aolcdn.com/images/dims?resize=1200%2...,2020-08-28 21:48:56,Last year Neuralink launched with lofty promis...


# `Or we can parse from the iteration of JSON directly`

In [None]:
b=defaultdict(list)

for i in p:
    for j in i.items(): # convert to tuples
        
        if j[0]=='source':
             b[j[0]].append(j[1]['name'])
                
        # format date
        elif j[0]=='publishedAt':
            b[j[0]].append(datetime.datetime.strptime(j[1], '%Y-%m-%dT%H:%M:%S'+'Z'))
       
        else:
            b[j[0]].append(j[1])
            
pd.DataFrame(b).head()

Unnamed: 0,source,author,title,description,url,urlToImage,publishedAt,content
0,Lifehacker.com,Joel Cunningham,How to Turn Off Alexa's Creepy 'Whisper Mode',I love my smart speaker‚Äîas much as one can eve...,https://lifehacker.com/how-to-turn-off-alexas-...,https://i.kinja-img.com/gawker-media/image/upl...,2020-08-31 20:30:00,I love my smart speakeras much as one can ever...
1,Lifehacker.com,Elizabeth Yuko,How to Avoid Getting a Last-Minute Booking Blo...,Airbnb is cracking down on parties. They are n...,https://lifehacker.com/how-to-avoid-getting-a-...,https://i.kinja-img.com/gawker-media/image/upl...,2020-09-06 14:00:00,Airbnb is cracking down on parties. They are n...
2,Lifehacker.com,"Beth Skwarecki on Vitals, shared by Beth Skwar...",Tackle a Hill Head-On,Did you find a new trail to run or hike for la...,https://vitals.lifehacker.com/tackle-a-hill-he...,https://i.kinja-img.com/gawker-media/image/upl...,2020-09-11 16:30:00,Did you find a new trail to run or hike for la...
3,Lifehacker.com,David Murphy,How Do I Share a Hard Drive on My Home Network?,I will never fault someone for asking a tech q...,https://lifehacker.com/how-do-i-share-a-hard-d...,https://i.kinja-img.com/gawker-media/image/upl...,2020-08-21 13:00:00,I will never fault someone for asking a tech q...
4,Engadget,Richard Lawler,Watch Elon Musk's Neuralink reveal live at 6PM ET,Last year Neuralink launched with lofty promis...,https://www.engadget.com/neuralink-bmi-livestr...,https://o.aolcdn.com/images/dims?resize=1200%2...,2020-08-28 21:48:56,Last year Neuralink launched with lofty promis...


`------------------`

# <font color=red>LIKE</font>, Share &

# <font color=red>SUB</font>scribe

# <font size=7>‚óîÃØ‚óî</font>