In [2]:
import json
file = open("hn_2014.json")
hn = json.load(file)

print(type(hn))

<class 'list'>


Let's find out how many objects are in the list, and the type of the first object

In [3]:
print(len(hn))
print(type(hn[0]))

35806
<class 'dict'>


Our data set contains 35,806 dictionary objects, each representing a Hacker News story. Let's look at some keys from the dictionary.

In [4]:
hn[0].keys()

dict_keys(['author', 'numComments', 'points', 'url', 'storyText', 'createdAt', 'tags', 'createdAtI', 'title', 'objectId'])

In [5]:
hn[0]["title"]

'Are we getting too Sassy? Weighing up micro-optimisation vs. maintainability'

In [6]:
def del_key(dict_, key):
    # create a copy so we don't
    # modify the original dict
    modified_dict = dict_.copy()
    del modified_dict[key]
    return modified_dict

In [11]:
hn_clean = [del_key(d, 'createdAtI') for d in hn]
for i in range(0,3):
    print(hn_clean[i],"\n")

{'author': 'dragongraphics', 'numComments': 0, 'points': 2, 'url': 'http://ashleynolan.co.uk/blog/are-we-getting-too-sassy', 'storyText': '', 'createdAt': '2014-05-29T08:07:50Z', 'tags': ['story', 'author_dragongraphics', 'story_7815238'], 'title': 'Are we getting too Sassy? Weighing up micro-optimisation vs. maintainability', 'objectId': '7815238'} 

{'author': 'jcr', 'numComments': 0, 'points': 1, 'url': 'http://spectrum.ieee.org/automaton/robotics/home-robots/telemba-telepresence-robot', 'storyText': '', 'createdAt': '2014-05-29T08:05:58Z', 'tags': ['story', 'author_jcr', 'story_7815234'], 'title': 'Telemba Turns Your Old Roomba and Tablet Into a Telepresence Robot', 'objectId': '7815234'} 

{'author': 'callum85', 'numComments': 0, 'points': 1, 'url': 'http://online.wsj.com/articles/apple-to-buy-beats-1401308971', 'storyText': '', 'createdAt': '2014-05-29T08:05:06Z', 'tags': ['story', 'author_callum85', 'story_7815230'], 'title': 'Apple Agrees to Buy Beats for $3 Billion', 'objectId

Let us extract the url value from each dictionary in hn_clean

In [22]:
urls = [item["url"] for item in hn_clean]
for i in range(0,5):
    print(urls[i])

http://ashleynolan.co.uk/blog/are-we-getting-too-sassy
http://spectrum.ieee.org/automaton/robotics/home-robots/telemba-telepresence-robot
http://online.wsj.com/articles/apple-to-buy-beats-1401308971
http://alexsblog.org/2014/05/29/dont-wait-for-inspiration/
http://techcrunch.com/2014/05/28/hackerone-get-9m-in-series-a-funding-to-build-bug-tracking-bounty-programs/


Let's count the number of stories that have comments. Note that column "numComments" in hn_clean is an integer, with 0 if there are no comments. 

In [25]:
has_comments = [d for d in hn_clean if d["numComments"]>0]
for i in range(10):
    print(has_comments[i]["title"])

Five Super Successful Tech Pivots
Gi Bike: The light, full-size, electric, folding bike
For Hire: Dedicated Young Man With Down Syndrome
World War II in the Pacific, narrated by my grandpa
The NSA can't remotely turn on all phones
Chrome Logger
My Response to the Accusation that Nerds are Misogynistic
Microsoft Lays out the Future of Internet Explorer
The Soylent Revolution Will Not Be Pleasurable
Pure 0.5.0 – Get Started with Grids


In [30]:
print(len(has_comments), " total titles with at least one comment")

9279  total titles with at least one comment


Let's count how many stories have more than 1000 points and display their titles.

In [40]:
thousand_points = [d for d in hn_clean if d["points"]>1000]
num_thousand_points = len(thousand_points)
print(num_thousand_points) 

8


In [41]:
for row in thousand_points:
    print(row["points"], " points", ": ", row["title"])

1297  points :  Microsoft Open Sources C# Compiler
1192  points :  Elon Musk: To the People of New Jersey
2732  points :  2048
1095  points :  The Face Behind Bitcoin?
1054  points :  Facebook Buying WhatsApp for $16B in Cash and Stock Plus $3B in RSUs
1958  points :  Today is The Day We Fight Back
1062  points :  Mystery signal from a helicopter
1522  points :  Wozniak: “Actually, the movie was largely a lie about me”


Let's find the story that has the greatest number of comments. First let's write a small function to help print json objects nicely.

In [43]:
def jprint(obj):
    # create a formatted string of the Python JSON object
    text = json.dumps(obj, sort_keys=True, indent=4)
    print(text)

In [44]:
def key_function(json_dict):
    return json_dict["numComments"]

most_comments = max(hn_clean, key=key_function)

jprint(most_comments)

{
    "author": "platz",
    "createdAt": "2014-04-03T19:02:53Z",
    "numComments": 1208,
    "objectId": "7525198",
    "points": 889,
    "storyText": null,
    "tags": [
        "story",
        "author_platz",
        "story_7525198"
    ],
    "title": "Brendan Eich Steps Down as Mozilla CEO",
    "url": "https://blog.mozilla.org/blog/2014/04/03/brendan-eich-steps-down-as-mozilla-ceo/"
}


Next let us sort the items in our JSON list alphabetically by name

In [55]:
temp = sorted(hn_clean, key=lambda x: x["author"])
print(len(temp))
for i in range(0,35806,250):
    jprint(temp[i])

35806
{
    "author": "001sky",
    "createdAt": "2014-05-28T10:24:46Z",
    "numComments": 1,
    "objectId": "7809196",
    "points": 1,
    "storyText": "see also: S&amp;P Assigns &#x27;B-&#x27; Rating to Tesla (TSLA); Notes &#x27;Vulnerable&#x27; Business Risk Profile<p>http:&#x2F;&#x2F;www.streetinsider.com&#x2F;Credit+Ratings&#x2F;S%26P+Assigns+B-+Rating+to+Tesla+%28TSLA%29%3B+Notes+Vulnerable+Business+Risk+Profile&#x2F;9525763.html",
    "tags": [
        "story",
        "author_001sky",
        "story_7809196"
    ],
    "title": "S&P Rates Tesla Debt as 'Junk' \u2013 Update",
    "url": "http://online.wsj.com/article/BT-CO-20140527-712847.html"
}
{
    "author": "ALee",
    "createdAt": "2014-02-23T02:09:00Z",
    "numComments": 0,
    "objectId": "7284623",
    "points": 3,
    "storyText": "",
    "tags": [
        "story",
        "author_ALee",
        "story_7284623"
    ],
    "title": "Uber and Lyft plan Houston invasion by offering services for free",
    "url": "http

Next, calculate the item in our JSON list with the smallest age: