# Guided Project: Exploring Hacker News Posts
from dataquest

## 概要
Hacker Newsというサイトの投稿について分析する。Hacker Newsはradditのよなもので、ユーザの投稿についてコメントや投票ができる（Quoraとか似た雰囲気のサイトかな？）。そして、その中でも投稿のタイトルに`Ask HN`か`Show HN`とついているものに注目する。

`Ask HN`はHacker newsのコミュニティーに質問する時についていて、`Show HN`はコミュニティにプロジェクトやプロダクトを見せる時についている。

In [1]:
from csv import reader

In [2]:
hn = list(reader(open('./data/hacker_news.csv')))

# colunms
column | description
---|---
id| The unique identifier from Hacker News for the post
title| The title of the post
url| The URL that the posts links to, if it the post has a URL
num_points| The number of points the post acquired, calculated as the total number of upvotes minus the total number of downvotes
num_comments| The number of comments that were made on the post
author| The username of the person who submitted the post
created_at| The date and time at which the post was

In [3]:
# ヘッダー行を取り除く
headers = hn.pop(0)
print('Header')
print(headers)

Header
['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']


In [4]:
# 試しにレコードを表示
for row in hn[0:5]:
    print(row)

['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52']
['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30']
['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20']
['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01']
['10301696', 'Note by Note: The Making of Steinway L1037 (2007)', 'http://www.nytimes.com/2007/11/07/movies/07stein.html?_r=0', '8', '2', 'walterbell', '9/30/2015 4:12']


## 投稿のリストを分割する
タイトルが`Ask HN`と`Show HN`で始まるものとそれ以外でリストを3つに分割する

In [5]:
ask_posts = []
show_posts = []
other_posts = []
for row in hn:
    if row[1].lower().startswith('ask hn'):
        ask_posts.append(row)
    elif row[1].lower().startswith('show hn'):
        show_posts.append(row)
    else:
        other_posts.append(row)
print(ask_posts[0])
print(show_posts[0])
print(other_posts[0])

['12296411', 'Ask HN: How to improve my personal website?', '', '2', '6', 'ahmedbaracat', '8/16/2016 9:55']
['10627194', 'Show HN: Wio Link  ESP8266 Based Web of Things Hardware Development Platform', 'https://iot.seeed.cc', '26', '22', 'kfihihc', '11/25/2015 14:03']
['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52']


## 各投稿に対するコメント数を取得する

### Ask HMに対するコメント数

In [11]:
total_ask_comments = 0
for row in ask_posts:
    comment_num = int(row[4])
    total_ask_comments += comment_num
    avg_ask_commnets = total_ask_comments / len(ask_posts)
print('number of total ask comments is ' + str(total_ask_comments))
print('ask posts length is ' + str(len(ask_posts)))
print('comment average of posts is ' + str(avg_ask_commnets))

number of total ask comments is 24483
ask posts length is 1744
comment average of posts is 14.038417431192661


### Show HMに対するコメント数

In [12]:
total_show_comments = 0
for row in show_posts:
    comment_num = int(row[4])
    total_show_comments += comment_num
    avg_show_commnets = total_show_comments / len(show_posts)
print('number of total ask comments is ' + str(total_show_comments))
print('ask posts length is ' + str(len(show_posts)))
print('comment average of posts is ' + str(avg_show_commnets))

number of total ask comments is 11988
ask posts length is 1162
comment average of posts is 10.31669535283993


Ask HMへの平均コメント数は14.04でShow HMへの平均コメント数は10.32。Askの方は質問系の投稿でShowの方は何かを紹介（発表）する系の投稿らしいから、やはり質問に対する回答の方がコメント数は伸びるのだろう。