# "Ask HN" or "Show HN": Which type of posts are more attractive to users on Hacker News 

Hacker News is a famous website where users can publish their posts. In the website, there are two common types of posts and the title of them begin with "Ask HN" or "Show HN".

In this projects, we will analyze randomly collected posts from Hacker News and compare the two types of posts to get some insights. We will determine the following:
- Do `Ask HN` or `Show HN` receive more comments on average?
- Do posts created at a certain time receive more comments on average?

The data of posts form Hacker News could be downloaded from [here](https://www.kaggle.com/hacker-news/hacker-news-posts).

## Opening and Exploring the Data

In [1]:
from csv import reader
opened_file = open("HN_posts_year_to_Sep_26_2016.csv")
read_file = reader(opened_file)
hn = list(read_file)
hn_header = hn[0]
hn = hn[1:]

def explore(hn, start=0, end=4):
    for row in hn[start:end]:
        print(row, "\n")
    print("\n")
        
print(hn_header)
explore(hn)

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']
['12579008', 'You have two days to comment if you want stem cells to be classified as your own', 'http://www.regulations.gov/document?D=FDA-2015-D-3719-0018', '1', '0', 'altstar', '9/26/2016 3:26'] 

['12579005', 'SQLAR  the SQLite Archiver', 'https://www.sqlite.org/sqlar/doc/trunk/README.md', '1', '0', 'blacksqr', '9/26/2016 3:24'] 

['12578997', 'What if we just printed a flatscreen television on the side of our boxes?', 'https://medium.com/vanmoof/our-secrets-out-f21c1f03fdc8#.ietxmez43', '1', '0', 'pavel_lishin', '9/26/2016 3:19'] 

['12578989', 'algorithmic music', 'http://cacm.acm.org/magazines/2011/7/109891-algorithmic-composition/fulltext', '1', '0', 'poindontcare', '9/26/2016 3:16'] 





## Separating the Data into "Ask HN", "Show HN", or other Posts

In [7]:
ask_posts = []
show_posts = []
other_posts = []

for row in hn:
    title = row[1]
    if title.lower().startswith('ask hn'):
        ask_posts.append(row)
    elif title.lower().startswith('show hn'):
        show_posts.append(row)
    else:
        other_posts.append(row)

explore(ask_posts)
explore(show_posts)
explore(other_posts)

['12578908', 'Ask HN: What TLD do you use for local development?', '', '4', '7', 'Sevrene', '9/26/2016 2:53'] 

['12578522', 'Ask HN: How do you pass on your work when you die?', '', '6', '3', 'PascLeRasc', '9/26/2016 1:17'] 

['12577908', 'Ask HN: How a DNS problem can be limited to a geographic region?', '', '1', '0', 'kuon', '9/25/2016 22:57'] 

['12577870', 'Ask HN: Why join a fund when you can be an angel?', '', '1', '3', 'anthony_james', '9/25/2016 22:48'] 



['12578335', 'Show HN: Finding puns computationally', 'http://puns.samueltaylor.org/', '2', '0', 'saamm', '9/26/2016 0:36'] 

['12578182', 'Show HN: A simple library for complicated animations', 'https://christinecha.github.io/choreographer-js/', '1', '0', 'christinecha', '9/26/2016 0:01'] 

['12578098', 'Show HN: WebGL visualization of DNA sequences', 'http://grondilu.github.io/dna.html', '1', '0', 'grondilu', '9/25/2016 23:44'] 

['12577991', 'Show HN: Pomodoro-centric, heirarchical project management with ES6 modules', '

## Get Average Comments of a Type of Post

In [8]:
def get_avg_comments(hn):
    total_comments = 0
    for row in hn:
        total_comments += int(row[4])
    avg_comments = total_comments / len(hn)
    return round(avg_comments, 1)


print("The average number of comments of 'Ask HN' were", get_avg_comments(ask_posts))
print("The average number of comments of 'Show HN' were", get_avg_comments(show_posts))

The average number of comments of 'Ask HN' were 10.4
The average number of comments of 'Show HN' were 4.9


According to this result, you can get more comments if you publish "Ask HN" posts rather than "Show HN" posts.

## Clean Time Format and Get Ranking of Comments by Hour

In [9]:
import datetime as dt

def clean_date(date_str):
    date_list = date_str.split()
    day, month, year = date_list[0].split("/")
    hour, minute = date_list[1].split(":")
    return "{}/{}/{} {}:{}".format(day.zfill(2), month.zfill(2), year, hour.zfill(2), minute)

def hour_comments(hn):
    result_list = []
    for row in hn:
        result_list.append([row[6], int(row[4])])
    
    counts_by_hour = {}
    comments_by_hour = {}
    for r in result_list:
        time_dt = dt.datetime.strptime(clean_date(r[0]), "%m/%d/%Y %H:%M")    
        time_h = time_dt.strftime("%H")
        
        if time_h not in counts_by_hour:
            counts_by_hour[time_h] = 1
            comments_by_hour[time_h] = r[1]
        else:
            counts_by_hour[time_h] += 1
            comments_by_hour[time_h] += r[1]
    return counts_by_hour, comments_by_hour

ask_counts_by_hour, ask_comments_by_hour = hour_comments(ask_posts)
print(ask_counts_by_hour)
print(ask_comments_by_hour)
print("\n")

def get_avg_by_hour_list(counts, comments):
    avg_by_hour = []
    for c in counts:
        avg_by_hour.append([c, (comments[c]/counts[c])])
    return avg_by_hour
ask_avg_by_hour = get_avg_by_hour_list(ask_counts_by_hour, ask_comments_by_hour)
print("[hour, average comments]")
print(ask_avg_by_hour, "\n")

def swap_list(list_of_lists):
    output_list = []
    for row in list_of_lists:
        output_list.append([row[1], row[0]])
    return output_list

def sort_lists_by_hour(avg_by_hour, name_index):
    avg_by_hour_swapped = swap_list(avg_by_hour)
    sorted_avg_hour = sorted(avg_by_hour_swapped, reverse=True)
    for ranked_by_hour in sorted_avg_hour[:5]:
        hour_dt = dt.datetime.strptime(ranked_by_hour[1], "%H")
        hour_h = hour_dt.strftime("%H:%M")
        print("{}: {:.2f} average {} per post".format(hour_h, ranked_by_hour[0], name_index))
    print("\n")

print("Ranking: the average number of comments per 'Ask HN' post by hour")
sort_lists_by_hour(ask_avg_by_hour, "comments")

print("Ranking: the average number of comments per 'Show HN' post by hour")
show_counts, show_comments = hour_comments(show_posts)
sort_lists_by_hour(get_avg_by_hour_list(show_counts, show_comments), "comments")

print("Ranking: the average number of comments per 'Other' post by hour")
show_counts, show_comments = hour_comments(other_posts)
sort_lists_by_hour(get_avg_by_hour_list(show_counts, show_comments), "comments")

{'02': 269, '01': 282, '22': 383, '21': 518, '19': 552, '17': 587, '15': 646, '14': 513, '13': 444, '11': 312, '10': 282, '09': 222, '07': 226, '03': 271, '23': 343, '20': 510, '16': 579, '08': 257, '00': 301, '18': 614, '12': 342, '04': 243, '06': 234, '05': 209}
{'02': 2996, '01': 2089, '22': 3372, '21': 4500, '19': 3954, '17': 5547, '15': 18525, '14': 4972, '13': 7245, '11': 2797, '10': 3013, '09': 1477, '07': 1585, '03': 2154, '23': 2297, '20': 4462, '16': 4466, '08': 2362, '00': 2277, '18': 4877, '12': 4234, '04': 2360, '06': 1587, '05': 1838}


[hour, average comments]
[['02', 11.137546468401487], ['01', 7.407801418439717], ['22', 8.804177545691905], ['21', 8.687258687258687], ['19', 7.163043478260869], ['17', 9.449744463373083], ['15', 28.676470588235293], ['14', 9.692007797270955], ['13', 16.31756756756757], ['11', 8.96474358974359], ['10', 10.684397163120567], ['09', 6.653153153153153], ['07', 7.013274336283186], ['03', 7.948339483394834], ['23', 6.696793002915452], ['20', 8.7

## Get Average Points of a Type of Post

In [11]:
def get_avg_points(hn):
    total_points = 0
    for row in hn:
        total_points += int(row[3])
    return round((total_points / len(hn)), 1)

print("Average points of ask posts:", get_avg_points(ask_posts))
print("Average points of show posts:", get_avg_points(show_posts))
print("Average points of other posts:", get_avg_points(other_posts))

Average points of ask posts: 11.3
Average points of show posts: 14.8
Average points of other posts: 15.2


## Get Ranking of Points by Hour

In [6]:
def get_index_by_hour(hn, index):
    result_list = []
    for row in hn:
        result_list.append([row[6], int(row[index])])
    
    counts_by_hour = {}
    index_by_hour = {}
    for r in result_list:
        time_dt = dt.datetime.strptime(clean_date(r[0]), "%m/%d/%Y %H:%M")    
        time_h = time_dt.strftime("%H")
        
        if time_h not in counts_by_hour:
            counts_by_hour[time_h] = 1
            index_by_hour[time_h] = r[1]
        else:
            counts_by_hour[time_h] += 1
            index_by_hour[time_h] += r[1]
    return counts_by_hour, index_by_hour

print("Ranking: the average points per 'Ask HN' post by hour")
counts_by_hour, points_by_hour = get_index_by_hour(ask_posts, 3)
sort_lists_by_hour(get_avg_by_hour_list(counts_by_hour, points_by_hour), name_index = "points")

print("Ranking: the average points per 'Show HN' post by hour")
counts_by_hour, points_by_hour = get_index_by_hour(show_posts, 3)
sort_lists_by_hour(get_avg_by_hour_list(counts_by_hour, points_by_hour), name_index = "points")

print("Ranking: the average points per 'Other post by hour")
counts_by_hour, points_by_hour = get_index_by_hour(other_posts, 3)
sort_lists_by_hour(get_avg_by_hour_list(counts_by_hour, points_by_hour), name_index = "points")

Ranking: the average points per 'Ask HN' post by hour
15:00: 21.64 average points per post
13:00: 17.93 average points per post
12:00: 13.58 average points per post
10:00: 13.44 average points per post
17:00: 12.19 average points per post


Ranking: the average points per 'Show HN' post by hour
12:00: 20.91 average points per post
11:00: 19.26 average points per post
13:00: 17.02 average points per post
19:00: 16.06 average points per post
06:00: 15.99 average points per post


Ranking: the average points per 'Other post by hour
02:00: 16.71 average points per post
12:00: 16.70 average points per post
11:00: 16.29 average points per post
00:00: 16.12 average points per post
13:00: 16.02 average points per post


