## Submissions to Hacker News

### Questions vs answers, frequency of comments

##### Luca Vehbiu, 02-05-2019

In order to reinforce the lessons learned in working with *strings*, *instances* and *date & times* the dataset from [Hacker News](https://www.kaggle.com/hacker-news/hacker-news-posts) will serve for such purpose.

This analyses will compare these two types of posts to determine the following:

   * Do **Ask HN** or **Show HN** receive more comments on average?
   * Do posts created at a certain time receive more comments on average?


In [1]:
#Read the data
import csv
file = open("hacker_news.csv")
file = csv.reader(file)
hn = list(file)

#row with column names
headers = hn[0]

#rest of data
hn = hn[1:]

headers

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']

In [2]:
#Separate the data into 3 

#ask posts
ask = []
post = []
other = []


for row in hn:
    title = row[1]
    title = title.lower() #to avoid any mistakes with uppercase
    if title.startswith('ask hn'):
        ask.append(row)
    elif title.startswith('show hn'):
        post.append(row)
    else:
        other.append(row)
        
print(len(ask))
print(len(post))
print(len(other))

1744
1162
17194


In [9]:
#Find which gets more comments on average
total_ask_comment = 0
total_post_comm = 0

#loop for ask posts
for row in ask:
    total_ask_comment += int(row[4])

avg_ask = total_ask_comment / len(ask)
print("Average for Ask HN:", avg_ask)

for row in post:
    total_post_comm += int(row[4])

avg_post = total_post_comm / len(post)

print("Average for Show HN:", avg_post)

Average for Ask HN: 14.038417431192661
Average for Show HN: 10.31669535283993


It seems that Ask HN receive more comments on average so we will focus the rest of the analyses on such posts. Next, we'll determine if **ask posts created at a certain time** are more likely **to attract comments**. We'll use the following steps to perform this analysis:

   1. Calculate the amount of ask posts created in each hour of the day, along with the number of comments received.
   2. Calculate the average number of comments ask posts receive by hour created.


In [14]:
#Work with the dates column
import datetime as dt
result_list = []

for row in ask:
    created_at = row[6]
    nr = int(row[4])
    result_list.append(
        [created_at, nr]
                      )

counts_by_row = {}
comments_by_row = {}

l_format = "%m/%d/%Y %H:%M"

for row in result_list:
    nr = row[1]
    date = row[0]
    date = dt.datetime.strptime(date, l_format).strftime("%H")
    if date not in counts_by_row:
        counts_by_row[date] = 1
        comments_by_row[date] = nr
    else:
        counts_by_row[date] += 1
        comments_by_row[date] += nr
    
    


In [36]:
#Average number of comments per hour of the day
avg_hour = []

for post in comments_by_row:
    avg_hour.append([post, comments_by_row[post]/ counts_by_row[post]])


In [50]:
swap = []

for post in avg_hour:
    swap.append([post[1], post[0]])

swap = sorted(swap, reverse  = True)
print("Top 5 hours for Ask comments:", swap[:5])

Top 5 hours for Ask comments: [[38.5948275862069, '15'], [23.810344827586206, '02'], [21.525, '20'], [16.796296296296298, '16'], [16.009174311926607, '21']]


In [51]:
for avg, hr in swap[:6]:
    print("{}: {:.2f} average comments per post".format(
        dt.datetime.strptime(hr, "%H").strftime("%H"), avg))

15: 38.59 average comments per post
02: 23.81 average comments per post
20: 21.52 average comments per post
16: 16.80 average comments per post
21: 16.01 average comments per post
13: 14.74 average comments per post


[[16.796296296296298, '16'],
 [6.746478873239437, '22'],
 [13.233644859813085, '14'],
 [9.022727272727273, '06'],
 [23.810344827586206, '02'],
 [16.009174311926607, '21'],
 [10.08695652173913, '05'],
 [7.852941176470588, '07'],
 [7.985294117647059, '23'],
 [7.170212765957447, '04'],
 [14.741176470588234, '13'],
 [8.127272727272727, '00'],
 [13.440677966101696, '10'],
 [5.5777777777777775, '09'],
 [10.8, '19'],
 [11.46, '17'],
 [10.25, '08'],
 [38.5948275862069, '15'],
 [21.525, '20'],
 [9.41095890410959, '12'],
 [11.383333333333333, '01'],
 [7.796296296296297, '03'],
 [11.051724137931034, '11'],
 [13.20183486238532, '18']]