# The Best Hour of the Day to Post on Hacker News: Analyzing the Posts' Data

This project aims to analyze a data set of submissions to the popular technology site Hacker News. According to [Wikipedia](https://en.wikipedia.org/wiki/Hacker_News), Hacker News is a social news website focusing on computer science and entrepreneurship and it is run by Y Combinator, Paul Graham's investment fund and startup incubator.

The data set we'll use is available [here](https://www.kaggle.com/hacker-news/hacker-news-posts) and has almost 300,000 rows, each row representing a post. It includes the following columns:

   * title: title of the post (self explanatory)

   * url: the url of the item being linked to

   * num_points: the number of upvotes the post received

   * num_comments: the number of comments the post received

   * author: the name of the account that made the post

   * created_at: the date and time the post was made (the time zone is Eastern Time in the US)
 
For this project, we are particulary interested in posts whose titles begin wih *Ask HN* and *Show HN*. The first one is used to aks the community a question while the second one is used to show the community something, it could be a project, a product, or just something the author finds interesting enough to share. Our goal is to determine if a post created in a particular moment of tha day is more interacted with than posts cretated in other moments.

**Disclaimer: This is guided project from the DataQuest's "Python for Data Science: Intermediate" course developed by learning purposes. Although it may look like other projects made for the same reason, this project has some features of its own implemented by me. The DataQuest plataform provides guidance over the project, but no solution was available whatsoever. Every line in this project was typed by me.**


We'll begin by opening the `.csv` file.

In [1]:
from csv import reader
opened_file = open('HN_posts_year_to_Sep_26_2016.csv', encoding="utf8")
read_file = reader(opened_file)
hn_file = list(read_file)
hn_data = hn_file[1:]
hn_header = hn_file[0]

We'll write a function to easier explore the database as it shows a particular range of rows and, if requested, the total number of rows and columns.

In [2]:
def explore_data(data_set, start, end, rows_colunms=False):
    sliced_dataset = data_set[start:end]
    for row in sliced_dataset:
        print(row)
        print('\n') 
    if rows_colunms:
        print('Total number of colunms: ', len(data_set[0]))
        print('Total number of rows: ', len(data_set))
        
print(hn_header)
print('\n')
explore_data(hn_data, 0, 6, True)

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']


['12579008', 'You have two days to comment if you want stem cells to be classified as your own', 'http://www.regulations.gov/document?D=FDA-2015-D-3719-0018', '1', '0', 'altstar', '9/26/2016 3:26']


['12579005', 'SQLAR  the SQLite Archiver', 'https://www.sqlite.org/sqlar/doc/trunk/README.md', '1', '0', 'blacksqr', '9/26/2016 3:24']


['12578997', 'What if we just printed a flatscreen television on the side of our boxes?', 'https://medium.com/vanmoof/our-secrets-out-f21c1f03fdc8#.ietxmez43', '1', '0', 'pavel_lishin', '9/26/2016 3:19']


['12578989', 'algorithmic music', 'http://cacm.acm.org/magazines/2011/7/109891-algorithmic-composition/fulltext', '1', '0', 'poindontcare', '9/26/2016 3:16']


['12578979', 'How the Data Vault Enables the Next-Gen Data Warehouse and Data Lake', 'https://www.talend.com/blog/2016/05/12/talend-and-Â\x93the-data-vaultÂ\x94', '1', '0', 'markgainor1', '9/26/2016 3:14']


['12578975

As we are looking for Ask HN and Show HN posts, so we will now use a `for` loop to search for them in the database. Every post whose title begins with Ask HN will be added to "ask" list while posts whose title begin iwth Show HF will be added to the "show" list. The other posts will added to the "other" list.

In [3]:
ask = []
show = []
other = []
for post in hn_data:
    title = post[1].strip().lower()
    if title.startswith('ask hn'):
        ask.append(post)
    elif title.startswith('show hn'):
        show.append(post)
    else:
        other.append(post)

print('Total number of Ask HN posts: ', len(ask))
print('Total number of Show HN posts: ', len(show))
print('Total number of other posts: ', len(other))

Total number of Ask HN posts:  9139
Total number of Show HN posts:  10158
Total number of other posts:  273822


We can see that the majority of posts are neither Ask HN nor Show HN.  This does not affect our goal, though.

Let's do some quickly exploring in these lists.

In [4]:
explore_data(ask, 0, 6, True)
print('\n')
explore_data(show, 0, 6, True)
print('\n')
explore_data(other, 0, 6, True)

['12578908', 'Ask HN: What TLD do you use for local development?', '', '4', '7', 'Sevrene', '9/26/2016 2:53']


['12578522', 'Ask HN: How do you pass on your work when you die?', '', '6', '3', 'PascLeRasc', '9/26/2016 1:17']


['12577908', 'Ask HN: How a DNS problem can be limited to a geographic region?', '', '1', '0', 'kuon', '9/25/2016 22:57']


['12577870', 'Ask HN: Why join a fund when you can be an angel?', '', '1', '3', 'anthony_james', '9/25/2016 22:48']


['12577647', 'Ask HN: Someone uses stock trading as passive income?', '', '5', '2', '00taffe', '9/25/2016 21:50']


['12576946', 'Ask HN: How hard would it be to make a cheap, hackable phone?', '', '2', '1', 'hkt', '9/25/2016 19:30']


Total number of colunms:  7
Total number of rows:  9139


['12578335', 'Show HN: Finding puns computationally', 'http://puns.samueltaylor.org/', '2', '0', 'saamm', '9/26/2016 0:36']


['12578182', 'Show HN: A simple library for complicated animations', 'https://christinecha.github.io/choreograp

We'll now calculate the avarege number of comments for Ask HN and Show HN posts. We will loop through both of the lists and addthe number of comments in each post to a variable and then dividing this number by the total number of posts.

In [5]:
# Calculating the average for Ask HN
total_ask_comments = 0
for post in ask:
    num_comments = int(post[4])
    total_ask_comments += num_comments

avg_ask_comments = total_ask_comments / len(ask)
print('Average Ask HN post comments: {:.2f}'.format(avg_ask_comments))

# Calculating the average for Show HN
total_show_comments = 0
for post in show:
    num_comments = int(post[4])
    total_show_comments += num_comments

avg_show_comments = total_show_comments / len(show)
print('Average Show HN post comments: {:.2f}'.format(avg_show_comments))

Average Ask HN post comments: 10.39
Average Show HN post comments: 4.89


We can see the tha Ask HN posts receive about twice as much commentaries than the Show HN posts. Users seem to be more interested in answer people's questions than interacting with what they want to show.

## Analyzing Ask HN posts

We'll begin by analyzing the Ask HN posts separately and walk the reader through the process. Then, we'll do the same with the Show HN posts and even with the other posts but in a more straight foward way.

Let's determine if Ask HN posts created at a certain time are more likely to attract comments. For that, we'll loop through the Ask HN list and retrieve the hour the post was created ans the number of comments it had, adding this data to a new list.

In [6]:
ask_hours = []
for post in ask:
    temp_list = [post[-1], int(post[4])]
    ask_hours.append(temp_list)

print(ask_hours)

[['9/26/2016 2:53', 7], ['9/26/2016 1:17', 3], ['9/25/2016 22:57', 0], ['9/25/2016 22:48', 3], ['9/25/2016 21:50', 2], ['9/25/2016 19:30', 1], ['9/25/2016 19:22', 22], ['9/25/2016 17:55', 3], ['9/25/2016 15:48', 0], ['9/25/2016 15:35', 13], ['9/25/2016 15:28', 0], ['9/25/2016 14:43', 0], ['9/25/2016 14:17', 3], ['9/25/2016 13:08', 2], ['9/25/2016 11:27', 2], ['9/25/2016 10:51', 0], ['9/25/2016 10:47', 6], ['9/25/2016 9:04', 97], ['9/25/2016 7:09', 4], ['9/25/2016 3:00', 1], ['9/24/2016 23:04', 0], ['9/24/2016 22:02', 7], ['9/24/2016 21:18', 2], ['9/24/2016 20:58', 0], ['9/24/2016 19:57', 1], ['9/24/2016 19:02', 0], ['9/24/2016 17:55', 0], ['9/24/2016 17:27', 1], ['9/24/2016 16:50', 0], ['9/24/2016 16:03', 5], ['9/24/2016 15:29', 66], ['9/24/2016 14:03', 1], ['9/24/2016 10:10', 11], ['9/24/2016 8:46', 7], ['9/24/2016 8:39', 1], ['9/24/2016 8:38', 1], ['9/24/2016 8:28', 1], ['9/24/2016 3:36', 3], ['9/24/2016 0:21', 2], ['9/23/2016 23:38', 6], ['9/23/2016 23:35', 6], ['9/23/2016 22:13', 4

As you can see, we have a huge list with small lists inside. Each small list contains the moment of creation and number os comments of a post, in this case, a Ask HN post.

Next, we'll use the `strptime` and `strftime` methods from the `datetime` class to retrieve only the hour the post was created from each small list inside the huge list above. Each hour of the day will become a key to a dictionary. There will be two dictionaries using the hours as keys:
   
   * The first one will contain the number of posts created in each hour; and
   * The second one will contain the number of comments that posts created in that hour received.

In [7]:
ask_posts_hour = {}
ask_comments_hour = {}
import datetime as dt

for each in ask_hours:
    comments = each[1]
    hour = each[0]
    hour = dt.datetime.strptime(hour, '%m/%d/%Y %H:%M')
    hour = dt.datetime.strftime(hour, '%H')
    if hour in ask_posts_hour:
        ask_posts_hour[hour] += 1
        ask_comments_hour[hour] += comments
    else:
        ask_posts_hour[hour] = 1
        ask_comments_hour[hour] = comments
        
print(ask_posts_hour)
print(ask_comments_hour)

{'02': 269, '01': 282, '22': 383, '21': 518, '19': 552, '17': 587, '15': 646, '14': 513, '13': 444, '11': 312, '10': 282, '09': 222, '07': 226, '03': 271, '23': 343, '20': 510, '16': 579, '08': 257, '00': 301, '18': 614, '12': 342, '04': 243, '06': 234, '05': 209}
{'02': 2996, '01': 2089, '22': 3372, '21': 4500, '19': 3954, '17': 5547, '15': 18525, '14': 4972, '13': 7245, '11': 2797, '10': 3013, '09': 1477, '07': 1585, '03': 2154, '23': 2297, '20': 4462, '16': 4466, '08': 2362, '00': 2277, '18': 4877, '12': 4234, '04': 2360, '06': 1587, '05': 1838}


We'll use the function below to display the results in a descending order.

In [8]:
def display_table(table):   
    table_display = []
    for key in table:
        key_tuple = (table[key], key)
        table_display.append(key_tuple)
    table_sorted = sorted(table_display, reverse=True)
    for tuple in table_sorted:
        print(tuple[1], ':', tuple[0])
        
        
print('Hour : Number of posts')
display_table(ask_posts_hour)
print('\n')
print('Hour : Number of comments')
display_table(ask_comments_hour)

Hour : Number of posts
15 : 646
18 : 614
17 : 587
16 : 579
19 : 552
21 : 518
14 : 513
20 : 510
13 : 444
22 : 383
23 : 343
12 : 342
11 : 312
00 : 301
10 : 282
01 : 282
03 : 271
02 : 269
08 : 257
04 : 243
06 : 234
07 : 226
09 : 222
05 : 209


Hour : Number of comments
15 : 18525
13 : 7245
17 : 5547
14 : 4972
18 : 4877
21 : 4500
16 : 4466
20 : 4462
12 : 4234
19 : 3954
22 : 3372
10 : 3013
02 : 2996
11 : 2797
08 : 2362
04 : 2360
23 : 2297
00 : 2277
03 : 2154
01 : 2089
05 : 1838
06 : 1587
07 : 1585
09 : 1477


We can easily see the the majority of posts are created between 15 and 16 o'clock and those are also the posts with more comments. However it's interesting to notice that the moment between 13 and 14 o'clock is only the nineth period of time in posts created but the second one in comments received. 

Let's calculate the average of comments per period of time so we can see easier see which moment of day you should create a post to receive more comments.

In [9]:
ask_avg_comments_hour = {}
for key in ask_comments_hour:
    ask_avg_comments_hour[key] = round(ask_comments_hour[key] / ask_posts_hour[key], 2)
display_table(ask_avg_comments_hour)

15 : 28.68
13 : 16.32
12 : 12.38
02 : 11.14
10 : 10.68
04 : 9.71
14 : 9.69
17 : 9.45
08 : 9.19
11 : 8.96
22 : 8.8
05 : 8.79
20 : 8.75
21 : 8.69
03 : 7.95
18 : 7.94
16 : 7.71
00 : 7.56
01 : 7.41
19 : 7.16
07 : 7.01
06 : 6.78
23 : 6.7
09 : 6.65


Definitely, the best moment of the day to create an Ask HN post if you want you question to have lots of answer is between 15 and 16 o'clock. Between 12 and 14 o'clock is also a great option.

We'll now use a adapted version of `display_table()` function to display this data in a more comfortable way to be read.

In [10]:
def adapted_display_table(table):   
    table_display = []
    for key in table:
        key_tuple = (table[key], key)
        table_display.append(key_tuple)
    table_sorted = sorted(table_display, reverse=True)
    for tuple in table_sorted:
        print('{}:00: {} average comments per posts.\n'.format(tuple[1], tuple[0]))

adapted_display_table(ask_avg_comments_hour)

15:00: 28.68 average comments per posts.

13:00: 16.32 average comments per posts.

12:00: 12.38 average comments per posts.

02:00: 11.14 average comments per posts.

10:00: 10.68 average comments per posts.

04:00: 9.71 average comments per posts.

14:00: 9.69 average comments per posts.

17:00: 9.45 average comments per posts.

08:00: 9.19 average comments per posts.

11:00: 8.96 average comments per posts.

22:00: 8.8 average comments per posts.

05:00: 8.79 average comments per posts.

20:00: 8.75 average comments per posts.

21:00: 8.69 average comments per posts.

03:00: 7.95 average comments per posts.

18:00: 7.94 average comments per posts.

16:00: 7.71 average comments per posts.

00:00: 7.56 average comments per posts.

01:00: 7.41 average comments per posts.

19:00: 7.16 average comments per posts.

07:00: 7.01 average comments per posts.

06:00: 6.78 average comments per posts.

23:00: 6.7 average comments per posts.

09:00: 6.65 average comments per posts.



## Repeating the process for Show HN and other posts



We'll now repeat this whole process for the Show HN and other post. We will do it in a more straight foward way as you are already familiar with the process.

Fisrt, we will do the whole thing to the Show HN posts and we'll come out with the average number of  comments in each hour o the day.

In [11]:
show_hours = []
for post in show:
    temp_list = [post[-1], int(post[4])]
    show_hours.append(temp_list)
    

show_posts_hour = {}
show_comments_hour = {}

for each in show_hours:
    comments = each[1]
    hour = each[0]
    hour = dt.datetime.strptime(hour, '%m/%d/%Y %H:%M')
    hour = dt.datetime.strftime(hour, '%H')
    if hour in show_posts_hour:
        show_posts_hour[hour] += 1
        show_comments_hour[hour] += comments
    else:
        show_posts_hour[hour] = 1
        show_comments_hour[hour] = comments
        
show_avg_comments_hour = {}
for key in show_comments_hour:
    show_avg_comments_hour[key] = round(show_comments_hour[key] / show_posts_hour[key], 2)

Now let's do the same for the other posts.

In [12]:
other_hours = []
for post in other:
    temp_list = [post[-1], int(post[4])]
    other_hours.append(temp_list)
    

other_posts_hour = {}
other_comments_hour = {}

for each in other_hours:
    comments = each[1]
    hour = each[0]
    hour = dt.datetime.strptime(hour, '%m/%d/%Y %H:%M')
    hour = dt.datetime.strftime(hour, '%H')
    if hour in other_posts_hour:
        other_posts_hour[hour] += 1
        other_comments_hour[hour] += comments
    else:
        other_posts_hour[hour] = 1
        other_comments_hour[hour] = comments
        
other_avg_comments_hour = {}
for key in other_comments_hour:
    other_avg_comments_hour[key] = round(other_comments_hour[key] / other_posts_hour[key], 2)

We'll now take advantage of the `display_table()` function to to print the results for each of the lists.

In [13]:
print('Ask HN')
display_table(ask_avg_comments_hour)
print('\n')
print('Show HN')
display_table(show_avg_comments_hour)
print('\n')
print('Other')
display_table(other_avg_comments_hour)

Ask HN
15 : 28.68
13 : 16.32
12 : 12.38
02 : 11.14
10 : 10.68
04 : 9.71
14 : 9.69
17 : 9.45
08 : 9.19
11 : 8.96
22 : 8.8
05 : 8.79
20 : 8.75
21 : 8.69
03 : 7.95
18 : 7.94
16 : 7.71
00 : 7.56
01 : 7.41
19 : 7.16
07 : 7.01
06 : 6.78
23 : 6.7
09 : 6.65


Show HN
12 : 6.99
07 : 6.68
11 : 6.0
08 : 5.6
14 : 5.52
13 : 5.43
02 : 5.15
04 : 5.04
19 : 5.02
18 : 4.94
16 : 4.71
06 : 4.71
09 : 4.67
00 : 4.65
15 : 4.57
23 : 4.53
03 : 4.53
17 : 4.25
20 : 4.16
21 : 4.09
01 : 4.07
22 : 3.85
10 : 3.8
05 : 3.44


Other
12 : 7.59
11 : 7.37
02 : 7.18
13 : 7.15
05 : 6.79
00 : 6.61
09 : 6.58
04 : 6.56
10 : 6.48
18 : 6.46
01 : 6.46
17 : 6.44
03 : 6.43
14 : 6.4
15 : 6.39
19 : 6.35
08 : 6.28
16 : 6.19
06 : 6.19
07 : 6.05
23 : 6.01
20 : 5.92
21 : 5.9
22 : 5.84


And we'll also use the adapted version of the `display table()` function to print the results in a more readable manner.

In [14]:
print('Ask HN')
adapted_display_table(ask_avg_comments_hour)
print('\n')
print('Show HN')
adapted_display_table(show_avg_comments_hour)
print('\n')
print('Other')
adapted_display_table(other_avg_comments_hour)

Ask HN
15:00: 28.68 average comments per posts.

13:00: 16.32 average comments per posts.

12:00: 12.38 average comments per posts.

02:00: 11.14 average comments per posts.

10:00: 10.68 average comments per posts.

04:00: 9.71 average comments per posts.

14:00: 9.69 average comments per posts.

17:00: 9.45 average comments per posts.

08:00: 9.19 average comments per posts.

11:00: 8.96 average comments per posts.

22:00: 8.8 average comments per posts.

05:00: 8.79 average comments per posts.

20:00: 8.75 average comments per posts.

21:00: 8.69 average comments per posts.

03:00: 7.95 average comments per posts.

18:00: 7.94 average comments per posts.

16:00: 7.71 average comments per posts.

00:00: 7.56 average comments per posts.

01:00: 7.41 average comments per posts.

19:00: 7.16 average comments per posts.

07:00: 7.01 average comments per posts.

06:00: 6.78 average comments per posts.

23:00: 6.7 average comments per posts.

09:00: 6.65 average comments per posts.



Show

Aparently, only for Ask HN there's a great difference to post in one specific hour of day. If you have question to the Hacker News community, the best moment by far to submit it is between 15 and 16 o'clock as posts created in this period of time receive almost 30 comments in average. Subimiting a question between 12 and 14 o'clock is also good as these time periods are the second and third more commentend upon. 
The following two time periods still represent a considerable differenece in the number of comments: posts between 2 and 3 o'clock and betwee 10 and 11 o'clock receive an average of about 11 eleven comments.

If the goal of a submission is to show the community something usign Show HN, the period of time in which the post is created does not make such a great differenece. The best moment would be between 12 and 13 o'clock, but posts created at this particulaar moment receive, in average, only 3.5 comments more than posts created at the time period with the worst average of comments per post.

If your post is neither a Ask HN nor a Show HN post, the hour of day in which tou'll creat the post makes even less difference as the range of average goes only from 7.59 comments per post at the best moment for creating a post and 5.84 comments per post at worst moment to do it.

## Analyzing points per post

We'll determine if creating a post in a particular moment of the day can affect the number of points the post receives. As explained in the introduction, the number of points is the number of upvotes the post received minus the downvotes that the post received.

For this task, we'll repeat the same process that was done to find the number of comments per period of time, but now, instead of retrieving the number of comments from each post in the database, we'll retrieve the number of points. The rest of the process is exactly the same.



In [15]:
ask_points = []
for post in ask:
    temp_list = [post[-1], int(post[3])]
    ask_points.append(temp_list)

print(ask_points)

[['9/26/2016 2:53', 4], ['9/26/2016 1:17', 6], ['9/25/2016 22:57', 1], ['9/25/2016 22:48', 1], ['9/25/2016 21:50', 5], ['9/25/2016 19:30', 2], ['9/25/2016 19:22', 22], ['9/25/2016 17:55', 2], ['9/25/2016 15:48', 1], ['9/25/2016 15:35', 12], ['9/25/2016 15:28', 1], ['9/25/2016 14:43', 2], ['9/25/2016 14:17', 3], ['9/25/2016 13:08', 1], ['9/25/2016 11:27', 1], ['9/25/2016 10:51', 1], ['9/25/2016 10:47', 9], ['9/25/2016 9:04', 276], ['9/25/2016 7:09', 5], ['9/25/2016 3:00', 1], ['9/24/2016 23:04', 2], ['9/24/2016 22:02', 1], ['9/24/2016 21:18', 5], ['9/24/2016 20:58', 2], ['9/24/2016 19:57', 5], ['9/24/2016 19:02', 1], ['9/24/2016 17:55', 1], ['9/24/2016 17:27', 2], ['9/24/2016 16:50', 1], ['9/24/2016 16:03', 2], ['9/24/2016 15:29', 247], ['9/24/2016 14:03', 3], ['9/24/2016 10:10', 4], ['9/24/2016 8:46', 2], ['9/24/2016 8:39', 2], ['9/24/2016 8:38', 2], ['9/24/2016 8:28', 2], ['9/24/2016 3:36', 9], ['9/24/2016 0:21', 2], ['9/23/2016 23:38', 9], ['9/23/2016 23:35', 7], ['9/23/2016 22:13', 

Each small list in the huge list above contains the moment of creation of the post and its number of points.

We'll now use dictionaries to store the number of posts and number of points for each hour of the day.


In [16]:
ask_posts_hour = {}
ask_points_hour = {}

for list in ask_points:
    points = list[1]
    hour = list[0]
    hour = dt.datetime.strptime(hour, '%m/%d/%Y %H:%M')
    hour = dt.datetime.strftime(hour, '%H')
    if hour in ask_posts_hour:
        ask_posts_hour[hour] += 1
        ask_points_hour[hour] += points
    else:
        ask_posts_hour[hour] = 1
        ask_points_hour[hour] = points
        
print(ask_posts_hour)
print(ask_points_hour)

{'02': 269, '01': 282, '22': 383, '21': 518, '19': 552, '17': 587, '15': 646, '14': 513, '13': 444, '11': 312, '10': 282, '09': 222, '07': 226, '03': 271, '23': 343, '20': 510, '16': 579, '08': 257, '00': 301, '18': 614, '12': 342, '04': 243, '06': 234, '05': 209}
{'02': 2944, '01': 2662, '22': 3601, '21': 5042, '19': 4782, '17': 7155, '15': 13978, '14': 5390, '13': 7962, '11': 2856, '10': 3789, '09': 1763, '07': 2040, '03': 2539, '23': 2616, '20': 4491, '16': 5970, '08': 2744, '00': 2835, '18': 6850, '12': 4643, '04': 2650, '06': 2030, '05': 2046}


Let's use the `display_table()` function to print the dictionary in descending order.

In [17]:
print('Hour : Number of posts')
display_table(ask_posts_hour)
print('\n')
print('Hour : Number of comments')
display_table(ask_points_hour)

Hour : Number of posts
15 : 646
18 : 614
17 : 587
16 : 579
19 : 552
21 : 518
14 : 513
20 : 510
13 : 444
22 : 383
23 : 343
12 : 342
11 : 312
00 : 301
10 : 282
01 : 282
03 : 271
02 : 269
08 : 257
04 : 243
06 : 234
07 : 226
09 : 222
05 : 209


Hour : Number of comments
15 : 13978
13 : 7962
17 : 7155
18 : 6850
16 : 5970
14 : 5390
21 : 5042
19 : 4782
12 : 4643
20 : 4491
10 : 3789
22 : 3601
02 : 2944
11 : 2856
00 : 2835
08 : 2744
01 : 2662
04 : 2650
23 : 2616
03 : 2539
05 : 2046
07 : 2040
06 : 2030
09 : 1763


We'll now calculate the average of points per period of time.

In [18]:
ask_avg_points_hour = {}
for key in ask_points_hour:
    ask_avg_points_hour[key] = round(ask_points_hour[key] / ask_posts_hour[key], 2)
display_table(ask_avg_points_hour)

15 : 21.64
13 : 17.93
12 : 13.58
10 : 13.44
17 : 12.19
18 : 11.16
02 : 10.94
04 : 10.91
08 : 10.68
14 : 10.51
16 : 10.31
05 : 9.79
21 : 9.73
01 : 9.44
00 : 9.42
22 : 9.4
03 : 9.37
11 : 9.15
07 : 9.03
20 : 8.81
06 : 8.68
19 : 8.66
09 : 7.94
23 : 7.63


Let's use a second adaptation of the `display_table()` to print the results in a more readable way.

In [19]:
def adapted2_display_table(table):   
    table_display = []
    for key in table:
        key_tuple = (table[key], key)
        table_display.append(key_tuple)
    table_sorted = sorted(table_display, reverse=True)
    for tuple in table_sorted:
        print('{}:00: {} average points per posts.\n'.format(tuple[1], tuple[0]))

adapted2_display_table(ask_avg_points_hour)

15:00: 21.64 average points per posts.

13:00: 17.93 average points per posts.

12:00: 13.58 average points per posts.

10:00: 13.44 average points per posts.

17:00: 12.19 average points per posts.

18:00: 11.16 average points per posts.

02:00: 10.94 average points per posts.

04:00: 10.91 average points per posts.

08:00: 10.68 average points per posts.

14:00: 10.51 average points per posts.

16:00: 10.31 average points per posts.

05:00: 9.79 average points per posts.

21:00: 9.73 average points per posts.

01:00: 9.44 average points per posts.

00:00: 9.42 average points per posts.

22:00: 9.4 average points per posts.

03:00: 9.37 average points per posts.

11:00: 9.15 average points per posts.

07:00: 9.03 average points per posts.

20:00: 8.81 average points per posts.

06:00: 8.68 average points per posts.

19:00: 8.66 average points per posts.

09:00: 7.94 average points per posts.

23:00: 7.63 average points per posts.



As we can see, the top three is the same for the average of comments and the average of points per post. We now know that Ask HN posts created in between 12 and 13 o'clock, 13 and 14 o'clock and 15 and 16 o'clock are more likely to be interacted with.

Let's repeat this process for Show HN posts and the other posts. We'll begin with the Show HN posts.

In [20]:
show_points = []
for post in show:
    temp_list = [post[-1], int(post[3])]
    show_points.append(temp_list)
    
show_posts_hour = {}
show_points_hour = {}

for list in show_points:
    points = list[1]
    hour = list[0]
    hour = dt.datetime.strptime(hour, '%m/%d/%Y %H:%M')
    hour = dt.datetime.strftime(hour, '%H')
    if hour in show_posts_hour:
        show_posts_hour[hour] += 1
        show_points_hour[hour] += points
    else:
        show_posts_hour[hour] = 1
        show_points_hour[hour] = points

show_avg_points_hour = {}
for key in show_points_hour:
    show_avg_points_hour[key] = round(show_points_hour[key] / show_posts_hour[key], 2)


And now the other posts.

In [21]:
other_points = []
for post in other:
    temp_list = [post[-1], int(post[3])]
    other_points.append(temp_list)
    
other_posts_hour = {}
other_points_hour = {}

for list in other_points:
    points = list[1]
    hour = list[0]
    hour = dt.datetime.strptime(hour, '%m/%d/%Y %H:%M')
    hour = dt.datetime.strftime(hour, '%H')
    if hour in other_posts_hour:
        other_posts_hour[hour] += 1
        other_points_hour[hour] += points
    else:
        other_posts_hour[hour] = 1
        other_points_hour[hour] = points

other_avg_points_hour = {}
for key in other_points_hour:
    other_avg_points_hour[key] = round(other_points_hour[key] / other_posts_hour[key], 2)
    
print(other_posts_hour)
print(other_points_hour)

{'03': 6649, '02': 6977, '01': 7391, '00': 8391, '23': 9720, '22': 11657, '21': 13568, '20': 14920, '19': 15929, '18': 17406, '17': 18363, '16': 18790, '15': 18043, '14': 16929, '13': 14874, '12': 11876, '11': 9638, '10': 9130, '09': 8528, '08': 7930, '07': 7338, '06': 6954, '05': 6155, '04': 6666}
{'03': 102256, '02': 116600, '01': 117605, '00': 135285, '23': 142910, '22': 166800, '21': 200616, '20': 205674, '19': 248023, '18': 268580, '17': 277696, '16': 275203, '15': 262514, '14': 238981, '13': 238248, '12': 198322, '11': 157031, '10': 138270, '09': 125720, '08': 119660, '07': 109629, '06': 103757, '05': 96617, '04': 104052}


Let's print the results.

In [22]:
print('Ask HN')
display_table(ask_avg_points_hour)
print('\n')
print('Show HN')
display_table(show_avg_points_hour)
print('\n')
print('Other')
display_table(other_avg_points_hour)

Ask HN
15 : 21.64
13 : 17.93
12 : 13.58
10 : 13.44
17 : 12.19
18 : 11.16
02 : 10.94
04 : 10.91
08 : 10.68
14 : 10.51
16 : 10.31
05 : 9.79
21 : 9.73
01 : 9.44
00 : 9.42
22 : 9.4
03 : 9.37
11 : 9.15
07 : 9.03
20 : 8.81
06 : 8.68
19 : 8.66
09 : 7.94
23 : 7.63


Show HN
12 : 20.91
11 : 19.26
13 : 17.02
19 : 16.06
06 : 15.99
23 : 15.86
00 : 15.55
18 : 15.14
14 : 15.09
08 : 14.68
16 : 14.34
07 : 14.0
04 : 13.95
15 : 13.94
21 : 13.93
17 : 13.88
22 : 13.33
10 : 13.32
20 : 13.23
02 : 13.22
09 : 12.46
01 : 11.87
05 : 10.66
03 : 10.52


Other
02 : 16.71
12 : 16.7
11 : 16.29
00 : 16.12
13 : 16.02
01 : 15.91
05 : 15.7
04 : 15.61
19 : 15.57
18 : 15.43
03 : 15.38
10 : 15.14
17 : 15.12
08 : 15.09
07 : 14.94
06 : 14.92
21 : 14.79
09 : 14.74
23 : 14.7
16 : 14.65
15 : 14.55
22 : 14.31
14 : 14.12
20 : 13.79


In [23]:
print('Ask HN')
adapted2_display_table(ask_avg_points_hour)
print('\n')
print('Show HN')
adapted2_display_table(show_avg_points_hour)
print('\n')
print('Other')
adapted2_display_table(other_avg_points_hour)

Ask HN
15:00: 21.64 average points per posts.

13:00: 17.93 average points per posts.

12:00: 13.58 average points per posts.

10:00: 13.44 average points per posts.

17:00: 12.19 average points per posts.

18:00: 11.16 average points per posts.

02:00: 10.94 average points per posts.

04:00: 10.91 average points per posts.

08:00: 10.68 average points per posts.

14:00: 10.51 average points per posts.

16:00: 10.31 average points per posts.

05:00: 9.79 average points per posts.

21:00: 9.73 average points per posts.

01:00: 9.44 average points per posts.

00:00: 9.42 average points per posts.

22:00: 9.4 average points per posts.

03:00: 9.37 average points per posts.

11:00: 9.15 average points per posts.

07:00: 9.03 average points per posts.

20:00: 8.81 average points per posts.

06:00: 8.68 average points per posts.

19:00: 8.66 average points per posts.

09:00: 7.94 average points per posts.

23:00: 7.63 average points per posts.



Show HN
12:00: 20.91 average points per posts

The difference between the number of average points for Show HN posts created in the hour of the day with best point average and the one with the worst point average is not as big as this difference for Ask HN posts. However, we still can say that the best moment to create a post to show something is between 12 and 13 o'clock as Show HN posts created at this period of time receive, in average, more comments and more points.

For other posts it still does not make much difference since the point difference between the best and the worst period of time less than three.


# Conclusion

In this project we went through a data set containing data from almost 300,000 Hacker News submissions. Our goal was to determine if there's a particular moment of the day in which creating posts would draw more attention to the post.

We are now able to conclude that for Ask HN posts there are definitely better moments of the day to submit a post. As for the Show HN posts, we still can consider that there is at least one best moment of the day to submit something, but the difference is not as big as in the Ask HN posts. For the other posts the difference between posting in a determined moment or in any moment of the day is so small that we should not take it into consideration.

Finally, this data can be useful if you're looking to show a product or a project and wants it to draw some attetion or even if you need to ask a question and it is important that your question receive a lot of answers. For those scenarios you should definitely consider that there are better moments of the day to create your post.
