# Best time to create a post for reaching the biggest audience

### In the 21st century, thanks to the internet, influencers, and content makers got an opportunity to reach their followers all around the world in a matter of seconds. However, due to nowadays high digital competition,  the news feeds became too overloaded and even high-quality content risks to be unseen. 

### Our friends from [news.ycombinator.com](https://news.ycombinator.com) faced the same problem. Usually, users who create posts on their forum, discuss important business-related topics and of course, require to get the response as soon as possible. 

#### In order to help users to get the highest attention, we conducted research, presuming there is a correlation between: 

1. the time that post had been published
2. amount of comments it managed to receive. 

Data set was taken from [here](https://www.kaggle.com/hacker-news/hacker-news-posts/version/1)

#### Let's see what we got:

# Opening and formating Data Set

In [3]:
#Opening the data set

opened_file = open('HN_posts_year_to_Sep_26_2016.csv')
from csv import reader
read_file = reader(opened_file)
hn = list(read_file)

In [6]:
# Checking the header

header = hn[0]
print(hn[0])

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']


In [8]:
#Checking first 5 rows of the data set that we'll work with

for row in hn[1:5]:
    print(row)

['12579008', 'You have two days to comment if you want stem cells to be classified as your own', 'http://www.regulations.gov/document?D=FDA-2015-D-3719-0018', '1', '0', 'altstar', '9/26/2016 3:26']
['12579005', 'SQLAR  the SQLite Archiver', 'https://www.sqlite.org/sqlar/doc/trunk/README.md', '1', '0', 'blacksqr', '9/26/2016 3:24']
['12578997', 'What if we just printed a flatscreen television on the side of our boxes?', 'https://medium.com/vanmoof/our-secrets-out-f21c1f03fdc8#.ietxmez43', '1', '0', 'pavel_lishin', '9/26/2016 3:19']
['12578989', 'algorithmic music', 'http://cacm.acm.org/magazines/2011/7/109891-algorithmic-composition/fulltext', '1', '0', 'poindontcare', '9/26/2016 3:16']


In [10]:
# Removing header row for more comfortable navigation

hn = hn[1:]

In [12]:
# Checking wether removing passed as expected

print(hn[0])

['12579005', 'SQLAR  the SQLite Archiver', 'https://www.sqlite.org/sqlar/doc/trunk/README.md', '1', '0', 'blacksqr', '9/26/2016 3:24']


# There are 2 main types of posts: "Ask" and "Show"
### Let's investigate which one them is created more frequently and then which type of posts gets more comments

In [14]:
# Checking the total amount of "Ask" and "Show" posts

ask_posts = []
show_posts = []
other_posts = []

for row in hn:
    title = row[1]
    title = title.lower()
    if title.startswith('ask hn') == True:
        ask_posts.append(row)
    elif title.startswith('show hn') == True:
        show_posts.append(row)
    else:
        other_posts.append(row)
        
print(len(ask_posts))
print(len(show_posts))
print(len(other_posts))

9139
10158
273821


In [16]:
print(ask_posts[0:5])

[['12578908', 'Ask HN: What TLD do you use for local development?', '', '4', '7', 'Sevrene', '9/26/2016 2:53'], ['12578522', 'Ask HN: How do you pass on your work when you die?', '', '6', '3', 'PascLeRasc', '9/26/2016 1:17'], ['12577908', 'Ask HN: How a DNS problem can be limited to a geographic region?', '', '1', '0', 'kuon', '9/25/2016 22:57'], ['12577870', 'Ask HN: Why join a fund when you can be an angel?', '', '1', '3', 'anthony_james', '9/25/2016 22:48'], ['12577647', 'Ask HN: Someone uses stock trading as passive income?', '', '5', '2', '00taffe', '9/25/2016 21:50']]


In [18]:
print(show_posts[0:5])

[['12578335', 'Show HN: Finding puns computationally', 'http://puns.samueltaylor.org/', '2', '0', 'saamm', '9/26/2016 0:36'], ['12578182', 'Show HN: A simple library for complicated animations', 'https://christinecha.github.io/choreographer-js/', '1', '0', 'christinecha', '9/26/2016 0:01'], ['12578098', 'Show HN: WebGL visualization of DNA sequences', 'http://grondilu.github.io/dna.html', '1', '0', 'grondilu', '9/25/2016 23:44'], ['12577991', 'Show HN: Pomodoro-centric, heirarchical project management with ES6 modules', 'https://github.com/jakebian/zeal', '2', '0', 'dbranes', '9/25/2016 23:17'], ['12577142', 'Show HN: Jumble  Essays on the go #PaulInYourPocket', 'https://itunes.apple.com/us/app/jumble-find-startup-essay/id1150939197?ls=1&mt=8', '1', '1', 'ryderj', '9/25/2016 20:06']]


In [20]:
#calculating total number of comments for all "ask" posts
total_ask_comments = 0

for row in ask_posts:
    num_comments = row[4]
    num_comments = int(num_comments)
    total_ask_comments += num_comments
print(total_ask_comments)

94986


In [21]:
#calculating average number of comments per "ask" post

avg_ask_comments = total_ask_comments / len(ask_posts)
print(avg_ask_comments)

10.393478498741656


In [22]:
#calculating total number of comments for all "show" posts
total_show_comments = 0

for row in show_posts:
    num_comments = row[4]
    num_comments = int(num_comments)
    total_show_comments += num_comments
print(total_show_comments)

49633


In [23]:
#calculating average number of comments per "show" post

avg_show_comments = total_show_comments / len(show_posts)
print(avg_show_comments)


4.886099625910612


# Findings:
### "Ask" posts receive approximately x2 more comments than "Show" posts. 
### From this point we'll be working only with "Ask" posts as they catch more users' attention

In [24]:
#creating a list containing the time when post was created and number of comments it received

result_list = []

for row in ask_posts:
    created_at = row[-1]
    num_comments = row[4]
    num_comments = int(num_comments)
    result_list.append([created_at, num_comments])
    
print(len(result_list))

print(result_list[0:3])

9139
[['9/26/2016 2:53', 7], ['9/26/2016 1:17', 3], ['9/25/2016 22:57', 0]]


In [26]:
# Creating 2 dictionaries which will store values of:
# 1. ask posts created during each hour of the day
# 2. corresponding number of comments ask posts created at each hour received

counts_by_hour = {}
comments_by_hour = {}

import datetime
dt = datetime
date_format = "%m/%d/%Y %H:%M"

for row in result_list:
    time = row[0]
    comments = row[1]
    date_object = dt.datetime.strptime(time, date_format)
    hours_date_object = date_object.strftime('%H')
    
    if hours_date_object not in counts_by_hour:
        counts_by_hour[hours_date_object] = 1
        comments_by_hour[hours_date_object] = comments
    else:
        counts_by_hour[hours_date_object] += 1
        comments_by_hour[hours_date_object] += comments
        
print(counts_by_hour)
print(comments_by_hour)

{'02': 269, '01': 282, '22': 383, '21': 518, '19': 552, '17': 587, '15': 646, '14': 513, '13': 444, '11': 312, '10': 282, '09': 222, '07': 226, '03': 271, '23': 343, '20': 510, '16': 579, '08': 257, '00': 301, '18': 614, '12': 342, '04': 243, '06': 234, '05': 209}
{'02': 2996, '01': 2089, '22': 3372, '21': 4500, '19': 3954, '17': 5547, '15': 18525, '14': 4972, '13': 7245, '11': 2797, '10': 3013, '09': 1477, '07': 1585, '03': 2154, '23': 2297, '20': 4462, '16': 4466, '08': 2362, '00': 2277, '18': 4877, '12': 4234, '04': 2360, '06': 1587, '05': 1838}


In [28]:
# Calculating the average number of comments per post, created at each hour, received

avg_by_hour = []

for row in comments_by_hour:
    if row in counts_by_hour:
        avg = comments_by_hour[row] / counts_by_hour[row]
        avg_by_hour.append([row, avg])
        
print(avg_by_hour)

[['02', 11.137546468401487], ['01', 7.407801418439717], ['22', 8.804177545691905], ['21', 8.687258687258687], ['19', 7.163043478260869], ['17', 9.449744463373083], ['15', 28.676470588235293], ['14', 9.692007797270955], ['13', 16.31756756756757], ['11', 8.96474358974359], ['10', 10.684397163120567], ['09', 6.653153153153153], ['07', 7.013274336283186], ['03', 7.948339483394834], ['23', 6.696793002915452], ['20', 8.749019607843136], ['16', 7.713298791018998], ['08', 9.190661478599221], ['00', 7.5647840531561465], ['18', 7.94299674267101], ['12', 12.380116959064328], ['04', 9.7119341563786], ['06', 6.782051282051282], ['05', 8.794258373205741]]


In [30]:
# Swapping the position of comments and hours for easier navigation

swap_avg_by_hour = []

for row in avg_by_hour:
    first = row[1]
    second = row[0]
    swap_avg_by_hour.append([first,second])
    
print(swap_avg_by_hour)

[[11.137546468401487, '02'], [7.407801418439717, '01'], [8.804177545691905, '22'], [8.687258687258687, '21'], [7.163043478260869, '19'], [9.449744463373083, '17'], [28.676470588235293, '15'], [9.692007797270955, '14'], [16.31756756756757, '13'], [8.96474358974359, '11'], [10.684397163120567, '10'], [6.653153153153153, '09'], [7.013274336283186, '07'], [7.948339483394834, '03'], [6.696793002915452, '23'], [8.749019607843136, '20'], [7.713298791018998, '16'], [9.190661478599221, '08'], [7.5647840531561465, '00'], [7.94299674267101, '18'], [12.380116959064328, '12'], [9.7119341563786, '04'], [6.782051282051282, '06'], [8.794258373205741, '05']]


In [33]:
# Sorting the results in descending order

sorted_swap = sorted(swap_avg_by_hour, reverse = True)
print(sorted_swap[:5])

[[28.676470588235293, '15'], [16.31756756756757, '13'], [12.380116959064328, '12'], [11.137546468401487, '02'], [10.684397163120567, '10']]


In [34]:
# Displaying 5 top results of our investigation

date_format = "%H"

for row in sorted_swap[:5]:
    comments = row[0]
    time = row[1]
    date_object = dt.datetime.strptime(time, date_format)
    hours_date_object = date_object.strftime('%H:%M')
    print("{}: {:.2f} average comments per post".format(hours_date_object, comments))

15:00: 28.68 average comments per post
13:00: 16.32 average comments per post
12:00: 12.38 average comments per post
02:00: 11.14 average comments per post
10:00: 10.68 average comments per post


# Result:

## As we can see, the posts which were created approximately at 15:00 EST received the most comments on average:


- 15:00: 28.68 average comments per post
- 13:00: 16.32 average comments per post
- 12:00: 12.38 average comments per post
- 02:00: 11.14 average comments per post
- 10:00: 10.68 average comments per post



# Conslusion:

#### Our investigation shows that in order to reach the biggest audience you have to submit your post approximately at 15:00 EST.

#### The main aim of this research was to show the best procedure for finding the most efficient time for submiting your post. Keep in mind that the research was conducted based on 2016th data for [news.ycombinator.com](https://news.ycombinator.com), and in order to obtain the up-to-date results always use the most recent data set available. Also note that the result we obtained won't necessarily fit other web sites like Facebook or Twitter, each platform requires unique research based on the most recent data.
