# Exploring Hackers News Posts

In this project I am comparing two different types of posts from [Hacker News](https://www.kaggle.com/hacker-news/hacker-news-posts), a popular site where technology related posts are voted and commented upon. The two types of posts I'll begin exploring are either Ask HN or Show HN posts.

Users submit Ask HN posts to ask the Hacker News community a specific question. Likewise, users submit Show HN posts to show the Hacker News community a project, product, or just generally something interesting.

The third type of posts on Hacker News are simply called "other" posts for the sake of this data analysis, and will also be used to compared to the Ask HN and Show HN posts. 

I am specifically comparing these types of posts to determine the following:

* Do Ask HN or Show HN receive more comments on average?
* Do posts created at a certain time receive more comments on average?
* Do Ask HN or Show HN receive more points on average?
* Do posts created at a certain time receive more points on average?
* How do other posts comments on average compare?
* How do other posts points on average compare?

It should be noted that the data set I am working with was reduced from almost 300,000 rows to approximately 20,000 rows by removing all submissions that did not receive any comments, and then randomly sampling from the remaining submissions.

---

## Reading the file

---

First I'll read in the data and remove the headers. 

In [212]:
from csv import reader

opened_file = open('hacker_news.csv')
read_file = reader(opened_file)
hn = list(read_file) 
hn_headers = hn[0]
hn = hn[1:]

print(hn_headers)
print(hn[:5])

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']
[['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'], ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01'], ['10301696', 'Note by Note: The Making of Steinway L1037 (2007)', 'http://www.nytimes.com/2007/11/07/movies/07stein.html?_r=0', '8', '2', 'walterbell', '9/30/2015 4:12']]


We can see that the data set contains the title of the posts, the number of comments for each post, the number of points for each post, and the date the post was created.

---

## Extracting Ask HN and Show HN posts

---

First, I'll identify posts that begin with either Ask HN or Show HN and separate the data for those two types of posts into different lists from the other types of posts. Separating the data makes it easier to analyze in the following steps.

In [213]:
ask_posts = []
show_posts = []
other_posts = []

for row in hn :
    title = row[1]
    if title.lower().startswith('ask hn') :
        ask_posts.append(row)
    elif title.lower().startswith('show hn') :
        show_posts.append(row)
    else :
        other_posts.append(row)

print(len(ask_posts))
print(len(show_posts))
print(len(other_posts))

1744
1162
17194


In [214]:
print(ask_posts[:5])

[['12296411', 'Ask HN: How to improve my personal website?', '', '2', '6', 'ahmedbaracat', '8/16/2016 9:55'], ['10610020', 'Ask HN: Am I the only one outraged by Twitter shutting down share counts?', '', '28', '29', 'tkfx', '11/22/2015 13:43'], ['11610310', 'Ask HN: Aby recent changes to CSS that broke mobile?', '', '1', '1', 'polskibus', '5/2/2016 10:14'], ['12210105', 'Ask HN: Looking for Employee #3 How do I do it?', '', '1', '3', 'sph130', '8/2/2016 14:20'], ['10394168', 'Ask HN: Someone offered to buy my browser extension from me. What now?', '', '28', '17', 'roykolak', '10/15/2015 16:38']]


In [215]:
print(show_posts[:5])

[['10627194', 'Show HN: Wio Link  ESP8266 Based Web of Things Hardware Development Platform', 'https://iot.seeed.cc', '26', '22', 'kfihihc', '11/25/2015 14:03'], ['10646440', 'Show HN: Something pointless I made', 'http://dn.ht/picklecat/', '747', '102', 'dhotson', '11/29/2015 22:46'], ['11590768', 'Show HN: Shanhu.io, a programming playground powered by e8vm', 'https://shanhu.io', '1', '1', 'h8liu', '4/28/2016 18:05'], ['12178806', 'Show HN: Webscope  Easy way for web developers to communicate with Clients', 'http://webscopeapp.com', '3', '3', 'fastbrick', '7/28/2016 7:11'], ['10872799', 'Show HN: GeoScreenshot  Easily test Geo-IP based web pages', 'https://www.geoscreenshot.com/', '1', '9', 'kpsychwave', '1/9/2016 20:45']]


Let's start by exploring the number of comments for each type of post.

---

# Comments

---

---

## Calculating the Average Number of Comments for Ask HN and Show HN Posts

---

Now that the ask posts and show posts are separated into different lists, I'll calculate the average number of comments each type of post receives.

In [216]:
total_ask_comments = 0

for post in ask_posts :
    num_comments = post[4]
    num_comments = int(num_comments)
    total_ask_comments += num_comments
    
avg_ask_comments = total_ask_comments / len(ask_posts)

print(avg_ask_comments)

14.038417431192661


In [217]:
total_show_comments = 0

for post in show_posts :
    num_comments = post[4]
    num_comments = int(num_comments)
    total_show_comments += num_comments
    
avg_show_comments = total_show_comments / len(show_posts)

print(avg_show_comments)

10.31669535283993


On average, ask posts in the sample receive approximately 14 comments, whereas show posts receive approximately 10. Since ask posts are more likely to receive comments, I'll focus our remaining analysis just on these posts.

---

## Finding the Amount of Ask Posts and Comments by Hour Created

---

Next, I'll determine if we can maximize the amount of comments an ask post receives by creating it at a certain time. To do this I will: 
* Calculate the amount of ask posts created in each hour of the day, along with the number of comments received.
* Calculate the average number of comments ask posts receive by hour created.

In [218]:
# calculating the amount of ask posts and comments by hour created

import datetime as dt

result_list = []

for post in ask_posts :
    created_at = post[6]
    num_comments = post[4]
    num_comments = int(num_comments)
    result_list.append([created_at, num_comments])

counts_by_hour = {}
comments_by_hour = {}
date_format = "%m/%d/%Y %H:%M"

for row in result_list :
    date = row[0]
    comments = row[1]
    hour = dt.datetime.strptime(date, date_format).strftime("%H")
    
    if hour not in counts_by_hour :
        counts_by_hour[hour] = 1
        comments_by_hour[hour] = comments
    else :
        counts_by_hour[hour] += 1
        comments_by_hour[hour] += comments

print(counts_by_hour)
print(comments_by_hour)

{'12': 73, '07': 34, '22': 71, '16': 108, '05': 46, '10': 59, '06': 44, '14': 107, '19': 110, '17': 100, '18': 109, '21': 109, '11': 58, '00': 55, '13': 85, '23': 68, '08': 48, '20': 80, '15': 116, '03': 54, '04': 47, '09': 45, '01': 60, '02': 58}
{'12': 687, '07': 267, '22': 479, '16': 1814, '05': 464, '10': 793, '06': 397, '14': 1416, '19': 1188, '17': 1146, '18': 1439, '21': 1745, '11': 641, '00': 447, '13': 1253, '23': 543, '08': 492, '20': 1722, '15': 4477, '03': 421, '04': 337, '09': 251, '01': 683, '02': 1381}


---

## Calculating the Average Number of Comments for Ask HN Posts by Hour

---

Next, I'll calculate the average number of comments for posts created during each hour of the day. 

In [219]:
# Calculate the average amount of comments ask posts created at each hour of the day receive.

avg_by_hour = []

for hour in comments_by_hour :
    avg_by_hour.append([hour, (comments_by_hour[hour]/counts_by_hour[hour])])
    
avg_by_hour

[['12', 9.41095890410959],
 ['07', 7.852941176470588],
 ['22', 6.746478873239437],
 ['16', 16.796296296296298],
 ['05', 10.08695652173913],
 ['10', 13.440677966101696],
 ['06', 9.022727272727273],
 ['14', 13.233644859813085],
 ['19', 10.8],
 ['17', 11.46],
 ['18', 13.20183486238532],
 ['21', 16.009174311926607],
 ['11', 11.051724137931034],
 ['00', 8.127272727272727],
 ['13', 14.741176470588234],
 ['23', 7.985294117647059],
 ['08', 10.25],
 ['20', 21.525],
 ['15', 38.5948275862069],
 ['03', 7.796296296296297],
 ['04', 7.170212765957447],
 ['09', 5.5777777777777775],
 ['01', 11.383333333333333],
 ['02', 23.810344827586206]]

---

## Sorting and Printing Values from a List of Lists for Comments on Ask Posts

---

In [220]:
swap_avg_by_hour = []

for row in avg_by_hour :
    swap_avg_by_hour.append(
    [row[1], row[0]])
swap_avg_by_hour

[[9.41095890410959, '12'],
 [7.852941176470588, '07'],
 [6.746478873239437, '22'],
 [16.796296296296298, '16'],
 [10.08695652173913, '05'],
 [13.440677966101696, '10'],
 [9.022727272727273, '06'],
 [13.233644859813085, '14'],
 [10.8, '19'],
 [11.46, '17'],
 [13.20183486238532, '18'],
 [16.009174311926607, '21'],
 [11.051724137931034, '11'],
 [8.127272727272727, '00'],
 [14.741176470588234, '13'],
 [7.985294117647059, '23'],
 [10.25, '08'],
 [21.525, '20'],
 [38.5948275862069, '15'],
 [7.796296296296297, '03'],
 [7.170212765957447, '04'],
 [5.5777777777777775, '09'],
 [11.383333333333333, '01'],
 [23.810344827586206, '02']]

In [221]:
sorted_swap = sorted(swap_avg_by_hour, reverse=True)
sorted_swap

[[38.5948275862069, '15'],
 [23.810344827586206, '02'],
 [21.525, '20'],
 [16.796296296296298, '16'],
 [16.009174311926607, '21'],
 [14.741176470588234, '13'],
 [13.440677966101696, '10'],
 [13.233644859813085, '14'],
 [13.20183486238532, '18'],
 [11.46, '17'],
 [11.383333333333333, '01'],
 [11.051724137931034, '11'],
 [10.8, '19'],
 [10.25, '08'],
 [10.08695652173913, '05'],
 [9.41095890410959, '12'],
 [9.022727272727273, '06'],
 [8.127272727272727, '00'],
 [7.985294117647059, '23'],
 [7.852941176470588, '07'],
 [7.796296296296297, '03'],
 [7.170212765957447, '04'],
 [6.746478873239437, '22'],
 [5.5777777777777775, '09']]

In [222]:
print("Top 5 Hours for 'Ask HN' Posts Comments")
for avg, hour in sorted_swap[:5] :
    print(
    "{}: {:.2f} average comments per post".format(
        dt.datetime.strptime(hour,"%H").strftime("%H:%M"),avg
        )
    )

Top 5 Hours for 'Ask HN' Posts Comments
15:00: 38.59 average comments per post
02:00: 23.81 average comments per post
20:00: 21.52 average comments per post
16:00: 16.80 average comments per post
21:00: 16.01 average comments per post


According to the data set documentation, the timezone used is Eastern Time in the US. The hour that receives the most comments per post on average is 15:00, or 3:00 pm EST, with an average of 38.59 comments per post. 

Now let's start exploring the number of points for each type of post.

---

# Points

---

---

## Calculating the Average Number of Points for Ask HN and Show HN Posts

---

Below I'll calculate the average number of points each type of post receives.

In [223]:
total_ask_points = 0

for post in ask_posts :
    num_points = post[3]
    num_points = int(num_points)
    total_ask_points += num_points
    
avg_ask_points = total_ask_points / len(ask_posts)

print(avg_ask_points)

15.061926605504587


In [224]:
total_show_points = 0

for post in show_posts :
    num_points = post[3]
    num_points = int(num_points)
    total_show_points += num_points
    
avg_show_points = total_show_points / len(show_posts)

print(avg_show_points)

27.555077452667813


On average, show posts in the sample receive approximately 27 points, whereas ask posts receive approximately 15. Since show posts are more likely to receive points, I'll focus our remaining analysis just on these posts.

---

## Finding the Amount of Show Posts and Points by Hour Created

---

Next, I'll determine if we can maximize the amount of points a show post receives by creating it at a certain time. To do this I will: 
* Calculate the amount of show posts created in each hour of the day, along with the number of points received.
* Calculate the average number of points show posts receive by hour created.

In [225]:
# calculating the amount of show posts and points by hour created

import datetime as dt

result_list = []

for post in show_posts :
    created_at = post[6]
    num_points = post[3]
    num_points = int(num_points)
    result_list.append([created_at, num_points])

counts_by_hour = {}
points_by_hour = {}
date_format = "%m/%d/%Y %H:%M"

for row in result_list :
    date = row[0]
    points = row[1]
    hour = dt.datetime.strptime(date, date_format).strftime("%H")
    
    if hour not in counts_by_hour :
        counts_by_hour[hour] = 1
        points_by_hour[hour] = points
    else :
        counts_by_hour[hour] += 1
        points_by_hour[hour] += points

print(counts_by_hour)
print(points_by_hour)

{'12': 61, '17': 93, '22': 46, '16': 93, '05': 19, '10': 36, '11': 44, '14': 86, '18': 61, '07': 26, '00': 31, '19': 55, '21': 47, '06': 16, '13': 99, '23': 36, '08': 34, '20': 60, '15': 78, '03': 27, '04': 26, '02': 30, '01': 28, '09': 30}
{'12': 2543, '17': 2521, '22': 1856, '16': 2634, '05': 104, '10': 681, '11': 1480, '14': 2187, '18': 2215, '07': 494, '00': 1173, '19': 1702, '21': 866, '06': 375, '13': 2438, '23': 1526, '08': 519, '20': 1819, '15': 2228, '03': 679, '04': 386, '02': 340, '01': 700, '09': 553}


---

## Calculating the Average Number of Points for Show HN Posts by Hour

---

Next, I'll calculate the average number of points for posts created during each hour of the day. 

In [226]:
# Calculate the average amount of points show posts created at each hour of the day receive.

avg_by_hour = []

for hour in points_by_hour :
    avg_by_hour.append([hour, (points_by_hour[hour]/counts_by_hour[hour])])
    
avg_by_hour

[['12', 41.68852459016394],
 ['17', 27.107526881720432],
 ['22', 40.34782608695652],
 ['16', 28.322580645161292],
 ['05', 5.473684210526316],
 ['10', 18.916666666666668],
 ['11', 33.63636363636363],
 ['14', 25.430232558139537],
 ['18', 36.31147540983606],
 ['07', 19.0],
 ['00', 37.83870967741935],
 ['19', 30.945454545454545],
 ['21', 18.425531914893618],
 ['06', 23.4375],
 ['13', 24.626262626262626],
 ['23', 42.388888888888886],
 ['08', 15.264705882352942],
 ['20', 30.316666666666666],
 ['15', 28.564102564102566],
 ['03', 25.14814814814815],
 ['04', 14.846153846153847],
 ['02', 11.333333333333334],
 ['01', 25.0],
 ['09', 18.433333333333334]]

---

## Sorting and Printing Values from a List of Lists for Points on Show Posts

---

In [227]:
swap_avg_by_hour = []

for row in avg_by_hour :
    swap_avg_by_hour.append(
    [row[1], row[0]])
swap_avg_by_hour

[[41.68852459016394, '12'],
 [27.107526881720432, '17'],
 [40.34782608695652, '22'],
 [28.322580645161292, '16'],
 [5.473684210526316, '05'],
 [18.916666666666668, '10'],
 [33.63636363636363, '11'],
 [25.430232558139537, '14'],
 [36.31147540983606, '18'],
 [19.0, '07'],
 [37.83870967741935, '00'],
 [30.945454545454545, '19'],
 [18.425531914893618, '21'],
 [23.4375, '06'],
 [24.626262626262626, '13'],
 [42.388888888888886, '23'],
 [15.264705882352942, '08'],
 [30.316666666666666, '20'],
 [28.564102564102566, '15'],
 [25.14814814814815, '03'],
 [14.846153846153847, '04'],
 [11.333333333333334, '02'],
 [25.0, '01'],
 [18.433333333333334, '09']]

In [228]:
sorted_swap = sorted(swap_avg_by_hour, reverse=True)
sorted_swap

[[42.388888888888886, '23'],
 [41.68852459016394, '12'],
 [40.34782608695652, '22'],
 [37.83870967741935, '00'],
 [36.31147540983606, '18'],
 [33.63636363636363, '11'],
 [30.945454545454545, '19'],
 [30.316666666666666, '20'],
 [28.564102564102566, '15'],
 [28.322580645161292, '16'],
 [27.107526881720432, '17'],
 [25.430232558139537, '14'],
 [25.14814814814815, '03'],
 [25.0, '01'],
 [24.626262626262626, '13'],
 [23.4375, '06'],
 [19.0, '07'],
 [18.916666666666668, '10'],
 [18.433333333333334, '09'],
 [18.425531914893618, '21'],
 [15.264705882352942, '08'],
 [14.846153846153847, '04'],
 [11.333333333333334, '02'],
 [5.473684210526316, '05']]

In [229]:
print("Top 5 Hours for 'Show HN' Posts Points")
for avg, hour in sorted_swap[:5] :
    print(
    "{}: {:.2f} average points per post".format(
        dt.datetime.strptime(hour,"%H").strftime("%H:%M"),avg
        )
    )

Top 5 Hours for 'Show HN' Posts Points
23:00: 42.39 average points per post
12:00: 41.69 average points per post
22:00: 40.35 average points per post
00:00: 37.84 average points per post
18:00: 36.31 average points per post


The hour that receives the most points per post on average is 23:00, or 11:00 pm EST, with an average of 42.39 comments per post. 

Now let's explore the number of comments for other posts.

# Other

In [230]:
print(other_posts[:5])

[['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'], ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01'], ['10301696', 'Note by Note: The Making of Steinway L1037 (2007)', 'http://www.nytimes.com/2007/11/07/movies/07stein.html?_r=0', '8', '2', 'walterbell', '9/30/2015 4:12']]


---

## Calculating the Average Number of Comments for Other Posts

---

Here I'll calculate the average number of comments other types of posts receives.

In [231]:
total_other_comments = 0

for post in other_posts :
    num_comments = post[4]
    num_comments = int(num_comments)
    total_other_comments += num_comments
    
avg_other_comments = total_other_comments / len(other_posts)

print(avg_other_comments)

26.8730371059672


On average, other posts in the sample receive approximately 26 comments.

---

## Finding the Amount of Other Posts and Comments by Hour Created

---

Next, I'll determine if we can maximize the amount of comments an other post receives by creating it at a certain time. To do this I will: 
* Calculate the amount of other posts created in each hour of the day, along with the number of comments received.
* Calculate the average number of comments other posts receive by hour created.

In [232]:
# calculating the amount of other posts and comments by hour created

import datetime as dt

result_list = []

for post in other_posts :
    created_at = post[6]
    num_comments = post[4]
    num_comments = int(num_comments)
    result_list.append([created_at, num_comments])

counts_by_hour = {}
comments_by_hour = {}
date_format = "%m/%d/%Y %H:%M"

for row in result_list :
    date = row[0]
    comments = row[1]
    hour = dt.datetime.strptime(date, date_format).strftime("%H")
    
    if hour not in counts_by_hour :
        counts_by_hour[hour] = 1
        comments_by_hour[hour] = comments
    else :
        counts_by_hour[hour] += 1
        comments_by_hour[hour] += comments

print(counts_by_hour)
print(comments_by_hour)

{'12': 789, '06': 408, '22': 758, '11': 660, '05': 388, '10': 591, '16': 1101, '14': 958, '18': 1084, '17': 1169, '19': 980, '21': 874, '07': 448, '00': 611, '13': 918, '23': 674, '08': 496, '20': 911, '15': 1040, '03': 407, '04': 454, '09': 534, '01': 500, '02': 441}
{'12': 23944, '06': 8714, '22': 17635, '11': 19532, '05': 9768, '10': 15728, '16': 27959, '14': 30973, '18': 29186, '17': 32727, '19': 26167, '21': 20635, '07': 12010, '00': 16544, '13': 28363, '23': 16592, '08': 13405, '20': 21080, '15': 30700, '03': 10918, '04': 10953, '09': 14732, '01': 11536, '02': 12254}


---

## Calculating the Average Number of Comments for Other Posts by Hour

---

Next, I'll calculate the average number of comments for posts created during each hour of the day. 

In [233]:
# Calculate the average amount of comments other posts created at each hour of the day receive.

avg_by_hour = []

for hour in comments_by_hour :
    avg_by_hour.append([hour, (comments_by_hour[hour]/counts_by_hour[hour])])
    
avg_by_hour

[['12', 30.34727503168568],
 ['06', 21.357843137254903],
 ['22', 23.265171503957784],
 ['11', 29.593939393939394],
 ['05', 25.175257731958762],
 ['10', 26.612521150592215],
 ['16', 25.394187102633968],
 ['14', 32.33089770354906],
 ['18', 26.924354243542435],
 ['17', 27.99572284003422],
 ['19', 26.701020408163266],
 ['21', 23.60983981693364],
 ['07', 26.808035714285715],
 ['00', 27.076923076923077],
 ['13', 30.896514161220043],
 ['23', 24.617210682492583],
 ['08', 27.026209677419356],
 ['20', 23.13940724478595],
 ['15', 29.51923076923077],
 ['03', 26.825552825552826],
 ['04', 24.125550660792953],
 ['09', 27.588014981273407],
 ['01', 23.072],
 ['02', 27.786848072562357]]

---

## Sorting and Printing Values from a List of Lists for Comments on Other Posts

---

In [234]:
swap_avg_by_hour = []

for row in avg_by_hour :
    swap_avg_by_hour.append(
    [row[1], row[0]])
swap_avg_by_hour

[[30.34727503168568, '12'],
 [21.357843137254903, '06'],
 [23.265171503957784, '22'],
 [29.593939393939394, '11'],
 [25.175257731958762, '05'],
 [26.612521150592215, '10'],
 [25.394187102633968, '16'],
 [32.33089770354906, '14'],
 [26.924354243542435, '18'],
 [27.99572284003422, '17'],
 [26.701020408163266, '19'],
 [23.60983981693364, '21'],
 [26.808035714285715, '07'],
 [27.076923076923077, '00'],
 [30.896514161220043, '13'],
 [24.617210682492583, '23'],
 [27.026209677419356, '08'],
 [23.13940724478595, '20'],
 [29.51923076923077, '15'],
 [26.825552825552826, '03'],
 [24.125550660792953, '04'],
 [27.588014981273407, '09'],
 [23.072, '01'],
 [27.786848072562357, '02']]

In [235]:
sorted_swap = sorted(swap_avg_by_hour, reverse=True)
sorted_swap

[[32.33089770354906, '14'],
 [30.896514161220043, '13'],
 [30.34727503168568, '12'],
 [29.593939393939394, '11'],
 [29.51923076923077, '15'],
 [27.99572284003422, '17'],
 [27.786848072562357, '02'],
 [27.588014981273407, '09'],
 [27.076923076923077, '00'],
 [27.026209677419356, '08'],
 [26.924354243542435, '18'],
 [26.825552825552826, '03'],
 [26.808035714285715, '07'],
 [26.701020408163266, '19'],
 [26.612521150592215, '10'],
 [25.394187102633968, '16'],
 [25.175257731958762, '05'],
 [24.617210682492583, '23'],
 [24.125550660792953, '04'],
 [23.60983981693364, '21'],
 [23.265171503957784, '22'],
 [23.13940724478595, '20'],
 [23.072, '01'],
 [21.357843137254903, '06']]

In [236]:
print("Top 5 Hours for Other Posts Comments")
for avg, hour in sorted_swap[:5] :
    print(
    "{}: {:.2f} average comments per post".format(
        dt.datetime.strptime(hour,"%H").strftime("%H:%M"),avg
        )
    )

Top 5 Hours for Other Posts Comments
14:00: 32.33 average comments per post
13:00: 30.90 average comments per post
12:00: 30.35 average comments per post
11:00: 29.59 average comments per post
15:00: 29.52 average comments per post


The hour that receives the most comments per post on average is 14:00, or 2:00 pm EST, with an average of 32.33 comments per post. 

Finally let's explore the number of points for other posts.

---

## Calculating the Average Number of Points for Other Posts

---

Here I'll calculate the average number of points other posts receive.

In [237]:
total_other_points = 0

for post in other_posts :
    num_points = post[3]
    num_points = int(num_points)
    total_other_points += num_points
    
avg_other_points = total_other_points / len(other_posts)

print(avg_other_points)

55.4067698034198


On average, other posts in the sample receive approximately 55 points. 

---

## Finding the Amount of Other Posts and Points by Hour Created

---

Next, I'll determine if we can maximize the amount of points an other post receives by creating it at a certain time. To do this I will: 
* Calculate the amount of other posts created in each hour of the day, along with the number of points received.
* Calculate the average number of points other posts receive by hour created.

In [238]:
# calculating the amount of other posts and points by hour created

import datetime as dt

result_list = []

for post in other_posts :
    created_at = post[6]
    num_points = post[3]
    num_points = int(num_points)
    result_list.append([created_at, num_points])

counts_by_hour = {}
points_by_hour = {}
date_format = "%m/%d/%Y %H:%M"

for row in result_list :
    date = row[0]
    points = row[1]
    hour = dt.datetime.strptime(date, date_format).strftime("%H")
    
    if hour not in counts_by_hour :
        counts_by_hour[hour] = 1
        points_by_hour[hour] = points
    else :
        counts_by_hour[hour] += 1
        points_by_hour[hour] += points

print(counts_by_hour)
print(points_by_hour)

{'12': 789, '06': 408, '22': 758, '11': 660, '05': 388, '10': 591, '16': 1101, '14': 958, '18': 1084, '17': 1169, '19': 980, '21': 874, '07': 448, '00': 611, '13': 918, '23': 674, '08': 496, '20': 911, '15': 1040, '03': 407, '04': 454, '09': 534, '01': 500, '02': 441}
{'12': 45287, '06': 18864, '22': 38079, '11': 37995, '05': 19387, '10': 35746, '16': 59655, '14': 59191, '18': 58459, '17': 67777, '19': 58811, '21': 43149, '07': 25461, '00': 35718, '13': 57398, '23': 35068, '08': 26830, '20': 41218, '15': 62964, '03': 23167, '04': 22549, '09': 28802, '01': 25303, '02': 25786}


---

## Calculating the Average Number of Points for Other Posts by Hour

---

Next, I'll calculate the average number of points for posts created during each hour of the day. 

In [239]:
# Calculate the average amount of points other posts created at each hour of the day receive.

avg_by_hour = []

for hour in points_by_hour :
    avg_by_hour.append([hour, (points_by_hour[hour]/counts_by_hour[hour])])
    
avg_by_hour

[['12', 57.3979721166033],
 ['06', 46.23529411764706],
 ['22', 50.236147757255935],
 ['11', 57.56818181818182],
 ['05', 49.96649484536083],
 ['10', 60.4839255499154],
 ['16', 54.182561307901906],
 ['14', 61.78601252609603],
 ['18', 53.928966789667896],
 ['17', 57.97861420017109],
 ['19', 60.01122448979592],
 ['21', 49.369565217391305],
 ['07', 56.832589285714285],
 ['00', 58.4582651391162],
 ['13', 62.525054466230934],
 ['23', 52.02967359050445],
 ['08', 54.09274193548387],
 ['20', 45.24478594950604],
 ['15', 60.542307692307695],
 ['03', 56.92137592137592],
 ['04', 49.66740088105727],
 ['09', 53.93632958801498],
 ['01', 50.606],
 ['02', 58.471655328798185]]

---

## Sorting and Printing Values from a List of Lists for Points on Other Posts

---

In [240]:
swap_avg_by_hour = []

for row in avg_by_hour :
    swap_avg_by_hour.append(
    [row[1], row[0]])
swap_avg_by_hour

[[57.3979721166033, '12'],
 [46.23529411764706, '06'],
 [50.236147757255935, '22'],
 [57.56818181818182, '11'],
 [49.96649484536083, '05'],
 [60.4839255499154, '10'],
 [54.182561307901906, '16'],
 [61.78601252609603, '14'],
 [53.928966789667896, '18'],
 [57.97861420017109, '17'],
 [60.01122448979592, '19'],
 [49.369565217391305, '21'],
 [56.832589285714285, '07'],
 [58.4582651391162, '00'],
 [62.525054466230934, '13'],
 [52.02967359050445, '23'],
 [54.09274193548387, '08'],
 [45.24478594950604, '20'],
 [60.542307692307695, '15'],
 [56.92137592137592, '03'],
 [49.66740088105727, '04'],
 [53.93632958801498, '09'],
 [50.606, '01'],
 [58.471655328798185, '02']]

In [241]:
sorted_swap = sorted(swap_avg_by_hour, reverse=True)
sorted_swap

[[62.525054466230934, '13'],
 [61.78601252609603, '14'],
 [60.542307692307695, '15'],
 [60.4839255499154, '10'],
 [60.01122448979592, '19'],
 [58.471655328798185, '02'],
 [58.4582651391162, '00'],
 [57.97861420017109, '17'],
 [57.56818181818182, '11'],
 [57.3979721166033, '12'],
 [56.92137592137592, '03'],
 [56.832589285714285, '07'],
 [54.182561307901906, '16'],
 [54.09274193548387, '08'],
 [53.93632958801498, '09'],
 [53.928966789667896, '18'],
 [52.02967359050445, '23'],
 [50.606, '01'],
 [50.236147757255935, '22'],
 [49.96649484536083, '05'],
 [49.66740088105727, '04'],
 [49.369565217391305, '21'],
 [46.23529411764706, '06'],
 [45.24478594950604, '20']]

In [242]:
print("Top 5 Hours for Other Posts Points")
for avg, hour in sorted_swap[:5] :
    print(
    "{}: {:.2f} average points per post".format(
        dt.datetime.strptime(hour,"%H").strftime("%H:%M"),avg
        )
    )

Top 5 Hours for Other Posts Points
13:00: 62.53 average points per post
14:00: 61.79 average points per post
15:00: 60.54 average points per post
10:00: 60.48 average points per post
19:00: 60.01 average points per post


The hour that receives the most points per post on average is 13:00, or 1:00 pm EST, with an average of 62.53 points per post. However, the top three hours are consecutive. 

---

## Data Visualization

---

[Visualizations](https://github.com/leahrowland86/Dataquest-Guided-Project-2/tree/master/Visualizations)

![ask_average-comments.png](Visualizations/ask_average-comments.png)

![show_average-points.png](Visualizations/show_average-points.png)

![other_average-comments.png](Visualizations/other_average-comments.png)

![other_average-points.png](Visualizations/other_average-points.png)

![ask_other_average-comments.png](Visualizations/ask_other_average-comments.png)

![show_other_average-points.png](Visualizations/show_other_average-points.png)

---

## Conclusion

---

In this project, I analyzed ask HN posts and show HN posts to determine which type of post and time receive the most comments and points on average. I then compared this to the average comments and points on other posts. Based on the analysis, to maximize the amount of comments a post receives I would recommend the post be categorized as ask post and created between 15:00 and 16:00 (3:00 pm EST - 4:00 pm EST). To maximize the amount of points a post received I would recommend the post be uncategorized and created between 13:00 and 16:00 (1:00 pm EST - 4:00 pm EST).

However, it should be noted that the data set I analyzed excluded posts without any comments. 