# Guided Project: Exploring Hacker News Posts

### We're specifically interested in posts with titles that begin with either Ask HN or Show HN

### We'll compare these two types of posts to determine the following:

 * Do Ask HN or Show HN receive more comments on average?
 * Do posts created at a certain time receive more comments on average?

In [59]:
from csv import reader
from datetime import datetime as dt, timezone, timedelta

In [2]:
file = open('hacker_news.csv')
csvreader = reader(file)
hn = list(csvreader)

In [3]:
print(hn[:5])

[['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at'], ['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'], ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01']]


In [4]:
headers = hn[0]
hn = hn[1:]

In [5]:
print(headers)

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']


In [6]:
print(hn[:5])

[['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'], ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01'], ['10301696', 'Note by Note: The Making of Steinway L1037 (2007)', 'http://www.nytimes.com/2007/11/07/movies/07stein.html?_r=0', '8', '2', 'walterbell', '9/30/2015 4:12']]


In [7]:
# Extracting Ask HN and Show HN Posts

ask_posts = []
show_posts = []
other_posts = []

for row in hn:
    title = row[1].lower() #title
    if title.startswith('ask hn'):
        ask_posts.append(row)
    elif title.startswith('show hn'):
        show_posts.append(row)
    else:
        other_posts.append(row)
        
print('ask posts lenght:',len(ask_posts))
print('show posts lenght:',len(show_posts))
print('other posts lenght:',len(other_posts))

ask posts lenght: 1744
show posts lenght: 1162
other posts lenght: 17194


#### Calculating the Average Number of Comments for Ask HN and Show HN Posts

In [8]:
total_ask_comments = 0
for post in ask_posts:
    total_ask_comments += int(post[4])

total_show_comments = 0
for post in show_posts:
    total_show_comments += int(post[4])

avg_ask_comments = total_ask_comments / len(ask_posts)
avg_show_comments = total_show_comments / len(show_posts)

print('ask comments average:',avg_ask_comments)
print('show comments average:',avg_show_comments)

ask comments average: 14.038417431192661
show comments average: 10.31669535283993


Do show posts or ask posts receive more comments on average? 
 * ask posts recieved more comments on average

### Finding the Number of Ask Posts and Comments by Hour Created
#### We'll determine if ask posts created at a certain time are more likely to attract comments. We'll use the following steps to perform this analysis:
 * Calculate the number of ask posts created in each hour of the day, along with the number of comments received.
 * Calculate the average number of comments ask posts receive by hour created.

In [9]:
result_list = []
for post in ask_posts:
    result_list.append([post[6], int(post[4])])

In [10]:
print(result_list[1])
# date format: %m/%d/%Y %H:%M

['11/22/2015 13:43', 29]


In [11]:
counts_by_hour = {}
comments_by_hour = {}

In [16]:
#dt_obj = dt.strptime(result_list[1][0], "%m/%d/%Y %H:%M")
#print(result_list[1][0])
#print(dt_obj)
#print(dt_obj.hour)

In [17]:
for info in result_list:
    dt_str = info[0]
    dt_obj = dt.strptime(dt_str, "%m/%d/%Y %H:%M")
    hour = dt_obj.hour
    
    if hour in counts_by_hour:
        counts_by_hour[hour] += 1
        comments_by_hour[hour] += info[1]
    else:
        counts_by_hour[hour] = 1
        comments_by_hour[hour] = info[1]
    

In [23]:
print("total posts by hour")
print(counts_by_hour)

total posts by hour
{9: 90, 13: 170, 10: 118, 14: 214, 16: 216, 23: 136, 12: 146, 17: 200, 15: 232, 21: 218, 20: 160, 2: 116, 18: 218, 3: 108, 5: 92, 19: 220, 1: 120, 22: 142, 8: 96, 4: 94, 0: 110, 6: 88, 7: 68, 11: 116}


In [24]:
print("total comments by hour")
print(comments_by_hour)

total comments by hour
{9: 502, 13: 2506, 10: 1586, 14: 2832, 16: 3628, 23: 1086, 12: 1374, 17: 2292, 15: 8954, 21: 3490, 20: 3444, 2: 2762, 18: 2878, 3: 842, 5: 928, 19: 2376, 1: 1366, 22: 958, 8: 984, 4: 674, 0: 894, 6: 794, 7: 534, 11: 1282}


### Calculating the Average Number of Comments for Ask HN Posts by Hour
#### divide total comments per total posts by hour

In [32]:
avg_by_hour = []
for hour, num_comments in comments_by_hour.items():
    avg = num_comments / counts_by_hour[hour]
    avg_by_hour.append([hour, avg])

In [35]:
print(avg_by_hour)

[[9, 5.5777777777777775], [13, 14.741176470588234], [10, 13.440677966101696], [14, 13.233644859813085], [16, 16.796296296296298], [23, 7.985294117647059], [12, 9.41095890410959], [17, 11.46], [15, 38.5948275862069], [21, 16.009174311926607], [20, 21.525], [2, 23.810344827586206], [18, 13.20183486238532], [3, 7.796296296296297], [5, 10.08695652173913], [19, 10.8], [1, 11.383333333333333], [22, 6.746478873239437], [8, 10.25], [4, 7.170212765957447], [0, 8.127272727272727], [6, 9.022727272727273], [7, 7.852941176470588], [11, 11.051724137931034]]


### Sorting and Printing Values from a List of Lists

In [37]:
swap_avg_by_hour = []
for avg in avg_by_hour:
    swap_avg_by_hour.append([avg[1], avg[0]])
    
print(swap_avg_by_hour)

[[5.5777777777777775, 9], [14.741176470588234, 13], [13.440677966101696, 10], [13.233644859813085, 14], [16.796296296296298, 16], [7.985294117647059, 23], [9.41095890410959, 12], [11.46, 17], [38.5948275862069, 15], [16.009174311926607, 21], [21.525, 20], [23.810344827586206, 2], [13.20183486238532, 18], [7.796296296296297, 3], [10.08695652173913, 5], [10.8, 19], [11.383333333333333, 1], [6.746478873239437, 22], [10.25, 8], [7.170212765957447, 4], [8.127272727272727, 0], [9.022727272727273, 6], [7.852941176470588, 7], [11.051724137931034, 11]]


In [39]:
sorted_swap = sorted(swap_avg_by_hour, reverse=True)

In [67]:
brazil_time_zone = timezone(timedelta(hours=-3))
print("Brazil time zone: ", brazil_time_zone)
print("Top 5 Hours for Ask Posts Comments")
for info in sorted_swap[:5]:
    avg = info[0]
    hour = info[1]
    hour_dt = dt.strptime(str(hour), "%H")
    hour_dt_brazil = hour_dt.astimezone(brazil_time_zone)
    print("{} {:.2f} average comments per post".format(hour_dt_brazil.time(), avg))

Brazil time zone:  UTC-03:00
Top 5 Hours for Ask Posts Comments
12:00:00 38.59 average comments per post
23:00:00 23.81 average comments per post
17:00:00 21.52 average comments per post
13:00:00 16.80 average comments per post
18:00:00 16.01 average comments per post


#### During which hours should you create a post to have a higher chance of receiving comments? 
 * Using Brazil time zone (UTC-03:00): 12h, 23h and 17h

#### Next steps:
 * Determine if show or ask posts receive more points on average.
 * Determine if posts created at a certain time are more likely to receive more points.
 * Compare your results to the average number of comments and points other posts receive.