**The project** is an analysis, comparison and research on all kinds of postings on the "Hacker News" web platform. [**Hacker News**](https://news.ycombinator.com/news) (sometimes abbreviated as **HN**) is a social news website focusing on computer science and entrepreneurship. 

The intention is to recreate a community made by interested parties who will ask questions, show some individual work of their own and make a discussion.

The main research will be limited to ask type posts and comparing the average number of comments/responses to questions within each hour of a day. 

The purpose is to determine if the results will provide best time of day to post questions according to the highest frequency of comments/responses.


---




1.Read the csv file and create a list

In [2]:
from csv import reader

open_file = open('/content/hacker_news.csv')
read_file = reader(open_file)
hn = list(read_file)
print(hn[:5])

[['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at'], ['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'], ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01']]


2.Excluding the header information in order to do a research

In [3]:
headers = hn[0]
hn = hn[1:]
print(headers)
print('\n')
print(hn[:5])

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']


[['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'], ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01'], ['10301696', 'Note by Note: The Making of Steinway L1037 (2007)', 'http://www.nytimes.com/2007/11/07/movies/07stein.html?_r=0', '8', '2', 'walterbell', '9/30/2015 4:12']]


3.Extracting Ask posts,Show posts and other.You can see the number of each of the separate lists.

In [4]:
ask_posts = list()
show_posts = list()
other_posts = list()
for row in hn:
    title = row[1]
    if title.lower().startswith('ask hn'):
        ask_posts.append(row)
    elif title.lower().startswith('show hn'):
        show_posts.append(row)
    else:
        other_posts.append(row)
print('Number of Ask HNs Posts : ', len(ask_posts))  
print('Number of HNs Posts : ', len(show_posts))  
print('Number of Other Posts : ', len(other_posts))
    

Number of Ask HNs Posts :  1744
Number of HNs Posts :  1162
Number of Other Posts :  17194


4.Calculating average number of comments for Ask and Show posts.From this far, we can conclude that ask hacker news posts are drawing more attention for debate rather than show hacker news posts.

In [5]:
total_ask_comments = 0
for row in ask_posts:
    comments = row[4]
    comments = int(comments)
    total_ask_comments = total_ask_comments + comments
avg_ask_comments = total_ask_comments / len(ask_posts)
print("Average ask hn comments : ", round(avg_ask_comments,2))

total_show_comments = 0
for row in show_posts:
    comments = row[4]
    comments = int(comments)
    total_show_comments = total_show_comments + comments
avg_show_comments = total_show_comments / len(show_posts)
print("Average show hn comments : ", round(avg_show_comments,2))

Average ask hn comments :  14.04
Average show hn comments :  10.32


5.Calculating number of comments by hour in ask hacker news posts

In [6]:
import datetime as dt
result_list = list()
for row in ask_posts:
    created_at = row[6]
    num_comments = int(row[4])
    result_list.append([created_at,num_comments])
    
counts_by_hour = dict()
comments_by_hour = dict()
date_format = "%m/%d/%Y %H:%M"
for row in result_list:
    hour = row[0]
    comments = row[1]
    time = dt.datetime.strptime(hour, date_format)
    time = time.strftime("%H")
    if time not in counts_by_hour:
        counts_by_hour[time] = 1
        comments_by_hour[time] = comments
    else:
        counts_by_hour[time] += 1
        comments_by_hour[time] += comments
print(counts_by_hour)
print('\n')
print(comments_by_hour)

{'09': 45, '13': 85, '10': 59, '14': 107, '16': 108, '23': 68, '12': 73, '17': 100, '15': 116, '21': 109, '20': 80, '02': 58, '18': 109, '03': 54, '05': 46, '19': 110, '01': 60, '22': 71, '08': 48, '04': 47, '00': 55, '06': 44, '07': 34, '11': 58}


{'09': 251, '13': 1253, '10': 793, '14': 1416, '16': 1814, '23': 543, '12': 687, '17': 1146, '15': 4477, '21': 1745, '20': 1722, '02': 1381, '18': 1439, '03': 421, '05': 464, '19': 1188, '01': 683, '22': 479, '08': 492, '04': 337, '00': 447, '06': 397, '07': 267, '11': 641}


6.We are interested in average number of comments by every hour. 

In [7]:
avg_by_hour = list()
for hour in counts_by_hour:
    avg_by_hour.append([hour, comments_by_hour[hour] / counts_by_hour[hour]])
for row in avg_by_hour:
    hour = row[0]
    average = row[1]
    print("Average # of comments in ", hour,"hrs is ", round(average,2))

Average # of comments in  09 hrs is  5.58
Average # of comments in  13 hrs is  14.74
Average # of comments in  10 hrs is  13.44
Average # of comments in  14 hrs is  13.23
Average # of comments in  16 hrs is  16.8
Average # of comments in  23 hrs is  7.99
Average # of comments in  12 hrs is  9.41
Average # of comments in  17 hrs is  11.46
Average # of comments in  15 hrs is  38.59
Average # of comments in  21 hrs is  16.01
Average # of comments in  20 hrs is  21.52
Average # of comments in  02 hrs is  23.81
Average # of comments in  18 hrs is  13.2
Average # of comments in  03 hrs is  7.8
Average # of comments in  05 hrs is  10.09
Average # of comments in  19 hrs is  10.8
Average # of comments in  01 hrs is  11.38
Average # of comments in  22 hrs is  6.75
Average # of comments in  08 hrs is  10.25
Average # of comments in  04 hrs is  7.17
Average # of comments in  00 hrs is  8.13
Average # of comments in  06 hrs is  9.02
Average # of comments in  07 hrs is  7.85
Average # of comments in

7.To make the order to be sequential, in the cell below you can see the top 5 hours for ask posts in descedning order of average comments per post.

In [8]:
swap_avg_by_hour = list()
for row in avg_by_hour:
    swap_avg_by_hour.append([row[1],row[0]])

sorted_swap = sorted(swap_avg_by_hour, reverse=True)
print("Sorted swap:" , sorted_swap)
print('Top 5 Hours for Ask Posts Comments Across All Days of Week ')
time_format = "%H"
for row in sorted_swap[:5]:
    comments = row[0]
    hour = row[1]
    hour1 = dt.datetime.strptime(hour, time_format).strftime("%H:%M")
    print(hour1, round(comments,2), ' average comments per post')

Sorted swap: [[38.5948275862069, '15'], [23.810344827586206, '02'], [21.525, '20'], [16.796296296296298, '16'], [16.009174311926607, '21'], [14.741176470588234, '13'], [13.440677966101696, '10'], [13.233644859813085, '14'], [13.20183486238532, '18'], [11.46, '17'], [11.383333333333333, '01'], [11.051724137931034, '11'], [10.8, '19'], [10.25, '08'], [10.08695652173913, '05'], [9.41095890410959, '12'], [9.022727272727273, '06'], [8.127272727272727, '00'], [7.985294117647059, '23'], [7.852941176470588, '07'], [7.796296296296297, '03'], [7.170212765957447, '04'], [6.746478873239437, '22'], [5.5777777777777775, '09']]
Top 5 Hours for Ask Posts Comments Across All Days of Week 
15:00 38.59  average comments per post
02:00 23.81  average comments per post
20:00 21.52  average comments per post
16:00 16.8  average comments per post
21:00 16.01  average comments per post


**Conclusion**

From the cell above we can conclude the following: 15:00(3pm),16:00(4pm) and little after that 20:00(8pm) and 21:00(9pm) are **the most frequent times** for discussion, posting or anyting connected to this interesting platform.But, this result is a complex one bearing in mind that the hours are given in **UTC 0+ Time Zone** and since this is a global platform people from all over the world have access to and join the discussion. This means that in particular parts of the world they would access the site in a different part of their day or night.
Let's see the analysis based on the days of the week.



---



In [13]:
counts_by_day = dict()
comments_by_day = dict()
date_format = "%m/%d/%Y %H:%M"

for row in result_list:
    entry_date = row[0]
    comments = row[1]
    day = dt.datetime.strptime(entry_date, date_format).strftime("%A")
    if day in counts_by_day:
        counts_by_day[day] += 1
        comments_by_day[day] += comments
    else:
        counts_by_day[day] = 1
        comments_by_day[day] = comments

avg_by_day = list()

for day in counts_by_day:
    avg_by_day.append([day, comments_by_day[day] / counts_by_day[day]])

    
print('\n')
print(avg_by_day)

swap_avg_by_day = list()
for row in avg_by_day:
    swap_avg_by_day.append([row[1],row[0]])

sorted_swap_day = sorted(swap_avg_by_day, reverse=True)
print("Sorted swap day:" , sorted_swap_day)
print('\n')

for row in sorted_swap_day:
    comments = row[0]
    day = row[1]
    print(day, round(comments,2), ' average comments per day')



[['Tuesday', 10.59375], ['Sunday', 19.290123456790123], ['Monday', 12.592982456140351], ['Thursday', 13.125984251968504], ['Saturday', 15.636842105263158], ['Friday', 17.55719557195572], ['Wednesday', 12.431972789115646]]
Sorted swap day: [[19.290123456790123, 'Sunday'], [17.55719557195572, 'Friday'], [15.636842105263158, 'Saturday'], [13.125984251968504, 'Thursday'], [12.592982456140351, 'Monday'], [12.431972789115646, 'Wednesday'], [10.59375, 'Tuesday']]


Sunday 19.29  average comments per day
Friday 17.56  average comments per day
Saturday 15.64  average comments per day
Thursday 13.13  average comments per day
Monday 12.59  average comments per day
Wednesday 12.43  average comments per day
Tuesday 10.59  average comments per day


Average comments per post within each day is **highest** for the weekend days: Friday, Saturday and Sunday. It is considered as quite a logical feedback, mostly because people have more free time in the weekends for extracurricular activities and discussions on the Hacker News platform.

Based on my research, now I should advise you when to post a topic on Hacker News according to my remarks. Well, **the optimal time is Friday or Saturday**, because the Hacker News community is the most active during this period of the week. However, time frames between **3.00pm and 9.00pm** during every day is proved as effective for posting a question to the public. Having that in mind, consider posting **Friday from 3.00pm till 9.00pm**, so that your post would get attention from a larger group of entusiasts, thus having more chance for an open discussion on your topic.

Thank you for your attention.