# How to grasp attention on Hacker News

On this simple guided project, we are going to find out the most popular posts on the [hacker news website](https://news.ycombinator.com/) by exploring and evaluating it's pulic data set.

In [None]:
from csv import reader
opened_file = open("hacker_news.csv")
read_file = reader(opened_file)
hn = list(read_file)
headers, hn = hn[0], hn[1:]

In [None]:
print(headers, "\n")
for i in range(5):
    print(hn[i], "\n")

In [None]:
ask_posts, show_posts, other_posts = [], [], []
for row in hn:
    title = row[1]
    if title.lower().startswith("ask hn"):
        ask_posts.append(row)
    elif title.lower().startswith("show hn"):
        show_posts.append(row)
    else:
        other_posts.append(row)

In [None]:
print("Number of posts that belongs to each category:")
print(f"Ask: {len(ask_posts)}")
print(f"Show: {len(show_posts)}")
print(f"Other: {len(other_posts)}")

In [None]:
def avg_comments(type_post):
    total_comments = 0
    for row in type_post:
        n_comments = int(row[4])
        total_comments += n_comments
    
    avg_comments = total_comments / len(type_post)
    return avg_comments

In [None]:
print(f"Average number of comments on asking posts: {avg_comments(ask_posts):.2f}")
print(f"Average number of comments on show posts: {avg_comments(show_posts):.2f}")

As we can see, the type of post that gets ahead by looking at the mean of comments on both categories is the asking posts. This is a sign which gives us a broader vision about the site environment, where it can be not only a place to get updated, but also a good option to clear up some specific questions.

Furthermore, it indicates us the path wich we'll choose to explore depper in our analysis. The proeminently "asking posts" alone still can't give us enough insight in regards to the topic, though. Some reasons for that are that the popularity of a post can be also considerably influenced by other aspects such as the `number of points` and the time of the creation.

In [None]:
import datetime as dt
result_list = []
for row in ask_posts:
    created_at = row[6]
    n_comments = int(row[4])
    result_list.append([created_at, n_comments])

counts_by_hour, comments_by_hour = {}, {}
for row in result_list:
    date = row[0]
    datetime_obj = dt.datetime.strptime(date, "%m/%d/%Y %H:%M")
    hour = dt.datetime.strftime(datetime_obj, "%H")
    n_comments = row[1]
    if hour not in counts_by_hour:
        counts_by_hour[hour] = 1
        comments_by_hour[hour] = n_comments
    else:
        counts_by_hour[hour] += 1
        comments_by_hour[hour] += n_comments

In [None]:
avg_by_hour = []
for hour in comments_by_hour:
    avg_by_hour.append([hour, comments_by_hour[hour]/counts_by_hour[hour]])
avg_by_hour = sorted(avg_by_hour)
for row in avg_by_hour:
    print(f"{row[0]}h : {row[1]:.2f}")

In [None]:
swap_avg_by_hour = []
for row in avg_by_hour:
    swap_avg_by_hour.append([ float(row[1]), row[0] ])
    
sorted_swap = sorted(swap_avg_by_hour, reverse=True)
print("Top 5 hours for Ask Posts Comments:\n")
for row in sorted_swap[:5]:
    print(f"{row[1]}:00: {row[0]:.2f} average comments per post")