<a href="https://colab.research.google.com/github/noseda-allison/python-projects-/blob/main/Exploring_Hacker_News_Posts.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Exploring Hacker News Posts
In this project we will analyze user-sumbitted stories (AKA "posts"). We are interested in posts that begin with `ASK HN` and `SHOW HN`. 
  * `ASK HN` posts are submitted by users in order to ask the Hacker News community a question.
  * `SHOW HN` posts are used to share a project, a product or something else of interest to the Hacker News community. 

Our goal is answer the following questions:
* What type of post recieves more commments on average, `ASK HN` or `SHOW HN`?
* Do posts submitted at a particular time recieve more comments on average?

> Indented block



In [None]:
import pandas as pd

path = 'https://github.com/StephaniePC1/ThisIsWhatIDoNow/raw/master/197_419_bundle_archive%20(1).zip'
hackernews = pd.read_csv(path, compression='zip')
# differnt data importing

In [None]:
hackernews.head()

Unnamed: 0,id,title,url,num_points,num_comments,author,created_at
0,12579008,You have two days to comment if you want stem ...,http://www.regulations.gov/document?D=FDA-2015...,1,0,altstar,9/26/2016 3:26
1,12579005,SQLAR the SQLite Archiver,https://www.sqlite.org/sqlar/doc/trunk/README.md,1,0,blacksqr,9/26/2016 3:24
2,12578997,What if we just printed a flatscreen televisio...,https://medium.com/vanmoof/our-secrets-out-f21...,1,0,pavel_lishin,9/26/2016 3:19
3,12578989,algorithmic music,http://cacm.acm.org/magazines/2011/7/109891-al...,1,0,poindontcare,9/26/2016 3:16
4,12578979,How the Data Vault Enables the Next-Gen Data W...,https://www.talend.com/blog/2016/05/12/talend-...,1,0,markgainor1,9/26/2016 3:14


In [None]:
hn = [hackernews.columns.values.tolist()] + hackernews.values.tolist()
# Converted the Pandas DataFrame to a list of list

In [None]:
headers = hn[0]
print(headers)

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']


In [None]:
hn = hn[1:]
print(hn[:5])

[[12579008, 'You have two days to comment if you want stem cells to be classified as your own', 'http://www.regulations.gov/document?D=FDA-2015-D-3719-0018', 1, 0, 'altstar', '9/26/2016 3:26'], [12579005, 'SQLAR  the SQLite Archiver', 'https://www.sqlite.org/sqlar/doc/trunk/README.md', 1, 0, 'blacksqr', '9/26/2016 3:24'], [12578997, 'What if we just printed a flatscreen television on the side of our boxes?', 'https://medium.com/vanmoof/our-secrets-out-f21c1f03fdc8#.ietxmez43', 1, 0, 'pavel_lishin', '9/26/2016 3:19'], [12578989, 'algorithmic music', 'http://cacm.acm.org/magazines/2011/7/109891-algorithmic-composition/fulltext', 1, 0, 'poindontcare', '9/26/2016 3:16'], [12578979, 'How the Data Vault Enables the Next-Gen Data Warehouse and Data Lake', 'https://www.talend.com/blog/2016/05/12/talend-and-Â\x93the-data-vaultÂ\x94', 1, 0, 'markgainor1', '9/26/2016 3:14']]


In [None]:
ask_posts=[]
show_posts=[]
other_posts=[]

In [None]:
for post in hn:
    title = post[1]
    if title.lower().startswith("ask hn"):
        ask_posts.append(post)
    elif title.lower().startswith("show hn"):
        show_posts.append(post)
    else:
        other_posts.append(post)

In [None]:
print("The number of ask hn posts: ", len(ask_posts))
print("The number of show hn posts: ", len(show_posts))
print("The number of other posts: ", len(other_posts))
# Printed text associated with the solution

The number of ask hn posts:  9139
The number of show hn posts:  10158
The number of other posts:  273822


In [None]:
total_ask_comments = 0

In [None]:
for post in ask_posts:
    total_ask_comments += int(post[4])

In [None]:
avg_ask_comments = round(total_ask_comments / len(ask_posts) ,2)
print(total_ask_comments)
# Rounded 

94986


In [None]:
print(avg_ask_comments)

10.39


In [None]:
total_show_comments = 0

In [None]:
for post in ask_posts:
    total_show_comments += int(post[4])

In [None]:
avg_show_comments = round(total_show_comments / len(show_posts) ,2)
# Rounded

In [None]:
print(avg_show_comments)

9.35


After separating the `ASK_HN` posts from the `SHOW_HN` posts, we calculated the average number of comments on both types of post.

 *  On average `ASK_HN` posts receive more comments - about **10.39** comments per post.
 
 *  `SHOW_HN` posts recieve about **9.35**.

Now we will analyse the `ASK_HN` posts to find the time of day (or night) this type of post recieves the most comments.

In [None]:
import datetime as dt

In [None]:
result_list=[]

In [None]:
for post in ask_posts:
    result_list.append([post[6], int(post[4])])

In [None]:
comments_by_hour = {}
counts_by_hour = {}

In [None]:
date_time = "%m/%d/%Y %H:%M"
for each_row in result_list:
    date = each_row[0]
    comment = each_row[1]
    time = dt.datetime.strptime(date, date_time).strftime("%H")
    if time in counts_by_hour:
        comments_by_hour[time] += comment
        counts_by_hour[time] += 1
    else:
        comments_by_hour[time] = comment
        counts_by_hour[time] = 1

In [None]:
counts_by_hour

{'00': 301,
 '01': 282,
 '02': 269,
 '03': 271,
 '04': 243,
 '05': 209,
 '06': 234,
 '07': 226,
 '08': 257,
 '09': 222,
 '10': 282,
 '11': 312,
 '12': 342,
 '13': 444,
 '14': 513,
 '15': 646,
 '16': 579,
 '17': 587,
 '18': 614,
 '19': 552,
 '20': 510,
 '21': 518,
 '22': 383,
 '23': 343}

In [None]:
comments_by_hour

{'00': 2277,
 '01': 2089,
 '02': 2996,
 '03': 2154,
 '04': 2360,
 '05': 1838,
 '06': 1587,
 '07': 1585,
 '08': 2362,
 '09': 1477,
 '10': 3013,
 '11': 2797,
 '12': 4234,
 '13': 7245,
 '14': 4972,
 '15': 18525,
 '16': 4466,
 '17': 5547,
 '18': 4877,
 '19': 3954,
 '20': 4462,
 '21': 4500,
 '22': 3372,
 '23': 2297}

In [None]:
average_comments_by_hour = []

for hour in comments_by_hour:
    average_comments_by_hour.append([hour, round(comments_by_hour[hour] / counts_by_hour[hour] ,2)])
# Rounded

In [None]:
avg_by_hour=average_comments_by_hour
avg_by_hour

[['02', 11.14],
 ['01', 7.41],
 ['22', 8.8],
 ['21', 8.69],
 ['19', 7.16],
 ['17', 9.45],
 ['15', 28.68],
 ['14', 9.69],
 ['13', 16.32],
 ['11', 8.96],
 ['10', 10.68],
 ['09', 6.65],
 ['07', 7.01],
 ['03', 7.95],
 ['23', 6.7],
 ['20', 8.75],
 ['16', 7.71],
 ['08', 9.19],
 ['00', 7.56],
 ['18', 7.94],
 ['12', 12.38],
 ['04', 9.71],
 ['06', 6.78],
 ['05', 8.79]]

In [None]:
swap_avg_by_hour = []
for row in avg_by_hour:
    swap_avg_by_hour.append([row[1], row[0]])

In [None]:
swap_avg_by_hour

[[11.14, '02'],
 [7.41, '01'],
 [8.8, '22'],
 [8.69, '21'],
 [7.16, '19'],
 [9.45, '17'],
 [28.68, '15'],
 [9.69, '14'],
 [16.32, '13'],
 [8.96, '11'],
 [10.68, '10'],
 [6.65, '09'],
 [7.01, '07'],
 [7.95, '03'],
 [6.7, '23'],
 [8.75, '20'],
 [7.71, '16'],
 [9.19, '08'],
 [7.56, '00'],
 [7.94, '18'],
 [12.38, '12'],
 [9.71, '04'],
 [6.78, '06'],
 [8.79, '05']]

In [None]:
sorted_swap = sorted(swap_avg_by_hour, reverse=True)

In [None]:
print("Top 5 Hours for Ask Posts Comments")
print("\n")

for avg, hour in sorted_swap[:5]:
    print("{}: {:.2f} average comments per post".format(
            dt.datetime.strptime(hour, "%H").strftime("%H:%M"),avg))

Top 5 Hours for Ask Posts Comments


15:00: 28.68 average comments per post
13:00: 16.32 average comments per post
12:00: 12.38 average comments per post
02:00: 11.14 average comments per post
10:00: 10.68 average comments per post


In [None]:
diff=28.68-16.32
print(diff)
print((diff/16.32)*100)
# Simple calculations for write up

12.36
75.73529411764706


Our analysis indicates that we are most likely to recieve comments on our post at 15:00 (3:00 PM EST as we are currently in EDT this time would be 4:00 PM until November). 

There is a significant difference between the number of comments for the highest averaging hour and the second highest hour - appoximately 12 more comments which is a 76% increase.