# Exploring Hacker News Posts

In this project I will provide a comparative analysis of [Hacker News Posts](https://www.kaggle.com/hacker-news/hacker-news-posts). I will focus on the user engagement of Ask HN and Show HN Posts in comparison to the average posts to answer the following questions:

### Do Ask HN or Show HN posts receive more comments on average?

### Do posts submitted at certain times receive more comments on average?

#### To accomplish this I will do the following:

1. Explore the Data
2. Clean the Data
3. Analyze the Data
4. Make Conclusions from my analysis

### 1. Explore the Data

In [1]:
import csv

f = open('HN.csv')
hn = list(csv.reader(f))
hn[:10]

[['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at'],
 ['12579008',
  'You have two days to comment if you want stem cells to be classified as your own',
  'http://www.regulations.gov/document?D=FDA-2015-D-3719-0018',
  '1',
  '0',
  'altstar',
  '9/26/2016 3:26'],
 ['12579005',
  'SQLAR  the SQLite Archiver',
  'https://www.sqlite.org/sqlar/doc/trunk/README.md',
  '1',
  '0',
  'blacksqr',
  '9/26/2016 3:24'],
 ['12578997',
  'What if we just printed a flatscreen television on the side of our boxes?',
  'https://medium.com/vanmoof/our-secrets-out-f21c1f03fdc8#.ietxmez43',
  '1',
  '0',
  'pavel_lishin',
  '9/26/2016 3:19'],
 ['12578989',
  'algorithmic music',
  'http://cacm.acm.org/magazines/2011/7/109891-algorithmic-composition/fulltext',
  '1',
  '0',
  'poindontcare',
  '9/26/2016 3:16'],
 ['12578979',
  'How the Data Vault Enables the Next-Gen Data Warehouse and Data Lake',
  'https://www.talend.com/blog/2016/05/12/talend-and-Â\x93the-data-vaultÂ\x94',
  

### 2. Clean the Data

In [2]:
headers = hn[0]
hn = hn[1:]
print(headers)
print('\n')
print(hn[:10])

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']


[['12579008', 'You have two days to comment if you want stem cells to be classified as your own', 'http://www.regulations.gov/document?D=FDA-2015-D-3719-0018', '1', '0', 'altstar', '9/26/2016 3:26'], ['12579005', 'SQLAR  the SQLite Archiver', 'https://www.sqlite.org/sqlar/doc/trunk/README.md', '1', '0', 'blacksqr', '9/26/2016 3:24'], ['12578997', 'What if we just printed a flatscreen television on the side of our boxes?', 'https://medium.com/vanmoof/our-secrets-out-f21c1f03fdc8#.ietxmez43', '1', '0', 'pavel_lishin', '9/26/2016 3:19'], ['12578989', 'algorithmic music', 'http://cacm.acm.org/magazines/2011/7/109891-algorithmic-composition/fulltext', '1', '0', 'poindontcare', '9/26/2016 3:16'], ['12578979', 'How the Data Vault Enables the Next-Gen Data Warehouse and Data Lake', 'https://www.talend.com/blog/2016/05/12/talend-and-Â\x93the-data-vaultÂ\x94', '1', '0', 'markgainor1', '9/26/2016 3:14'], ['12578975', '

### 3. Analyze the Data

#### Extract Ask HN and Show HN Posts

In [3]:
ask_hn = []
show_hn =[]
other_hn = []

for post in hn:
    title = post[1]
    if title.lower().startswith("ask hn"):
        ask_hn.append(post)
    elif title.lower().startswith("show hn"):
        show_hn.append(post)
    else:
        other_hn.append(post)
        

print(len(ask_hn))
print(len(show_hn))
print(len(other_hn))

9139
10158
273822


![](https://user-images.githubusercontent.com/50001708/85905687-e9108800-b7c0-11ea-91dc-8dfe6b05acb6.png)

#### Calculate Total Number of Comments for Ask HN, Show HN and Other HN Posts

In [4]:
t_ask = 0

for post in ask_hn:
    t_ask += int(post[4])
    
avg_ask = t_ask / len(ask_hn)


t_show = 0

for post in show_hn:
    t_show += int(post[4])
    
avg_show = t_show / len(show_hn)


t_other = 0

for post in other_hn:
    t_other += int(post[4])
    
avg_other = t_other / len(other_hn)

print(t_ask)
print(t_show)
print(t_other)

94986
49633
1768142


![](https://user-images.githubusercontent.com/50001708/85905686-e9108800-b7c0-11ea-9fb5-74ef9af1e809.png)

#### Calculate Average Number of Comments for Ask HN, Show HN and Other HN Posts

In [5]:
print(avg_ask)
print(avg_show)
print(avg_other)

10.393478498741656
4.886099625910612
6.4572678601427205


![](https://user-images.githubusercontent.com/50001708/85905685-e877f180-b7c0-11ea-96f7-1a86a1a0d7d1.png)

#### Calculate the Percentage of Ask HN Posts

In [6]:
print('Total Ask HN Posts Percentage')
print(len(ask_hn) / (len(ask_hn) + len(show_hn) + len(other_hn)))

Total Ask HN Posts Percentage
0.03117846335447378


#### Find Amount of Ask HN Posts and Comments by Hour Created

Because Ask HN posts have more comments on average compared to Show HN and Other HN posts, my analysis going forward will focus on Ask HN posts. 

In [7]:
import datetime as dt

result_l = []

for post in ask_hn:
    result_l.append(
        [post[6], int(post[4])]
    )

comments_by_h = {}
counts_by_h = {}
date_format = "%m/%d/%Y %H:%M"

for each_r in result_l:
    date = each_r[0]
    comment = each_r[1]
    time = dt.datetime.strptime(date, date_format).strftime("%H")
    if time in counts_by_h:
        comments_by_h[time] += comment
        counts_by_h[time] += 1
    else:
        comments_by_h[time] = comment
        counts_by_h[time] = 1
        
comments_by_h

{'02': 2996,
 '01': 2089,
 '22': 3372,
 '21': 4500,
 '19': 3954,
 '17': 5547,
 '15': 18525,
 '14': 4972,
 '13': 7245,
 '11': 2797,
 '10': 3013,
 '09': 1477,
 '07': 1585,
 '03': 2154,
 '23': 2297,
 '20': 4462,
 '16': 4466,
 '08': 2362,
 '00': 2277,
 '18': 4877,
 '12': 4234,
 '04': 2360,
 '06': 1587,
 '05': 1838}

#### Calculate Average Number of Comments for Ask HN Posts by Hour

In [8]:
avg_by_h = []

for hr in comments_by_h:
    avg_by_h.append([hr, comments_by_h[hr] / counts_by_h[hr]])

avg_by_h

[['02', 11.137546468401487],
 ['01', 7.407801418439717],
 ['22', 8.804177545691905],
 ['21', 8.687258687258687],
 ['19', 7.163043478260869],
 ['17', 9.449744463373083],
 ['15', 28.676470588235293],
 ['14', 9.692007797270955],
 ['13', 16.31756756756757],
 ['11', 8.96474358974359],
 ['10', 10.684397163120567],
 ['09', 6.653153153153153],
 ['07', 7.013274336283186],
 ['03', 7.948339483394834],
 ['23', 6.696793002915452],
 ['20', 8.749019607843136],
 ['16', 7.713298791018998],
 ['08', 9.190661478599221],
 ['00', 7.5647840531561465],
 ['18', 7.94299674267101],
 ['12', 12.380116959064328],
 ['04', 9.7119341563786],
 ['06', 6.782051282051282],
 ['05', 8.794258373205741]]

#### Sort and Print Values from a List of Lists

In [9]:
swap_avg_by_h = []

for row in avg_by_h:
    swap_avg_by_h.append([row[1], row[0]])
    
print(swap_avg_by_h)

sorted_swap = sorted(swap_avg_by_h, reverse=True)

sorted_swap

[[11.137546468401487, '02'], [7.407801418439717, '01'], [8.804177545691905, '22'], [8.687258687258687, '21'], [7.163043478260869, '19'], [9.449744463373083, '17'], [28.676470588235293, '15'], [9.692007797270955, '14'], [16.31756756756757, '13'], [8.96474358974359, '11'], [10.684397163120567, '10'], [6.653153153153153, '09'], [7.013274336283186, '07'], [7.948339483394834, '03'], [6.696793002915452, '23'], [8.749019607843136, '20'], [7.713298791018998, '16'], [9.190661478599221, '08'], [7.5647840531561465, '00'], [7.94299674267101, '18'], [12.380116959064328, '12'], [9.7119341563786, '04'], [6.782051282051282, '06'], [8.794258373205741, '05']]


[[28.676470588235293, '15'],
 [16.31756756756757, '13'],
 [12.380116959064328, '12'],
 [11.137546468401487, '02'],
 [10.684397163120567, '10'],
 [9.7119341563786, '04'],
 [9.692007797270955, '14'],
 [9.449744463373083, '17'],
 [9.190661478599221, '08'],
 [8.96474358974359, '11'],
 [8.804177545691905, '22'],
 [8.794258373205741, '05'],
 [8.749019607843136, '20'],
 [8.687258687258687, '21'],
 [7.948339483394834, '03'],
 [7.94299674267101, '18'],
 [7.713298791018998, '16'],
 [7.5647840531561465, '00'],
 [7.407801418439717, '01'],
 [7.163043478260869, '19'],
 [7.013274336283186, '07'],
 [6.782051282051282, '06'],
 [6.696793002915452, '23'],
 [6.653153153153153, '09']]

![](https://user-images.githubusercontent.com/50001708/85905684-e877f180-b7c0-11ea-8d8d-d0907894ef9d.png)

#### Top 5 Hours for 'Ask HN' Comments

In [10]:
print("Top 5 Hours for 'Ask HN' Comments")
for avg, hr in sorted_swap[:5]:
    print(
        "{}: {:.2f} average comments per post".format(
            dt.datetime.strptime(hr, "%H").strftime("%H:%M"),avg
        )
    )

Top 5 Hours for 'Ask HN' Comments
15:00: 28.68 average comments per post
13:00: 16.32 average comments per post
12:00: 12.38 average comments per post
02:00: 11.14 average comments per post
10:00: 10.68 average comments per post


### 4. Conclusion

I have briefly analyzed the user engagement of Ask HN, Show HN and other posts from the data available at [Hacker News Posts](https://www.kaggle.com/hacker-news/hacker-news-posts). My analysis is the following:

#### Do Ask HN or Show HN posts receive more comments on average?

Ask HN Posts receive 10.39 comments on average. In comparison, Show HN posts receive 4.89, and other posts receive 6.46 comments on average per post. The data shows that Ask HN posts, receive more comments on average than Show HN and other posts, despite only making up 3% of the total amount of posts on this Hacker News data. 

#### Do posts submitted at certain times receive more comments on average?

In this analysis I only focused on evaluating the amount of comments Ask HN posts receive, given that those posts receive the most amount of comments. The data shows that yes, posts submitted at certain times do receive more comments on average. The top 5 hours that receive the most amount of comments on average per Ask HN post are the following:

|Hour| Average Comments per Post |
|:------------:|:-----------:|
|    3:00 pm      |    28.68  |
|1:00 pm  |   16.32  |
| 12:00 pm         | 12.38         |
| 2:00 am          | 11.14         |
| 10:00 am         | 10.68         |

As a result, I recommend posting an `Ask HN` post during those times on Hacker News to potentially receive the most amount of user engagement, measured by comments. 

Other information that could be analyzed from this data is the time that Show HN and other posts receive the most amount of comments. This information may be useful to users that would like to know when different types of posts may receive more user engagement.