# Exploring Hackers News Posts
In this project, we'll work with a data set of submissions to popular technology site [Hacker News](https://news.ycombinator.com/), where user-submitted stories (known as "posts") are voted and commented upon, similar to reddit. You can find the data set from [Kaggle](https://www.kaggle.com/hacker-news/hacker-news-posts)


We're specifically interested in posts whose titles begin with either Ask HN or Show HN.  Our goal is to compare two types of posts to determine the following:
* Do Ask HN or Show HN receive more comments on average?
* Do posts created at a certain time receive more comments on average?


## 1. Loading Data
We start by reading the "hacker_news.csv" file in as a list of lists, then display the first six rows.

In [1]:
 # read csv file as a list of lists
from csv import reader
read_obj =open('hacker_news.csv') 
csv_reader = reader(read_obj)
hn = list(csv_reader)
hn[:5]

[['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at'],
 ['12224879',
  'Interactive Dynamic Video',
  'http://www.interactivedynamicvideo.com/',
  '386',
  '52',
  'ne0phyte',
  '8/4/2016 11:52'],
 ['10975351',
  'How to Use Open Source and Shut the Fuck Up at the Same Time',
  'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/',
  '39',
  '10',
  'josep2',
  '1/26/2016 19:30'],
 ['11964716',
  "Florida DJs May Face Felony for April Fools' Water Joke",
  'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/',
  '2',
  '1',
  'vezycash',
  '6/23/2016 22:20'],
 ['11919867',
  'Technology ventures: From Idea to Enterprise',
  'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429',
  '3',
  '1',
  'hswarna',
  '6/17/2016 0:01']]

The above data set contains the following columns:

1. id: post id
2. title : title of posts
3. url: post url
4. num_points: number of points on post
5. num_comments: number of comments on post
6. author:  author of post
7. created_at: the date the post was created. 


## 2. Removing Headers from a List of Lists

In order to analyze our data, we need to first remove the row containing the column headers. Let's remove that first row next.

* Extract the first row of data, and assign it to the variable headers.
* Remove the first row from, display headers.
* Display the first five rows of hn to verify that you removed the header row properly.

In [2]:
headers = hn[0]
hn = hn[1:]

print("Display the headers:" )
print(headers)

print("Display the first five rows:" )
print (hn[:5])

Display the headers:
['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']
Display the first five rows:
[['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'], ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01'], ['10301696', 'Note by Note: The Making of Steinway L1037 (2007)', 'http://www.nytimes.com/2007/11/07/movies/07stein.html?_r=0', '8', '2',

## 3.  Extracting Ask HN and Show HN Posts

We will separate posts beginning with Ask HN and Show HN into two different lists next. We'll do the following:

* Create three empty lists called ask_posts, show_posts, and other_posts.
* Loop through each row in hn.
* Get the title at index 1 in each row, assign it to variable title.
* If the lowercase version of title starts with ask hn, append the row to ask_posts.
* Else if the lowercase version of title starts with show hn, append the row to show_posts.
* Else append to other_posts.
* Check the number of posts in ask_posts, show_posts, and other_posts.

In [3]:
ask_posts = []
show_posts = []
other_posts = []

for row in hn:
    title = row[1]
    if title.lower().startswith('ask hn'):
        ask_posts.append(row)
    elif title.lower().startswith('show hn'):
        show_posts.append(row)
    else:
        other_posts.append(row)

print ("number of ask posts: " + str(len(ask_posts)))
print ("number of show posts: " + str(len(show_posts)))        
print ("number of other posts: " + str(len(other_posts)))     


number of ask posts: 1744
number of show posts: 1162
number of other posts: 17194


## 4. Calculating the Average Number of Comments for Ask HN and Show HN Posts
Next, let's determine if ask posts or show posts receive more comments on average.

Fist define a function,
* Set total_ask_comments to 0.
* Use a for loop to iterate over the posts list
  * Get the num_comments element from index 4 in each row.
  * Add this value to total_comments.
* Compute the average number of comments on posts and return the value

Use the above function to get average number of ask post and show post, and print the number out.

In [4]:
def avg_comments (posts):
    total_comments = 0
    for row in posts:
        total_comments += int(row[4])
    avg_comments = total_comments / len(posts)
    return  int(avg_comments)

avg_ask_comments =avg_comments(ask_posts)
print( "Average number of ask comments: " + str(avg_ask_comments))

avg_show_comments =avg_comments(show_posts)
print( "Average number of show comments: " + str(avg_show_comments)) 

Average number of ask comments: 14
Average number of show comments: 10


On average, ask posts receive 4 more comments. Since ask posts are more likely to receive comments, we'll focus our remaining analysis just on these posts


## 5.  Finding the Amount of Ask Posts and Comments by Hour Created

We'll determine if ask posts created at a certain time are more likely to attract comments. We'll use the following steps to perform this analysis:

* Calculate the amount of ask posts created in each hour of the day
* Calculate the number of comments received created in each hour of the day.

In [5]:
import datetime as dt

# Iterate over ask_posts, append to result_list a list with two elements:
#    column created_at and the number of comments of the post.
result_list =[]
for row in ask_posts:
    created_at = row[6]
    num_comments = int(row[4])
    result_list.append([created_at,num_comments])

# Loop through each row of result_list
# calculate the amount of ask posts created and comments per hour
counts_by_hour ={}
comments_by_hour ={}

for row in result_list:
    comment_num = row[1]
    date =  dt.datetime.strptime(row[0],"%m/%d/%Y %H:%M")
    hr = date.strftime("%H") # hr = date.hour
    if hr not in counts_by_hour:
        counts_by_hour[hr] = 1
        comments_by_hour [hr] = int(comment_num)
    else:
        counts_by_hour[hr] += 1
        comments_by_hour[hr] += int(comment_num)

counts_by_hour
# comments_by_hour

{'09': 45,
 '13': 85,
 '10': 59,
 '14': 107,
 '16': 108,
 '23': 68,
 '12': 73,
 '17': 100,
 '15': 116,
 '21': 109,
 '20': 80,
 '02': 58,
 '18': 109,
 '03': 54,
 '05': 46,
 '19': 110,
 '01': 60,
 '22': 71,
 '08': 48,
 '04': 47,
 '00': 55,
 '06': 44,
 '07': 34,
 '11': 58}

## 6. Calculating the Average Number of Comments for Ask HN Posts by Hour

In [6]:
# Calculate average number of comments each posts received by hour
#  avg_by_hour = comments_by_hour / counts_by_hour

avg_by_hour = []
for hr in counts_by_hour:
    avg = int(comments_by_hour[hr]) / int(counts_by_hour[hr])
    avg_by_hour.append([hr,round(avg,2)])

avg_by_hour

[['09', 5.58],
 ['13', 14.74],
 ['10', 13.44],
 ['14', 13.23],
 ['16', 16.8],
 ['23', 7.99],
 ['12', 9.41],
 ['17', 11.46],
 ['15', 38.59],
 ['21', 16.01],
 ['20', 21.52],
 ['02', 23.81],
 ['18', 13.2],
 ['03', 7.8],
 ['05', 10.09],
 ['19', 10.8],
 ['01', 11.38],
 ['22', 6.75],
 ['08', 10.25],
 ['04', 7.17],
 ['00', 8.13],
 ['06', 9.02],
 ['07', 7.85],
 ['11', 11.05]]


## 7. Sorting and Printing Values from a List of Lists

Let's finish by sorting the list of lists and printing the five highest values in a format that's easier to read.

In [7]:
# Swap first and secound elemet of avg_by_hour, and save result to list swap_avg_by_hour
swap_avg_by_hour = []

for row in avg_by_hour:
     swap_avg_by_hour.append([row[1],row[0]])
print (swap_avg_by_hour)        


# sort swap_avg_by_hour in descending order, and save reuslt to list sorted_swap
swap_avg_by_hour.sort(reverse= True)
sorted_swap =swap_avg_by_hour

print ("Top 5 Hours for Ask Posts Comments")
template ="{}:00: {:.2f} average comments per post."
for row in sorted_swap[:5]:
        print (template.format(row[1],row[0]))        

[[5.58, '09'], [14.74, '13'], [13.44, '10'], [13.23, '14'], [16.8, '16'], [7.99, '23'], [9.41, '12'], [11.46, '17'], [38.59, '15'], [16.01, '21'], [21.52, '20'], [23.81, '02'], [13.2, '18'], [7.8, '03'], [10.09, '05'], [10.8, '19'], [11.38, '01'], [6.75, '22'], [10.25, '08'], [7.17, '04'], [8.13, '00'], [9.02, '06'], [7.85, '07'], [11.05, '11']]
Top 5 Hours for Ask Posts Comments
15:00: 38.59 average comments per post.
02:00: 23.81 average comments per post.
20:00: 21.52 average comments per post.
16:00: 16.80 average comments per post.
21:00: 16.01 average comments per post.


Based on above analysis, the top hours that at receives the  most comment are:
15:00, 02;00, 20:00,16:00, 21:00.

## Conclusion 
The best time to create ASK HN post is  3:00 pm - 4:00 pm est to revive the most comments. 