Skip to content

Exploratory data analysis in Python: strings, dates and times, OOP

License

Notifications You must be signed in to change notification settings

gaiaengineer/hacker_news_posts

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Exploring Hacker News Posts

Project Description

In this project, I compare two types of posts from a popular site Hacker News to determine:

  • Which of them receive more comments on average?
  • Do posts created at a certain time receive more comments on average?

The types of posts I'm interested in are Ask HN (created to ask a question to the community) and Show HN (created to show the community a project that you've created).

Data Set

Hacker News is a site started by the startup incubator Y Combinator, where user-submitted stories (known as "posts") receive votes and comments, similar to reddit. Hacker News is extremely popular in technology and startup circles, and posts that make it to the top of the Hacker News listings can get hundreds of thousands of visitors as a result.

The original dataset can be found on Kaggle. For this project, the original dataset was downsampled to this set. The number of rows was reduced from almost 300,000 rows to approximately 20,000 rows by removing all submissions that didn't receive any comments and then randomly sampling from the remaining submissions.

The descriptions of the columns

  • id: the unique identifier from Hacker News for the post
  • title: the title of the post
  • url: the URL that the posts links to, if the post has a URL
  • num_points: the number of points the post acquired, calculated as the total number of upvotes minus the total number of downvotes
  • num_comments: the number of comments on the post
  • author: the username of the person who submitted the post
  • created_at: the date and time of the post's submission

Technologies

  • Python:
    • data analysis: working with strings, OOP (Object-Oriented Programming), working with dates and times
  • Jupyter Notebook

About

Exploratory data analysis in Python: strings, dates and times, OOP

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published