Skip to content
Week 18 Homework Assignment - All the News That's Fit to Scrape
JavaScript HTML CSS
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
mdimg
models
public/assets
router
scraper
views
.gitignore
LICENSE.md
README.md
mongo-config.json
package.json
server.js

README.md

news-scraper - Week of 18 Homework Node, Express, MongoDB, Mongoose, Cheerio, and Handlebars

Table of Contents

Overview

A "scraper" application that requests a page from a predetermined "news" site and then extracts (scrapes) desired content. The extracted content is saved in a MongoDB using Mongoose. This application can also list previously extracted "issues", display them, and allow the user to create or delete comments for a specific "item".

Usage

This application has been deployed to Heroku as per the assignment, but it can also be ran locally.

Running from Heroku

At the time when this assignment was submitted the application has been deployed to Heroku. Its URL is -

Heroku link is provided in the homework submission

NOTE: The :3000 port selection is not necessary, there is a PORT environment variable that contains the port number to be used.

Current Heroku Issues

Application Crash

As of 2017-02-24 there have been some problems with running the application on Heroku. It takes multiple attempts over time to get the application to run. From what I can decipher from the Heroku log it appears that the problem is caused by a connection problem with the Mongo database hosted on mLab.

Misrepresentation of Time

I've noticed that on Heroku that the time stamps that are created for the Mongo documents are off by 6 hours. Code is in place that should convert an ISO-8601 time string to a local one. When the application is ran locally the time stamps are correct. It should be noted that the problem might be caused by mLab where it may have a system time that is off.

URLs

The following URLs are recognized by the server and will serve pages -

  • https://deployed-server/ - displays the index page
  • https://deployed-server/index - displays the index page

No other paths are intended for direct access via the browser.

Navigation

Within the navigation bar the following items are clickable -

Navigation Bar

Screenshots

Index Page

Index Page

News Item List Page

This page is reached by choosing an issue on the index page and clicking "List Scraped Items".

Issue Item List Page

Item View Page

Clicking on the title of a news item will render that item. Please note that not all news items have images.

Item View Page

Comments

While viewing an item comments can be either created, or deleted. In this application comment deletion has been implemented as a deleted flag in the comment document. This was done in planning for future development where as an exercise comments could be restored.

Create or Delete Comments

Heroku Deployment

Deployment to Heroku for the most part is straight forward and easy to accomplish. However, several key steps should be noted :

  • For this application it is necessary to set up a database on Heroku and modify the config/config.json file to use the correct credentials. MongoDB is the database chosen for this application.

  • Edit your package.json file (after it's been created with npm init) so that it contains -

    "engines": {
       "node": "6.9.4"
     }

Where : "node": "6.9.4" indicates the version of Node that you want your application use. This application is using the same version that was used locally during development.

  • The listening port requires some special consideration. Although it is configured, that value will not be used when running on Heroku. The following links provided useful information -

Provided the necessary information for editing the packages.json file :

https://devcenter.heroku.com/articles/deploying-nodejs

The top answer provided the details needed for managing the port number :

http://stackoverflow.com/questions/31092538/heroku-node-js-error-r10-boot-timeout-web-process-failed-to-bind-to-port-w

  • The deployment steps I used are -
  1. heroku login
  2. heroku create
  3. git push heroku master

Heroku is now ready to serve the application. After the initial deployment and subsequent file modifications, and after committing and pushing the changes to Git then only step 3 is required.

Don't forget to heroku logout when done!

Errors on Heroku

Heroku logs the output from the server application. And it can be viewed from the Heroku dashboard. This log is useful when troubleshooting issues on Heroku.

Other Heroku Behavior

When a node server application is deployed on Heroku several things will happen -

  1. Heroku will start the application as specified in the package.json file.
  2. When server application runs Heroku will assign a port to it, so it's necessary for the server application to read the PORT environment variable.
  3. If the application is idle for a period of time Heroku will kill the process. Then upon the next connection it will start the application again. The only visible side effect is that it will take a little longer to load a page on the first time after Heroku has killed the process.

Designing the Pages

For this application I took a different approach than usual when it comes to designing the application's page, content and controls layouts. I used the Pencil drawing application to create pseudo wire-frames. A wire-frame was created for each of the pages, and indicated things like -

  • Bootstrap row boundaries.
  • GET/POST paths and post data.
  • Handlebars data object key names and types.
  • On-page content locations and types.

The wire-frames were a good place to start and keep track of the paths and data. And I found myself referring back to them and altering them as needed. However, they don't precisely represent what I ended up with. But I expected that.

Wire Frame Examples

Wireframe

Wireframe

Wireframe

My Toolbox

For this assignment I used the following development tools running on Windows -

  • NodeJS - for running the applications.
  • Visual Studio Code - for debugging code running in Node and secondary editor.
  • Notepad++ - Primary editor.
  • Postman - a GUI API development tool.
  • Gitkraken - a GUI Git tool.
  • Markdown Edit - a WYSIWYG tool for editing markdown files.
  • Astrogrep - a GUI grep program.
  • Evolus Pencil - for creating the diagrams used in this document.
  • Paint-DOT-Net - used for editing screen captures used in this document.

Heroku Log Captures

Capture 1

2017-02-25T07:44:06.833304+00:00 heroku[router]: at=info method=GET path="/assets/img/scraper-yellow.png" host=shrouded-reaches-39610.herokuapp.com request_id=4dc9450e-003e-4b76-8c53-fe752ad29595 fwd="73.210.229.7" dyno=web.1 connect=1ms service=3ms status=200 bytes=4007

2017-02-25T07:44:07.259148+00:00 heroku[router]: at=info method=GET path="/favicon.ico" host=shrouded-reaches-39610.herokuapp.com request_id=9d148e88-a794-4068-83bb-22d4658346f5 fwd="73.210.229.7" dyno=web.1 connect=1ms service=3ms status=204 bytes=253

2017-02-25T07:44:07.255142+00:00 app[web.1]: standard.js - favicon.ico request, responding with 204

2017-02-25T07:44:12.006545+00:00 app[web.1]: Mongoose Error:  Error: connection timeout

2017-02-25T07:44:12.006556+00:00 app[web.1]:     at Db.<anonymous> (/app/node_modules/mongoose/lib/drivers/node-mongodb-native/connection.js:169:17)

2017-02-25T07:44:12.006557+00:00 app[web.1]:     at emitTwo (events.js:106:13)

2017-02-25T07:44:12.006558+00:00 app[web.1]:     at Db.emit (events.js:191:7)

2017-02-25T07:44:12.006558+00:00 app[web.1]:     at Server.listener (/app/node_modules/mongodb/lib/db.js:1798:14)

2017-02-25T07:44:12.006559+00:00 app[web.1]:     at emitOne (events.js:96:13)

2017-02-25T07:44:12.006560+00:00 app[web.1]:     at Server.emit (events.js:188:7)

2017-02-25T07:44:12.006560+00:00 app[web.1]:     at Server.<anonymous> (/app/node_modules/mongodb/lib/server.js:274:14)

2017-02-25T07:44:12.006561+00:00 app[web.1]:     at emitOne (events.js:96:13)

2017-02-25T07:44:12.006564+00:00 app[web.1]:     at Server.emit (events.js:188:7)

2017-02-25T07:44:12.006565+00:00 app[web.1]:     at Pool.<anonymous> (/app/node_modules/mongodb-core/lib/topologies/server.js:335:12)

2017-02-25T07:44:12.006565+00:00 app[web.1]:     at emitOne (events.js:96:13)

2017-02-25T07:44:12.006566+00:00 app[web.1]:     at Pool.emit (events.js:188:7)

2017-02-25T07:44:12.006566+00:00 app[web.1]:     at Connection.<anonymous> (/app/node_modules/mongodb-core/lib/connection/pool.js:270:12)

2017-02-25T07:44:12.006567+00:00 app[web.1]:     at Connection.g (events.js:291:16)

2017-02-25T07:44:12.006568+00:00 app[web.1]:     at emitTwo (events.js:106:13)

2017-02-25T07:44:12.006569+00:00 app[web.1]:     at Connection.emit (events.js:191:7)

2017-02-25T07:44:12.007769+00:00 app[web.1]: /app/models/index.js:41

2017-02-25T07:44:12.007770+00:00 app[web.1]:     throw error;

2017-02-25T07:44:12.007771+00:00 app[web.1]:     ^

2017-02-25T07:44:12.007771+00:00 app[web.1]: 

2017-02-25T07:44:12.007772+00:00 app[web.1]: Error: connection timeout

2017-02-25T07:44:12.007773+00:00 app[web.1]:     at Db.<anonymous> (/app/node_modules/mongoose/lib/drivers/node-mongodb-native/connection.js:169:17)

2017-02-25T07:44:12.007773+00:00 app[web.1]:     at emitTwo (events.js:106:13)

2017-02-25T07:44:12.007774+00:00 app[web.1]:     at Db.emit (events.js:191:7)

2017-02-25T07:44:12.007775+00:00 app[web.1]:     at Server.listener (/app/node_modules/mongodb/lib/db.js:1798:14)

2017-02-25T07:44:12.007775+00:00 app[web.1]:     at emitOne (events.js:96:13)

2017-02-25T07:44:12.007776+00:00 app[web.1]:     at Server.emit (events.js:188:7)

2017-02-25T07:44:12.007776+00:00 app[web.1]:     at Server.<anonymous> (/app/node_modules/mongodb/lib/server.js:274:14)

2017-02-25T07:44:12.007777+00:00 app[web.1]:     at emitOne (events.js:96:13)

2017-02-25T07:44:12.007778+00:00 app[web.1]:     at Server.emit (events.js:188:7)

2017-02-25T07:44:12.007778+00:00 app[web.1]:     at Pool.<anonymous> (/app/node_modules/mongodb-core/lib/topologies/server.js:335:12)

2017-02-25T07:44:12.007779+00:00 app[web.1]:     at emitOne (events.js:96:13)

2017-02-25T07:44:12.007780+00:00 app[web.1]:     at Pool.emit (events.js:188:7)

2017-02-25T07:44:12.007780+00:00 app[web.1]:     at Connection.<anonymous> (/app/node_modules/mongodb-core/lib/connection/pool.js:270:12)

2017-02-25T07:44:12.007781+00:00 app[web.1]:     at Connection.g (events.js:291:16)

2017-02-25T07:44:12.007781+00:00 app[web.1]:     at emitTwo (events.js:106:13)

2017-02-25T07:44:12.007782+00:00 app[web.1]:     at Connection.emit (events.js:191:7)

2017-02-25T07:44:12.021387+00:00 app[web.1]: 

2017-02-25T07:44:12.032307+00:00 app[web.1]: npm ERR! Linux 3.13.0-105-generic

2017-02-25T07:44:12.032610+00:00 app[web.1]: npm ERR! argv "/app/.heroku/node/bin/node" "/app/.heroku/node/bin/npm" "start"

2017-02-25T07:44:12.032852+00:00 app[web.1]: npm ERR! node v6.9.4

2017-02-25T07:44:12.033044+00:00 app[web.1]: npm ERR! npm  v3.10.10

2017-02-25T07:44:12.033251+00:00 app[web.1]: npm ERR! code ELIFECYCLE

2017-02-25T07:44:12.033815+00:00 app[web.1]: npm ERR! news-scraper@0.1.0 start: `node server.js`

2017-02-25T07:44:12.034123+00:00 app[web.1]: npm ERR! Exit status 1

2017-02-25T07:44:12.034322+00:00 app[web.1]: npm ERR! 

2017-02-25T07:44:12.034460+00:00 app[web.1]: npm ERR! Failed at the news-scraper@0.1.0 start script 'node server.js'.

2017-02-25T07:44:12.034589+00:00 app[web.1]: npm ERR! Make sure you have the latest version of node.js and npm installed.

2017-02-25T07:44:12.034721+00:00 app[web.1]: npm ERR! If you do, this is most likely a problem with the news-scraper package,

2017-02-25T07:44:12.034869+00:00 app[web.1]: npm ERR! not with npm itself.

2017-02-25T07:44:12.035145+00:00 app[web.1]: npm ERR! Tell the author that this fails on your system:

2017-02-25T07:44:12.035828+00:00 app[web.1]: npm ERR!     node server.js

2017-02-25T07:44:12.036033+00:00 app[web.1]: npm ERR! You can get information on how to open an issue for this project with:

2017-02-25T07:44:12.036333+00:00 app[web.1]: npm ERR!     npm bugs news-scraper

2017-02-25T07:44:12.036483+00:00 app[web.1]: npm ERR! Or if that isn't available, you can get their info via:

2017-02-25T07:44:12.036859+00:00 app[web.1]: npm ERR!     npm owner ls news-scraper

2017-02-25T07:44:12.037000+00:00 app[web.1]: npm ERR! There is likely additional logging output above.

2017-02-25T07:44:12.061453+00:00 app[web.1]: 

2017-02-25T07:44:12.061720+00:00 app[web.1]: npm ERR! Please include the following file with any support request:

2017-02-25T07:44:12.061844+00:00 app[web.1]: npm ERR!     /app/npm-debug.log

2017-02-25T07:44:12.196541+00:00 heroku[web.1]: State changed from up to crashed

2017-02-25T07:44:12.197616+00:00 heroku[web.1]: State changed from crashed to starting

2017-02-25T07:44:12.167348+00:00 heroku[web.1]: Process exited with status 1

2017-02-25T07:44:14.784966+00:00 heroku[web.1]: Starting process with command `npm start`

2017-02-25T07:44:18.678996+00:00 app[web.1]: 

2017-02-25T07:44:18.679029+00:00 app[web.1]: > news-scraper@0.1.0 start /app

2017-02-25T07:44:18.679030+00:00 app[web.1]: > node server.js

2017-02-25T07:44:18.679031+00:00 app[web.1]: 

2017-02-25T07:44:20.397676+00:00 app[web.1]: connection success

2017-02-25T07:44:20.867372+00:00 heroku[web.1]: State changed from starting to up

2017-02-25T07:44:20.802022+00:00 app[web.1]: Server - listening on port 3664

2017-02-25T07:44:20.802071+00:00 app[web.1]: Server - IDLE - waiting for the first connection

2017-02-25T07:44:20.802127+00:00 app[web.1]: ================================================
You can’t perform that action at this time.