Skip to content

A blogger-driven search engine for programming content

Notifications You must be signed in to change notification settings

Excloudx6/thesis-server

 
 

Repository files navigation

BlogRank

BlogRank is your go-to source for discovering programming knowledge from bloggers. Independent bloggers power the world of development, often-times disseminating the best new ideas, practices, and even novel language contructs that the creators of a technology didn't even think of. The core idea of BlogRank is that the articles you want to see most are the ones vetted by other independent bloggers. The more highly cited a blog post is by other authors, the more highly we rank it. You can also search by author who are ranked according to their h-index, as inspired by the world of academia.

Our search engine is powered by the graph data structure you see visualized in the background, and a web crawler and indexing service running behind the scenes. The project was made with React, Node.js, and postgreSQL.

Team

  • Product Owner: Amir Bandeali
  • Scrum Master: Nick Olszowy
  • Development Team Members: Amir Bandeali, Nick Olszowy, Pete Herbert

Table of Contents

  1. Usage
  2. Requirements
  3. Development
    1. Installing Dependencies
    2. Server-side
    3. Client-side
    4. Worker Service and Index Service
    5. Roadmap
  4. Contributing
  5. API

Usage

Just enter your search term or phrase and see what we give you back! The app shows you relevant information when you click on a search result, mainly the blog posts who have cited the result you are on. This is useful knowledge, because while you are visiting a website you can never see those who have linked to it, only links from the page outwards. Having this information is very useful to guide your search in finding relevant and well-written information.

Requirements

  • Node (v6.6^)
  • PostgreSQL (v9.5^)

Development

In order to start developing there are several steps to take. First, you should have a local postgreSQL up and running with a database named testgraph. See this page to get started with postgreSQL locally. From there, you'll want to use brew or another package manager to get the grunt command line interface and the mocha command line interface if you don't already.

Once you have a working postgres server up, move on to installing dependencies.

Installing Dependencies

Clone the client repo into the top level of the server repo. From within the server directory, and then again within the client directory:

npm install

That's it!

Server-side

To develop server-side, within your server-side directory run npm run start:dev which will intialize nodemon to start the server connected to your local postgres DB and watch the files for changes. Postman is a very useful app for testing routes.

Client-side

ES6 syntax and JSX notation must be transpiled to vanilla javascript. To achieve this we are using Babel within grunt to transpile and browserify files into the compiled directory. To transpile files just once, run grunt in the terminal within the client directory. To watch the files for changes and transpile on all changes run grunt watch.

Worker Service and Index Service

This is where the magic happens. These files are accessed by CRON jobs on the deployed server, and as such export their functionality. If you want to test the services on your local DB, go into the top level file for each (startCrawlers.js and main.js, respectively) and uncomment the code at the bottom that initializes the main function. Then you will be able to use the node terminal command to test either one. One thing to note, before the crawler will work you must load in the whitelist to the database with the loadWL.js file. Finally, the worker must find posts and the index service must populate the query lists before your local client will be able to search for anything.

The crawler has a few arguments that can be provided to it. If your crawler ever crashes or you command-C on the terminal, the crawlers current queue qill be written to a JSON file. You can restart the crawler with this queue using the --continue argument. Additionally, if you wish to interactively add to the whitelist implemented by your local crawler, run the crawler with the argument --add. This will give you a prompt driven system where random sites are chosen and presented to you. You have four options at any given link:

  • y to add the site to the whitelist
  • n to not add it and move on to the next link in the queue
  • a to add this base url to the list of urls you know are not blogs and can be filtered out automatically
  • e to exit interactive mode and continue crawling with the new links you have amassed

Roadmap

FOR PETE View the project roadmap here

Contributing

See CONTRIBUTING.md for contribution guidelines.

API

###Search Posts or Authors###

Fetches an array of posts or authors that best match the provided tags and page number. Also returns a count of the total number of matching posts or authors in our database.

  • URL

    /api/posts or /api/authors

  • Method:

    GET

  • URL Params

    Required:

    tags=[string] example: tags=["javascript", "node"]

    Optional:

    page=number default=1

  • Post Response:

    • Code: 200
      Content:
    {
      "results": [
        {
          "url": ...,
          "postId": ...,
          "inLinks": [
            ...
          ],
          "title": ...,
          "oldTags": [
            ...
          ],
          "author": ...,
          "publishDate": ...,
          "rank": ...,
          "createdAt": ...,
          "updatedAt": ...
        },
        ...
      ],
      "count": ...
    }

  • Author Response:

    • Code: 200
      Content:
    {
      "results": [
        {
          "id": ...,
          "name": ...,
          "hIndex": ...,
          "createdAt": ...,
          "updatedAt": ...,
          "posts": [
            ...
          ]
        },
        ...
      ],
      "count": ...
    }

  • Sample Call:
  $.ajax({
    url: blogrank.io/api/posts?tags=["javascript"]&page=2,
    method: 'GET',
    success: data => console.log(data);
  });

###Fetch Post Inlinks###

Fetches an array of posts that link to the post with the sent in Id.

  • URL

    /api/posts/:id

  • Method:

    GET

  • URL Params

    Required:

    id=number

  • Response:

    • Code: 200
      Content:
      [
        {
          "url": ...,
          "postId": ...,
          "inLinks": [
            ...
          ],
          "title": ...,
          "oldTags": [
            ...
          ],
          "author": ...,
          "publishDate": ...,
          "rank": ...,
          "createdAt": ...,
          "updatedAt": ...
        },
        ...
      ]

  • Sample Call:
  $.ajax({
    url: blogrank.io/api/posts/22882,
    method: 'GET',
    success: data => console.log(data);
  });

###Get Database Statistics###

Returns a JSON object with database statistics.

  • URL

    /api/stats

  • Method:

    GET

  • Response:

    • Code: 200
      Content:
      {
        "posts": ...,
        "connected": ...,
        "authors": ...
      }

  • Sample Call:
  $.ajax({
    url: blogrank.io/api/stats,
    method: 'GET',
    success: data => console.log(data);
  });

About

A blogger-driven search engine for programming content

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • JavaScript 100.0%