Skip to content

A dashboard showing data on boardgames collected from boardgamegeek.com

Notifications You must be signed in to change notification settings

robinzigmond/bgg_data_dashboard

Repository files navigation

Board game data dashboard

Overview

This is a dashboard displaying data on boardgames, which has been gather from the BoardGameGeek website (henceforth referred to as BGG - for those who are unaware of it, you can think of it as being for modern board games what IMDB is for movies), using BGG's XML API.

It is deployed on Heroku at https://boardgame-dashboard.herokuapp.com.

This project will be my submitted project for Stream 2 (Back End Development) of the Code Institute's Full Stack Developer course. The brief was to create a data dashboard by uploading data to a backend database and then delivering it to the frontend in JSON format, using Javascript libraries to make interactive charts. (NB this is a paraphrase, not an "official" statement.)

Technologies/libraries used

Back End

The backend code is written in Python, and uses (very simple features of) the Flask framework.

The data is stored in a MongoDB collection.

The Python libraries used are listed in the requirements.txt file. Other than Flask and its dependencies, and pymongo for the database handling, I have used the following 3rd-party libraries:

  • The boardgamegeek2 module is used to access the BGG API.
  • BeautifulSoup is used in conjunction with the lxml HTML parser to gather some information from the main BGG site in order to select the right games to query the API about.
  • gunicorn is necessary for deployment to Heroku, and python-rq (together with its dependencies) is necessary to create the job queue which allows the data-updating script to run when scheduled.

Front End

The following Javascript and CSS libraries have been used for this project:

  • dc.js is the main library used to display the charts and other interactive dashboard elements, as well as to filter the data based on the user's selections within those charts. Unfortunately, it seems that dc.js is not maintained in a very orderly way, with features being removed as often as they are added. As a result, I have not used any one version of the code. I started with v2.0.2 (the latest release at the time of writing), which crucially defines the beginSlice and endSlice methods for the dataTable class - but when I discovered that the code for the selectMenu class was unaccountably missing from this (having been in earlier versions which I had been using in earlier work on this project), I simply copied the code from http://dc-js.github.io/dc.js/docs/html/select-menu.js.html and pasted it into the 2.0.2 code!
  • d3.js is used by dc.js in order to actually draw the charts. Version 3.5.3 has been used here - note that dc.js has still not been updated to work with v4 of d3.js. I have made no direct use of d3, since most of that is handled internally by dc, although very occasionally a d3 method or object has to be specified to pass to one of dc's methods.
  • crossfilter.js is also used internally by dc.js in order to quickly "slice up" the data according to whatever properties are specified, and to put on and remove filters in response to the user's actions. Unlike with d3, a reasonable knowledge of crossfilter is required when using dc.js, to know how to specify "dimensions" and "groups" to get the charts drawn correctly. Note that I have used crossfilter v1.4.0 - this is the first version of crossfilter to support dimensions whose values are arrays. This feature was absolutely necessary for my row charts for mechanics, categories, designers and publishers.
  • dc.css is the companion css file for dc.js, and is required in order for the charts to display correctly
  • intro.js was used to construct the dashboard tutorial (seen by clicing the "show me how this works!" button at the top of the screen). introjs.css is required in addition, and I also included introjs-dark.css as my choice of intro theme from the options available.
  • Finally, I have used icons from Font Awesome on the table pagination buttons - and of course a couple of fonts from Google Fonts .

Notes on my own code

I have chosen not to use any libraries other than the ones listed above - and in particular, I have not used Bootstrap because I wanted to have a go at producing a responsive layout on my own, without relying on Bootstrap's grid system. (It was also good practice to style things like the navbar and the buttons myself, rather than using one of bootstrap's ready-made classes.) Although there are a few minor issues remaining (mostly on mobile Safari), I believe that I have largely achieved this, by relying on flexbox to organise the layout. I have also used one further "trick", which was suggested by a fellow Code Institute student, of including 2 separate versions of the year chart - one a bar chart, and the other a row chart - and using a media query to only display whichever one is appropriate for the user's device. The bar chart is what I consider the "primary" version, but it is too wide for phone screens (and for many tablets in portrait mode), often resulting not just in scrolling but in the left edge of the chart actually getting cut off. So it is "replaced" (on the screen) by a row chart version, which is much taller but narrow enough to fit on the smallest screens in common use.

The main dashboard code is in the dashboard.js file. Although much of this is simply passing parameters into the ready-made objects and methods of crossfilter.js and dc.js, there were a few issues which I had to solve which were specific to this project. One was that I quickly decided that I wanted the table display of games selected to have both full pagination and be able to be re-ordered just by clicking the appropriate column. dc.js does not give an easy interface to implement pagination - but it does at least make it possible through the beginSlice and endSlice methods. But there is still a fair bit of Javascript code which needs to be explicitly written to make it work - this is shown in the example at https://github.com/dc-js/dc.js/blob/master/web/examples/table-pagination.html, and I took this code as a starting point, while extending it to include the option to go directly to the first or last page, as well as making it interact nicely with the other dashboard elements. The code for the re-ordering ended up being quite extensive, even though it is straightforward in principle, especially as the table is redrawn with each click so it was necessary to add the necessary event listeners, and CSS classes, every time that such interactions occur.

One problem that came up repeatedly when testing the table ordering was that fields that should have been ordered as numbers were in fact being treated as strings (so that 999 was being considered bigger than 1000, and so on). For a long time I had no idea where this was coming from, since I had explicitly converted all the values to numbers - before realising that it was down to d3.nest (which is called by dc.js when making the table) automatically converting all values to strings. I solved this by modifying the ordering functions in a very basic way (converting the values, after they had been through d3.nest, back to numbers passing to d3.ascending or d3.descending).

I have also added "reset buttons" for each of the row charts, which are needed because otherwise it is easily possible to "lose" one's selection and have no way to undo it short of refreshing the page (and thus resetting everything, after a wait of several seconds). This of course is only an issue because I am displaying just the 10 most popular selections for each chart - this in turn was necessary because the total number of possibilities ranges from a few dozen to several hundred, making having permanent rows for each option completely unworkable.

Finally, the "trick" I mentioned above of having 2 separate year charts and only displaying one, depending on the user's screen size, had an unexpected consequences. When viewing on a tablet (of the appropriate dimensions - I first observed this on my wife's iPad Mini), rotating the screen from portrait to landscape or vice versa changes which of the charts is displayed. This should not be a problem - but it was, because the selections were not carried over from one chart to the other, which would confuse the user who expects to see the "same" chart in a different orientation. (The same effect happens on a desktop or laptop if the user decides to resize the window.) The solution to this was again quite simple in the end (keep track of which of the 2 charts the user has most recently interacted with, then when the user resizes or rotates, pass the filters from that chart to the other one and redraw it), but it was a problem that caused me to go down several blind alleys before arriving at that solution.

As far as the backend of the project is concerned, I did have a few issues with getting the data from the BGG API. Although I eventually discovered the source of the problem myself, readers might want to check this BGG thread in which I described my problem to others, because I did receive some helpful responses, and it is a good insight into the rather peculiar nature of this API and how others deal with it.

Testing

I have done no formal unit-testing of the Javascript code - mainly because I find it hard to imagine what kind of formal tests would help for a project which is to a large extent about the user interface, and in which most of the "calculation" parts of the code are already handled by libraries (dc.js, d3.js and crossfilter.js).

But I have extensively tested the functionality by the simple method of trying things out (which was how I noticed each of the issues mentioned above) - and I have also tested the page across several devices and web browsers, mainly to check that the layout was displayed correctly. On my laptop I have tested the site in Firefox, Google Chrome and Microsoft Edge, and found it fine in all of them. I have not been able to do as much mobile testing as I would have liked - I have done so on my wife's iPad Mini (running Safari 9) and iPhone 6 (Safari 10). The display on the iPad is still not as good as I would like (in landscape mode, the very left edge of the year chart is cut off the screen, in a way which is noticeable even though it doesn't affect the functionality too badly) - but it appears to now work fine on the iPhone, after I altered the flex-shrink and flex-basis values to ensure that Safari did not try to make the 2 main "columns" (one containing the bar/row charts, the other containing the pie chart and table) overlap.

I would love to test the site properly on an Android phone and tablet. Unfortunately, although I have an Android phone (Sony Experia), it is very hold and is running Android 4, which did not implement flexbox properly at all (something which I read was soon fixed in subsequent Android versions). So I can draw no conclusions from the fact that the display simply doesn't work at all here, and do not want to expend what would be a considerable amount of effort to fix this, when extremely few users will have a mobile operating system/browser which is quite so out of date!

[Update December 2017 - since writing this I have acquired a new Android phone, running Android 7. This actually came with Chrome installed and not the Android browser - and I am happy to report that the layout works fine, apart from a slightly-too-wide main screen which makes the navigation links have to be scrolled to, and some of the introJS popups having their left edge slightly shaved off by the edge of the screen. These flaws are certainly annoying and I would like to fix them in time, but they do not affect the main use of the website.]

Deployment

Basic deployment of the dashboard to Heroku was straightforward, using the mLab add-on (free "sandbox" tier) as the remote MongoDB instance.

It was less straightforward to get the data-fetching script (upload_bgg_data.py) to run every 24 hours as intended. I could of course simply run it from my own PC whenever wanted - which is what I did to get the initial data when I first deployed to Heroku - but I would prefer to have it be a regular event, both to make sure that it happens even if I am unavailable (or my PC has problems), and so that users can rely with some confidence on the update happening at a certain time. For this reason I also added the (free) Heroku scheduler to the app.

But my upload script typically runs for around 1 hour on my own PC - due mostly to the need to pause between API calls in order to avoid them being "throttled" by BGG (I found by experimentation that 10 seconds works fine, but that shorter waits do not). And Heroku's documentation specifies that the scheduler should not be used for tasks which run for more than few minutes. Although I tried it anyway, I found that the task "idled" and shut down after just a few minutes. So I realised that I would need to use a "worker" dyno to perform this task, and only use the scheduler to added a job to the worker's "queue" - and I have implemented this, following more or less exactly the example given at https://devcenter.heroku.com/articles/python-rq. (Note that this also necessitated another free addon: RedisToGo.)

Note that, although the script to upload the data runs once per day (currently from 7pm UK time), it is not guaranteed to run successfully each day. The most common problem is that the API becomes unresponsive for long periods - as of December 2017 I am now allowing 4 hours for the script to run, but typically this does not allow for more than 2 failed API calls (out of 100 required) because it is usually about an hour before the script is allowed access again. And not infrequently there can indeed be more than 2 failures encountered. And even if the API is behaving fine, Heroku shuts down dynos briefly once per day, at a time which is difficult to predict - if this happens when the process is running, as I have seen a few times, there is no way to save it (short of manually restarting via the Heroku scheduler).

Unfortunately, even this didn't allow the process to run smoothly - because if using free dynos on Heroku, the web dyno (which actually serves the webpage, via gunicorn in the case of a Python app as here) is set to "sleep" after half an hour with no requests. And, despite some (older) pages of the Heroku docmentation implying otherwise, worker dynos also go to sleep when the same app's web dyno does. This of course means that my worker dyno will not be able to carry out its task, unless I can ensure there is web traffic to the site at the same time. Although it would be possible to do this in an automated way, I decided that I may as well pay the relatively modest (although certainly not trivial) $14/month for 2 "hobby" dynos (1 web and 1 worker), because this will also ensure much better performance in the event that my site gets a reasonable amount of web traffic. Which I certainly hope it will :) And I can always switch back to free dynos if it becomes apparent that there is no significant gain to the upgrade.

About

A dashboard showing data on boardgames collected from boardgamegeek.com

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published