SYRAS: A Systematic Review Assistant
This project is in LATE ALPHA
What is this project?
The aim of this project is to provide a set of tools to help undertake scientific systematic reviews more easily and quickly so they are more likely to be performed at all.
Some scientist such as Ben Goldacre believe systematic reviews are one of the most important steps forward for progressing science itself. "Systematic reviews are one of the great ideas of modern thought. They should be celebrated." [Bad Science p99]. Academic organisations such as The Cochrane Collaboration already provide rules and guidance, services and tools, but due to gaps in this support during the lengthy review process they are still perceived to be so difficult or laborious that they are not performed as often as they should.
There are commercial offerings such as Covidence and Zotero which offer a well-established range of functionality, some specialised to particular fields. While these are certainly powerful tools, commercial products are sometimes challenging to acquire during research projects.
- to provide free and open software to the science community
- to develop intelligent assistants to automate the laborious aspects of collation and screening
- to establish a community of open-source developers to broaden the creativity and support base
What are Systematic Reviews?
"Systematic reviews are a type of literature review that collects and critically analyzes multiple research studies or papers, using methods that are selected before one or more research questions are formulated, and then finding and analyzing studies that relate to and answer those questions in a structured methodology. They are designed to provide a complete, exhaustive summary of current literature relevant to a research question. Systematic reviews of randomized controlled trials are key in the practice of evidence-based medicine, and a review of existing studies is often quicker and cheaper than embarking on a new study." https://en.wikipedia.org/wiki/Systematic_review
Systematic reviews can provide the data needed for a meta-analysis, or they can be used as a preparatory stage of any research project to assess the current state of a specific scientific topic.
A typical scenario might be summarised as follows:
- Journal database article search (sourcing articles by keyword, reference/citations)
- Systematic Review Process (filtering thousands down to dozens)
- Data extraction (of experimental methods, results/statistics)
- Meta-analysis and/or further research.
We are focussing on step 2, which itself has a fairly complex set of stages. As mentioned above Cochrane provide good tools and services for steps 1 and 4 but only guidance on how to do 2 and 3, which is up to each researcher to perform.
Step 1 is to implement a basic web application which can perform a basic review, including article data import, screening process, collaboration and result data export. The application must have a "plugin" architecture to enable future additions.
Step 2 is to research and develop potential solutions to the perceived roadblocks. For example could screening 5,000 documents be assisted by a natural-language AI? Or could the initial citation/reference searches be improved?
Step 3 is to widen out the project to collaboration by international, academic developers who will have their own ideas and challenges.
The project was originally initiated around 2016 by Dr Nic Badcock and team (Dept. Cognitive Science, Macquarie University, Sydney, Australia). While performing many systematic reviews, they developed their own software in R and Matlab. The R libraries help to automate the complex task of ingesting and processing articles exported from various online journal database archives. The Matlab GUI allows researchers to screen thousands of articles, keeping track of ratings and comments, while keeping the researcher focused and productive. This first version has been used successfully in several collaborative projects.
In early 2017 Pip Jones (programmer, scipilot.org) teamed up with Nic with the idea of adding an AI assistant to the screening process. After considering the scalability of installing and supporting the Matlab-based GUI and limitations of Matlab itself for this purpose, it was decided to first build a web-based GUI which was compatible with the existing R libraries and data-file formats. The MatLab GUI's information-architecture was duplicated into an Express JS prototype application with a wireframe front-end.
In 2017 a small research fund was granted by Macquarie University to help kickstart the web project.
An initial machine learning (ML) assistant has been added to the web-based GUI after extensive evaluation of various classification techniques. This is implemented as a separate REST API backed by a Python application which utilises the SciKit-Learn libraries. The API provides classification and search services using fairly standard vectorisation (word-embedding) models, term frequency (TF-IDF), principle component analysis for dimensionality reduction (PCA/LSI/LSA) and nearest-neighbour (KNN) correlation. The REST API could be used independently from the web application.
While the initial ML algorithm in place (in the beta version) does have a little special-sauce in the ranking algorithm, it predominately using very basic, standard techniques. This is because after a long evaluation study, I found there not much improvement in classification accuracy (F-Score) in this specific problem domain. I felt it was best left working as simple and performant as possible and without some of the quirks exhibited by the more complex algorithms.
There is huge and exciting scope for improvement in the assistant concept, but it’s not necessarily in the classification and search accuracy of the algorithms. Buy me a beer if you want to know more.
Last updated 06/2018
Currently the web application is in alpha at v0.8 heading for v0-alpha1 where it should be ready for a small research project to use it in a low-commitment trial.
An initial roadmap has been established.
V0.8 which was the major addition of a Machine Learning component to assist with pre-classifying documents in the screening presentation workflow. A body of research was carried out to prove and select the underlying classification and search algorithms. These have been connected to the main front-end and are in trial in v0.8.
v0.9 Added article import formats from RIS, Endnote, PubMed, thanks to the similar Bond Uni project.
v0.10 improved the user messages from background processes via real-time push notifications.
V0.11 added basic document de-duping during the import, plus some cosmetic tweaks.
V0.12 saw the addition of GDPR-esque data controls, import/export improvements.
V0.13 sees the addition of "Article Sets" to provide project workflows, and a nicer "end of screening" experience.
See the section below on Assistants.
Express JS Prototype
- Express JS (4.15)
- Node.js (v8, ES6)
- MongoDB (3.6)
- Mongoose (4.10)
- Zurb Foundation (6.4)
- Gulp, NPM
- JQuery (minimal)
- Supervisor (dev)
I intend to migrate from Zurb to Semantic UI, possibly Vue.js and Vuetify.
Note: see the Docker repository for general information on the environment.
cd express npm install cd src npm install
Add this to your dev shell (profile or IDE env):
export NODE_ENV=development export SYRAS_SECRETS_PATH="APPPATH/etc/config/dev_syras.bcup" export SYRAS_SECRETS_PASSWORD="DevSecretIsSecret"
Run both the back-end app and front-end build with:
Visit the app at https://0.0.0.0:3443/
To avoid local certificate warnings: use the built-in local SSL certificate, and add the host name syras.local in your etc/hosts file. Add the CA.cert to your trusted roots (e.g. Keychain), then develop on https://syras.local:3443/
The "Document Prediction Assistant" is the first plug-in helper. It uses the SysRev Document Search REST API which is included in the associated scipilot/sysrev-assist-python repository.
Note: configuration of the connection to the Document Search REST API is still work-in-progress.
There is a Docker composition which can install the entire system including the initial Python Document API and SSL via Lets Encrypt. This is currently in a separate repository, and built via Docker Hub automated builds.
Make a Docker host (e.g. DO droplet)
Register the DNS and wait for it to propagate
git clone https://github.com/scipilot/sysrev-assist-docker cd sysrev-assist-docker bin/install
While this software is open-source, the original works, ideas and concepts
belong to Macquarie University of Sydney, Australia (www.mq.edu.au).
While software itself can be freely re-distributed (as per the terms of the licence below) attribution must be retained to the original developers at Macquarie University and to the valuable investment of time and money by the University.
This program is free software: you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Lesser Public License for more details.
You should have received a copy of the GNU Lesser General Public License along with this program. If not, see http://www.gnu.org/licenses/.