This README file has the following sections:
- What is the Research Software Directory?
- How do I enter data into an instance of the Research Software Directory?
- Documentation for developers
- Documentation for maintainers
What is the Research Software Directory?
The Research Software Directory is a content management system that is tailored to software.
The idea is that institutes for whom research software is an important output, can run their own instance of the Research Software Directory. The system is designed to be flexible enough to allow for different data sources, database schemas, and so on. By default, the Research Software Directory is set up to collect data from GitHub, Zenodo, Zotero, as well as Medium blogs.
For each software package, a product page can be created on the Research Software Directory if the software is deemed useful to others. Here is an example of what a product page may look like:
While the content shown on the product page can be completely customized, by default it includes a Mentions section, which can be used to characterize the context in which the software exists. The context may include links to scientific papers, but is certainly broader than that: for example, there may be links to web applications that demonstrate the use of the software, there may be links to videos on YouTube, tutorials on readthedocs.io or Jupyter notebooks, or there may be links to blog posts; really, anything that helps visitors decide if the software could be useful for them.
The Research Software Directory improves findability of software packages, partly because it provides metadata that helps search engines understand what the software is about, but more importantly because of the human centered text snippets that must be provided for each software package. After all, discovery of a software package is often not so much about finding it but knowing that you found it.
How do I enter data into an instance of the Research Software Directory?
The process is described here.
Documentation for developers
Try it out locally
Basically, the steps to get a copy of https://research-software.nl running locally (including data) are as follows:
- Fork this repo to your own GitHub organization or GitHub profile and clone it
- Start the complete stack using
For details, see below.
Make sure you have a Linux computer with
git installed. Other operating systems might work but we develop exclusively
on Linux based systems. You can find the installation instructions for each tool
You'll need a minimum of about 3 GB free disk space to store the images, containers and volumes that we will be making.
Optionally, add yourself to the
docker group following the instructions
documentation assumes that you did).
Try it out, step 1/3: Fork and clone
Fork button on
fork to your own GitHub organization or GitHub profile, then:
git clone https://github.com/<your-github-organization>/research-software-directory.git
Try it out, step 2/3: Configure
The research software directory is configured using a file with environment
rsd-secrets.env. An example config file
rsd-secrets.env.example) is available, use it as a starting point.
cd research-software-directory cp rsd-secrets.env.example rsd-secrets.env
The config file has some placeholder values (
changeme); they must be set by
rsd-secrets.env file. Below are instructions on how to get the
different tokens and keys.
These environment variables are used for authenticating a user, such that they can be granted access to the admin interface to create, read, update, and delete items in the Research Software Directory.
These are the steps to assign values:
- Go to https://github.com/settings/developers
- Click the
New OAuth Appbutton
Application name, write something like The Research Software Directory's admin interface on localhost
Homepage URLfill in some URL, for example, let it point to this readme on GitHub. Despite the fact that it is a required field, its value is not used as far as I can tell.
- Optionally add a description. This is especially useful if you have multiple OAuth apps
- The most important setting is the value for
Authorization callback url. Set it to http://localhost/auth/get_jwt for now. We will revisit
AUTH_GITHUB_CLIENT_SECRETin the section about deployment below
- Assign the
Client IDas value for
AUTH_GITHUB_CLIENT_IDand assign the
Client Secretas value for
Data is entered into the Research Software Directory via the admin interface.
AUTH_GITHUB_ORGANIZATION to the name of the GitHub organization whose
members should be allowed access to the admin interface. Most likely, it is the
name of the organization where you forked this repository to.
Note: members should make their membership of the GitHub organization public. Go to https://github.com/orgs/<your-github-organization>/people to see which users are a member of <your-github-organization>, and whether their membership is public or not.
To query GitHub's API programmatically, we need an access token. Here's how you can get one:
- Go to https://github.com/settings/tokens
Generate new token
Token description, fill in something like Key to programmatically retrieve information from GitHub's API
- Verify that all scopes are unchecked
- Use token as value for
When getting the references data from Zotero, this environment variable
determines which library on Zotero is going to be harvested. Go to
https://www.zotero.org/groups/ to see which Zotero groups you are a member of.
If you click on the
Group library link there, the URL will change to
1689348 is the value you need to assign to
To query Zotero's API programmatically, we need an API key. Here's how you can get one:
Create new private key
- Type a description of the key, e.g. API key to access library X on Zotero
Personal library, make sure only
Allow library accessis checked.
Default group permissions, choose
Specific groups, check
Per group permissions
Read onlyfor the group that you want to harvest your references data from; verify that any other groups are set to
- Click the
Save Keybutton at the bottom of the page.
- On the
Key Createdpage, you will see a string of random character, something like
bhCJSBCcjzptBvd3fvliYOoE. This is the key; assign it to
This environment variable is used for making a daily backup of the database with software, people, projects, etc. As it is typically only used during deployment, leave its value like it is for now; we will revisit it in the section about deployment below.
JWT_SECRET is simply a string of random characters. You can generate one
yourself using the
openssl command line tool, as follows:
openssl rand -base64 32
Assign the result to
These environment variables are not relevant when you're running your instance
locally. Leave their values like they are in
rsd-secrets.env.example for the
time being. We will revisit them in the section about deployment
# add the environment variables from rsd-secrets.env to the current terminal: source rsd-secrets.env # start the full stack using docker-compose: docker-compose --project-name rsd up --build # shorthand: docker-compose -p rsd up --build
After the Research Software Directory instance is up and running, we want to start harvesting data from external sources such as GitHub, Zotero, Zenodo, etc. To do so, open a new terminal and run
source rsd-secrets.env docker-compose --project-name rsd exec harvesting python app.py harvest all
You should see some feedback in the newly opened terminal.
harvest all task finishes, several database collections should
have been updated, but we still need to use the data from those separate
collections and combine them into one document that we can feed to the frontend.
This is done with the
resolve task, as follows:
docker-compose --project-name rsd exec harvesting python app.py resolve
By default, the
resolve tasks runs every fifth minute anyway, so you could just wait for a bit, until you see some output scroll by that is generated by the
rsd-harvesting container, something like:
rsd-harvesting | 2018-07-11 10:30:02,990 cache_software [INFO] processing Xenon command line interface rsd-harvesting | 2018-07-11 10:30:03,013 cache_software [INFO] processing Xenon gRPC server rsd-harvesting | 2018-07-11 10:30:03,036 cache_software [INFO] processing xtas rsd-harvesting | 2018-07-11 10:30:03,059 cache_software [INFO] processing boatswain rsd-harvesting | 2018-07-11 10:30:03,080 cache_software [INFO] processing Research Software Directory rsd-harvesting | 2018-07-11 10:30:03,122 cache_software [INFO] processing cffconvert rsd-harvesting | 2018-07-11 10:30:03,149 cache_software [INFO] processing sv-callers
Open a web browser to verify that everything works as it should.
http://localhostshould show a local instance of the Research Software Directory
http://localhost/adminshould show the Admin interface to the local instance of the Research Software Directory
http://localhost/api/softwareshould show a JSON representation of all software in the local instance of the Research Software Directory
http://localhost/software/xenonshould show a product page (here: Xenon) in the local instance of the Research Software Directory
http://localhost/api/software/xenonshould show a JSON representation of a product (here: Xenon) in the local instance of the Research Software Directory
http://localhost/graphsshould show you some integrated statistics of all the packages in the local instance of the Research Software Directory
http://localhost/oai-pmh?verb=ListRecords&metadataPrefix=datacite4should return an XML document with metadata about all the packages that are in the local instance of the Research Software Directory, in DataCite 4 format.
Customize your instance of the Research Software Directory
Let's say you followed the steps above, and have a running instance of the Research Software Directory. Now it is time to start customizing your Research Software Directory. We have prepared some FAQs for customizations that are common. For example, you can read up on the following topics:
- How do I change the colors?
- How do I change the font?
- How do I change the logo?
- How do I change when data collection scripts run?
- How do I empty the database?
- How do I make changes to the admin interface?
- How do I add properties to the data schema?
It is suggested that you first do one or more of:
Then, learn how to add properties to the schema:
Finally, learn how to empty the database, such that you can replace the sample data with your own:
General workflow when making changes
After making your changes, here's how you get to see them:
Go to the terminal where you started
Use Ctrl+C to stop the running instance of Research Software Directory
Check which docker containers you have with:
docker-compose --project-name rsd ps # shorthand: docker-compose -p rsd ps
For example, mine says:
docker-compose -p rsd ps Name Command State Ports ---------------------------------------------------------------------- rsd-admin sh -c rm -rf /build/* && c ... Exit 0 rsd-authentication /bin/sh -c gunicorn --prel ... Exit 0 rsd-backend /bin/sh -c gunicorn --prel ... Exit 0 rsd-database /mongo.sh --bind_ip 0.0.0.0 Exit 137 rsd-frontend /bin/sh -c sh -c "mkdir -p ... Exit 0 rsd-nginx-ssl /bin/sh -c /start.sh Exit 137 rsd-reverse-proxy /bin/sh -c nginx -g 'daemo ... Exit 137 rsd-harvesting /bin/sh -c crond -d7 -f Exit 137
docker-compose rmto delete container by their service name, e.g. the
docker-compose --project-name rsd rm frontend # shorthand: docker-compose -p rsd rm frontend
List all docker images on your system:
Note that image names consist of whatever you entered as --project-name, followed by
_, followed by the service name. Remove as follows:
docker rmi rsd_frontend
Make changes to the source code of the service whose container and image you just removed
Rebuild containers as necessary, using:
docker-compose --project-name rsd build frontend docker-compose --project-name rsd up frontend
Make your instance available to others by hosting it online (deployment)
Amazon Web Services (AWS) is a online service provider that offers all kinds of services relating to compute, storage, and hosting. The Netherlands eScience Center uses AWS to run their instance of the Research Software Directory. This section describes how to deploy your own customized instance of the Research Software Directory to AWS.
Go to https://aws.amazon.com/console/. Once there, you'll see something like:
Create a free account if you don't already have one, and subsequently click
Sign In to the Console.
Once in the console, you'll be presented with an overview of all the services that Amazon Web Services has to offer:
It's easy to get lost in this plethora of services, but for running an instance of the Research Software Directory, you'll only need 3 of them:
- EC2: this is where we will run your customized instance of the Research Software Directory and host it online; jump to the EC2 section
- IAM: we use this to create a user with limited privileges, so we don't have to use root credentials when we don't have to; jump to the IAM section
- S3: this is where we will store our daily backups; jump to the S3 section
All Services overview, click
EC2 or use this link
Click the blue
Scroll down to where it says
Ubuntu Server 18.04 LTS, click
Choose instance type
Proceed in the wizard by clicking
Nextuntil you get to
Configure Security Group. It should already have one rule listed. However, its security settings should be a bit more secure, because currently it allows SSH connections from any IP. Click the
Sourcedropdown button, select
Now click the blue
Review and Launchbutton in the lower right corner
Reviewscreen, click the blue
Launchbutton in the lower right corner to bring the instance up
Create a new key pair, try to give it a meaningful name, e.g.
Download Key Pair, save the
~/.sshon your local machine, then click
Launch Instances(it takes a moment to initialize).
On your local machine, open a terminal and go to
~/.ssh. Change the permissions of the key file to octal 400 (readable only by user):
chmod 400 <the keyfile>
Verify that the
.sshdirectory itself has octal permission 700 (readable, writable, and executable by user only).
Go back to Amazon, click
Make a note of your instance's public IPv4, e.g.
On your own machine use a terminal to log in to your instance
ssh -i path-to-the-keyfile ubuntu@<your-instance-public-ip>
Once logged in to the remote machine, install
docker-compose, then add user
ubuntuto the group
docker, same as before (see section Documentation for developers above).
Make a new directory and change into it:
cd ~ mkdir rsd cd rsd
The machine should have
gitinstalled, use it to
git cloneyour customized Research Software Directory instance into the current directory as follows:
git clone https://github.com/<your-github-organization>/research-software-directory.git .
(Note the dot at the end)
Open a new terminal and secure-copy your local
rsd-secrets.envfile to the Amazon machine as follows:
cd <where rsd-secrets.env is> scp -i path-to-the-keyfile ./rsd-secrets.env \ ubuntu@<your-instance-public-ip>:/home/ubuntu/rsd/rsd-secrets.env
Follow the instructions above to make a second key pair
AUTH_GITHUB_CLIENT_SECRET. However, let this one's
Authorization callback urlbe
https://plus your instance's IPv4 plus
/auth/get_jwt. Update the Amazon copy of
rsd-secrets.envaccording to the new client ID and secret.
Start the Research Software Directory instance with:
cd ~/rsd source rsd-secrets.env docker-compose --project-name rsd up --build &
On your local machine, open a new terminal. Connect to the Amazon instance, run the harvesters, and resolve the foreign keys:
ssh -i path-to-the-keyfile ubuntu@<your-instance-public-ip> cd ~/rsd source rsd-secrets.env docker-compose --project-name rsd exec harvesting \ python app.py harvest all docker-compose --project-name rsd exec harvesting \ python app.py resolve
At this point we should have a world-reachable, custom instance of the Research
Software Directory running at
if we go there using a browser like Firefox or Google Chrome, we get a warning
that the connection is not secure.
To fix this, we need to configure the security credentials, but this in turn requires us to claim a domain and configure a DNS record. There are free services available that you can use for this, e.g. https://noip.com. Here's how:
Go to https://noip.com, sign up and log in.
Under My services, find
Add a hostnamebutton
Choose your free (sub)domain name, e.g. I chose
Fill in the IP address of your Amazon machine. In my case,
https://myrsd.ddns.netwill serve as an alias for
Once you have the (sub)domain name, update
SSL_DOMAINSin the file
rsd-secrets.envon your Amazon instance (leave out the
https://part, as well as anything after the
.orgor whatever you may have).
Fill in your e-mail for
Finally, revisit your OAuth app here https://github.com/settings/developers, replace the Amazon IP address in the
Authorization callback urlwith your freshly minted domain name.
Now, stop the Research Software Directory if it is still running with Ctrl-c or
docker-compose -p rsd stop.
Update the environment variables by
sourceing your secrets again:
cd ~/rsd source rsd-secrets.env
Start the Research Software Directory back up
cd ~/rsd docker-compose -p rsd up
Pointing your browser to your (sub)domain name should now show your instance of the Research Software Directory (although be aware that sometimes it takes a while before the domain name resolves to the IP address.
- In the
All Servicesoverview, click
IAMor use this link https://console.aws.amazon.com/iam.
- In the menu on the left, click
- Click the
Create New Groupbutton.
- Name the group
- When asked to attach a (security) policy, use the search bar to find
AmazonS3FullAccessand check its checkbox.
- Click the
Next stepbutton in the lower right corner.
- Review your group, go back if need be. When you're ready, click the
Create Groupbutton in the lower right corner.
- Now you should be presented with a group, but the group is still empty; there are no users.
- In the menu on the left, click
- Click the
Add userbutton in the top left corner.
- Choose your user name. I chose to call mine
rsd-backup-maker. For this user, check the checkbox labeled
Programmatic access. This user won't need
AWS Management Console access, so leave that box unchecked.
- In the lower right corner, click the
Add user to group, and make user
rsd-backup-makera member of group
- In the lower right corner, click the
Next: Tagsbutton. We don't need to assign any tags, so proceed to the next page by clicking
Next: Review. Go back if you need to, but if everything looks OK, click
Create User. You will be presented with the new user's credentials. Download the CSV file now; we'll use the
Access key IDand the
Secret access keylater to set up the backup mechanism.
All Services overview, click
S3 or use this link
create a bucket with a random name (bucket names must be globally unique; websites like https://www.random.org/strings/ are useful to get a random string)
in that bucket, make a directory, e.g.
The backup service contains a program (xenon-cli) that can copy to a range of storage providers. You can use it to make daily backups of the MongoDB database, and store the backups on Amazon's S3. For this, configure the environmental variable
BACKUP_CMDas follows (naturally, you'll need to use a different location, username, and password; see explanation below):
BACKUP_CMD='xenon filesystem s3 \ --location http://s3-us-west-2.amazonaws.com/nyor-yiwy-fepm-dind/ \ --username AKIAJ52LWSUUKATRQZ2A \ --password xQ3ezZLKN7XcxIwRko2xkKhV9gdJ5etA4OyLbXN/ \ upload rsd-backup.tar.gz /rsd-backups/rsd-backup-$BACKUP_DATE.tar.gz'
- The bucket name is
nyor-yiwy-fepm-dind. It is physically located in zone
- We access the bucket as a limited-privileges IAM user, for whom we
created an access key (it has been deactivated since). The Access key ID is
AKIAJ52LWSUUKATRQZ2A, and its corresponding Secret access key is
- The variable
BACKUP_DATEis set by the backup script (see
/backup/backup.sh); no need to change this for your application.
rsd-backup.tar.gzis the name of the backup archive as it is called inside the container; no need to change this for your application.
/rsd-backups/rsd-backup-$BACKUP_DATE.tar.gzis the path inside the bucket. It includes the date to avoid overwriting previously existing archives; no need to change this for your application.
- The bucket name is
Test the setup by stopping the Research Software Directory on Amazon, by
#ssh into the remote machine cd rsd docker-compose -p rsd stop # update BACKUP_CMD by editing the rsd-secrets.env file source rsd-secrets.env docker-compose -p rsd up
Wait until the Research Software Directory is up and running again, then open a second terminal and
#ssh into the remote machine cd rsd docker-compose -p rsd exec backup /bin/sh /app # /bin/sh backup.sh
Documentation for maintainers
It is sometimes helpful to visualize the structure in the
Use https://github.com/pmsipilot/docker-compose-viz to generate a png image.
docker run --rm -it --name dcv -v $(pwd):/input pmsipilot/docker-compose-viz render -m image --output-file=docs/images/docker-compose.png docker-compose.yml
Making a release
Write the release notes
Generate the metadata file for Zenodo using cffconvert.
pip install --user cffconvert cffconvert --outputformat zenodo --ignore-suspect-keys --outfile .zenodo.json
# git add, commit, and push everything
Make sure that everything is pushed
cd $(mktemp -d) git clone https://github.com/research-software-directory/research-software-directory.git cd research-software-directory
Follow the notes from the 'For developers' section above, and verify that it all works as it should.
Draft a new releasebutton here to make a release.
Pulling in changes from upstream using a three-way merge
DOWNSTREAM to the different sources you want to
three-way merge between, e.g.
cd $(mktemp -d) mkdir left middle right cd left && git clone $UPSTREAM . && cd - cd middle && git clone $DOWNSTREAM . && git branch develop && git checkout develop && cd - cd right && git clone $DOWNSTREAM . && cd - meld left middle right &
You should only make changes to the
middle one. When you're done making your changes,
git add <the files> git commit git push origin develop