| layout | title | zenodo_link | questions | objectives | time_estimation | key_points | contributors | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
tutorial_hands_on |
Reference Data with CVMFS without Ansible |
|
1h |
|
Overview
{:.no_toc}
The CernVM-FS is a distributed filesystem perfectly designed for sharing readonly data across the globe. We use it in the Galaxy Project for sharing things that a lot of Galaxy servers need. Namely:
- Reference Data
- Genome sequences for hundreds of useful species.
- Indices for the genome sequences
- Various bioinformatic tool indices for the available genomes
- Tool containers
- Singularity containers of everything stored in Biocontainers (A bioinformatic tool container repository.) You get these for free every time you build a Bioconda recipe/package for a tool.
- Others too..
From the Cern website:
The CernVM File System provides a scalable, reliable and low-maintenance software distribution service. It was developed to assist High Energy Physics (HEP) collaborations to deploy software on the worldwide-distributed computing infrastructure used to run data processing applications. CernVM-FS is implemented as a POSIX read-only file system in user space (a FUSE module). Files and directories are hosted on standard web servers and mounted in the universal namespace /cvmfs."
-- https://cernvm.cern.ch/portal/filesystem {: .quote}
A slideshow presentation on this subject can be found [here]({% link topics/admin/tutorials/cvmfs/slides.html %}). More details on the usegalaxy.org (Galaxy Main's) reference data setup and CVMFS system can be found here
There are two sections to this exercise. The first shows you how to use Ansible to setup and configure CVMFS for Galaxy. The second shows you how to do everything manually. It is recommended that you use the Ansible method. The manual method is included here mainly for a more in depth understanding of what is happening.
If you really want to perform all these tasks manually, go here, otherwise just follow along.
Agenda
- TOC {:toc}
{: .agenda}
CVMFS and Galaxy without Ansible
{% icon comment %} Manual version of Ansible Commands
If you wish to perform the same thing that we've just done, but by building the ansible script manually, follow these instructions. Otherwise, you have already done everything below and do not need to re-do it. {: .comment}
We are going to setup a CVMFS mount to the Galaxy reference data repository on our machines. To do this we have to install and configure the CVMFS client and then mount the appropriate CVMFS repository using the publicly available keys.
{% icon hands_on %} Hands-on: Installing the CVMFS Client
On your remote machine, we need to first install the Cern software apt repo and then the CVMFS client and config utility:
sudo apt install lsb-release wget https://ecsft.cern.ch/dist/cvmfs/cvmfs-release/cvmfs-release-latest_all.deb sudo dpkg -i cvmfs-release-latest_all.deb rm -f cvmfs-release-latest_all.deb sudo apt-get update sudo apt install cvmfs cvmfs-configNow we need to run the CVMFS setup script.
sudo cvmfs_config setup
{: .hands_on}
Configuring CVMFS
The configuration is not complex for CVMFS:
{% icon hands_on %} Hands-on: Configuring CVMFS
Create a
/etc/cvmfs/default.localfile with the following contents:CVMFS_REPOSITORIES="data.galaxyproject.org" CVMFS_HTTP_PROXY="DIRECT" CVMFS_QUOTA_LIMIT="500" CVMFS_CACHE_BASE="/srv/cvmfs/cache" CVMFS_USE_GEOAPI=yesThis tells CVMFS to mount the Galaxy reference data repository and use a specific location for the cache which is limited to 500MB in size and to use the instance's geo-location to choose the best CVMFS repo server to connect to. You can use the
cvmfs_quota_limitrole variable to control this setting.Create a
/etc/cvmfs/domain.d/galaxyproject.org.conffile with the following contents:CVMFS_SERVER_URL="http://cvmfs1-tacc0.galaxyproject.org/cvmfs/@fqrn@;http://cvmfs1-iu0.galaxyproject.org/cvmfs/@fqrn@;http://cvmfs1-psu0.galaxyproject.org/cvmfs/@fqrn@;http://galaxy.jrc.ec.europa.eu:8008/cvmfs/@fqrn@;http://cvmfs1-mel0.gvl.org.au/cvmfs/@fqrn@;http://cvmfs1-ufr0.galaxyproject.eu/cvmfs/@fqrn@"This is a list of the available stratum 1 servers that have this repo.
Create a
/etc/cvmfs/keys/data.galaxyproject.org.pubfile with the following contents:-----BEGIN PUBLIC KEY----- MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA5LHQuKWzcX5iBbCGsXGt 6CRi9+a9cKZG4UlX/lJukEJ+3dSxVDWJs88PSdLk+E25494oU56hB8YeVq+W8AQE 3LWx2K2ruRjEAI2o8sRgs/IbafjZ7cBuERzqj3Tn5qUIBFoKUMWMSIiWTQe2Sfnj GzfDoswr5TTk7aH/FIXUjLnLGGCOzPtUC244IhHARzu86bWYxQJUw0/kZl5wVGcH maSgr39h1xPst0Vx1keJ95AH0wqxPbCcyBGtF1L6HQlLidmoIDqcCQpLsGJJEoOs NVNhhcb66OJHah5ppI1N3cZehdaKyr1XcF9eedwLFTvuiwTn6qMmttT/tHX7rcxT owIDAQAB -----END PUBLIC KEY-----Make a directory for the cache files
sudo mkdir /srv/cvmfs
{: .hands_on}
Testing it out
Probe the connection.
{% icon hands_on %} Hands-on: Testing it out
Run
sudo cvmfs_config probe data.galaxyproject.org{% icon question %} Question
What does it output?
{% icon solution %} Solution
OKIf this doesn't return
OKthen you may need to restart autofs:sudo systemctl restart autofs{: .solution }
{: .question}
Change directory into
/cvmfs/and list the files in that folder{% icon question %} Question
What do you see?
{% icon solution %} Solution
You should see nothing, as CVMFS uses
autofsin order to mount paths only upon request. Once youcdinto the directory, autofs will automatically mount the repository and files will be listed.{: .solution }
{: .question}
Change directory into
/cvmfs/data.galaxyproject.org/. Have a browse through the contents. You'll see.locfiles, genomes and indices.And just like that we all have access to all the reference genomes and associated tool indices thanks to the Galaxy Project, IDC, and Nate's hard work!
{% icon tip %} Contributing Reference Genomes
If you are developing a new tool, and want to add a reference genome, we recommend you talk to us on Gitter. You can also look at one of the tools that uses reference data, and try and copy from that. If you’re developing the location files completely new, you need to write the data manager. {: .tip}
{: .hands_on}
Look at the repository
Now to configure Galaxy to use the CVMFS references we have just installed, see [the Ansible tutorial.]({% link topics/admin/tutorials/cvmfs/tutorial.md %}#configuring-galaxy-to-use-the-cvmfs-references)