Template Description

Ivo edited this page Jul 18, 2017 · 11 revisions
Clone this wiki locally

Description and usage

  • You can find three templates in the repository - samples, mixs, full (combination of the previous two).
  • Currently only the full template is supported by the GFBio molecular brokerage service. All documentation hereafter refers to the full template.
  • We accept either comma (,), tab (\t), or semicolon (;) as field separators, but only choose one and make sure you do not use that character within the content of the fields (e.g. if you separate the fields with a comma, do not use commas in the sample description).
  • Only the first 10 are strictly mandatory for starting the submission. Our curation team will contact you in case problems occur.
  • Do not remove empty optional fields.
  • Empty fields should be empty, do not use an extra notation (e.g. 'NA', '-').
  • You can add extra fields at the end of the template.

Field descriptions

  • MixS fields are marked as optional, because you can start a submission without them and add them later (if available). We strongly recommend complying to the MIxS standard and support our users in achieving compliance.
field name mandatory description
sample_title yes a unique label for your samples, preferably one you can use to map to any other data (e.g. environmental measurements, experimental conditions)
taxon_id yes the numeric taxon ID according to NCBI Taxonomy ([search or browse]( ecological metagenomes), e.g. ecological metagenomes)
sample_description no (recommended) Sample titles are often short and uninformative. Therefore, we strongly recommend adding a short sample description (e.g. 'surface water sample from '). It can be a combination of the most important features of this sample - imagine what information would help you determine if a sample is interesting for you at a glance.
sequencing_platform yes The full name of the sequencing machine, e.g. "Illumina MiSeq". Full list of available values will be provided ASAP.
library_strategy yes e.g. 'AMPLICON' for community analysis with marker genes like 16S rRNA. Full list of available values will be provided ASAP.
library_source yes e.g. "METAGENOMIC" or "METATRANSCRIPTOMIC" for community based analyses. Full list of available values will be provided ASAP.
library_selection yes e.g for amplicon studies, use "PCR"
library_layout yes refers to whether the sequence reads are single-end or paired-end (allowed values: "single" or "paired").
nominal_length yes (conditional) A single number denoting the expected insert size. This field is only mandatory for paired-end sequencing (i.e. the value of the library_layout column is ''paired''). See https://www.ebi.ac.uk/fg/annotare/help/seq_lib_spec.html) for more information.
forward_read_file_name yes The complete filename for the forward read as you upload through the input interface or make available through a file-sharing platform.
forward_read_file_checksum no You can calculate the checksum of the read files (e.g. using md5sum on linux) and provide them here. We can use the values to check the integrity of the files after the transfer.
reverse_read_file_name no (conditional) if you have paired-end sequences (library_layout=paired); then this field is mandatory
reverse_read_file_checksum no You can calculate the checksum of the read files (e.g. using md5sum on linux) and provide them here. We can use the values to check the integrity of the files after the transfer.
checksum_method no If your provide a checksum for any of your files, then it is highly recommended to provide the method you used (allowed values/methods: "MD5" or "SHA-256")
MIxS and optional parameters
investigation type no refers to the type of material you sequenced, the value is one of: eukaryote,bacteria_archaea,plasmid,virus,organelle,metagenome,mimarks-survey,mimarks-specimen. Tip: the correct value for amplicon studies is 'mimarks-survey'.
environmental package no refers to the MIxS environmental packages (e.g. 'water', 'sediment')
collection date no in ISO8601 format, e.g. '2016-01-18'.
geographic location (latitude) no The latitude in decimal degrees (WGS84), e.g. '32.4567'.
geographic location (longitude) no The latitude in decimal degrees (WGS84), e.g. '111.0034'.
geographic location (depth) no Depth in meters (do not include the unit).
geographic location (elevation) no Elevation in meters, e.g. if your sample was taken atop a mountain. If you want to express the distance from the surface to the bottom of a water body, use "total depth water column".
total depth water column no Distance from the water surface to the bottom in meters (do not include the unit).
geographic location (country and/or sea) no The country or sea where the sample was taken from. The value must be from the INSDC country list.
environment (biome) no EnvO biome term in the format 'forest biome [ENVO:01000174]'. Browse EnvO biome terms at OLS
environment (material) no EnvO environmental material term in the format 'sea water [ENVO:00002149]'. Browse EnvO material terms at OLS
environment (feature) no EnvO environmental feature term(s) in the format ''. Browse EnvO feature terms at OLS