Skip to content

FCS adaptor output

Eric Tvedte edited this page Apr 17, 2024 · 2 revisions

Expected outputs for run_fcsadaptor.sh:

  • cleaned_sequences/*.fa.gz: cleaned sequences file.
  • combined.calls.jsonl: final FCS-adaptor report (JSON file format).
  • fcs.log: auto-generated, empty file.
  • fcs_adaptor.log: log file for the FCS-adaptor run
  • fcs_adaptor_report.txt: final FCS-adaptor report (TSV file format).
  • logs.jsonl: auto-generated, empty file.
  • pipeline_args.yaml: YAML file format of parameters specified for FCS-adaptor run (BLAST db, input FASTA)
  • skipped_trims.jsonl: JSON file format of internal adaptor hits skipped by cleanup.
  • validate_fasta.txt: report of any formatting issues with input FASTA. empty if input FASTA is valid

Expected outputs for fcs.py clean genome:

FCS-adaptor report

A final report of recommended actions from FCS-adaptor is provided in the file fcs_adaptor_report.txt.

The following table illustrates column numbers (first column) with corresponding column headers (second column):

1:      accession     seq_00001 
2:      length        230276
3:      action        ACTION_TRIM 
4:      range         1..58 
5:      name          CONTAMINATION_SOURCE_TYPE_ADAPTOR:NGB00360.1:Illumina PCR Primer
  • Column 1: A seq-id (sequence ID) for a whole sequence, as found in the input FASTA.
  • Column 2: Length of the entire sequence in Column 1. Only a portion may be identified as contaminant, according to the range column.
  • Column 3: The recommended action. Action values are as follows:
    • ACTION_EXCLUDE: Remove the entire sequence.
    • ACTION_TRIM: Remove the sequence at the beginning or end of the sequence.
  • Column 4: Start and end coordinates for the identified contamination. If only a portion of the sequence is identified as contaminant, these values indicate the range that should be removed.
  • Column 5: The matched synthetic sequence identified by FCS-adaptor. See here for the sequences contained in the FCS-adaptor database.

Interpreting Outputs

FCS-adaptor uses the following rules to determine calls in fcs_adaptor_report.txt:

  • If adaptors are found at the beginning or end of the sequence, the matching span is reported as "ACTION_TRIM," and is removed in the cleaned_sequences/*.fa.gz output.
  • If adaptors are found within 100 bp of either end of the sequence, the span to trim is extended to the end of the contig. If additional adaptors are found within 100 bp of the proposed trim range, then the trim span is transitively extended to cover the additional hits. These spans are reported as "ACTION_TRIM," and are removed in the cleaned_sequences/*.fa.gz output.
  • If adaptors are found at greater than 100 bp from either end of the sequence, the matching span is reported as “ACTION_TRIM,” but the internal span is not removed in the cleaned_sequences/*.fa.gz output.
  • If adaptors are found at greater than 100 bp from either end of the sequence but 50 bp or less from each other, the spans are joined and reported as “ACTION_TRIM,” but the internal span is not removed in the cleaned_sequences/*.fa.gz output.
  • If more than 75% of the sequence matches the adaptors, the whole sequence is reported as “ACTION_EXCLUDE,” and is removed in the cleaned_sequences/*.fa.gz output.
  • If less than 200 bp of the sequence remains unmatched to the adaptors, the whole sequence is reported as “ACTION_EXCLUDE,” and is removed in the cleaned_sequences/*.fa.gz output.

Use fcs.py clean genome as described in the FCS-adaptor Quickstart to automatically clean all adaptor contaminant spans and see Separated cleaned and contaminated sequences for information on how fcs.py clean genome handles adaptor report calls.

FCS-adaptor cleaning report

A successful fcs.py clean genome run will print the summary of cleaning actions:

Applied 11 actions; 522 bps dropped; 0 bps hardmasked.

Separated cleaned and contaminated sequences

fcs.py clean genome performs the following actions on FCS-adaptor reports to separate "clean" from "contaminated" sequences:

  • ACTION_EXCLUDE : whole sequences are removed in clean.fasta, sent to contam.fasta.
  • ACTION_TRIM : beginning or end of sequence is removed in clean.fasta, not sent to contam.fasta.
  • FIX : internal contamination range is masked in clean.fasta at the range defined by start-pos>end-pos. This action is not defined automatically by run_fcsadaptor.sh and must be substituted by the user for internal ACTION_TRIM ranges where appropriate. Not sent to contam.fasta.
  • SPLIT : clean.fasta is split at the internal contamination range defined by start-pos>end-pos. This action is synonymous with internal ACTION_TRIM ranges.