Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v0.3.0 stability update part 2 #40

Merged
merged 96 commits into from
Jun 21, 2022
Merged

v0.3.0 stability update part 2 #40

merged 96 commits into from
Jun 21, 2022

Conversation

ktmeaton
Copy link
Owner

Part 2 because I forgot to rebase properly the first time 😅

Major Changes

  1. Default parameters have been updated! Please regenerate your profiles/builds with:

    bash scripts/create_profile.sh --data data/custom
  2. Rule outputs are now in sub-directories for a cleaner results directory.

  3. The in-text report (report.pptx) statistics are no longer cumulative counts of all sequences. Instead they, will match the reporting period in the accompanying plots.

Bug Fixes

  1. Improve subtree collapse effiency (Collapsing subtrees is inefficient and slow #35).
  2. Improve subtree aesthetics and filters (Add better filters to the Auspice JSON output #20).
  3. Fix issues rendering as float (Issue number is a float rather than a string in linelists and reports #29).
  4. Explicitly control the dimensions of plots for powerpoint embedding.
  5. Remove hard-coded extra_cols (column "gisaid_epi_isl" not existed in file #26).
  6. Fix mismatch in lineages plot and description (Troubleshoot missing designated lineages in controls report #21).
  7. Downstream steps no longer fail if there are no recombinant sequences (Downstream steps fail if Nextclade finds no recombinants #7).

Workflow

  1. Add new rule usher_columns to augment the base usher metadata.
  2. Add new script parents.py, plots, and report slide to summarize recombinant sequences by parent.
  3. Make rules plot and report more dynamic with regards to plots creation.
  4. Exclude the reference genome from alignment until faToVcf.
  5. Include the log path and expected outputs in the message for each rule.
  6. Use sub-functions to better control optional parameters.
  7. Make sure all rules write to a log if possible (Rules with empty log files #34).
  8. Convert all rule inputs to snakemake rule variables.
  9. Create and document a create_profile.sh script.
  10. Implement the --low-memory mode parameter within the script usher_metadata.sh.

Data

  1. Create new controls datasets:

    • controls-negatives
    • controls-positives
    • controls
  2. Add versions to genbank_accessions for controls.

Programs

  1. Upgrade UShER to v0.5.4 (possibly this was done in a prior ver).
  2. Remove taxonium and chronumental from the conda env.

Parameters

  1. Add parameters to control whether negatives and false_positives should be excluded:

    • exclude_negatives: false
    • false_positives: false
  2. Add new optional param max_placements to rule linelist.

  3. Remove --show-private-mutations from debug_args of rule sc2rf.

  4. Add optional param --sc2rf-dir to sc2rf to enable execution outside of sc2rf dir.

  5. Add params --output-csv and --output-ansi to the wrapper scripts/sc2rf.sh.

  6. Remove params nextclade_ref and custom_ref from rule nextclade.

  7. Change --breakpoints 0-10 in sc2rf.

Continuous Integration

  1. Re-rename tutorial action to pipeline, and add different jobs for different profiles:

    • Tutorial
    • Controls (Positive)
    • Controls (Negative)
    • Controls (All)

Output

  1. Output new _historical plots and slides for plotting all data over time.

  2. Output new file parents.tsv to summarize recombinant sequences by parent.

  3. Order the colors/legend of the stacked bar plots by number of sequences.

  4. Include lineage and cluster id in filepaths of largest plots and tables.

  5. Rename the linelist output:

    • linelist.tsv
    • positives.tsv
    • negatives.tsv
    • false_positives.tsv
    • lineages.tsv
    • parents.tsv
  6. The report.xlsx now includes the following tables:

    • lineages
    • parents
    • linelist
    • positives
    • negatives
    • false_positives
    • summary
    • issues

Katherine Eaton added 29 commits June 21, 2022 10:31
@ktmeaton ktmeaton merged commit e8eda40 into master Jun 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant