Skip to content

Commit

Permalink
Merge pull request #197 from nextstrain/ingest-tutorial-followup
Browse files Browse the repository at this point in the history
Fill in missing code-block language
  • Loading branch information
joverlee521 committed Apr 19, 2024
2 parents ba9ced1 + 4de29e9 commit e9721fc
Show file tree
Hide file tree
Showing 4 changed files with 35 additions and 35 deletions.
4 changes: 2 additions & 2 deletions src/learn/augur-to-auspice.rst
Original file line number Diff line number Diff line change
Expand Up @@ -718,9 +718,9 @@ Select (discrete) colorings are available for filtering in Auspice (both
via the sidebar UI and listed in the footer) if they are defined in the
auspice-config JSONs filters list:

.. code-block::
.. code-block:: json
"filters": ["country", "region", ...]
"filters": ["country", "region", "..."]
Additionally, each mutation and strain name will be automatically
available in Auspice’s sidebar UI for filtering.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -36,9 +36,9 @@ This tutorial will only focus on using the guide to set up the ingest workflow.
3. Follow the `GitHub guide to download the new repository <https://docs.github.com/en/repositories/creating-and-managing-repositories/cloning-a-repository>`_.
4. Change directory to your new pathogen repository

.. code-block::
.. code-block:: console
cd <new-pathogen-repository>
$ cd <new-pathogen-repository>
Decide on data source
=====================
Expand Down Expand Up @@ -66,9 +66,9 @@ You can decide whether NCBI Datasets include sufficient data for your pathogen b
1. Add your pathogen's NCBI taxonomy ID to the ``ncbi_taxon_id`` parameter in the ``ingest/defaults/config.yaml`` config file.
2. Dump the uncurated metadata by running

.. code-block::
.. code-block:: console
nextstrain build ingest dump_ncbi_dataset_report
$ nextstrain build ingest dump_ncbi_dataset_report
3. Inspect the generated file ``ingest/data/ncbi_dataset_report_raw.tsv``
4. If there are other fields in the raw file that you would like to include in the workflow,
Expand All @@ -92,7 +92,7 @@ the `NCBI Entrez <https://www.ncbi.nlm.nih.gov/books/NBK25501/>`_ tool to downlo
4. Switch the `Snakemake ruleorder <https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#handling-ambiguous-rules>`_
within the ``ingest/rules/fetch_from_ncbi.smk`` file.

.. code-block::
.. code-block:: python
ruleorder: format_ncbi_datasets_ndjson < parse_genbank_to_ndjson
Expand Down Expand Up @@ -276,7 +276,7 @@ Geolocation rules

Geolocation rules are defined in a TSV file with the format

.. code-block::
.. code-block:: none
region/country/division/location<\t>region/country/division/location
Expand All @@ -289,14 +289,14 @@ If there are rules that can be applied across multiple locations, then a wildcar

Let's say you have the following locations in your NDJSON

.. code-block::
.. code-block:: none
{“region”: “North America”, “country”: “United States”, “division”: “New York”, “location”: “Buffalo”}
{“region”: “North America”, “country”: “United States”, “division”: “New York”, “location”: “New York”}
And you provide these geolocation rules

.. code-block::
.. code-block:: none
North America/United States/New York/New York North America/United States/New York/New York City
North America/United States/New York/* North America/United States/New York State/*
Expand All @@ -308,7 +308,7 @@ The third rule has wildcards for both division and location, so it will correct

Running through the ``ingest/vendored/apply-geolocation-rules`` script should produce the following

.. code-block::
.. code-block:: none
{“region”: “North America”, “country”: “USA”, “division”: “New York State”, “location”: “Buffalo”}
{“region”: “North America”, “country”: “USA”, “division”: “New York State”, “location”: “New York City”}
Expand Down Expand Up @@ -337,7 +337,7 @@ User annotations

The user annotations are defined in a TSV file with the format

.. code-block::
.. code-block:: none
id<\t>field<\t>value
Expand All @@ -347,14 +347,14 @@ The ``value`` is the value you are trying to add to the NDJSON record.

Let's say you have the following NDJSON records

.. code-block::
.. code-block:: none
{“accession”: “AAAAA”, “country”: “United States”, “division”: “New York”, “location”: “Buffalo”}
{“accession”: “BBBBB”, “country”: “United States”, “division”: “New York”, “location”: “Buffalo”}
And you provide these user annotations

.. code-block::
.. code-block:: none
AAAAA age 10
BBBBB age 12
Expand All @@ -365,7 +365,7 @@ third annotation overwrites the existing ``location`` field for the record ``BBB

Running through the ``ingest/vendored/merge-user-metadata`` script should produce the following:

.. code-block::
.. code-block:: none
{“accession”: “AAAAA”, “country”: “United States”, “division”: “New York”, “location”: “Buffalo”, “age”: 10}
{“accession”: “BBBBB”, “country”: “United States”, “division”: “New York”, “location”: “Niagara Falls”, “age”: 12}
Expand Down Expand Up @@ -448,7 +448,7 @@ config file to include the Nextclade rules from ``ingest/rules/nextclade.smk`` a
1. Add your Nextclade dataset name to the ``nextclade.dataset_name`` parameter
2. Run the ingest workflow with the additional config file

.. code-block::
.. code-block:: bash
nextstrain build ingest --configfile defaults/nextclade_config.yaml
Expand Down
6 changes: 3 additions & 3 deletions src/tutorials/running-a-phylogenetic-workflow.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ Download the example Zika pathogen repository

:term:`Pathogen workflows<phylogenetic workflow>` are stored in :term:`pathogen repositories<pathogen repository>` (version-controlled folders) to track changes over time. Download the `example Zika pathogen repository <https://github.com/nextstrain/zika-tutorial>`_.

.. code-block::
.. code-block:: console
$ git clone https://github.com/nextstrain/zika-tutorial
Cloning into 'zika-tutorial'...
Expand All @@ -37,7 +37,7 @@ Run the workflow

Run the workflow with the :term:`Nextstrain CLI`.

.. code-block::
.. code-block:: console
$ nextstrain build --cpus 1 zika-tutorial/
Building DAG of jobs...
Expand All @@ -53,7 +53,7 @@ Visualize results

View the resulting :term:`phylogenetic dataset` using Nextstrain's visualizations.

.. code-block::
.. code-block:: console
$ nextstrain view zika-tutorial/auspice/
——————————————————————————————————————————————————————————————————————————————
Expand Down
32 changes: 16 additions & 16 deletions src/tutorials/using-a-pathogen-repo/running-an-ingest-workflow.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ Download the Zika repository
All pathogen ingest workflows are stored in :term:`pathogen repositories<pathogen repository>` (version-controlled folders) to track changes over time.
Download the `Zika repository <https://github.com/nextstrain/zika>`_.

.. code-block::
.. code-block:: console
$ git clone https://github.com/nextstrain/zika
Cloning into 'zika'...
Expand All @@ -44,13 +44,13 @@ the downloaded data into a format suitable for :term:`phylogenetic workflows <ph

1. Change directory to the Zika pathogen repository downloaded in the previous step

.. code-block::
.. code-block:: console
$ cd zika
2. Run the default ingest workflow with the :term:`Nextstrain CLI`.

.. code-block::
.. code-block:: console
$ nextstrain build ingest
Using profile profiles/default and workflow specific profile profiles/default for setting default command line arguments.
Expand Down Expand Up @@ -91,34 +91,34 @@ you can download the uncurated NCBI data.

1. Enter an interactive Nextstrain shell to be able to run the NCBI Datasets CLI commands without installing them separately.

.. code-block::
.. code-block:: console
$ nextstrain shell .
2. Create the ``ingest/data`` directory if it doesn't already exist.

.. code-block::
.. code-block:: console
$ mkdir -p ingest/data
3. Download the dataset with the pathogen NCBI taxonomy ID.

.. code-block::
.. code-block:: console
$ datasets download virus genome taxon <taxon-id> \
--filename ingest/data/ncbi_dataset.zip
4. Extract and format the metadata as a TSV file for easy inspection

.. code-block::
.. code-block:: console
$ dataformat tsv virus-genome \
--package ingest/data/ncbi_dataset.zip \
> ingest/data/raw_metadata.tsv
5. Exit the Nextstrain shell to return to your usual shell environment.

.. code-block::
.. code-block:: console
$ exit
Expand All @@ -133,13 +133,13 @@ If you wanted this field to be included in your outputs, you could perform the f

1. Create a new build config directory ``ingest/build-configs/tutorial/``

.. code-block::
.. code-block:: console
$ mkdir ingest/build-configs/tutorial
2. Copy the default config to ``ingest/build-configs/tutorial/config.yaml``

.. code-block::
.. code-block:: console
$ cp ingest/defaults/config.yaml ingest/build-configs/tutorial/config.yaml
Expand Down Expand Up @@ -168,7 +168,7 @@ Any of the config parameters can be overridden in a custom config file.

4. Run the ingest workflow again with the custom config file.

.. code-block::
.. code-block:: console
$ nextstrain build ingest --configfile build-configs/tutorial/config.yaml --forceall
Using profile profiles/default and workflow specific profile profiles/default for setting default command line arguments.
Expand All @@ -190,7 +190,7 @@ We'll walk through an example customization that joins additional metadata to th

1. Create an additional metadata file ``ingest/build-configs/tutorial/additional-metadata.tsv``

.. code-block::
.. code-block:: none
genbank_accession column_A column_B column_C
AF013415 AAAAA BBBBB CCCCC
Expand All @@ -207,7 +207,7 @@ We'll walk through an example customization that joins additional metadata to th
2. Create a new rules file ``ingest/build-configs/tutorial/merge-metadata.smk``

.. code-block::
.. code-block:: python
rule merge_metadata:
input:
Expand All @@ -234,7 +234,7 @@ default ``?`` value in the new columns.

3. Add the following to the custom config file ``ingest/build-configs/tutorial/config.yaml``

.. code-block::
.. code-block:: yaml
custom_rules:
- build-configs/tutorial/merge-metadata.smk
Expand All @@ -243,7 +243,7 @@ The ``custom_rules`` config tells the ingest workflow to include your custom rul

4. Run the ingest workflow again with the customized rule.

.. code-block::
.. code-block:: console
$ nextstrain build ingest merge_metadata --configfile build-configs/tutorial/config.yaml
Using profile profiles/default and workflow specific profile profiles/default for setting default command line arguments.
Expand All @@ -261,7 +261,7 @@ Next steps
* Run the `zika phylogenetic workflow <https://github.com/nextstrain/zika/tree/main/phylogenetic>`_ with new ingested data as input
by running

.. code-block::
.. code-block:: console
$ mv ingest/results/* phylogenetic/data/
$ nextstrain build phylogenetic
Expand Down

0 comments on commit e9721fc

Please sign in to comment.