SierraPy

SierraPy package contains a client and a command line program for HIVDB Sierra GraphQL Webservice.

Installation

With the Python package installation tool pip one can easily install this package using the command below:

pip install sierrapy

To install the latest version (unstable) from Github:

pip install -e "git+https://github.com/hivdb/sierra-client.git#egg=sierrapy&subdirectory=python"

Usage

Once installed, a command sierrapy is available from the command line. The command currently support three main methods to fetch JSON-format drug-resistance report from backend service.

Select a virus

By default, SierraPy send queries to our HIV-1 analysis server. This behavior can be changed by specify --virus after the sierrapy command:

# For HIV-2 analysis
sierrapy --virus HIV2 ...

For SARS-CoV-2 analysis
sierrapy --virus SARS2 ...

Specify GraphQL entry-point

By default, SierraPy send queries to Sierra production servers maintained by HIVDB team. If users wish to host their own Sierra server, they must specify the entry-point of Sierra GraphQL. For example:

sierrapy --url http://localhost:8080/WebApplications/rest/graphql ...

Input Sequences (FASTA File)

This method is corresponding to the HIVDB "Input sequences" tab. It can accept any large number of files and sequences as long as you don't blow up your computer. The input FASTA files should contain at least one HIV/SIV pol DNA sequence.

You can specify one or more FASTA-format files to method sierrapy fasta. Use the following command to output the result to your console:

sierrapy fasta fasta1.fasta fasta2.fasta

You may redirect the output to a file using -o or --output parameter:

sierrapy fasta fasta1.fasta fasta2.fasta -o output.json

GraphQL allows users to customize the structure of the output result by defining the query. The sierrapy fasta method accepts an optional parameter -q or --query which allows users to define a custom query fragment on the SequenceAnalysis object. An example and the default query can be found in the "fragments" folder. Once you have your custom query, assuming it is saved at path/to/your/query/file.gql, use this command to get the customized result:

sierrapy fasta fasta1.fasta fasta2.fasta -q path/to/your/query/file.gql

For further infomations on how to write queries in GraphQL, please visit graphql.org/learn. For API reference and a playground of HIVDB GraphQL service, please visit hivdb.stanford.edu/page/graphiql.

Sharding

By default, SierraPy stores the results of every 100 sequences in a single JSON file to prevent memory exhaustion. You can override this behavior by passing the --sharding parameter to the command. It is safe to increase the --sharding value to 200 or even 500. However, as the value increases, the JSON file will become increasingly difficult to read or write with most popular JSON parsers:

sierrapy fasta fasta1.fasta fasta2.fasta --sharding 200

You can also disable the sharding mechanism completely by using the --no-sharding flag. This will prevent the addition of a suffix to the output JSON file.

sierrapy fasta fasta1.fasta fasta2.fasta --no-sharding

Input Sequence Reads (CodFreq File)

This method is corresponding to the HIVDB "Input sequence reads" tab. It can accept a list of CodFreq files or directories that containing CodFreq files. JSON format reports will be generated for each CodFreq files when the analysis completed.

sierrapy seqreads path/to/codfreq/dir/ additional.codfreq.gz

The reports will be placed in the same directory of the input CodFreq file and named with suffix ".report.json".

Users can customize the GraphQL by defining the query. A parameter -q or --query can be specified for defining the query fragment on SequenceReadsAnalysis object. An example and the default query is located in the "fragments" folder.

Input Mutations

This method is corresponding to the HIVDB "Input mutations" tab. It accepts PR, RT, and/or IN mutations based on HIV-1 subtype B consensus. The format of the mutations is not strictly required. Here's a list of examples for valid mutations:

PR:E35E_D, PRE35_, PR:35Insertion, and PR35ins are all valid insertions at PR codon 35 position.
RT:T67-, RT67Deletion, RT67d, and RT69del are all valid deletions at RT codon 67 position.
IN:M50MI, a mutation at IN codon 50 position and contains mixture.
IN:M50*, a mutation at IN codon 50 position and is a stop codon.

You can specify one or more mutations to method sierrapy mutations. Use the following command to output the result to your console:

sierrapy mutations PR:E35E_D RT:T67- IN:M50MI

You may redirect the output to a file using -o or --output parameter:

sierrapy mutations PR:E35E_D RT:T67- IN:M50MI -o output.json

You can also specify a custom query fragment on object MutationsAnalysis. Use the similar command like previous section to retrieve custom result.

Input Patterns

A pattern is a set (list) of mutations. With this method, you can analyze mutations derived from different samples at the same time. The method accepts one or more files contained mutations. Each row in the files represents a pattern. Here's an example of a file contained 2 patterns:

> patient 1
RT:M41L + RT:M184V + RT:L210W + RT:T215Y
> patient 2
PR:L24I + PR:M46L + PR:I54V + PR:V82A

These delimiters are supported: commas (,), plus signs(+), semicolon(;), whitespaces and tabs. The output result of this method is a list of MutationsAnalysis object, in the same order as the input.

Here's a command example. It output the JSON result to the current console:

sierrapy patterns /path/to/pattern/file.txt

This one output the JSON result to a file:

sierrapy patterns /path/to/pattern/file.txt -o output.json

Custom query fragment on object MutationsAnalysis can be also specified by parameter -q or --query. As we described in the above section. This method also supports the same --sharding parameter described in the fasta method.

Donation

If you find SierraPy useful and wish to donate to the HIVDB team, you can do so through Stanford Make a Gift form. Your contribution will be greatly appreciated.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

SierraPy

Installation

Usage

Select a virus

Specify GraphQL entry-point

Input Sequences (FASTA File)

Sharding

Input Sequence Reads (CodFreq File)

Input Mutations

Input Patterns

Donation

Files

README.md

Latest commit

History

README.md

File metadata and controls

SierraPy

Installation

Usage

Select a virus

Specify GraphQL entry-point

Input Sequences (FASTA File)

Sharding

Input Sequence Reads (CodFreq File)

Input Mutations

Input Patterns

Donation