This document will walk you through the steps of how to visualize your rFon1D outputs from ortho_seqs using the rf1d-viz CLI command.
Note: rf1d-viz assumes that you have already run orthogonal_polynomial on the dataset. For a tutorial on how to run orthogonal_polynomial, view the tutorial here.
- The {trait_file_name}_regressions.npz file that is returned from orthogonal-polynomial.
- The rf1d form of the alphabet input.
When you run orthogonal-polynomial, the CLI will output the following text towards the beginning:
rf1d form of alphabet input:
The line beneath that line is the rf1d form of the alphabet input.
- The molecule type of the sequence (mostly DNA or protein).
- What the phenotype values are representing.
rf1d-viz will require you to input the following flags, many of which have counterparts in orthogonal-polynomial:
--filename
This will be the {trait_file_name}_regressions.npz file that is returned from orthogonal-polynomial.
--alphbt_input
This will be the rf1d form of the alphabet input.
--molecule
This is the molecule type.
--phenotype
This is the phenotype type. It will be used for labelling the graphs.
--out_dir
This is where you want the graphs stored. Note: the path must exist prior to running rf1d-viz.
--action
This is where you specify what kind of visualization you want. The current options are:
barplot - This will create a barplot of the rFon1D values, grouped by site and alphabet input. This is called automatically when you run orthogonal-polynomial.
density - This will create a density plot of the rFon1D values.
summary - Prints out the number of sites and dimensions, the alphabet input, the molecule, and calls sort (another rf1d-viz action that is explained in further detail below). This is called in orthogonal-polynomial automatically, and will be saved to the out_dir as summary.txt.
heatmap - This will create a heatmap of the rFon1D values, grouped by site and alphabet input.
boxplot - This will create a boxplot of the rFon1D values, grouped by .
sort - This will print out the top 10 rFon1D values by magnitude, including the rFon1D value, the site, and the group it belongs to. This will be saved to the out_dir as sort.txt.
ALL - This will produce a barplot, histogram, heatmap, and boxplot simultaneously.
Note: For now, you will need to close the first graph once it displays on your computer for the rest of the graphs to run.
Similarly to orthogonal-polyomial, you will run rf1d-viz in your CLI, first starting with the keyword ortho_seq, but now followed by rf1d-viz, instead of orthogonal-polynomial. The general format is
ortho_seq rf1d-viz filename --alphbt_input --molecule --phenotype --out_dir --action
where filename represents the --filename flag.
The example uses the Sidhu dataset, which is the same as was used for the orthogonal-polynomial tutorial. Recall that the input for orthogonal-polynomial was:
ortho_seq orthogonal-polynomial ortho_seq_code/Sidhu/Sidhu.xlsx --molecule protein --poly_order first --out_dir docs/source/tutorial_outputs --alphbt_input SYG,R --min_pct 40 --pheno_name ELISA
The regression file that will be used for rf1d-viz will thus be called
Sidhu_regressions.npz
Using the CLI output, we obtain
rf1d form of alphabet input:
SYG,R,z,n
which reveals that the rf1d form of the alphabet input is SYG,R,z,n.
With these in mind, the CLI input for rf1d-viz for a barplot will be
ortho_seq rf1d-viz docs/source/tutorial_outputs/Sidhu_regressions.npz --alphbt_input SYG,R,z,n --molecule protein --phenotype ELISA --out_dir docs/source/tutorial_outputs --action barplot
This line of code will reproduce the graph that is automatically run, and looks like
Notice how the y axis is labelled with the phenotype name specified
The CLI input for rf1d-viz for a density plot will be
ortho_seq rf1d-viz docs/source/tutorial_outputs/Sidhu_regressions.npz --alphbt_input SYG,R,z,n --molecule protein --phenotype ELISA --out_dir docs/source/tutorial_outputs --action density
The graph looks like
Run summary with
ortho_seq rf1d-viz docs/source/tutorial_outputs/Sidhu_regressions.npz --alphbt_input SYG,R,z,n --molecule protein --phenotype ELISA --out_dir docs/source/tutorial_outputs --action summary
The output will be
rf1d Object:
Number of sites: 19
Number of dimensions: 4
Alphabet input: ['SYG', 'R', 'z', 'n']
Molecule: protein
Phenotype represents ELISA values
Image output directory: docs/source/tutorial_outputs
Highest rFon1D magnitudes:
-1.3014 Site: 0 Key: SYG
1.3014 Site: 0 Key: R
1.1394 Site: 8 Key: R
1.1394 Site: 10 Key: R
1.1229 Site: 9 Key: z
1.1229 Site: 12 Key: z
1.1229 Site: 8 Key: z
1.1229 Site: 13 Key: R
1.0344 Site: 16 Key: z
-0.9606 Site: 10 Key: z
The CLI input for rf1d-viz for a heatmap will be
ortho_seq rf1d-viz docs/source/tutorial_outputs/Sidhu_regressions.npz --alphbt_input SYG,R,z,n --molecule protein --phenotype ELISA --out_dir docs/source/tutorial_outputs --action heatmap
The graph looks like
The CLI input for rf1d-viz for a boxplot will be
ortho_seq rf1d-viz docs/source/tutorial_outputs/Sidhu_regressions.npz --alphbt_input SYG,R,z,n --molecule protein --phenotype ELISA --out_dir docs/source/tutorial_outputs --action boxplot
The graph looks like
Lastly, this is the input for sort:
ortho_seq rf1d-viz docs/source/tutorial_outputs/Sidhu_regressions.npz --alphbt_input SYG,R,z,n --molecule protein --phenotype ELISA --out_dir docs/source/tutorial_outputs --action sort
The output will be
-1.3014 Site: 0 Key: SYG
1.3014 Site: 0 Key: R
1.1394 Site: 8 Key: R
1.1394 Site: 10 Key: R
1.1229 Site: 9 Key: z
1.1229 Site: 12 Key: z
1.1229 Site: 8 Key: z
1.1229 Site: 13 Key: R
1.0344 Site: 16 Key: z
-0.9606 Site: 10 Key: z
As you can see, this prints out the second half of the summary output, since summary calls sort.