-
Notifications
You must be signed in to change notification settings - Fork 2
/
README
882 lines (589 loc) · 28.3 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
MVTest - GWAS Analysis
**********************
* Installation
* Install with PIP
* Manual Installation
* System Requirements
* Running Unit Tests
* Virtual Env
* Miniconda
* What is MVtest?
* Documentation
* Command-Line Arguments
* mvmany Helper script
* The Default Template
* Command Line Arguments
* Development Notes
* MVtest authors
* Change Log
* Changelog
Installation
************
libGWAS requires python 3.7.x as well as the following libraries:
* NumPy (version 1.16.2 or later) www.numpy.org
* SciPY (version 1.3.0 or later) www.scipy.org
* pytabix (version 0.1 or later) https://pypi.org/project/pytabix/
* bgen-reader (version 3.0.6 or later) https://pypi.org/project
/bgen-reader/
libGWAS’s installation will attempt to install these required
components for you, however, it requires that you have write
permission to the installation directory. If you are using a shared
system and lack the necessary privileges to install libraries and
software yourself, you should please see one of the sections,
Miniconda or virtual-env below for instructions on different options
for setting up your own python environement which will exist entirely
under your own control.
Installation
===================
To install libGWAS, simply clone the sources using the following command:
$ *git clone https://github.com/edwards-lab/libGWAS*
Or you may visit the website and download the tarball directly from
github: https://github.com/edwards-lab/libGWAS
Once you have downloaded the software, simply extract the contents and
run the following command to install it:
$ *pip install .*
If no errors are reported, it should be installed and ready to use.
**Regarding PYTHON 2** I have completely switched over to python 3
without trying to remain compatible with Python 2 because the
bgen_reader no longer supports 2 and the end of life is only a few
months from my writing this.
As such, if you wish to use python2, you will need to install an older
version.
System Requirements
===================
Aside from the library dependencies, libGWAS’s requirements depend
largely on the number of SNPs and individuals being analyzed as well
as the data format being used. In general, GWAS sized datasets will
require several gigabytes of memory when using the traditional
pedigree format, however, even 10s of thousands of subjects can be
analyzed with less than 1 gigabyte of RAM when the data is formatted
as transposed pedigree or PLINK’s default bed format.
Otherwise, it is recommended that the system be run on a unix-like
system such as Linux or OS X, but it should work under windows as well
(we can’t offer support for running libGWAS under windows).
Running Unit Tests
==================
libGWAS comes with a unit test suite which can be run prior to
installation. To run the tests, simply run the following command from
within the root directory of the extracted archive’s contents:
$ *pytest*
If no errors are reported, then mvtest should run correctly on your
system.
For example:
```
$ pytest
================================================================ test session starts =================================================================
platform linux -- Python 3.9.12, pytest-7.4.0, pluggy-1.2.0
rootdir: /mnt/d/common/dev/mvtest/ACCRE/mvtest-sim/libGWAS
collected 185 items
libgwas/tests/bed_parser_test.py ............... [ 8%]
libgwas/tests/test_bgen_parser.py ... [ 9%]
libgwas/tests/test_boundary.py ............... [ 17%]
libgwas/tests/test_impute_parser.py ...................... [ 29%]
libgwas/tests/test_libbasics.py .. [ 30%]
libgwas/tests/test_locus.py .... [ 32%]
libgwas/tests/test_mach_parser.py ............... [ 41%]
libgwas/tests/test_pedigree_parser.py ................................... [ 60%]
libgwas/tests/test_phenocovar.py ................................. [ 77%]
libgwas/tests/test_transped_parser.py ........................... [ 92%]
libgwas/tests/test_vcf_parser.py .............. [100%]
================================================================ 185 passed in 12.58s ================================================================
```
Virtual Env
===========
Virtual ENV is a powerful too for python programmers and end users
alike as it allows for users to deploy different versions of python
applications without the need for root access to the machine.
Because libGWAS requires version 2.7, you’ll need to ensure that your
machine’s python version is in compliance. Virtual Env basically uses
the the system version of python, but creates a user owned environment
wrapper allowing users to install libraries easily without
administrative rights to the machine.
For a helpful introduction to VirtualEnv, please have a look at the
tutorial: http://www.simononsoftware.com/virtualenv-tutorial/
Miniconda
=========
Miniconda is a minimal version of the package manager used by the
Anaconda python distribution. It makes it easy to create local
installations of python with the latest versions of the common
scientific libraries for users who don’t have root access to their
target machines. Basically, when you use miniconda, you’ll be
installing your own version of Python into a directory under your
control which allows you to install anything else you need without
having to submit a helpdesk ticket for administrative assistance.
Unlike pip, the folks behind the conda distributions provide binary
downloads of it’s selected library components. As such, only the most
popular libraries, such as pip, NumPY and SciPy, are supported by
conda itself. However, these do not require compilation and may be
easier to get installed than when using pip alone. I have experienced
difficulty installing SciPy through pip and setup tools on our cluster
here at vanderbilt due to non-standard paths for certain required
components, but mini-conda always comes through.
Firstly, download and install the appropriate version of miniconda at
the project website. Please be sure to choose the Python 2 version:
http://conda.pydata.org/miniconda.html
While it is doing the installation, please allow it to update your
PATH information. If you prefer not to always use this version of
python in the future, simple tell it not to update your .bashrc file
and note the instructions for loading and unloading your new python
environment. Please note that even if you chose to update your .bashrc
file, you will need to follow directions for loading the changes into
your current shell.
Once those changes have taken effect, install setuptools and scipy: $
*conda install pip scipy*
Installing SciPy will also force the installation of NumPy, which is
also required for running mvtest. (setuptools includes easy_install).
Once that has been completed successfully, you should be ready to
follow the standard instructions for installing mvtest.
What is MVtest?
***************
*TODO: Write some background information about the application and
it’s scientific basis.*
Documentation
=============
Documentation for MVtest is still under construction. However, the
application provides reasonable inline help using standard unix help
arguments:
> *mvtest.py -h*
or
> *mvtest.py –help*
In general, overlapping functionality should mimic that of PLINK.
Command-Line Arguments
======================
Command line arguments used by MVtest often mimick those used by
PLINK, except where there is no matching functionality (or the
functionality differs significantly.)
For the parameters listed below, when a parameter requires a value,
the value must follow the argument with a single space separating the
two (no ‘=’ signs.) For flags with no specified value, passing the
flag indicates that condition is to be “activated”.
When there is no value listed in the “Type” column, the arguments are
*off* by default and *on* when the argument is present (i.e. by
default, compression is turned off except when the flag, –compression,
has been provided.)
Getting help
------------
-h, --help
Show this help message and exit.
-v
Print version number
Input Data
----------
MVtest attempts to mimic the interface for PLINK where appropriate.
All input files should be whitespace delimited. For text based allelic
annotations, 1|2 and A|C|G|T annotation is sufficient. All data must
be expressed as alleles, not as genotypes (except for IMPUTE output,
which is a specialized format that is very different from the other
forms).
For Pedigree, Transposed Pedigree and PLINK binary pedigree files, the
using the PREFIX arguments is sufficient and recommended if your files
follow the standard naming conventions.
Pedigree Data
~~~~~~~~~~~~~
Pedigree data is fully supported, however it is not recommended. When
loading pedigree data, MVtest must load the entire dataset into memory
prior to analysis, which can result in a substantial amount of memory
overhead that is unnecessary.
Flags like –no-pheno and –no-sex can be used in any combination
creating MAP files with highly flexible header structures.
--file <prefix>
(filename prefix) Prefix for .ped and .map files
--ped <filename>
PLINK compatible .ped file
--map <filename>
PLink compatible .map file
--map3
Map file has only 3 columns
--no-sex
Pedigree file doesn’t have column 5 (sex)
--no-parents
Pedigree file doesn’t have columns 3 and 4 (parents)
--no-fid
Pedgiree file doesn’t have column 1 (family ID)
--no-pheno
Pedigree file doesn’t have column 6 (phenotype)
--liability
Pedigree file has column 7 (liability)
PLINK Binary Pedigree
~~~~~~~~~~~~~~~~~~~~~
This format represents the most efficient storage for large GWAS
datasets, and can be used directly by MVtest. In addition to a minimal
overhead, plink style bed files will also run very quickly, due to the
efficient disk layout.
--bfile <prefix>
(filename prefix) <prefix> for .bed, .bim and .fam files
--bed <filename>
Binary Ped file(.bed)
--bim <filename>
Binary Ped marker file (.bim)
--fam <filename>
Binary Ped family file (.fam)
Transposed Pedigree Data
~~~~~~~~~~~~~~~~~~~~~~~~
Transposed Pedigree data is similar to standard pedigree except that
the data is arranged such that the data is organized as SNPs as rows,
instead of individuals. This allows MVtest to run it’s analysis
without loading the entire dataset into memory.
--tfile <prefix>
Prefix for .tped and .tfam files
--tped <filename>
Transposed Pedigre file (.tped)
--tfam <filename>
Transposed Pedigree Family file (.tfam)
Pedigree/Transposed Pedigree Common Flags
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
By default, Pedigree and Transposed Pedigree data is assumed to be
uncompressed. However, MVtest can directly use gzipped data files if
they have the extension .tgz with the addition of the –compressed
argument.
--compressed
Indicate that ped/tped files have been compressed with gzip and are
named with extensions such as .ped.tgz and .tped.tgz
IMPUTE output
~~~~~~~~~~~~~
MVtest doesn’t call genotypes when performing analysis and allows
users to define which model to use when analyzing the data. Due to the
fact that there is no specific location for chromosome within the
input files, MVtest requires that users provide chromosome, impute
input file and the corresponding .info file for each imputed output.
Due to the huge number of expected loci, MVtest allows users to
specify an offset and file count for analysis. This is to allow users
to run multiple jobs simultaneously on a cluster and work individually
on separate impute region files. Users can segment those regions even
further using standard MVtest region selection as well.
By default, all imputed data is assumed to be compressed using gzip.
Default naming convention is for impute data files to end in .gen.gz
and the info files to have the same name except for the end being
replaced by .info.
--impute <filename>
File containing list of impute output for analysis
--impute-fam <filename>
File containing family details for impute data
--impute-offset <integer>
Impute file index (1 based) to begin analysis
--impute-count <integer>
Number of impute files to process (for this node). Defaults to all
remaining.
--impute-uncompressed
Indicate that the impute input is not gzipped, but plain text
--impute-encoding
(additive,dominant or recessive)
Genetic model to be used when analyzing imputed data.
--impute-info-ext <extension>
Portion of filename denotes info filename
--impute-gen-ext <extension>
Portion of filename that denotes gen file
--impute-info-thresh <float>
Threshold for filtering imputed SNPs with poor ‘info’ values
IMPUTE File Input
~~~~~~~~~~~~~~~~~
When performing an analysis on IMPUTE output, users must provide a
single file which lists each of the gen files to be analyzed. This
plain text file contains 2 (or optionally 3) columns for each gen
file:
+------------------+----------------+----------------------------------+
| **Chromosome** | **Gen File** | **.info <filename> (optional)** |
|==================|================|==================================|
| N | <filename> | <filename> |
+------------------+----------------+----------------------------------+
| … | … | … |
+------------------+----------------+----------------------------------+
The 3rd column is only required if your .info files and .gen files are
not the same except for the <extension>.
MACH output
~~~~~~~~~~~
Users can analyze data imputed with MACH. Because most situations
require many files, the format is a single file which contains either
pairs of dosage/info files, or, if the two files share the same
filename except for extensions, one dosage file per line.
Important: MACH doesn’t provide anywhere to store chromosome and
positions. Users may wish to embed this information into the first
column inside the .info file. Doing so will allow MVtest to
recognize those values and populate the corresponding fields in the
report.To use this feature, users much use the –mach-chrpos field
and their ID columns inside the .info file must be formatted in the
following way:chr:pos (optionally :rsid)When the –mach-chrpos flag
is used, MVtest will fail when it encounters IDs that aren’t in this
format and there must be at least 2 ‘fields’ (i.e. there must be at
least one “:” character.When processing MACH imputed data without
this special encoding of IDs, MCtest will be unable to recognize
positions. As a result, unless the –mach-chrpos flag is present,
MVtest will exit with an error if the user attempts to use
positional filters such as –from-bp, –chr, etc.
When running MVtest using MACH dosage on a cluster, users can instruct
a given job to anlyze data from a portion of the files contained
within the MACH dosage file list by changing the –mach-offset and
–mach-count arguments. By default, the offset starts with 1 (the first
file in the dosage list) and runs all it finds. However, if one were
to want to split the jobs up to analyze three dosage files per job,
they might set those values to –mach-offset 1 –mach-count 3 or –mach-
offset 4 –mach-count 3 depending on which job is being defined.
In order to minimize memory requirements, MACH dosage files can be
loaded incrementally such that only N loci are stored in memory at a
time. This can be controlled using the –mach-chunk-size argument. The
larger this number is, the faster MVtest will run (fewer times reading
from file) but the more memory is required.
--mach <filename>
File containing list of dosages, one per line. Optionally, lines
may contain the info names as well (separated by whitespace) if the
two <filename>s do not share a common base name.
--mach-offset <integer>
Index into the MACH file to begin analyzing
--mach-count <integer>
Number of dosage files to analyze
--mach-uncompressed
By default, MACH input is expected to be gzip compressed. If data
is plain text, add this flag. *It should be noted that dosage and
info files should be either both compressed or both uncompressed.*
--mach-chunk-size <integer>
Due to the individual orientation of the data, large dosage files
are parsed in chunks in order to minimize excessive memory during
loading
--mach-info-ext <extension>
Indicate the <extension> used by the mach info files
--mach-dose-ext <extension>
Indicate the <extension> used by the mach dosage files
--mach-min-rsquared <float>
Indicate the minimum threshold for the rsqured value from the .info
files required for analysis.
--mach-chrpos
When set, MVtest expects IDs from the .info file to be in the
format chr:pos:rsid (rsid is optional). This will allow the report
to contain positional details, otherwise, only the RSID column will
have a value which will be the contents of the first column from
the .info file
MACH File Input
~~~~~~~~~~~~~~~
When running an analysis on MACH output, users must provide a single
file which lists of each dosage file and (optionally) the matching
.info file. This file is a simple text file with either 1 column (the
dosage filename) or 2 (dosage filename followed by the info filename
separated by whitespace).
The 2nd column is only required if the filenames aren’t identical
except for the extension.
+--------------------------------+----------------------------------------+
| **Col 1 (dosage <filename>)** | **Col 2 (optional info <filename>)** |
|================================|========================================|
| <filename>.dose | <filename>.info |
+--------------------------------+----------------------------------------+
| … | … |
+--------------------------------+----------------------------------------+
Phenotype/Covariate Data
~~~~~~~~~~~~~~~~~~~~~~~~
Phenotypes and Covariate data can be found inside either the standard
pedigree headers or within special PLINK style covariate files. Users
can specify phenotypes and covariates using either header names (if a
header exists in the file) or by 1 based column indices. An index of 1
actually means the first variable column, not the first column. In
general, this will be the 3rd column, since columns 1 and 2 reference
FID and IID.
--pheno <filename>
File containing phenotypes. Unless –all-pheno is present, user must
provide either index(s) or label(s) of the phenotypes to be
analyzed.
--mphenos LIST
Column number(s) for phenotype to be analyzed if number of columns
> 1. Comma separated list if more than one is to be used.
--pheno-names LIST
Name for phenotype(s) to be analyzed (must be in –pheno file).
Comma separated list if more than one is to be used.
--covar <filename>
File containing covariates
--covar-numbers LIST
Comma-separated list of covariate indices
--covar-names LIST
Comma-separated list of covariate names
--sex
Use sex from the pedigree file as a covariate
--missing-phenotype CHAR
Encoding for missing phenotypes as can be found in the data.
--all-pheno
When present, mv-test will run each phenotypes found inside the
phenotype file.
Restricting regions for analysis
--------------------------------
When specifying a range of positions for analysis, a chromosome must
be present. If a chromosome is specified but is not accompanied by a
range, the entire chromosome will be used. Only one range can be
specified per run.
In general, when specifying region limits, –chr must be defined unless
using generic MACH input (which doesn’t define a chromosome number nor
position, in which case positional restrictions do not apply).
--snps LIST
Comma-delimited list of SNP(s): rs1,rs2,rs3-rs6
--chr <integer>
Select Chromosome. If not selected, all chromosomes are to be
analyzed.
--from-bp <integer>
SNP range start
--to-bp <integer>
SNP range end
--from-kb <integer>
SNP range start
--to-kb <integer>
SNP range end
--from-mb <integer>
SNP range start
--to-mb <integer>
SNP range end
--exclude LIST
Comma-delimited list of rsids to be excluded
--remove LIST
Comma-delimited list of individuals to be removed from analysis.
This must
be in the form of family_id:individual_id
--maf <float>
Minimum MAF allowed for analysis
--max-maf <float>
MAX MAF allowed for analysis
--geno <integer>
MAX per-SNP missing for analysis
--mind <integer>
MAX per-person missing
--verbose
Output additional data details in final report
mvmany Helper script
********************
In addition to the analysis program, mvtest.py, a helper script,
mvmany.py is also included and can be used to split large jobs into
smaller ones suitable for running on a compute cluster. Users simply
run mvmany.py just like they would run mvtest.py but with a few
additional parameters, and mvmany.py will build multiple job scripts
to run the jobs on multiple nodes. It records most arguments passed to
it and will write them to the scripts that are produced.
It is important to note that mvmany.py simply generates cluster
scripts and does not submit them.
The Default Template
====================
When mvmany.py is first run, it will generate a copy of the default
template inside the user’s home directory named .mv-many.template.
This template is used to define the job details that will be written
to each of the job scripts. By default, the template is configured for
the SLURM cluster software, but can easily be changed to work with any
cluster software that works similarly to the SLURM job manager, such
as TORQUE/PBS or sungrid.
In addition to being able to replace the preprocessor definitions to
work with different cluster manager software, the user can also add
user specific definitions, such as email notifications or account
specification, giving the user the the options necessary to run the
software under many different system configurations.
Example Template (SLURM)
------------------------
An example template might look like the following
#!/bin/bash
#SBATCH --job-name=$jobname
#SBATCH --nodes=1
#SBATCH --tasks-per-node=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=$memory
#SBATCH --time=$walltime
#SBATCH --error $logpath/$jobname.e
#SBATCH --output $respath/$jobname.txt
cd $pwd
$body
It is important to note that this block of text contains a mix of
SLURM preprocessor settings (such as #SBATCH –job-name) as well as
variables which will be replaced with appropriate values (such as
$jobname being replaced with a string of text which is unique to that
particular job). Each cluster type has it’s own syntax for setting the
necessary variables and it is assumed that the user will know how to
correctly edit the default template to suit their needs.
Example TORQUE Template
-----------------------
For instance, to use these scripts on a TORQUE based cluster, one
might update ~/.mvmany.template to the following
#!/bin/bash
#PBS -N $jobname
#PBS -l nodes=1
#PBS -l ppn=1
#PBS -l mem=$memory
#PBS -l walltime=$walltime
#PBS -e $logpath/$jobname.e
#PBS -o $respath/$jobname.txt
cd $pwd
$body
Please note that not all SLURM settings have a direct mapping to PBS
settings and that it is up to the user to understand how to properly
configure their cluster job headers.
In general, the user should ensure that each of the variables are
properly defined so that the corresponding values will be written to
the final job scripts. The following variables are replaced based on
the job that is being performed and the parameters passed to the
program by the user (or their default values):
+-----------------------------------+-----------------------------------------------+
| **Variable** | **Purpose** |
|===================================|===============================================|
| $jobname | Unique name for the current job |
+-----------------------------------+-----------------------------------------------+
| $memory (2G) | Amount of memory to provide each job. |
+-----------------------------------+-----------------------------------------------+
| $walltime (3:00:00) | Define amount of time to be assigned to jobs |
+-----------------------------------+-----------------------------------------------+
| $logpath | Directory specified for writing logs |
+-----------------------------------+-----------------------------------------------+
| $respath | Directory sepcified for writing results |
+-----------------------------------+-----------------------------------------------+
| $pwd | current working dir when mvmany is run |
+-----------------------------------+-----------------------------------------------+
| $body | Statements of execution |
+-----------------------------------+-----------------------------------------------+
Command Line Arguments
======================
mvmany.py exposes the following additional arguments for use when
running the script.
--mv-path PATH
Set path to mvtest.py if it’s not in PATH
--logpath PATH
Path to location of job’s error output
--res-path PATH
Path to location of job's results
--script-path PATH
Path for writing script files
--template FILENAME
Specify a template other than the default
--snps-per-job INTEGER
Specify the number of SNPs to be run at one time
--mem STRING
Specify the amount of memory to be requested for each job
--wall-time
Specify amount of time to be requested for each job
The option, –mem, is dependent on the type of input that is being used
as well as configurable options to be used. The user should perform
basic test runs to determine proper settings for their jobs. By
default, 2G is used, which is generally more than adequate for binary
pedigrees, IMPUTE and transposed pedigrees. Others will vary greatly
based on the size of the dataset and the settings being used.
The option, –wall-time, is largely machine dependent but will vary
based on the actual dataset’s size and completeness of the data. Users
should perform spot tests to determine reasonable values. By default,
the requested wall-time is 3 days, which is sufficient for a GWAS
dataset, but probably not sufficient for an entire whole exome dataset
and the time required will depend on just how many SNPs are being
analyzed by any given node.
In general, mvmany.py accepts all arguments that mvtest.py accepts,
with the exception of those that are more appropriately defined by
mvmany.py itself. These include the following arguments
--chr
--snps
--from-bp
--to-bp
--from-kb
--to-kb
--from-mb
--to-mb
To see a comprehensive list of the arguments that mvmany.py can use
simply ask the program itself
mvmany.py --help
Users can have mvmany split certain types of jobs up into pieces and
can specify how many independent commands to be run per job. At this
time, mvmany.py assumes that imputation data is already split into
fragments and doesn’t support running parts of a single file on
multiple nodes.
The results generated can be manually merged once all nodes have
completed execution.
Changelog
=========
libGWAS.py: 1.0.0 released
* Migrated library out from MVtest in preparation for release of new analysis program
libGWAS.py: 1.1.0
* Added support for bgen and vcf file formats