Check sex of samples using BAM files, k-means clustering, and the number of reads in the sex chromosomes of samples.
--samplesDir
Path to a directory containing BAM files of all samples or subdirectories of the BAM files.--cores
Number of cores to use for multiprocessing. Default is 1.--sampleInfo
Path to a sample info .csv file. Must containName
andSex
columns for each sample.
- Calculate sex chromosome ratios for each sample.
- For k-means clustering,
k = 2
to represent the male and female sexes. All sex chromosome read ratios are divided into two clusters.
- If the value of one cluster center is greater than two the value of the second cluster center, then both male and female samples exist. The sex of samples will be predicted as
M
orF
. - Otherwise, the sex prediction will be recorded as
all M or all F
.
If there is a mismatch between the predicted sex of a sample and its recorded sex in the sample info file, the Mismatch
column in the output file will record Mismatch
. Otherwise, if the predicted sex is the same as the sample info sex, a single period .
will be recorded in the Mismatch
column.
sex_checker_[current date and time].print
Stores all print statements.sex_checker_output_[current date and time].txt
Output of sex checker in tab-delimited format.
Sample_name | ChrX_reads | ChrY_reads | ChrY:ChrX_ratio | ChrY:ChrX_percent | Predicted_sex | Sample_info_sex | Mismatch |
---|---|---|---|---|---|---|---|
Sample_1 | 4890507 | 20417 | 0.004174823 | 0.41748 | F | F | . |
Sample_2 | 5550573 | 24946 | 0.004494311 | 0.44943 | F | M | Mismatch |
Sample_3 | 2990996 | 356739 | 0.119270972 | 11.9271 | M | M | . |