Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Markers in 'RegionFile' are not in 'GenoFile'. The chunk is empty (Set Based Tests; Step 2) #85

Open
surakshavinod opened this issue Mar 27, 2023 · 1 comment

Comments

@surakshavinod
Copy link

Hi, I'm facing an issue where if I run the Step 2 for set based tests, an empty output file is being generated.

The following is displayed in the log repeatedly as each chunk is analyzed, so ultimately a blank file with only the header is getting generated as an output (I've masked the actual number of markers but it is in 1000s here):

Start extracting marker-level information from 'groupFile' of ./groupfile/test19.txt ....
indexChunk is  0
indexChunk  0
nRegions  1355
x markers in 'RegionFile' are not in 'GenoFile'.
Read in  100  region(s) from the group file.
The chunk is empty
x1 markers in 'RegionFile' are not in 'GenoFile'.
Read in  100  region(s) from the group file.
The chunk is empty
.
.
.

Log file:

Loading required package: RhpcBLASctl
R version 4.1.3 (2022-03-10)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /gpfs/data/user/suraksha/.conda/envs/RSAIGE/lib/libopenblasp-r0.3.20.so

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] data.table_1.14.2 optparse_1.7.1 RhpcBLASctl_0.21-247.1
[4] SAIGE_1.1.6.3

loaded via a namespace (and not attached):
[1] compiler_4.1.3 Matrix_1.4-1 Rcpp_1.0.7 getopt_1.20.3
[5] grid_4.1.3 RcppParallel_5.1.5 lattice_0.20-45

The groupfile is in the right format (chr:pos:ref:alt), so I'm unsure as to where I'm going wrong. It's probably worth mentioning that I'm using the same plink files used as input in step 1 for step 2 as well. Is this where I'm making a mistake?

Thanks,
Suraksha

@evatosco
Copy link

evatosco commented Aug 29, 2023

Hi,

I encountered the same error message, and eventually found out that the filenames I introduced were not exactly correct for all included files as input in step 2. It's not really that explanatory as an error.
In my case, the command was using an old step1 file from early trials.
Maybe you would like to check the file paths too, especially if you have several copies of the same file in multiple directories.

Hope it helps!

--

Update: I later found out a crucial detail: the format of the names of the variants in the GroupFile must be the exact same as the IDs of the variants in the plink files (where rsID are by default). I used this code:

  plink2 \
  --bfile input_pheno_sex_covars \
  --make-bed --allow-no-sex \
  --set-all-var-ids @:#:\$r:\$a \
  --out ${input_dir}/${input_vcf}_pheno_sex_covars_recodID

with plink2 to update the ID field in the plink files (in my experience, plink1.9 does not work for this), knowing that the groupfile I created had the variant IDs in the same format (separated by semicolons). It worked for me.

Hope it helps (again!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants