New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Odd sized kinship file #55

Closed
pjotrp opened this Issue Jul 9, 2017 · 8 comments

Comments

Projects
None yet
2 participants
@pjotrp
Contributor

pjotrp commented Jul 9, 2017

When running the new test testCenteredRelatednessMatrixK I get different resulting files on two build setups. One is the correct size, but the second has 1940 rows instead of 1410. The extra rows are the same as the tail of the previous - so, somehow, the output buffer gets overwritten. I'll scrutinize this more closely.

@pjotrp

This comment has been minimized.

Contributor

pjotrp commented Jul 9, 2017

Oh yes, the other tools using K don't seem to mind. They don't read the full file. So, this may have been happening before, but it did not matter for the result.

@pcarbo

This comment has been minimized.

Collaborator

pcarbo commented Jul 10, 2017

@pjotrp Should I close this? It seems that it is not a bug after all?

@pjotrp

This comment has been minimized.

Contributor

pjotrp commented Jul 10, 2017

No, I need to figure out what happened first.

@pjotrp

This comment has been minimized.

Contributor

pjotrp commented Jul 21, 2017

Getting the same bug on my laptop. Test on master branch fails with

testCenteredRelatednessMatrixK
Reading Files ... 
## number of total individuals = 1940
## number of analyzed individuals = 1410
## number of covariates = 1
## number of phenotypes = 1
## number of total SNPs = 12226
## number of analyzed SNPs = 10768
Calculating Relatedness Matrix ... 
Reading SNPs  ==================================================100.00%
## total computation time = 0.35778 min 
ASSERT:expected:<24.9799> but was:<29.691>

and

wc -l test/output/mouse_hs1940.cXX.txt 
1940 test/output/mouse_hs1940.cXX.txt

it appears it is outputting the number of individuals total, rather than the number used. @pcarbo what is the size of your output file?

@pcarbo

This comment has been minimized.

Collaborator

pcarbo commented Jul 21, 2017

@pjotrp I get an output file mouse_hs1940.cXX.txt with 1940 lines. I think the "number of analyzed individuals" is misleading (and confusing!) because it is about the phenotype data, which is irrelevant for computing the relatedness matrix. If you look at the phenotype file you will see that there are indeed 530 missing phenotype values:

$ cut -f 1 mouse_hs1940.pheno.txt | grep NA | wc -l
530

In any case, this doesn't explain the error you are getting.

@pjotrp

This comment has been minimized.

Contributor

pjotrp commented Jul 22, 2017

Cool, I swear I have seen a smaller K file. Anyway, we are getting different answers on different systems, so I am looking into that.

@pjotrp

This comment has been minimized.

Contributor

pjotrp commented Jul 26, 2017

Turns out that the output files are identical, but that awk gives a different result on different machines! Not a gemma problem - I'll fix the test.

@pjotrp pjotrp closed this Jul 26, 2017

pjotrp added a commit to genenetwork/GEMMA that referenced this issue Jul 26, 2017

@pjotrp pjotrp referenced this issue Jul 26, 2017

Merged

Tests #60

@pcarbo

This comment has been minimized.

Collaborator

pcarbo commented Jul 26, 2017

@pjotrp Interesting, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment