Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hic-pro 3 breaks ice normalized matrixes and sparsetodense.py #433

Closed
ChenfuShi opened this issue Apr 19, 2021 · 8 comments
Closed

hic-pro 3 breaks ice normalized matrixes and sparsetodense.py #433

ChenfuShi opened this issue Apr 19, 2021 · 8 comments

Comments

@ChenfuShi
Copy link

Hi,
I updated to hic-pro version 3.0.0 but now the ice normalized matrixes are not compatible anymore with the sparsetodense script.

the ICE matrixes seem to have a new format, the old format had 3 columns with binX, binY and counts. The new format has 3 rows and they seem to be all floating point numbers.
This affects only the ice normalized matrixes and not the raw matrixes, I imagine this is caused by some bug in the ice normalization script.

The file is too big to upload but if you need it i can upload it somewhere else.

Thanks!

@nservant
Copy link
Owner

Hi,
The iced version you are using is bugged ! please update the package.
Sorry for that
N

@ChenfuShi
Copy link
Author

i was using 0.5.4 which is the one that is set in your environment.yml file!
I'll update it and see what happens

@nservant
Copy link
Owner

Yes, my fault. I'll update it. Sorry
N

@ChenfuShi
Copy link
Author

Thanks!

@nservant
Copy link
Owner

Of note, another user reported me that the latest iced version 0.5.8 may also have different outputs compared to the previous ones.
The two first columns are expected to be one-based indices, while in the 0.5.8 they are zero-based.
If you upgrade to 0.5.8 and observe the same thing, please let me know ... so that I can contact the iced developer.
Thanks

@ChenfuShi
Copy link
Author

yes it looks like they are 0-based

@ChenfuShi ChenfuShi reopened this Jun 7, 2021
@esebesty
Copy link

esebesty commented Jun 28, 2021

I ran into this issue while trying to convert the ice normalized matrix to the format needed by TopDom. Looks like a simple matrix transpose with R on the ice normalized matrix solves the issue.

library("data.table")

icematrix   <- fread(file = "data/sample_1000000_iced.matrix", header = FALSE)
icematrix_t <- t(icematrix)
icematrix_d <- as.data.frame(icematrix_t)

write.table(icematrix_d, file = "data/sample_1000000_iced_corr.matrix", sep = "\t", quote = FALSE,
            col.names = FALSE, row.names = FALSE)

The new matrix looks like this:

1       1       3590.51297981505
2       1       1465.5227894145
3       1       270.030342746284
4       1       158.666018009291
5       1       90.5154390031429
6       1       56.8238616628458

and its accepted by the sparseToDense.py script.

BTW, it might be useful to add an option to the script and provide an N-by-(3+N) or N-by-(4+N) output matrix, where the first 3/4 columns are chr, from, to coordinates, or id, chr, from, to.

@nservant
Copy link
Owner

Fixed in iced 0.5.9 which has been added in the conda env of HiC-Pro 3.1.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants