HiCPotts

A Bayesian framework for detection of enriched Hi‑C interactions and experimental biases in Hi-C data

1  Why HiCPotts?

Hi‑C interaction counts are

sparse (most bin pairs are zero),
over‑dispersed and often zero‑inflated, and
spatially correlated along the genome.

HiCPotts deals with these challenges by combining

Layer	Details
Spatial layer	Potts model (Wu 1982) with interaction parameter (\gamma) to capture neighbourhood dependence.
Count layer	Mixture of three components (“noise”, “signal”, “false‑positive”) modelled with Poisson
Bias regression	Genomic distance, GC content, TE density and chromatin accessibility are covariates in a log‑linear model for the mean.
Inference	Metropolis‑within‑Gibbs MCMC updates betas, (\gamma), zero‑inflation (\theta) and dispersion size (if NB/ZINB).

2  Installation

## released version (once on Bioconductor)
if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("HiCPotts")

Requirements: R ≥ 4.2, C++17 compiler, plus
Rcpp, RcppArmadillo, parallel.

The optional genome support uses BSgenome.Dmelanogaster.UCSC.dm6 (only needed to compute GC content from raw coordinates, hence moved to Suggests):

BiocManager::install("BSgenome.Dmelanogaster.UCSC.dm6")

3  Quick start

library(HiCPotts)


## 1  Load long‑format interaction table
df <- read.csv("hic_interactions.csv")  # start, end, interactions, GC, ACC, TES

## 2  Convert to N×N matrices
prep   <- process_data(df, N = 40, standardization_y = FALSE)
x_vars <- prep$x_vars
y_list <- prep$y                      # list of 40×40 matrices

## 3  Run the three‑component MCMC
res <- run_chain_betas(
  N            = 40,
  iterations   = 5000,
  x_vars       = x_vars,
  y_sim_list   = y_list,
  use_data_prior = TRUE,
  dist         = "ZINB",
  size_initial = c(1, 1, 1), # initial NB size for the 3 comps
  thetap       = 0.5,        # initial θ
  mc_cores     = 1
)

## 4  Posterior component probabilities
probs <- compute_HMRFHiC_probabilities(
  data        = df,
  chain_betas = results,                    # full list, NOT results[["chains"]]
  iterations  = 2000,
  dist        = "ZINB"
)
head(probs)

graph TD
  A[get_data] --> B[process_data]
  B --> C[run_chain_betas]
  C --> D[Posterior summaries + probability maps]

Key exported functions

Function	What it does
`process_data()`	Converts long‑format counts + covariates to lists of $N \times N$ matrices (counts optionally scaled 1–500).
`run_chain_betas()`	Parallel front‑end to `run_metropolis_MCMC_betas()` for multiple matrices; always uses three components.
`run_metropolis_MCMC_betas()`	Core Metropolis‑within‑Gibbs sampler.
`compute_HMRFHiC_probabilities()`	Converts posterior mean betas into per‑pixel probabilities for components 1, 2, 3.
`prior_combined()`, `likelihood_combined()`, `posterior_combined()`	Compute log‑priors, log‑likelihoods and log‑posteriors.
`pred_combined()`	Calculates log‑linear predictors $\lambda$.
`Neighbours_combined()` (C++)	Counts matching neighbours for the Potts model.
`pz_123()` (C++)	Site‑wise component log‑posterior calculation.

mcmc1 <- res[[1]]

## trace of γ
plot(mcmc1$gamma, type = "l", col = "#1f77b4",
     main = "Potts interaction γ", ylab = "γ")

## posterior means of βs for component 1
colMeans(mcmc1$chains[[1]][-(1:2500), ])

Advanced options Distribution choice – dist = "Poisson", "NB" or "ZIP".

Fixed vs. data‑driven priors – set use_data_priors = FALSE in the sampler and supply user_fixed_priors.

Proposal tuning – edit the sd_values vectors inside run_metropolis_MCMC_betas().

set.seed(1)
N <- 10
fake <- data.frame(
  start        = rep(1:N, each = N),
  end          = rep(1:N, N),
  interactions = rpois(N*N, 5),
  GC           = runif(N*N),
  ACC          = runif(N*N),
  TES          = runif(N*N)
)

prep   <- process_data(fake, N)
res    <- run_chain_betas(prep$y, N = N, iterations = 100,
                          x_vars = prep$x_vars,
                          thetap = 0.5, size_initial = c(1,1,1),
                          dist   = "Poisson", mc_cores = 1)

Another example is using the test_data2.csv file inside the folder; inst/extdata

mydata=read.csv("~/inst/extdata/test_data2.csv")

N=20
thetap=0.6

iterations=2000


# Assuming mydata is a data frame
colnames(mydata)[colnames(mydata) %in% c("start.i.", "start.i", "start")] <- "start"
colnames(mydata)[colnames(mydata) %in% c("end.j.", "end.j", "end")] <- "end"
colnames(mydata)[colnames(mydata) %in% c("Acc", "ACC")] <- "ACC"
colnames(mydata)[colnames(mydata) %in% c("GC", "Gc")] <- "GC"
colnames(mydata)[colnames(mydata) %in% c("Tes", "TEs", "TES")] <- "TES"
colnames(mydata)[colnames(mydata) %in% c("interactions", "interaction")] <- "interactions"



scaled_data<-process_data(mydata, N, scale_max = 500, standardization_y = TRUE)


chains=run_chain_betas(
  N,
  gamma_prior=0.3,
  iterations=iterations,
  x_vars=scaled_data[["x_vars"]],
  y=scaled_data[["y"]],
  use_data_priors=TRUE,
  dist="Poisson",
  distance_metric="manhattan",
  mc_cores = 22
)

4  Important changes from 1.0.0 -> 1.0.1

The 1.0.1 patch release fixes a number of statistical and plumbing bugs surfaced by external review. Most user code will continue to run unchanged. Highlights:

Component 2 and 3 likelihoods now use every data point assigned to the component, and does not force stop if a component was not assigned any data during iteration.
ZIP/ZINB zero probability written for more clarify to $\theta + (1-\theta)\exp(-\mu)$.
Neighbour counting in Neighbours_combined() now returns counts in ${0,1,2,3,4}$ as intended.
ABC update for $\gamma$ simulates synthetic data with the proposed $\gamma$ with a more refined distribution.
MH acceptance for regression coefficients no longer includes a "proposal density" term; the symmetric Gaussian random-walk proposal cancels correctly.
Data-driven priors are more deterministic given the same (parameters, data, assignments) and refined distribution.
compute_HMRFHiC_probabilities() no longer silently caps interactions at 500; pass max_interactions = 500 to restore the old behaviour.
Windows: mc_cores > 1 emits a warning and falls back to 1 (previously silently).
.hic input to get_data() now raises an informative error (previously failed inside h5ls). See NEWS for the full list.

5. Feedback and bug reports

Please file issues at https://github.com/igosungithub/HiCPotts/issues.

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
R		R
inst		inst
man		man
src		src
tests		tests
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
NAMESPACE		NAMESPACE
NEWS		NEWS
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HiCPotts

1  Why HiCPotts?

2  Installation

3  Quick start

4  Important changes from 1.0.0 -> 1.0.1

5. Feedback and bug reports

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

HiCPotts

1 Why HiCPotts?

2 Installation

3 Quick start

4 Important changes from 1.0.0 -> 1.0.1

5. Feedback and bug reports

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1  Why HiCPotts?

2  Installation

3  Quick start

4  Important changes from 1.0.0 -> 1.0.1

Packages