Skip to content

Commit

Permalink
fix: execute all chunks in centros workflow
Browse files Browse the repository at this point in the history
  • Loading branch information
js2264 committed Jan 17, 2024
1 parent 756e02a commit 95680c7
Show file tree
Hide file tree
Showing 13 changed files with 29 additions and 65 deletions.
Binary file removed inst/pages/images/20230322181000.png
Binary file not shown.
Binary file removed inst/pages/images/20230322181300.png
Binary file not shown.
Binary file removed inst/pages/images/20230324125800.png
Binary file not shown.
Binary file removed inst/pages/images/20230327182802.png
Binary file not shown.
Binary file removed inst/pages/images/20230403134800.png
Binary file not shown.
Binary file removed inst/pages/images/20230523141745.jpg
Binary file not shown.
Binary file removed inst/pages/images/20230523144800.png
Binary file not shown.
Binary file removed inst/pages/images/20230523152700.png
Binary file not shown.
Binary file removed inst/pages/images/20230523153300.png
Binary file not shown.
Binary file removed inst/pages/images/20230523180000.png
Binary file not shown.
Binary file removed inst/pages/images/20230523180300.png
Binary file not shown.
79 changes: 22 additions & 57 deletions inst/pages/workflow-centros.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -34,31 +34,30 @@ We leverage two yeast datasets in this notebook.

## Importing Hi-C data and plotting contact matrices

```{r eval = FALSE}
```{r}
library(HiContactsData)
library(HiContacts)
library(purrr)
library(ggplot2)
hics <- list(
'G1' = import('/home/rsg/repos/OHCA-data/S288c_G1.mcool', resolution = 4000),
'G2M' = import('/home/rsg/repos/OHCA-data/S288c_G2M.mcool', resolution = 4000)
'G1' = import(HiContactsData('yeast_g1', 'mcool'), format = 'cool', resolution = 4000),
'G2M' = import(HiContactsData('yeast_g2m', 'mcool'), format = 'cool', resolution = 4000)
)
imap(hics, ~ plotMatrix(
.x, use.scores = 'balanced', limits = c(-4, -1), caption = FALSE
) + ggtitle(.y))
```

![](images/20230523141745.jpg)

We can visually appreciate that inter-chromosomal interactions, notably
between centromeres, are less prominent in G2/M.

## Checking P(s) and cis/trans interactions ratio

```{r eval = FALSE}
```{r}
library(dplyr)
pairs <- list(
'G1' = PairsFile('/home/rsg/repos/OHCA-data/S288c_G1.pairs'),
'G2M' = PairsFile('/home/rsg/repos/OHCA-data/S288c_G2M.pairs')
'G1' = PairsFile(HiContactsData('yeast_g1', 'pairs')),
'G2M' = PairsFile(HiContactsData('yeast_g2m', 'pairs'))
)
ps <- imap_dfr(pairs, ~ distanceLaw(.x, by_chr = TRUE) |>
mutate(sample = .y)
Expand All @@ -69,12 +68,10 @@ plotPsSlope(ps, ggplot2::aes(x = binned_distance, y = slope, group = interaction
scale_color_manual(values = c('black', 'red'))
```

![](images/20230523144800.png)

This confirms that interactions in cells synchronized in G2/M are enriched
for 10-30kb-long interactions.

```{r eval = FALSE}
```{r}
ratios <- imap_dfr(hics, ~ cisTransRatio(.x) |> mutate(sample = .y))
ggplot(ratios, aes(x = chr, y = trans_pct, fill = sample)) +
geom_col() +
Expand All @@ -83,37 +80,34 @@ ggplot(ratios, aes(x = chr, y = trans_pct, fill = sample)) +
facet_grid(~sample)
```

![](images/20230523152700.png)

We can also highlight that trans (inter-chromosomal) interactions are proportionally
decreasing in G2/M-synchronized cells.

## Centromere virtual 4C profiles

```{r eval = FALSE}
```{r}
data(centros_yeast)
v4c_centro <- imap_dfr(hics, ~ virtual4C(.x, resize(centros_yeast[2], 8000)) |>
v4c_centro <- imap_dfr(hics, ~ virtual4C(.x, GenomicRanges::resize(centros_yeast[2], 8000)) |>
as_tibble() |>
mutate(sample = .y) |>
filter(seqnames == 'IV')
)
ggplot(v4c_centro, aes(x = start, y = score, colour = sample)) +
geom_line() +
ggplot(v4c_centro, aes(x = start, y = score, fill = sample)) +
geom_area() +
theme_bw() +
labs(
x = "chrIV position",
y = "Contacts with chrII centromere",
title = "Interaction profile of chrII centromere"
)
) +
coord_cartesian(ylim = c(0, 0.015))
```

![](images/20230523153300.png)

## Aggregated 2D signal over all pairs of centromeres

We can start by computing all possible pairs of centromeres.

```{r eval = FALSE}
```{r}
centros_pairs <- lapply(1:length(centros_yeast), function(i) {
lapply(1:length(centros_yeast), function(j) {
S4Vectors::Pairs(centros_yeast[i], centros_yeast[j])
Expand All @@ -125,28 +119,11 @@ centros_pairs <- lapply(1:length(centros_yeast), function(i) {
centros_pairs <- centros_pairs[anchors(centros_pairs, 'first') != anchors(centros_pairs, 'second')]
centros_pairs
## GInteractions object with 240 interactions and 0 metadata columns:
## seqnames1 ranges1 seqnames2 ranges2
## <Rle> <IRanges> <Rle> <IRanges>
## [1] I 151583-151641 --- II 238361-238419
## [2] I 151583-151641 --- III 114322-114380
## [3] I 151583-151641 --- IV 449879-449937
## [4] I 151583-151641 --- V 152522-152580
## [5] I 151583-151641 --- VI 147981-148039
## ... ... ... ... ... ...
## [236] XVI 556255-556313 --- XI 440229-440287
## [237] XVI 556255-556313 --- XII 151366-151424
## [238] XVI 556255-556313 --- XIII 268222-268280
## [239] XVI 556255-556313 --- XIV 628588-628646
## [240] XVI 556255-556313 --- XV 326897-326955
## -------
## regions: 16 ranges and 0 metadata columns
## seqinfo: 17 sequences (1 circular) from R64-1-1 genome
```

Then we can aggregate the Hi-C signal over each pair of centromeres.

```{r eval = FALSE}
```{r}
aggr_maps <- purrr::imap(hics, ~ {
aggr <- aggregate(.x, centros_pairs, maxDistance = 1e999)
plotMatrix(
Expand All @@ -155,43 +132,33 @@ aggr_maps <- purrr::imap(hics, ~ {
caption = FALSE
) + ggtitle(.y)
})
## Going through preflight checklist...
## Parsing the entire contact matrix as a sparse matrix...
## Modeling distance decay...
## Filtering for contacts within provided targets...
## Going through preflight checklist...
## Parsing the entire contact matrix as a sparse matrix...
## Modeling distance decay...
## Filtering for contacts within provided targets...
cowplot::plot_grid(plotlist = aggr_maps, nrow = 1)
```

![](images/20230523180300.png)

## Aggregated 1D interaction profile of centromeres

One can generalize the previous virtual 4C plot, by extracting the interaction profile
between all possible pairs of centromeres in each dataset.

```{r eval = FALSE}
```{r}
df <- map_dfr(1:{length(centros_yeast)-1}, function(i) {
centro1 <- resize(centros_yeast[i], fix = 'center', 8000)
centro1 <- GenomicRanges::resize(centros_yeast[i], fix = 'center', 8000)
map_dfr({i+1}:length(centros_yeast), function(j) {
centro2 <- resize(centros_yeast[j], fix = 'center', 80000)
gi <- GInteractions(centro1, centro2)
centro2 <- GenomicRanges::resize(centros_yeast[j], fix = 'center', 80000)
gi <- InteractionSet::GInteractions(centro1, centro2)
imap_dfr(hics, ~ .x[gi] |>
interactions() |>
as_tibble() |>
mutate(
sample = .y,
center = center2 - start(resize(centro2, fix = 'center', 1))
center = center2 - start(GenomicRanges::resize(centro2, fix = 'center', 1))
) |>
select(sample, seqnames1, seqnames2, center, balanced)
)
})
})
p <- ggplot(df, aes(x = center/1e3, y = balanced)) +
ggplot(df, aes(x = center/1e3, y = balanced)) +
geom_line(aes(group = interaction(seqnames1, seqnames2)), alpha = 0.03, col = "black") +
geom_smooth(col = "red", fill = "red") +
theme_bw() +
Expand All @@ -203,8 +170,6 @@ p <- ggplot(df, aes(x = center/1e3, y = balanced)) +
facet_grid(~sample)
```

![](images/20230523180000.png)

# Session info {-}

::: {.callout-note collapse="true"}
Expand Down
15 changes: 7 additions & 8 deletions inst/pages/workflow-yeast.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -40,18 +40,17 @@ We leverage seven yeast datasets in this notebook. They are all available from S

:::

## Recovering data from SRA

The easiest for this is to directly fetch files from SRA from their FTP server. We can do so using the base `download.file` function.

::: {.callout-note}

The next two code chunks illustrate how to do download and process Hi-C reads from SRA, but
they are not actually executed when rendering this website as it would take a
significant amount of time.
This notebook does not execute code when it's rendered. Instead, the results
and the plots have been locally computed and manually embedded in the notebook.

:::

## Recovering data from SRA

The easiest for this is to directly fetch files from SRA from their FTP server. We can do so using the base `download.file` function.

```{r eval = FALSE}
# !! This code is not actually executed !!
dir.create('data')
Expand Down Expand Up @@ -413,7 +412,7 @@ results
```

The `results()` output is not very informative as it is. It requires a little
bit of reformating to be able to extract valuable insights from it.
bit of reformatting to be able to extract valuable insights from it.

```{r eval = FALSE}
df <- left_join(results@hic_table, results(results)) |>
Expand Down

0 comments on commit 95680c7

Please sign in to comment.