# Analysis of mouse single cell hematopoietic populations - Combine ATAC-Seq replicates

__Author__: Elisabeth F. Heuston

## Purpose

Single cell transcriptional and clustering analysis of LSK, CMP, MEP, and GMP data presented in Heuston et al., 2021  

This notebook contains the workflow to analyze ATAC-Seq data and pertains to:
* Figures 5B, 5C, 5D
* Supplemental Figure 7D


Combine BED file replicates using the __Save-peak method__ 
1. Generate BED files using HOMER
2. Create mergedBed file for each replicate  
3. Using intersectBed, keep peaks from mergedBed that are in both replicates


Raw data for this publication are available to download from the GEO Project GSE168260 at https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE168260  

__kentUtils__ documentation is available at https://github.com/ucscGenomeBrowser/kent


## Workbook setup

## Create MergeBed Files

In [1]:
cd /Users/heustonef/Desktop/scRNAanalysis_codetest/ATAC/Replicates/

bedTag=".bed"
catTag="_cat.bed"
sortTag="_sort.bed"
mergeTag="_merge.bed"
multiBedTag="_multiIntersect.txt"


for bedFile in C11*.bed C10*.bed C3*.bed C17*.bed
do
	sortBed -i "$bedFile" > "${bedFile/.bed/_sort.bed}"
	echo "Sorted $bedFile"
done

for replicate in C11 C10 C3 C17
do
	cat "$replicate"*"$bedTag" > "$replicate$catTag"
	sortBed -i "$replicate$catTag" > "$replicate$sortTag"
	mergeBed -i "$replicate$sortTag" > "$replicate$mergeTag"
	echo "Finished generating ""$replicate$mergeTag"
done



Sorted C11r2.bed
Sorted C11r3.bed
Sorted C10r2.bed
Sorted C10r3.bed
Sorted C3r1.bed
Sorted C3r2.bed
Sorted C17r1.bed
Sorted C17r2.bed
Finished generating C11_merge.bed
Finished generating C10_merge.bed
Finished generating C3_merge.bed
Finished generating C17_merge.bed


In [2]:
rm *_cat.bed

In [3]:
ls C11* C10* C3* C17*

C10_merge.bed	C10r3_sort.bed	C11r3.bed	C17r1_sort.bed	C3r1.bed
C10_sort.bed	C11_merge.bed	C11r3_sort.bed	C17r2.bed	C3r1_sort.bed
C10r2.bed	C11_sort.bed	C17_merge.bed	C17r2_sort.bed	C3r2.bed
C10r2_sort.bed	C11r2.bed	C17_sort.bed	C3_merge.bed	C3r2_sort.bed
C10r3.bed	C11r2_sort.bed	C17r1.bed	C3_sort.bed


#### C11 

In [4]:
ls C11*

C11_merge.bed	C11r2.bed	C11r3.bed
C11_sort.bed	C11r2_sort.bed	C11r3_sort.bed


In [5]:
intersectBed -u -a C11_merge.bed -b C11r2_sort.bed > temp.bed
intersectBed -u -a temp.bed -b C11r3_sort.bed > C11.bed

In [6]:
cp C11.bed /Users/heustonef/Desktop/scRNAanalysis_codetest/ATAC/

#### C3 

In [7]:
ls C3*

C3_merge.bed	C3r1.bed	C3r2.bed
C3_sort.bed	C3r1_sort.bed	C3r2_sort.bed


In [8]:
intersectBed -u -a C3_merge.bed -b C3r1_sort.bed > C10.bed
intersectBed -u -a C10.bed -b C3r2_sort.bed > C3.bed

In [9]:
cp C3.bed /Users/heustonef/Desktop/scRNAanalysis_codetest/ATAC/

#### C17 <a class='anchor' id='C17'></a>

In [10]:
ls C17*

C17_merge.bed	C17r1.bed	C17r2.bed
C17_sort.bed	C17r1_sort.bed	C17r2_sort.bed


In [11]:
intersectBed -u -a C17_merge.bed -b C17r1_sort.bed > temp.bed
intersectBed -u -a temp.bed -b C17r2_sort.bed > C17.bed

In [12]:
cp C17.bed /Users/heustonef/Desktop/scRNAanalysis_codetest/ATAC/

#### C10 <a class='anchor' id='C10'></a>

In [13]:
ls C10*

C10.bed		C10_sort.bed	C10r2_sort.bed	C10r3_sort.bed
C10_merge.bed	C10r2.bed	C10r3.bed


In [14]:
intersectBed -u -a C10_merge.bed -b C10r2_sort.bed > temp.bed
intersectBed -u -a temp.bed -b C10r3_sort.bed > C10.bed

In [15]:
cp C10.bed /Users/heustonef/Desktop/scRNAanalysis_codetest/ATAC/

# Bigwig AverageOverBed <a class='anchor' id='bwaob'></a>

Combine all hematopoietic populations we have:<br>
>CFUE845<br>
>CFUE853<br>
>CFUMk847<br>
>CFUMk855<br>
>CMP842<br>
>CMP850<br>
>C11r2<br>
>C11r3<br>
>C3r1<br>
>C3r2<br>
>ERY846<br>
>ERY854<br>
>GMP843<br>
>GMP851<br>
>LSK987<br>
>LSK1196<br>
>MK848<br>
>MK856<br>
>C17r1<br>
>C17r2<br>
>C10r2<br>
>C10r3<br>

Files needed:<br>
* Bed file for _each replicate_ to generate allATAC_merge.bed4
* BW file for _each replicate_ for bigWigAverageOverBed script

In [16]:
cd /Users/heustonef/Desktop/scRNAanalysis_codetest/ATAC/Replicates

In [17]:
cat C*.bed ERY* GMP* LSK* MK* > allATAC_cat-multicol.bed

In [18]:
awk -v OFS='\t' '{print $1,$2,$3}' allATAC_cat-multicol.bed > allATAC_cat.bed

In [19]:
sortBed -i allATAC_cat.bed > allATAC_sort.bed

In [20]:
mergeBed -i allATAC_sort.bed > allATAC_merge.bed

In [21]:
awk -v OFS='\t' '{print $0, NR}' allATAC_merge.bed > allATAC_merge.bed4 # create peak_ID column

In [22]:
mv allATAC_merge.bed4 ../
rm allATAC*.bed

In [23]:
cd ../

In [24]:
for replicate in *.bw
do
    tabfile="${replicate/.bw/.tab}"
    bedfile="${replicate/.bw/.bed}"
    echo bigWigAverageOverBed $replicate allATAC_merge.bed4 $tabfile -bedOut=$bedfile
    bigWigAverageOverBed $replicate allATAC_merge.bed4 $tabfile -bedOut=$bedfile
done

bigWigAverageOverBed C10r2.mm10.bw allATAC_merge.bed4 C10r2.mm10.tab -bedOut=C10r2.mm10.bed
processing chromosomes.....................
bigWigAverageOverBed C10r3.mm10.bw allATAC_merge.bed4 C10r3.mm10.tab -bedOut=C10r3.mm10.bed
processing chromosomes.....................
bigWigAverageOverBed C11r2.mm10.bw allATAC_merge.bed4 C11r2.mm10.tab -bedOut=C11r2.mm10.bed
processing chromosomes.....................
bigWigAverageOverBed C11r3.mm10.bw allATAC_merge.bed4 C11r3.mm10.tab -bedOut=C11r3.mm10.bed
processing chromosomes.....................
bigWigAverageOverBed C17r1.bw allATAC_merge.bed4 C17r1.tab -bedOut=C17r1.bed
processing chromosomes.....................
bigWigAverageOverBed C17r2.bw allATAC_merge.bed4 C17r2.tab -bedOut=C17r2.bed
processing chromosomes.....................
bigWigAverageOverBed C3r1.bw allATAC_merge.bed4 C3r1.tab -bedOut=C3r1.bed
processing chromosomes.....................
bigWigAverageOverBed C3r2.bw allATAC_merge.bed4 C3r2.tab -bedOut=C3r2.bed
processing chromosomes