# Rotational positioning

The goal is to classify the nucleosomes by the strengh of rotational signal defined as the ratio of the number of reads mapping to rotationally outward positions compared to the total number of reads mapping within the dyad.

This analysis have been performed for two species:

- [H. sapiens](#human)
- [A. thaliana](#thali)

To be able to run this notebook it is required to run previously the ones in the following folders: nucleosomes. In addition, some external data needs to be downloaded. In each section you can find further details.

## H. sapiens <a id="human"></a>

For H. sapiens two files have been computed according to the rotational possitioning of the nucleosomes:
``low_rotational_dyads.bed.gz`` and ``high_rotational_dyads.bed.gz`` (which is equal to the dyads with score 1).

Perform a liftover from hg18 to hg19 and intersect with the final nucleosomes.

In [None]:
%%bash --out output1 --err error1

mapping=${PWD}/../nucleosomes/sapiens

mkdir -p sapiens
cd sapiens

# Do the lift
source activate env_crossmap
CrossMap.py bed ${mapping}/hg18ToHg19.over.chain.gz ${mapping}/mnase_mids_combined_147.bed.gz \
    mnase_mids_combined_147_hg19_unsorted.bed
gzip -f mnase_mids_combined_147_hg19_unsorted.bed
source deactivate
source activate env_nucperiod

zcat mnase_mids_combined_147_hg19_unsorted.bed.gz | sort -k1,1 -k2,2n | \
    gzip > mnase_mids_combined_147_hg19.bed.gz

zcat ${mapping}/dyads.bed.gz | \
    awk '{OFS="\t"}{print $1, $2-58, $3+58, $1 "_" $2 "_" $3, $2, $3}' | \
    intersectBed -a stdin -b  mnase_mids_combined_147_hg19.bed.gz -wo -sorted |
    gzip > intersected_dyads_midpoints.tsv.gz

Classify the nucleosomes.

In [None]:
from os import path

import pandas as pd

from scripts.utils import annotate_midpoints

ws="sapiens"
mapping_ws="../mapping/data/sapiens"

midpoints = path.join(ws, 'intersected_dyads_midpoints.tsv.gz')
nucleosomes = path.join(mapping_ws, 'dyads.bed.gz')

output_cols = ['chr', 'pos1', 'pos2', 'score_rot', 'reads_in', 'total_reads']

df_midpoints = pd.read_csv(midpoints, sep='\t',
                 names=['chr', 'start', 'end', 'ID', 's1', 'e1', 'chr2', 'bedpos', 'pos2', 'id2', 'reads', 'overlapp'])

df = annotate_midpoints(df_midpoints)

# get those nucleosomes with rotational score equal to 1
length_high = len(df[df['score_rot'] == 1])

# sort values by score
df.sort_values(by='score_rot', ascending=True, inplace=True)

# get the low-high rotational scores
df.iloc[0:length_high].sort_values(by=['chr', 'pos2']).to_csv(
    path.join(ws, 'low_rotational_dyads.bed.gz'),
    sep='\t', index=False,
    header=False,
    compression='gzip',
    columns=output_cols)
df[df['score_rot'] == 1].sort_values(by=['chr', 'pos2']).to_csv(
    path.join(ws, 'high_rotational_dyads.bed.gz'),
    sep='\t', index=False,
    header=False, compression='gzip',
    columns=output_cols)

## A. Thaliana <a id="thali"></a>

For A. thaliana we have only computed the nucleosomes with roatational score equals to 1 (``score1_rotational_dyads.bed.gz``). Note that we have used as input the nucleosomes before filtering the ones in genic regions.

In [None]:
%%bash --out output2 --err error2

source activate env_nucperiod
mapping=${PWD}/../nucleosomes/thaliana

mkdir -p thaliana
cd thaliana

zcat ${mapping}/SRR1536143_dyads.bed.gz |sort -k1,1 -k2,2n |\
    gzip > SRR1536143_dyads_sorted.bed.gz
    
zcat ${mapping}/SRR1536143_dyads_stringency.bed.gz | sort -k1,1 -k2,2n |\
    gzip > SRR1536143_dyads_stringency_sorted.bed.gz

zcat SRR1536143_dyads_stringency_sorted.bed.gz | \
    awk '{OFS="\t"}{print $1, $2-58, $3+58, $1 "_" $2 "_" $3, $2, $3}' |\
    intersectBed -a stdin -b  SRR1536143_dyads_sorted.bed.gz -wo -sorted |\
    gzip > intersected_dyads_midpoints_all.tsv.gz

Classify the nucleosomes.

In [None]:
from os import path

import pandas as pd

from scripts.utils import annotate_midpoints

ws="thaliana"
midpoints = path.join(ws, 'intersected_dyads_midpoints_all.tsv.gz')

output_cols = ['chr', 'pos1', 'pos2', 'score_rot', 'reads_in', 'total_reads']

df_midpoints = pd.read_csv(midpoints, sep='\t',
                 names=['chr', 'start', 'end', 'ID', 's1', 'e1', 'chr2', 'bedpos', 'pos2', 'reads', 'overlapp'])
df = annotate_midpoints(df_midpoints)

# get those nucleosomes with rotational score equal to 1
length_high = len(df[df['score_rot'] == 1])

# sort values by score
df.sort_values(by='score_rot', ascending=True, inplace=True)

df[df['score_rot'] == 1].sort_values(by=['chr', 'pos2']).to_csv(
    path.join(ws, 'score1_rotational_dyads.gz'),
    sep='\t', index=False,
    header=False, compression='gzip',
    columns=output_cols)