Skip to content

The repository corresponding to the Nature Communications article Shahein et. al. 2022, "Systematic analysis of low-affinity transcription factor binding site clusters in vitro and in vivo establishes their functional relevance".

License

Notifications You must be signed in to change notification settings

eukaryoting/systematic_analysis_of_low-affinity_clusters

Repository files navigation

Software used for processing, analysis, and plotting of data from:

Systematic analysis of low-affinity transcription factor binding site clusters in vitro and in vivo establishes their functional relevance


Required dependencies can be found in the Requirements.txt file.

There are four jupyter notebooks:

in-vivo-analysis_Zif268-Pho4.ipynb

Contains scripts used for the in vivo analysis. It can be run independently of the other three notebooks.

in-vitro-analysis_Zif268.ipynb

Contains scripts used for the in vitro analysis of Zif268 data. For the section that relates gene expression to mean occupancy, it requires data generated from the in-vivo-analysis_Zif268-Pho4.ipynb notebook.

in-vitro-analysis_Pho4.ipynb

Contains scripts used for the in vitro analysis of Pho4 data. It can be run independently of the other three notebooks.

in-vitro-summary-plotting_Zif268-Pho4.ipynb

Contains scripts and plotting functionality to generate summary plots used in the manuscript, for both Zif268 and Pho4 data. Accordingly, the Zif268 and Pho4 in-vitro-analysis notebooks should be run first.


Jupyter notebook cells can be run simply from start to finish (recommended to use Jupyter Lab, and it's possible to run them in the order that they are listed above).

There is virtually no install time required.

Raw data is available in this project, and some intermediate data is also available directly, so that most plotting scripts can be run and plots visualized by the user without having to re-run Markov Chain Monte Carlo (MCMC). Running MCMC is the most time consuming and computationally intensive section of the code. Running MCMC with 10,000 steps will generally take a few hours. Due to these steps, running the full code with all of the different models will take roughly a full day.

The expected output from the code is generally explained with the detailed headers (markdown cells) found throughout the code.


Binding site information:

All in vitro DNA targets are 90bp in length.

Pho4

"X" stands for non-specific DNA designed not to bind to transcription factor.
"S" represents a strong binding site.
"M" represents a weak binding site.
"W" represents a very-weak binding site.
Ex: M1, M2, M3, are different members of the weak class of binding sites.
Two different notations are used. Either all non-primer regions are specified:
Ex: S1XXXX represents the DNA target with only the single consensus binding
site in the position furthest from the chip's surface. With remaining DNA designed to be non-binding to TF.
Or in brackets notation, (gap distance) is specified, corresponding to non-specific
basepairs.

Zif268

"A" represents a binding site, without referring to its affinity class.
A11 represents the consensus, strong binding site.
(The S naming is not used for Zif268)
Everything else is similar to Pho4, however with the additional convention
that negative gap distances can be specified for binding sites that share
common basepairs (similar to $\Delta$) in the manuscript.
Ex: A11(-3)A11 refers to two neighbring consensus binding sites that share
three basepairs in common (overlapping)

About

The repository corresponding to the Nature Communications article Shahein et. al. 2022, "Systematic analysis of low-affinity transcription factor binding site clusters in vitro and in vivo establishes their functional relevance".

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages