This is the repository for the code used in the paper "The transcription factor Xrp1 is required for PERK-mediated antioxidant gene induction in Drosophila" by Brian Brown et al., published in eLife (https://elifesciences.org/articles/74047#content).
The code in this repository scans regulatory DNA sequences to determine potential transcription factors binding sites. Transcription factors have DNA binding domains which will bind to an optimal squence, and also to sequences which vary at specific positions in the binding site from that optimal sequence. The position frequency or weight matrix of the transcription factor quantifies this variation. These matrices allow the code to predict binding affinity by producing a "binding score" for any given sequence of DNA, with a higher binding score corresponding to higher putative affinity. The top binding scores – those above a certain percentage of the optimal binding score - are presented by the code in a graph, displayed within the DNA sequence, and printed alongside their percentage of the maximum binding score. To run the code with a different position frequency or weight matrix, the matrix can be manually replaced in the "gstd1_thor.py" file (and other parameters like the cutoff score adjusted), as was done for the transcription factor Xrp1 in the eLife paper cited, or the file "template_code.py," which will ask the user to input a matrix along with other parameters, can be run.
The py file "gstd1_thor.py" contains the code specifically relavent to the paper (using the ATF4 position frequncy matrix). The csv file "gstd1_thor.csv" is the spreadsheet imported by "gstd1_thor.py" and contains the sequences which the code scans. The pdf "gstd1_thor_output.pdf" contains the output generated when "gstd1_thor.py" is run. The py file "template_code.py" contains a more user friendly and generalized version of the code in "gstd1_thor.py".