# Demonstrating `bendIt` running in Jupyter

The standalone version of `bendIt` is installed and able to be run in this launched Jupyter environment. (If this notebook doesn't seem active, you can launch it from [here](https://github.com/fomightez/bendit-binder). 

If you want the basic usage block and an example of using it on the command line go to [this notebook](basic_bendit_commandline.ipynb). 

This notebook will illustrate a realistic workflow where a number of sequences in a multi-sequence fasta file are analyzed with `bendIt`.  This should serve to show the advantage of using the standalone version of bend.it over [the bend.it Server](http://pongor.itk.ppke.hu/dna/bend_it.html#/bendit_form) for processing more than a few sequences, and touch upon some of the benefits of having `bendIt` working in the Jupyter environment.

------

<div class="alert alert-block alert-warning">
<p>If you haven't used one of these notebooks before, they're basically web pages in which you can write, edit, and run live code. They're meant to encourage experimentation, so don't feel nervous. Just try running a few cells and see what happens!.</p>

<p>
    Some tips:
    <ul>
        <li>Code cells have boxes around them.</li>
        <li>To run a code cell either click the <i class="fa-play fa"></i> icon on the menu bar above, or click on the cell and then hit <b>Shift+Enter</b>. The <b>Shift+Enter</b> combo will also move you to the next cell, so it's a quick way to work through the notebook.</li>
        <li>While a cell is running a <b>*</b> appears in the square brackets next to the cell. Once the cell has finished running the asterisk will be replaced with a number.</li>
        <li>In most cases you'll want to start from the top of notebook and work your way down running each cell in turn. Later cells might depend on the results of earlier ones.</li>
        <li>To edit a code cell, just click on it and type stuff. Remember to run the cell once you've finished editing.</li>
    </ul>
</p>
</div>

----

### STEP 1 : UPLOAD SEQUENCES.

This demonstration scenario will analyze the properties of a series of sequence cassettes flanked by defined sequences. The curvature and bendability predictions for each resulting sequence will be plotted.

Click on and drag a file listing sequences in FASTA format from your computer into the file browser window to the left of this text.  
When the file is correctly dragged into the pane, a dashed, gray outline will appear and you can release your mouse button.

TO **RUN THE DEMO WITH A PROVIDED FASTA FILE: DON'T DRAG ANYTHING IN AND JUST GO AHEAD AND RUN THE CELLS BELOW. IT WILL USE THE FASTA FILE FROM THE bendit test folder ALREADY PRESENT HERE IF YOU DON'T UPLOAD ONE.** 

Change the file extension to `.fa` or `.fasta` (or even "faa", "fas", "fsa" work), if it isn't already. To do that right-click on the file name in the file navigation panel to the left, and select `Rename`.

You can also drag in more FASTA files and each one will be processed and treated as a separate sample set.

The sample set designation(s) will be derived from the sequence file(s) names. What get used to mark the end of the 'sample set' name can be adjusted below. For example, by default the file `A5_seq_set.fa` will yield the 'sample set' name `A5`. And so you may wish to adjust the file(s) name to use this appropriately at this time.

Run the following code cells to process the sequence(s) to make the plot(s). Change any settings you need to as described.    
There are three ways to run a cell if you are not familiar with the JupyterLab interface.

- You can run the cell by clicking on it and pressing the `run` button, shaped like a triangle heading towards the right, that is on the utility bar above this notebook.

- Click on the cell to run to select it, and then under `Run` menu above, choose `Run Selected Cells`

- Click on the cell to run to select it, and type `Shift-Enter`. Which is holding down the shift key wille pressing the enter key.

### STEP 2 : CHOSE `bendIt` SETTINGS.

There are several options that be set for running `bendIt`. They come with inherent defaults that can be seen by running `!bendIt --help` as a cell in this notebook.

Several of these are set to alternative settings using the following assignments.


Edit the text in the cell below to better reflect your choice of window for analysis, if you prefer. 

In [None]:
window_size = 3

In addition to predicting curvature, `bendIt` will also report G+C content, complexity, or a prediction of bendability. Here, we set this to `bendabilty`. To change, edit text between the quotes to `G+Ccontent` or `complexity`.

In [None]:
report_with_curvature = "bendability"

For the analysis a series of sequences defined in the FASTA file will be analyzed in the context of defined flanking sequences. The cell below assigns the sequences that flank the sequence cassette that will be substituted succesively and anlayzed. If instead of this swappable cassette scenario, you already have a multi-sequence fasta file containing the sequences you wish to analyze, you can change the bracket sequences to nothing by changing the settings for the bracket sequenes to the following in the cell below:

```python
#if you don't want defined flanking sequences added, use this code:
up_bracket = ""
down_bracket = ""
```

In [None]:
up_bracket = "gtaaaacgacggccagcatggaggtacaa"
down_bracket = "gggaggtacttccatggtcatagctgtt"

### STEP 2 : ADJUST SAMPLE SET HANDLING SETTINGS.

The script that will run here will use the file name of the uploaded multi-sequence FASTA file to determine a 'sample set name' to refer to the entire set of sequences and corresponding results. The text in the file name prior to occurence of the `character_to_mark_name_end` that is defined below, default is an underscrore, will be used as the name designation of the sample set. If you want to change the delimiter, edit `character_to_mark_name_end` in the pertient cell appearing in this section below.   
If you'd like to override that process altogether and designate a specific name yourself, then edit the following cell so that `sample_set_name_extract_auto` is instead assigned to `False`. And then add a line below it where you assign `sample_set_name` to the name you want to specify, like the following where you'd replace `Name_here` with text to actually use:

```python
sample_set_name_extract_auto = False
sample_set_name = "Name_here"
```

(Because input data provided by users wishing to use standalone bendIt version had slightly deviated from best practices of data handling by mixing the overarching label for the sample set in with the entries in the multi-FASTA file, the script will check if the label for the sample set has been placed as the first line above the listing of sequence entries and use that as the sample set name, in that case. And, so that is an alternative way to provide the sample set name.)

In [None]:
sample_set_name_extract_auto = True

In [None]:
character_to_mark_set_name_end = "_"

By default, the first 'word' in the description line of each cassette sequence will be used as the individual sample name. If you'd rather specify a different delimiter for the individual sample names, then change on the following line what is in the quotes to alter the `character_to_mark_individual_sample_name_end` setting.

In [None]:
character_to_mark_individual_sample_name_end = " "

In order to accomodate a user's request to make the resulting plots look similar to what is seen with Excel using the setting to graph 'Smooth Lines', the plots are 'smoothed'. If you don't want that set `smooth_plot_curves` to `False` on the line below.

In [None]:
smooth_plot_curves = True

In order, to accomodate a user's request to allow a complex sample naming scheme in the description line, there is a sanitization step early on in order to avoid issues in processing files with `/`,`|`, and parantheses in bendIt, and then later the script tries to substitute back in the offending characters in the plot title, relying on the pattern. Every effort is made to make it restrict to match the presumed user pattern; however, set `show_date_with_slashes_in_plot_title` to `False` on the line below if you are seeing dash characters to show up as forward slashes in your plot title or if your underscores are showing as `|` or if `+` are being converted to parantheses. Likewise, if you are using the offending characters as part of the text that becomes the sample names, you can edit the script conditional code block that begins `if show_date_with_slashes_in_plot_title` to reverse the sanitizing step for the plot title by following the built-in example as a guide.

In [None]:
show_date_with_slashes_in_plot_title = True

The pipeline here generates plots of the data using Python. This is meant to make plots closer to presentation or publication-quality from each analyzed sequence without the need for further post-processing. Those familiar with using the bendit server may know that you can choose to have the server site output 'raw' gnuplot-generated plots of the data. The standalone version running here generates those 'raw gnuplots' by default. This setting below kepps the 'raw gnuplot' output as part of the collected output. However, once you establish the pipeline results in the same plots as the bendit server, you may wish to not collect these plots as part of the output in order to keep file sizes smaller. You can then change the following cell to read:

```python
include_gnuplots = False
```

It is purposefully set to `True` below by default to encourage you to verify your data indeed gives the same results before and after any post-processing in this pipleine, and for direct comparison to the bendit server produced plots.

In [None]:
include_gnuplots = True

Now with the sequences uploaded and the settings assigned, you are ready to begin the analysis by `bendIt`.

### STEP 3. ANALYZE THE SEQUENCES WITH `bendIt`.

To start analyzing the sequences, run the next cell. This will take some time to run; however, feedback will be provided at several steps. When completed, results will be shown and below you'll be given options to collect the produced data.

(For those interested in making a custom workflow that uses bendIt, you'll want to examine the script to try to adapt it to your needs.)

In [None]:
%run bendIt_analysis.ipy

### STEP 4. REVIEW THE RESULTS AND COLLECT FILES.

The animations should be shown just above with labels for each below the gif.

Before you run things again to select a different animation to produce, you'll want to bring any worthy results from the remote session to your own computer. **This session will go stale without any activity in 10 minutes**, and so this is a **very important step if you don't want to run things again**.

If you only produced a handful of animations, you can right-click on the name of the animation in the file browser panel on the left side of this browser window and select `Download` from the menu option.

Be sure to slide the border of the pane to the right if the names are being cut-off.

If you produced a lot of animations, you want to get the archive (zipped) where all the produced animations for that run where for convenience they have been packed up in a single file. The name was given above for any run that produced more than one file and so you want to find that file  (ends in `.zip`) and download it. The notes above will also tell you how to unpack it later if you are not familiar with this type of file.

### STEP 5. RESET THINGS OR ADJUST SETTINGS AND RUN THROUGH AGAIN?

Not quite what you needed? Or meed more? Or need a different animation choice?

If you want to use the same PyMOL session files, you can just scroll back up to the top and adjust the settings and run the cells again. 

If you only need some of the same session files, you can use the file browser pane at the left to remove those files you don't need by right-clicking on them and selecting `Delete`.

If you need to start with a different set of PyMOL session files, you can make a new cell with the '+' button above and paste the following code in that cell and run it to delete all the current PyMOL session files.

```python
!rm -rf *.pse
```

Then you can select from the menu `Kernel` > `Restart` and begin the process again by uploading new Pymol `.pse` files and work though again.

Enjoy.

In [None]:
import time

def executeSomething():
    #code here
    print ('.')
    time.sleep(480) #60 seconds times 8 minutes

while True:
    executeSomething()