# demonstration of `summarize_all_nts_even_ambiguous_present_in_FASTA.py` script

If you'd like an active Jupyter session to run this notebook, launch one by clicking [here](https://mybinder.org/v2/gh/fomightez/cl_sq_demo-binder/master?filepath=index.ipynb), and then select 'Demo of script to summarize all nts, even ambiguous, appearing in FASTA file' from the available notebooks listed there.  
Otherwise, the static version is rendered more nicely via [here](https://nbviewer.org/github/fomightez/cl_sq_demo-binder/blob/master/notebooks/demo%20summarize_all_nts_even_ambiguous_present_in_FASTA.ipynb).

<div class="alert alert-block alert-warning">
<p>If you haven't used one of these notebooks before, they're basically web pages in which you can write, edit, and run live code. They're meant to encourage experimentation, so don't feel nervous. Just try running a few cells and see what happens!.</p>

<p>
    Some tips:
    <ul>
        <li>Code cells have boxes around them. When you hover over them a <i class="fa-step-forward fa"></i> icon appears.</li>
        <li>To run a code cell either click the <i class="fa-step-forward fa"></i> icon, or click on the cell and then hit <b>Shift+Enter</b>. The <b>Shift+Enter</b> combo will also move you to the next cell, so it's a quick way to work through the notebook.</li>
        <li>While a cell is running a <b>*</b> appears in the square brackets next to the cell. Once the cell has finished running the asterix will be replaced with a number.</li>
        <li>In most cases you'll want to start from the top of notebook and work your way down running each cell in turn. Later cells might depend on the results of earlier ones.</li>
        <li>To edit a code cell, just click on it and type stuff. Remember to run the cell once you've finished editing.</li>
    </ul>
</p>
</div>

You'll need the current version of the script to run this notebook, and the next cell will get that. (Remember if you want to make things more reproducible when you use the script with your own data, you'll want to edit calls such as this to fetch a specific version of the script. How to do this is touched upon in the comment below [here](https://stackoverflow.com/a/48587645/8508004).

In [1]:
!curl -O https://raw.githubusercontent.com/fomightez/sequencework/master/Extract_Details_or_Annotation/summarize_all_nts_even_ambiguous_present_in_FASTA.py

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 31034  100 31034    0     0   159k      0 --:--:-- --:--:-- --:--:--  158k


In [2]:
%pip install rich

Collecting rich
  Downloading rich-12.6.0-py3-none-any.whl (237 kB)
     |████████████████████████████████| 237 kB 4.6 MB/s            
[?25hCollecting commonmark<0.10.0,>=0.9.0
  Downloading commonmark-0.9.1-py2.py3-none-any.whl (51 kB)
     |████████████████████████████████| 51 kB 6.7 MB/s             
Installing collected packages: commonmark, rich
Successfully installed commonmark-0.9.1 rich-12.6.0
Note: you may need to restart the kernel to use updated packages.


## Display Usage / Help Block

In [3]:
%run summarize_all_nts_even_ambiguous_present_in_FASTA.py -h

usage: summarize_all_nts_even_ambiguous_present_in_FASTA.py
       [-h] [-cs] [-mc] SEQUENCE_FILE

summarize_all_nts_even_ambiguous_present_in_FASTA.py takes a sequence file
(FASTA-format) and summarizes the nts in the sequence(s). Assumes multi-FASTA,
but single sequence entry is fine, too. When running on the command line, it
will print out a summary table of counts of nucleotides and other character in
each sequence and totals. When calling the main function it will, by default,
return a dataframe with this information. Only valid for DNA sequences; script
has no step checking for data type, and so you are responsible for verifying
appropriate input. By default, the summary is case-insensitive; however, a
flag can be added at the time of calling the script so that it will tally the
lowercase & uppercase letters separately in the summary data table. You can
also choose to have the summary table display in the terminal in a simpler,
color-less form. **** Script by Wayne Decatur (fomig

To read more about this script beyond that and what is covered below, see [here](https://github.com/fomightez/sequencework/tree/master/AdjustFASTA_or_FASTQ).

-----

## Basic use examples set #1: Using from the command line (or equivalent / similar)

### Preparing for usage example

In [4]:
#write example FASTA to file
s = '''>evoli
atctgatctggggcgaaatgagactgatctgatctggtctgtggcgQqQ*
>smer
atctgaatctgagactatatgagactgatctgatctgctctgaagc
'''

!echo "{s}" > sequence.fa

### Run the script

In [5]:
%%bash
python summarize_all_nts_even_ambiguous_present_in_FASTA.py sequence.fa

┏━━━━━━━┳━━━━┳━━━━┳━━━━┳━━━━┳━━━━━┳━━━━━┳━━━┳━━━━━━━━━━━┳━━━━━┓
┃       ┃  A ┃  T ┃  C ┃  G ┃   Q ┃   * ┃ N ┃ Total_nts ┃ % N ┃
┡━━━━━━━╇━━━━╇━━━━╇━━━━╇━━━━╇━━━━━╇━━━━━╇━━━╇━━━━━━━━━━━╇━━━━━┩
│ evoli │  9 │ 13 │  8 │ 16 │ 3.0 │ 1.0 │ 0 │        50 │ 0.0 │
│  smer │ 13 │ 14 │  9 │ 10 │ 0.0 │ 0.0 │ 0 │        46 │ 0.0 │
│ TOTAL │ 22 │ 27 │ 17 │ 26 │ 3.0 │ 1.0 │ 0 │        96 │ 0.0 │
└───────┴────┴────┴────┴────┴─────┴─────┴───┴───────────┴─────┘


2 sequences provided in the sequence file.



Note the table summarizing the nts present **will have color when run in an actual terminal**. (Assuming it is modern terminal that has a few colors.)

That can be simulated in in a notebook, like so:

In [6]:
%run summarize_all_nts_even_ambiguous_present_in_FASTA.py sequence.fa

2 sequences provided in the sequence file.



It is like if use the `%%bash magic` as shown above but with color!! This is beccause `%run` is special and works well with Jupyter.

However, those used to working in a terminal may prefer no color. That is possible while using the full-featured `%run` to call the script by adding the `--mono` flag:

In [7]:
%run summarize_all_nts_even_ambiguous_present_in_FASTA.py --mono sequence.fa

2 sequences provided in the sequence file.



That `--mono` flag can be abbreviated `-mc`.

You can use a different flag to have the accounting of the letters in the sequencee be case-sensitive, like so:

In [8]:
%run summarize_all_nts_even_ambiguous_present_in_FASTA.py -cs sequence.fa

2 sequences provided in the sequence file.



The full version of that flag if you want to write it out, would be `--case_sensitive`.   
I'd suggest if you want the case-sensitive summary, you also colllect the 'default', case-insensitive summary table as I think they complement each other well.

------

## Basic use example set #2: Use the main function via import

Very useful for when using this in a Jupyter notebook to build into a pipeline or workflow.

Prepare first by  importing the main function from the script into the notbeook environment.

In [9]:
from summarize_all_nts_even_ambiguous_present_in_FASTA import summarize_all_nts_even_ambiguous_present_in_FASTA

(That call will look redundant; however, it actually means *from the file* `summarize_all_nts_even_ambiguous_present_in_FASTA.py`  *import the* `summarize_all_nts_even_ambiguous_present_in_FASTA()` *function*.)

With the main function imported, into the namespace, we are ready to call it for use. The needed argument for calling is the `sequence file`. Optionally, you can set if case-sensitive, with `case_sensitive=True`. 

The function will return a dataframe.

In [10]:
df = summarize_all_nts_even_ambiguous_present_in_FASTA("sequence.fa",case_sensitive=True)
df

2 sequences provided in the sequence file.



Unnamed: 0,a,t,c,g,Q,q,*,N,n,Total_nts,% N,% n,% N&n
evoli,9,13,8,16,2.0,1.0,1.0,0,0,50,0.0,0.0,0.0
smer,13,14,9,10,0.0,0.0,0.0,0,0,46,0.0,0.0,0.0
TOTAL,22,27,17,26,2.0,1.0,1.0,0,0,96,0.0,0.0,0.0


The `mono` setting is moot when using the main function inside Python. That setting is only for when using the script in the terminal, or terminal-equivalent, setting.


----

Enjoy!

Upload your own sequence files to any running Jupyter session and adapt the commands in this notebook to search wihin them. Edit the notebook or copy the necessary cells to make the script work with your own data.

----
### ADVANCED DEVELOPMENT NOTE

If editing the script (***ATYPICAL***) and using import of the main function to test changes here in this Jupyter notebook, you'll need to run the following code in order to specifically trigger import of the updated version of the code for the function subsequent to any edit. Otherwise, without a restart of the kernel, the notebook environment will see any call to import the function and essentially ignore it as it considers that function already imported into the notebook environment.

In [11]:
# Run this to have new code reflected in the version of the function in memory within the notebook namespace
import importlib
import summarize_all_nts_even_ambiguous_present_in_FASTA; importlib.reload( summarize_all_nts_even_ambiguous_present_in_FASTA ); from summarize_all_nts_even_ambiguous_present_in_FASTA import summarize_all_nts_even_ambiguous_present_in_FASTA
# above line from https://stackoverflow.com/a/11724154/8508004

----
