# Fasta2Structure use anywhere WITH NO LOCAL INSTALLATIONS

Fasta2Structure prepares data in the specific format required by the population gentics software research program STRUCTURE that was developed by [Pritchard et al. 2000](https://pubmed.ncbi.nlm.nih.gov/10835412/). See more about that software [here](https://web.stanford.edu/group/pritchardlab/structure.html) with [the current version of the STRUCTURE software (v. 2.3.4) and documentation available here](https://web.stanford.edu/group/pritchardlab/structure_software/release_versions/v2.3.4/html/structure.html).  
Fasta2Structure is described in [Bessa-Silva 2024 'Fasta2Structure: a user-friendly tool for converting multiple aligned FASTA files to STRUCTURE format'](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-024-05697-7).

This notebook will demonstrate using a modified script Fasta2Structure, `improved_Fasta2Structure.py`, in Jupyter and on the command line. The latter means it will be useful almost anywhere such as on remote machines or computer clusters. There is also the original Tkinter-based Python script by Adam Bessa you can run in your desktop, if you prefer. To try that in a virtual desktop without needing to install anything you own system [here](https://gist.github.com/fomightez/e65761a066f56cbbc4c9b5b882c87380) and find a step-by-step. It will not be as easy to use as the examples that I walk through in this Jupyter notebook file.

Importantly, what is here will demonstrate using the improved Fasta2Structure script on the command line, right in your favorite web browser, **without the need to install or do anything on your own system**.  
The section below mentions you can still use the original in the GUI and links to a resource where you step through doing that with no installations on your own machine. I want to point out though you can use the improved version of the script in that manner, too. This repo won't step through using the GUI with that script; however, you can easily do that by following the steps to use the original script and substituing the appropriate script file at that point in the instructions. I still emphasize the original in the instructions referenced below for two prosaic reasons: (1) I originally wrote the instructions before the improved version existed, and (2) the improved version builds on the original code and has not yet had the extent of review & usage of the original and so I want to make it clear how one can easily test them in the same situatuion if concerned.

-----

##### Absolutely need to try out the version currently in AdamBessa's Fasta2Structure repo & yet rather not touch your system?

You can still try the original GUI-based (Tkinter) software presently available at https://github.com/AdamBessa/Fasta2Structure without installing anything on your computer. You can go [here](https://gist.github.com/fomightez/e65761a066f56cbbc4c9b5b882c87380) and find a step-by-step to use a remote virtual desktop to test the Fasta2Structure script. Only you'll find it isn't as convenient as what is provided here.
    
----
    
----    


#### General Usage

You can easily see the general usage information by running with the `--help` flag, which can be shortened as `-h` as seen below.

In [1]:
%run improved_Fasta2Structure.py -h

usage: improved_Fasta2Structure.py [-h] INPUT_FASTA [INPUT_FASTA ...]

improved_Fasta2Structure.py converts Multiple Aligned FASTA Files to STRUCTURE Format.
It can be used in two ways:
1. GUI mode: When run without arguments in a desktop environment, it launches a GUI.
2. Command-line mode: Used with arguments to process FASTA files directly.

Command-line mode means it can run where used on a remote server without a grahical 
display serving Jupyter (headless) or integrated into workflow managment tools like
Snakemake & NextFlow.

Command-line usage examples:
- Single multi-sequence FASTA file: python improved_Fasta2Structure.py my_fasta.fa
- Multiple FASTA files: python improved_Fasta2Structure.py file1.fa file2.fa file3.fa 
- Directory with FASTA files: python improved_Fasta2Structure.py path/to/fasta/directory

Jupyter usage examples for situations where graphical display not connected or opting for text-based only:
- Single multi-sequence FASTA file: %run improved_Fasta2Structure

 If the script is called in a situaiton where it cannot cannot connect to the GUI interface to handle selecting the files to act on, such as when used on a remote headless server like here, the usage information will also print if the script is called and no arguments are suplied indicating the files to process.  
You can demonstrate that if you wish by editing the cell above to remove the `-h` and then executing the command `%run improved_Fasta2Structure.py` alone.

## Practical Usage Examples on the command line and in Jupyter

I refer how to use the software the GUI-way at the end of the header section just above the 'General Usage' section.  
The rest of this notebook wil cover using the improved script in Jupyter/command line situations.

#### Scenario: Use individual filepaths/filenames

This example will be more detailed than the following scenarios, however, you can use these same approaches to explore the results of the additional scenarios.

In [2]:
%run improved_Fasta2Structure.py Example_data/Datasets/ITS.fas Example_data/Datasets/trnD-trnT.fas Example_data/Datasets/trnH-trnK.fas 

Converted files saved as: Structure.str


You'll see it say, '`Converted files saved as: Structure.str`'. When given input multiple files, the script saves the result with a generic file name, `Structure.str`.

(I'll add here a reminder that if that command wasn't already written out in the cell above, you can actually get help writing the file paths by using the '`Tab`' button to get autocomplete to work on the parts of the paths. For example, after `%run improved_Fasta2Structure.py ` start writing `Exa` and then hit the '`Tab`' button twice and you'll see it autocomplete to `Example_data/`. You can keep doing that with each part.)

Let's demonstrate the result is saved as a generic file name by listing the current files.

In [3]:
ls

[0m[01;34mbinder[0m/             [01;34mFasta2Structure_Windows[0m/     log.log        [01;34mtests[0m/
[01;34mExample_data[0m/       improved_Fasta2Structure.py  README.md
Fasta2Structure.py  index.ipynb                  Structure.str


Next we'll renename `Structure.str` to be a distinguishable name before running the script again because it may well save different data with the same name. (And do the same with the `log.log` that also happens to get saved when the script executes, albeit without feedback as such.) But first I want to point out that `%run improved_Fasta2Structure.py` command above is the version for running this in Juputer. If you were to run that equivalent command purely on the command line in say a terminal/console, the command would be along the lines of the following on your system:

```shell
python improved_Fasta2Structure.py Example_data/Datasets/ITS.fas Example_data/Datasets/trnD-trnT.fas Example_data/Datasets/trnH-trnK.fas 
```
(It is more rare these days because we are farther from the days where Python 2 and 3 were both in play, yet on some machines you'll need `python3` instead of `python`.)

You can run that variation here as well, but for doing that in a Jupyter cell you need to add an exclamation point in front of the `python`. I stick with using the `%run` here in the demonstration notebook because you get a better experience with other, more complex scripts in Jupyter using `%run` and so it is good to be familiar with the best practice for calling scripts in a running Jupyter notebook.

Let's return to addressing the renaming of the two output files to be clear as to when & where they originated before running the script again and clobbering that ouput with what may well be different data with the same name. Running the next cell will do that for the two output files. 

In [4]:
!mv Structure.str all_three_examples_called_individually_Structure.str
!mv log.log all_three_examples_called_individually_log.log

And we can confirm that worked by listing the files again.

In [5]:
ls 

all_three_examples_called_individually_log.log
all_three_examples_called_individually_Structure.str
[0m[01;34mbinder[0m/
[01;34mExample_data[0m/
Fasta2Structure.py
[01;34mFasta2Structure_Windows[0m/
improved_Fasta2Structure.py
index.ipynb
README.md
[01;34mtests[0m/


Let's examine the top lines of main result:

In [6]:
!head all_three_examples_called_individually_Structure.str

AgMRJ10_1 3  1  3  0  0  2  2  1  3  2  3  3  3  2  2  3  0  1  2  2  0  2  3  2  1  2  0  2  1  1  1  3  0  1  0  2  0  1  1  2  0  3  2  2  2  0  2  2  0  3  -9  2  3  0  1  3  1  1  2  1  1  1  0  2  0  1  2  2  3  2  2  2  1  0  2  3  2  1  0  0  3  1  3  3  -9  1  0  1  1 
AgMRJ10_2 3  1  3  0  0  2  2  1  3  2  3  3  3  2  2  3  0  1  2  2  0  2  3  2  1  2  0  2  1  1  1  3  3  1  0  2  0  1  1  2  0  3  2  2  2  0  2  2  0  3  -9  2  3  0  1  3  1  1  2  1  1  1  0  2  0  1  2  2  3  2  2  2  1  0  2  3  2  1  0  0  3  1  3  3  -9  1  0  1  1 
AgMRJ12_1 3  1  3  0  0  2  2  1  3  2  3  3  3  2  1  3  0  1  2  2  0  2  3  2  1  2  0  2  1  1  1  3  3  1  0  2  0  1  1  2  3  3  2  2  2  0  2  2  0  3  3  2  3  2  1  3  1  1  2  1  1  1  0  2  0  1  2  2  3  2  2  2  1  0  2  3  2  1  0  0  3  1  3  3  -9  1  0  1  1 
AgMRJ12_2 3  1  3  0  0  2  2  1  3  2  3  3  3  2  1  3  0  1  2  2  0  2  3  2  1  2  0  2  1  1  1  3  3  1  0  2  0  1  1  2  3  3  2  2  2  0  2  2  0  3  3  2

Oddly, that looks similar to what the original script author provides as an example result in [`Example_data/Results/Structure.str`](https://github.com/AdamBessa/Fasta2Structure/blob/b88da439a0dca47ef94ac1f1cd8ffe0f30703ce0/Example_data/Results/Structure.str) , but there are two issues I describe in an issue posted [here](https://github.com/AdamBessa/Fasta2Structure/issues/4). I built in tests to make sure the `improved_Fasta2Structure.py` script gives the same result as when I run the GUI with a few input examples, including what results [from that situation when the GUI bersion is used](https://github.com/AdamBessa/Fasta2Structure/assets/4700990/4ea587b2-fdba-4755-9b68-309db41488c1) by me at this time.  So for now, if the top of yours looks like the following, then it should be correct as far as I can vouch for at this times.

```text
AgMRJ10_1 3  1  3  0  0  2  2  1  3  2  3  3  3  2  2  3  0  1  2  2  0  2  3  2  1  2  0  2  1  1  1  3  0  1  0  2  0  1  1  2  0  3  2  2  2  0  2  2  0  3  -9  2  3  0  1  3  1  1  2  1  1  1  0  2  0  1  2  2  3  2  2  2  1  0  2  3  2  1  0  0  3  1  3  3  -9  1  0  1  1 
AgMRJ10_2 3  1  3  0  0  2  2  1  3  2  3  3  3  2  2  3  0  1  2  2  0  2  3  2  1  2  0  2  1  1  1  3  3  1  0  2  0  1  1  2  0  3  2  2  2  0  2  2  0  3  -9  2  3  0  1  3  1  1  2  1  1  1  0  2  0  1  2  2  3  2  2  2  1  0  2  3  2  1  0  0  3  1  3  3  -9  1  0  1  1 
AgMRJ12_1 3  1  3  0  0  2  2  1  3  2  3  3  3  2  1  3  0  1  2  2  0  2  3  2  1  2  0  2  1  1  1  3  3  1  0  2  0  1  1  2  3  3  2  2  2  0  2  2  0  3  3  2  3  2  1  3  1  1  2  1  1  1  0  2  0  1  2  2  3  2  2  2  1  0  2  3  2  1  0  0  3  1  3  3  -9  1  0  1  1 
AgMRJ12_2 3  1  3  0  0  2  2  1  3  2  3  3  3  2  1  3  0  1  2  2  0  2  3  2  1  2  0  2  1  1  1  3  3  1  0  2  0  1  1  2  3  3  2  2  2  0  2  2  0  3  3  2  3  2  1  3  1  1  2  1  1  1  0  2  0  1  2  2  3  2  2  2  1  0  2  3  2  1  0  0  3  1  3  3  -9  1  0  1  1 
AgMRJ14_1 3  1  3  0  0  2  2  1  3  2  3  3  3  2  2  3  0  1  2  2  0  2  3  2  1  2  0  2  1  1  1  3  0  1  0  2  0  1  1  2  0  3  2  2  2  0  2  2  0  3  3  2  3  2  1  3  1  1  2  1  1  1  0  2  0  1  2  2  3  2  2  2  1  0  2  3  2  1  0  0  3  1  3  3  -9  1  0  1  1 
AgMRJ14_2 3  1  3  0  0  2  2  1  3  2  3  3  3  2  2  3  0  1  2  2  0  2  3  2  1  2  0  2  1  1  1  3  0  1  0  2  0  1  1  2  0  3  2  2  2  0  2  2  0  3  3  2  3  2  1  3  1  1  2  1  1  1  0  2  0  1  2  2  3  2  2  2  1  0  2  3  2  1  0  0  3  1  3  3  -9  1  0  1  1 
AgMRJ17_1 3  1  3  0  0  2  2  1  3  2  3  3  3  2  1  3  0  1  2  2  0  2  3  2  1  2  0  2  1  1  1  3  3  1  0  2  0  1  1  2  3  3  2  2  2  0  2  2  0  3  3  2  3  2  1  3  1  1  2  1  1  1  0  2  0  1  2  2  3  2  2  2  1  0  2  3  2  1  0  0  3  1  3  3  -9  1  0  1  1 
AgMRJ17_2 3  1  3  0  0  2  2  1  3  2  3  3  3  2  1  3  0  1  2  2  0  2  3  2  1  2  0  2  1  1  1  3  3  1  0  2  0  1  1  2  3  3  2  2  2  0  2  2  0  3  3  2  3  2  1  3  1  1  2  1  1  1  0  2  0  1  2  2  3  2  2  2  1  0  2  3  2  1  0  0  3  1  3  3  -9  1  0  1  1 
AgMRJ19_1 3  1  3  0  0  2  2  1  3  2  3  3  3  2  2  3  0  1  2  2  0  1  3  2  1  2  0  2  1  1  1  3  3  1  0  2  0  1  1  2  0  3  2  2  2  0  2  2  0  3  3  2  3  2  1  3  1  1  2  1  1  1  0  2  0  1  2  2  3  2  2  2  1  0  2  3  2  1  0  0  3  1  3  3  -9  1  0  1  1 
AgMRJ19_2 3  1  3  0  0  2  2  1  3  2  3  3  3  2  2  3  0  1  2  2  0  1  3  2  1  2  0  2  1  1  1  3  3  1  0  2  0  1  1  2  0  1  2  2  2  0  2  2  0  3  3  2  3  2  1  3  1  1  2  1  1  1  0  2  0  1  2  2  3  2  2  2  1  0  2  3  2  1  0  0  3  1  3  3  -9  1  0  1  1 
```

And the log for that:

In [7]:
!cat all_three_examples_called_individually_log.log

root - INFO - 3 FASTA files selected.
root - INFO - Variable sites for Example_data/Datasets/ITS.fas: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 73, 78, 93, 96, 102, 106, 110, 122, 126, 131, 132, 141, 159, 178, 179, 180, 184, 200, 297, 391, 424, 447, 478, 492, 495, 496, 500, 501, 530, 574, 579, 612, 630, 633, 640, 647, 648, 649, 650, 651]
root - INFO - Variable sites for Example_data/Datasets/trnD-trnT.fas: [107, 292, 297, 366, 477, 592, 610, 627, 645, 721]
root - INFO - Variable sites for Example_data/Datasets/trnH-trnK.fas: [0, 1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 14, 15, 16, 18, 23, 28, 54, 61, 91, 122, 162, 268, 324, 507, 701, 777, 786]


The developer of the original script `Fasta2Structure.py` that uses a GUI to select the file provided results that would be obtained. Let's compare that to your results.

In [8]:
!cat Example_data/Results/log.log

root - INFO - 3 FASTA files selected.
root - INFO - Variable sites for C:/Users/adam-/OneDrive/�rea de Trabalho/Artigo_BMC/Exemple-Data/Avicennia-ITS_Phase.fas: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 73, 78, 93, 96, 102, 106, 110, 122, 126, 131, 132, 141, 159, 178, 179, 180, 184, 200, 297, 391, 424, 447, 478, 492, 495, 496, 500, 501, 530, 574, 579, 612, 630, 633, 640, 647, 648, 649, 650, 651]
root - INFO - Variable sites for C:/Users/adam-/OneDrive/�rea de Trabalho/Artigo_BMC/Exemple-Data/Avicennia-trnD-trnT_ediphase.fas: [107, 292, 297, 366, 477, 592, 610, 627, 645, 721]
root - INFO - Variable sites for C:/Users/adam-/OneDrive/�rea de Trabalho/Artigo_BMC/Exemple-Data/Avicennia-trnH-trnK_editphase.fas: [0, 1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 14, 15, 16, 18, 23, 28, 54, 61, 91, 122, 162, 268, 324, 507, 701, 777, 786]


That looks sort of like what we got for `all_three_examples_called_individually_log.log`; however, it would be best to validate precisely that it is the same.

Running the next cell will compare the elements in each of the lists of 'Variable sites' to see if match.

In [9]:
#compare numbers in the Variable sites list for provided `Example_data/Results/log.log` vs. all_three_examples_called_individually_log.log
# File paths
original_file_path = "Example_data/Results/log.log"
new_file_path = "all_three_examples_called_individually_log.log"
def parse_number_FASTA_selected(file_path):
    with open(file_path, 'r', encoding='utf-8', errors='replace') as file:
        log_content=file.read()
        return log_content.split("FASTA files selected.",1)[0].split()[-1].strip()

def extract_tag(file_path, process_number):
    with open(file_path, 'r', encoding='utf-8', errors='replace') as file:
        log_content=file.read()
        if 'Avicennia-' in log_content:
            return log_content.split(".fas",process_number)[process_number-1].split("Avicennia-")[1].split("_",1)[0]+".fas"
        else:
            return log_content.split(".fas",process_number)[process_number-1].split("/")[-1]+".fas"
def parse_variable_sites_info(file_path,process_number):
    with open(file_path, 'r', encoding='utf-8', errors='replace') as file:
        log_content=file.read()
        return "["+log_content.split("[",process_number)[process_number].split("]")[0]+"]"
def compare_log_info(file1_path, file2_path):
    # parse out number of files selected
    sel1 = parse_number_FASTA_selected(file1_path)
    sel2 = parse_number_FASTA_selected(file2_path)
    assert sel1 == sel2, f"Files have different number of FASTA files selected: {sel1} vs {sel2}"
    # make dictonary of variable sites with tags in FASTA file names as keys
    tag_and_variable_sites_f1 = {}
    tag_and_variable_sites_f2 = {}
    for i in range(int(sel1)):
        tag1 = extract_tag(file1_path,i+1)
        tag_and_variable_sites_f1[tag1] = parse_variable_sites_info(file1_path, i+1)
        tag2 = extract_tag(file2_path,i+1)
        tag_and_variable_sites_f2[tag2] = parse_variable_sites_info(file2_path, i+1)
    # Now iterate on the dictionary and compare the variable sites for each tag.
    for fastafile,vs1 in tag_and_variable_sites_f1.items():
        print(f"Comparison for sites related to the `{fastafile}` file:")
        vs2 = tag_and_variable_sites_f2[fastafile]
        if vs1 == vs2:
            print(f"  No deviations found.")
        else:
            orig_set = set([int(x) for x in vs1[1:].split("]")[0].split(", ")])
            new_set = set([int(x) for x in vs2[1:].split("]")[0].split(", ")])
            diff_orig = orig_set - new_set
            diff_new = new_set - orig_set
            if diff_orig:
                print(f"  Sites in original but not in new result: {sorted(diff_orig)}")
            if diff_new:
                print(f"  Sites in new result but not in original: {sorted(diff_new)}")
# Compare sites
compare_log_info(original_file_path, new_file_path)

Comparison for sites related to the `ITS.fas` file:
  No deviations found.
Comparison for sites related to the `trnD-trnT.fas` file:
  No deviations found.
Comparison for sites related to the `trnH-trnK.fas` file:
  No deviations found.


For each it should say, `No deviations found.`

#### Scenario: Use an ipywidget's based GUI to make submitting the command more convenient

The original software `Fasta2Structure.py` runs on your desktop and makes a simple GUI that lets you select files to feed the script and then does the conversion once you selected those files.  
Needing the GUI to run on your desktop can be limiting yet makes it user-friendly because users don't have to write out each file. Using ipywidgets that convenience can be added on top of `improved_Fasta2Structure.py` if users prefer.

The code below will show everything in `Example_data/Datasets` in a widget that allows multiple file names to be selected at the same time.

In [1]:
import os
import ipywidgets as widgets
directory = 'Example_data/Datasets'
file_info = [(f, os.path.join(directory, f)) for f in os.listdir(directory) if os.path.isfile(os.path.join(directory, f))]
style = {'description_width': 'initial'}
file_selector = widgets.SelectMultiple(
    options=[(basename, full_path) for basename, full_path in file_info],
    description='Select files in box to right to select:',
    disabled=False,
    layout={'width': 'max-content'},
    style = style # to make description show up entirely; based on https://stackoverflow.com/a/72721538/8508004
)
display(file_selector)

SelectMultiple(description='Select files in box to right to select:', layout=Layout(width='max-content'), opti…

Click on the file names to make selections. You want to select all three of them though.



Select all three files by clicking on the bottom one and holding down shift and selecting all of them.  

This note [here about that widget](https://ipywidgets.readthedocs.io/en/latest/examples/Widget%20List.html#selectmultiple):

>"Multiple values can be selected with shift and/or ctrl (or command) pressed and mouse clicks or arrow keys."

Then run the next cell to show you have that as your set of selected files.

In [2]:
selected_files = (" ").join(list(file_selector.value))
print(selected_files)

Example_data/Datasets/ITS.fas Example_data/Datasets/trnD-trnT.fas Example_data/Datasets/trnH-trnK.fas


Now run the script targeting those three files by running the next cell.

In [3]:
%run improved_Fasta2Structure.py {selected_files}

Converted files saved as: Structure.str


(Note I simply could have run `%run improved_Fasta2Structure.py {(" ").join(list(file_selector.value))}` and skipped the previous cell but I wanted to make clear what is happening.

In [4]:
!mv Structure.str all_three_examples_selected_by_file_selector_Structure.str
!mv log.log all_three_examples_selected_by_file_selector_log.log

Let's see what result files we have at this point.

In [5]:
ls

all_three_examples_called_individually_log.log
all_three_examples_called_individually_Structure.str
all_three_examples_selected_by_file_selector_log.log
all_three_examples_selected_by_file_selector_Structure.str
[0m[01;34mbinder[0m/
directory_utilized_Structure.str
[01;34mExample_data[0m/
Fasta2Structure.py
[01;34mFasta2Structure_Windows[0m/
improved_Fasta2Structure.py
index.ipynb
None_Structure.str
README.md
[01;34mtests[0m/


I'll leave it to you to investigate and see if the results in this section are the same as above.

I would argue that this section provides a more user-friendly version of the `Fasta2Structure.py` because it allows a GUI to help choose the files, but can be run pretty much anywhere.

You should also take time to re-run the cells in this section and see that although, I guided you through choosing all three FASTA files using the widget, you can select only one or two if you prefer. Things will run much the same and you can be sure you are specifying the files correctly without much typing. I'll point out though that because you can use Tab

The last two sections above turned out to be processing all the FASTA files in that directory. There is an easier way to indicate this type of situation when calling the script.

#### Scenario: Use a directory

One of the improvements to the script is that you can point the `improved_Fasta2Structure.py` script at a directory and it will process the FASTA files it recognizes in the directory. (Note they extensions must match the expected ones, presently `.fa`, `.fasta`, `.fas` are handled. I'd be happy to add more or users can edit the script aroung lines 170 or so to add there preferred options without waiting.)

This section steps through demonstrating use of a directory.

In [1]:
%run improved_Fasta2Structure.py Example_data/Datasets/

Converted files saved as: Structure.str


Note that produces the same as `%run improved_Fasta2Structure.py Example_data/Datasets/ITS.fas Example_data/Datasets/trnD-trnT.fas Example_data/Datasets/trnH-trnK.fas` beause those are the files in that directory and the order they get pased based on sorting. (If this wasn't the case for what you want, youd have to specify the filepasths as arguments with the order you need.)

That situation also saves it with a generic file name, `Structure.str`. again, rename it & the log file to distinguish them before running the script a next time.

In [2]:
!mv Structure.str directory_utilized_Structure.str
!mv log.log directory_utilized_log.log

In [3]:
ls

all_three_examples_called_individually_log.log
all_three_examples_called_individually_Structure.str
all_three_examples_selected_by_file_selector_log.log
all_three_examples_selected_by_file_selector_Structure.str
[0m[01;34mbinder[0m/
directory_utilized_log.log
directory_utilized_Structure.str
[01;34mExample_data[0m/
Fasta2Structure.py
[01;34mFasta2Structure_Windows[0m/
improved_Fasta2Structure.py
index.ipynb
None_Structure.str
README.md
[01;34mtests[0m/


#### Scenario: Use a directory in conjunction with ipywidget's / ipyfilechooser-provided GUI 

The original user-friendly tag for `Fasta2Structure.py` was because it had a GUI interface. Jupyter offers the option to add a user interface to make directory selections as well and because it is flexible the improved script can easily be adapted. Next we'll demonstrate the process of combining the process presented in the last section with a GUI.

In [1]:
import ipywidgets as widgets
from ipyfilechooser import FileChooser
# Create and display a FileChooser widget
fc = FileChooser('.')
fc.show_only_dirs = True # Create a FileChooser widget that can only choose directories
display(fc)

FileChooser(path='/home/jovyan', filename='', title='', show_hidden=False, select_desc='Select', change_desc='…

To use that, click on 'Select', and then navigate to `Example_data` and then to the `Datasets` subfolder and then click '`Select`' again to lock in the selection & end the selection process.

With that process completed, go ahead and run the next cell to actually process all the files in the specified directory at the same time as if they were selected. 

In [2]:
selected_directory = fc.selected
%run improved_Fasta2Structure.py {selected_directory}

Converted files saved as: Structure.str


I could have shortened the two lines in the cell above to the following:

```python
%run improved_Fasta2Structure.py {fc.selected}
```

I kept it long to better clarify for those less familar with Python, and thus not maybe comfortable with attribute notation like `fc.selected`, what is occuring.

To better set up to allow making sure this worked as well as the last section, we need to rename the produced files again.

In [3]:
!mv Structure.str directory_GUI_Structure.str
!mv log.log directory_GUI_log.log

Review the files present now.

In [4]:
ls

all_three_examples_called_individually_log.log
all_three_examples_called_individually_Structure.str
all_three_examples_selected_by_file_selector_log.log
all_three_examples_selected_by_file_selector_Structure.str
[0m[01;34mbinder[0m/
directory_GUI_log.log
directory_GUI_Structure.str
directory_utilized_log.log
directory_utilized_Structure.str
[01;34mExample_data[0m/
Fasta2Structure.py
[01;34mFasta2Structure_Windows[0m/
improved_Fasta2Structure.py
index.ipynb
None_Structure.str
README.md
[01;34mtests[0m/


Feel free to go ahead and examine the files made in these last two sections to see they have much the same content and only really differ by how the filepaths are specified as full verse relative.

Also note that we didn't do anything very different here compared with the prior section. We simply added in using a GUI to make selecting the directory in the command more convenient because we didn't have to look up & write out the full path in the command ourselves. Overall, the process enacted by the script is exactly the same.

#### Troubleshooting

Note if you start to not see the log file, `log.log`, get generated that should be produced when using improved_Fasta2Structure.py, you should first try restarting the kernel and sticking with `%run`, see [here](https://stackoverflow.com/a/48005958/8508004) for more about that. If that fails, you can try to change the `%run` to `!python`.  

