# Practical Python Programming for Biologists
Author: Dr. Daniel Pass | www.CompassBioinformatics.com

---

# Your code, on the command line

To make your Python code runnable from the command line there are a few main steps. The code itself should be equally comfortable on a local computer or a Jupyter notebook, and most of these steps are for the computer environment.

- Create the Python script: Write your Python code in a file with a .py extension - it is not technically essential but good practice.

- Define the shebang line: The first line of your script must include a shebang line to specify that it is Python and which interpreter to be used. This ensures that the correct Python version is invoked when running the script. For example:
```
#!/usr/bin/python or #!/usr/bin/env python3.
```
This could vary depending on your computer

- Add executable permissions (Mac/Linux only, perhaps?): If you are working on a Unix/Linux system (macs are a unix system), you need to make your Python script executable by assigning the appropriate permissions. Use the chmod command to set the executable permission: ```chmod +x script.py```

- Run the script: On the terminal move to the directory where your Python script is located, and run the script using ```./script.py```. If you have set up the shebang line correctly it shouldn't need you to specify python, but you can optionally do that if required: ```python script.py```

## Exercise - Run your code

Lets take the code that you developed for the Day1-Project and turn it into a command line program. Using the exact code, follow the steps above to create and run your code.

Note: You may not have the matplotlib library installed, or the environment setup to put the graph on screen. To instal matplotlib (if required) you simply use the commandline code:
```
pip install matplotlib
```
However, based on your installation on your computer you may require to use some variations i.e.
```
pip3 install matplotlib
```

Note2: You can replace the ```plt.show()``` with ```plt.savefig("my_plot.png")``` to save the figure as a file instead of displaying on the screen if your environment isn't set up for that (for example most servers don't have graphical output)


# Argparse

The argparse module in Python provides a powerful way to handle command-line arguments and options. It allows you to define the command-line interface for your script, specify the expected arguments and parameters, and automatically generate help messages.

It may not be something you want to do (you could edit all the variables inside your code) but it is a method to make your code run like a stand-alone program and be easily variable.

## Exercise - Create your first program - again

Here are the three main stages for creating parameter-controlled code. It cannot be ran in the Jupyter environment (I believe), so copy this code into your environment and test running it with different parameter options - Don't forget to add a shebang line and do the chmod!

Try running your script with just -h. It hasn't been defined, but it is a built-in parameter.

In [None]:
import argparse

# Initialise the parser class
parser = argparse.ArgumentParser(description='Description of your script')

# Define some options/arguments/parameters
parser.add_argument('-i', '--input', help='Path to input file')
parser.add_argument('-o', '--output', help='Path to output file', default='my_output.txt')

# Collect the inputted arguments into a dictionary
args = parser.parse_args()

print(args)

Let us once more return to our favourite activity and use sequence length and GC% as a useful function. Here is some code that you can add to your program (although you are welcome to write it again yourself!)

## Exercise - Adding parameters
- Combine the code below with your argparse code above and test run it with the ```co1_sequences.fasta``` file
- Add an additional argument for "minimum length" and set the default to 900.
- Use an if statement to only print the information line for sequences above the default 900. use your new parameter to output only over 1000

In [None]:
for seq_record in SeqIO.parse(args.input, 'fasta'):
    seq_len = len(seq_record)
    GC = (seq_record.seq.count("G") + seq_record.seq.count("C")) / seq_len * 100
    print("Sequence", seq_record.id, "has length", seq_len, "and GC of", str(round(GC, 2)) + "%")


# Subprocess - Including other programs in your python code

The ability to interact with external command-line tools and programs is crucial for data analysis and processing with a range of bioinformatic packages out there, and incorporating them into your workflows. The library ```subprocess``` that allows you to spawn new processes, connect to their input/output/error pipes, and obtain their returns. 

Firstly lets just look at running a basic command to make a directory, and then check in your local files

In [5]:
import subprocess

command = "mkdir test_directory"
subprocess.run(command, shell=True)

CompletedProcess(args='mkdir test_directory', returncode=0)

Alternatively, it can be important to capture the output of the other command you're running. We also include the ```text=True``` parameter so that the output is in the same format, otherwise it will be interpretted differently (Try removing that parameter to check)

In [7]:
import subprocess

command = "ls -l"
process = subprocess.run(command, shell=True, capture_output=True, text=True)
print(process.stdout)

total 8
drwxr-xr-x 1 root root 4096 May 19 13:32 sample_data
drwxr-xr-x 2 root root 4096 May 23 11:48 test_directory



We can combine multiple arguments togeher from a list

In [None]:
# Example 2: 
command = ["grep", "pattern", "file.txt"]
process = subprocess.run(command, capture_output=True, text=True)
print(process.stdout)



In [None]:
# Example 3: Running a command and capturing input/output
command = "blastp -query query.fasta -db database.fasta -out result.txt"
input_data = ">sequence1\nATGCATGC\n>sequence2\nGCTAGCTA"
process = subprocess.run(command, input=input_data, shell=True, capture_output=True, text=True)
print(process.stdout)

# Example 4: Handling errors and exceptions
command = "invalid_command"
try:
    process = subprocess.run(command, capture_output=True, text=True)
    process.check_returncode()  # Check if the process exited with a non-zero status
except subprocess.CalledProcessError as e:
    print("An error occurred:", e.stderr)