# Execute Shell Commands In Python



## Execute Shell Commands with '!'

In [1]:
# To directly execute shell command in Jupyter Notebook, put a "!" before the command
!echo This is a shell command

This is a shell command


In [2]:
# another example
!ls -lh

total 120
-rw-r--r--  1 hq  staff    31K Apr 24 23:56 execute_shell_commands_in_python.ipynb
-rw-r--r--  1 hq  staff    26K Apr 23 16:26 explor_jupyter_notebook_environment.ipynb


## Use the magic command to turn the cell into shell cell

In [3]:
%%bash

echo All commands in this cell are considered shell commands
ls -hl
pwd

All commands in this cell are considered shell commands
total 120
-rw-r--r--  1 hq  staff    31K Apr 24 23:56 execute_shell_commands_in_python.ipynb
-rw-r--r--  1 hq  staff    26K Apr 23 16:26 explor_jupyter_notebook_environment.ipynb
/Users/hq/Documents/pkg/py_genome_sci_book/analysis/python_basic


## Subprocess package
The subprocess package allows you execute shell commands within python. It provides two API to execute command: `subprocess.run` and `subprocess.Popen`. Here I only introduce you `subprocess.run`, `subprocess.Popen` will be introcduced later in a advanced section, since it is a lower level API.

In [4]:
# import the run function directly and the PIPE
from subprocess import run, PIPE

In [5]:
# The sleep command will sleep X seconds before return.
# We will use it to mimic any other commands, 
# you can imaging change the sleep command into any other commands like salmon quant
!sleep 1

### Run a command just like in the shell

In [6]:
return_object = run('sleep 1', shell=True)  # I executed a command here
return_object

CompletedProcess(args='sleep 1', returncode=0)

Let's take a look what is returned by the run function

In [7]:
# it is an instance of the subprocess.CompletedProcess class
type(return_object)

subprocess.CompletedProcess

This subprocess.CompletedProcess class has several methods and attributions you need to know

In [8]:
for i in dir(return_object):
    if i.startswith('__'):  
        # some methods start with __ is skipped, don't need to understand them here
        # we will talk about this in later advanced sections.
        continue
    print(i)

args
check_returncode
returncode
stderr
stdout


Let's write a function to print them out

In [9]:
def print_return_obj(return_obj):
    print(f"""The args was: {return_obj.args}
The returncode was: {return_obj.returncode}
The stderr information was: {return_obj.stderr}
The stdout information was: {return_obj.stdout}""")

In [10]:
print_return_obj(return_object)

The args was: sleep 1
The returncode was: 0
The stderr information was: None
The stdout information was: None


## Gather the stdout and stderr

In [11]:
# now let's use a compound command that print something into stderr and stdout

# There are three commands: 
# 1. sleep 1 sec so you feel it running
# 2. echo something into stdout, the ">&1" redirect information into stdout
# 3. echo something into stderr, the ">&2" redirect information into stderr

!sleep 1; echo "some stdout information" >&1; echo "some stderr information" >&2

some stdout information
some stderr information


### By default, stdout and stderr is not captured

In [12]:
three_command = 'sleep 1; echo "some stdout information" >&1; echo "some stderr information" >&2'

If you execute the command use run() like this, you will not see any stdout and stderr

In [13]:
return_obj = run(three_command, shell=True)
print_return_obj(return_obj)

The args was: sleep 1; echo "some stdout information" >&1; echo "some stderr information" >&2
The returncode was: 0
The stderr information was: None
The stdout information was: None


### Capture the stderr or stdout with PIPE

In [14]:
return_obj = run(three_command, shell=True, stderr=PIPE, stdout=PIPE)
print_return_obj(return_obj)

The args was: sleep 1; echo "some stdout information" >&1; echo "some stderr information" >&2
The returncode was: 0
The stderr information was: b'some stderr information\n'
The stdout information was: b'some stdout information\n'


Now you can see the information, but the stdout and stderr are bytes but not string

In [15]:
type(return_obj.stderr)

bytes

To get string, you can provide a encode parameter, remember I explained bytes vs string in the [File I/O page](https://hq-1.gitbook.io/essential-python-for-genome-science/data-cleaning/file-i-o).

In [16]:
return_obj = run(three_command, shell=True, 
                 stderr=PIPE, stdout=PIPE, encoding='utf8')
print_return_obj(return_obj)

The args was: sleep 1; echo "some stdout information" >&1; echo "some stderr information" >&2
The returncode was: 0
The stderr information was: some stderr information

The stdout information was: some stdout information



In [17]:
type(return_obj.stderr)

str

## Check return code

The definition of "success" based on return code is very simple, if return code is 0, the command is successfully finished, otherwise, it's failed. 

By default, subprocess.run will not raise any error if the command has non-zero return code, however, you can change this by set check=True

In [18]:
# In this command, I deliberately return a non-zero return code
non_zero_return_code_command = 'sleep 1; exit 1'

return_obj = run(non_zero_return_code_command, shell=True, 
                 stderr=PIPE, stdout=PIPE, encoding='utf8')
print_return_obj(return_obj)

# there is no error, because check=False by default. But the returncode was 1

The args was: sleep 1; exit 1
The returncode was: 1
The stderr information was: 
The stdout information was: 


In [19]:
# If we add check=True

return_obj = run(non_zero_return_code_command, shell=True, check=True,
                 stderr=PIPE, stdout=PIPE, encoding='utf8')
print_return_obj(return_obj)

# we got a CalledProcessError, and you know the command failed.

CalledProcessError: Command 'sleep 1; exit 1' returned non-zero exit status 1.

### Use try except to catch the error and do something arround it

When you got an error, its not the end of the world. You can catch the error and do something to deal with it, for example, try rerun the command automatically, or delete the temporary files to prevent leaving incomplete results

In [20]:
# In order to catch this special error defined in the subprocess package, you need to import it first
from subprocess import CalledProcessError

try:
    return_obj = run(non_zero_return_code_command, shell=True, check=True,
                 stderr=PIPE, stdout=PIPE, encoding='utf8')
    print_return_obj(return_obj)
except CalledProcessError as error:
    # the error also contain informations about the process
    print('The command has returncode:', error.returncode)
    print('Now we can do something about this error, like delete temporary files or try rerun it.')
    print('Once you done the clean up, you can still raise the error to alert users')
    raise error


The command has returncode: 1
Now we can do something about this error, like delete temporary files or try rerun it.
Once you done the clean up, you can still raise the error to alert users


CalledProcessError: Command 'sleep 1; exit 1' returned non-zero exit status 1.

## Provide command list when shell=False

All the above commands are provided as a string, and I set the shell=True. This is actually not the default way to use run(). The default way to use run() is shell=False, and you have to provide the command as a list but not string

In [21]:
simple_command = 'sleep 1'

# we set shell=False, which is the default
return_obj = run(simple_command, shell=False, check=True,
                 stderr=PIPE, stdout=PIPE, encoding='utf8')
print_return_obj(return_obj)

# This error is not really about "FileNotFoundError", but becuase the command was not parsed correctly. 
# When shell=False, the run() expect command provided as a list

FileNotFoundError: [Errno 2] No such file or directory: 'sleep 1': 'sleep 1'

In [22]:
simple_command_list = ['sleep', '1']

return_obj = run(simple_command_list, shell=False, check=True,
                 stderr=PIPE, stdout=PIPE, encoding='utf8')
print_return_obj(return_obj)

# Now it's OK

The args was: ['sleep', '1']
The returncode was: 0
The stderr information was: 
The stdout information was: 


## OPTIONAL - Why the shell=False is default?

The reason shell=False is default is due to some [security considerations](https://docs.python.org/3.8/library/subprocess.html#security-considerations). This is something package developer need to pay attention, however, if you only execute some code that you generated by yourself, it's OK to use shell=True. But following the python official documentation may also be a good choice, it just needs you provide your command as a list.

Another drawback with shell=False is that you can not use the UNIX pipe or redirect like the example above. In order to use them, you need use the lower level API subprocee.Popen, which will be explained in a later section. 

Or, you need to set shell=True.

In [27]:
import shlex  # shlex.split is a clever spliter for shell command

command_with_redirect = 'echo "some stderr information" >&2'
return_obj = run(shlex.split(command_with_redirect), shell=False, check=True,
                 stderr=PIPE, stdout=PIPE, encoding='utf8')  # shlex.split is a clever spliter for shell command
print_return_obj(return_obj)

# the output should be in stderr, this is not correct result.
# Because run() do not support redirect like this when shell=False

The args was: ['echo', 'some stderr information', '>&2']
The returncode was: 0
The stderr information was: 
The stdout information was: some stderr information >&2



In [28]:
# if shell=True
return_obj = run(command_with_redirect, shell=True, check=True,
                 stderr=PIPE, stdout=PIPE, encoding='utf8')  # shlex.split is a clever spliter for shell command
print_return_obj(return_obj)

# see the difference? The information is printed out in stderr but not stdout

The args was: echo "some stderr information" >&2
The returncode was: 0
The stderr information was: some stderr information

The stdout information was: 


## Take home message

- subprocess.run() is the most common API for running shell command in python.
- set stderr=PIPE and stdout=PIPE to capture the information printed in these two system file handles
- set encoding='utf8' to make sure you got string but not bytes from stderr and stdout
- Returncode == 0 means job succeed, otherwise means failure. If check=True, non-zero return code triggers subprocess.CalledProcessError
- Try ... Except ... allows you catch an error and deal with it
- By default, shell=False and you need to provide command as a list, redirct and pipe will not work.
- When shell=True, you can provide command as a string and any command works just like in shell