# Subprocess

You can write code to perform many tasks in Python. However, it is much to use tools that someone else has already made instead of writing everything yourself. In a previous week, we covered how you can use Python packages and modules to use code written by yourself or others in any Python script. But what about when you want to run other command line applications outside of Python? In those cases you can use the Python module, "subprocess", to run those command line applications and store their output.

## run

[The subprocess module](https://docs.python.org/3/library/subprocess.html) includes a range of functionality for creating and checking the output of processes. For this demonstration, we are going to stick with the basic functionality of `subprocess.run()`. There are other functions which provide more control and options, but `run()` is typically going to be all you need. 

`run()` takes a list of arguments as well as any optional settings you want to provide. It runs the command made by combining the provided arguments, waits for the command to finish executing, and then returns a class called `CompletedProcess` which has a range of attributes that allow you to interact with the output of the command.

A basic `subprocess.run()` command looks like this (note the command to run is a `list` of `str` instances):

In [1]:
import subprocess

result = subprocess.run(["ls"])

print(result)

subprocess.ipynb
CompletedProcess(args=['ls'], returncode=0)


The `CompletedProcess` print out tells us that the command `ls` was run and produced an exitcode (or returncode here) of 0, meaning it ran successfully. 

However, we don't see the contents of our directory. That's because the default behaviour of `subprocess.run()` is simply to run the command and return whether it worked or not (via the returncode). In the case of `ls`, that isn't what we want. We'll come back to that. First, let's show that `subprocess.run()` is actually running the command it claims to be.

In [2]:
result = subprocess.run(["touch", "a_file.txt"])

print(result)

CompletedProcess(args=['touch', 'a_file.txt'], returncode=0)


Again, we can see a description of our command as well as a return code. If you look at the contents of your current directory, you will also see the product of the executed command, "a_file.txt".

## Getting stdout and stderr

So, `subprocess.run()` does indeed run the command you give it. What about getting the output of the command? To do that you just need to tell `subprocess.run()` to capture the output of the command. You can do that with the aptly named option "capture_output"

In [3]:
result = subprocess.run(["ls"], capture_output=True)

print(result)

CompletedProcess(args=['ls'], returncode=0, stdout=b'a_file.txt\nsubprocess.ipynb\n', stderr=b'')


Now our `CompletedProcess` instance has two new attributes, stdout and stderr. We can see that both of those attributes have values associated with them that look like what we would expect. However, there is something weird about them - they look like `str` instances in that they have quotes around them, but they have the letter "b" before the first quote. That "b" tells you that what looks like a `str` is actually a `bytes`. We can see that by printing the `type()` of the stdout

In [4]:
print(type(result.stdout))

<class 'bytes'>


You can read a bit about `bytes` objects [in the Python docs](https://docs.python.org/3/library/stdtypes.html#bytes). We won't discuss them here as we won't be using them and they are outside the scope of this course.

If you want the stdout to be interpretted as a `str` instead of a `bytes` object, you can turn on that setting in the `subprocess.run()` call using the "text" option

In [5]:
result = subprocess.run(["ls"], capture_output=True, text=True)

print(result)

CompletedProcess(args=['ls'], returncode=0, stdout='a_file.txt\nsubprocess.ipynb\n', stderr='')


We can now interact with the stdout of the called process just like any other Python `str` object.

In [6]:
stdout = result.stdout

print(type(stdout))

print(stdout.split())

<class 'str'>
['a_file.txt', 'subprocess.ipynb']


The same is true of stderr. In the above outputs, stderr is empty because `ls` ran successfully. However, if we run `ls` in such a way that it throws an error, the stderr attribute will contain that error message

In [7]:
result = subprocess.run(["ls", "not_a_path"], capture_output=True, text=True)

print(result)

CompletedProcess(args=['ls', 'not_a_path'], returncode=2, stdout='', stderr="ls: cannot access 'not_a_path': No such file or directory\n")


## shell

When we use subprocess as above to run command line applications, we are doing so by using subprocess to directly execute the command. If instead, we want to use our shell to run the command in order to gain access to shell-specific functionality such as shell variables, globs, and pipes, we need to tell subprocess to use shell mode. Note that using this mode is often recommended strongly against. However, if you are just using your scripts yourself, the danger is no different than if you were running commands in your terminal. Basically, don't run `rm` commands without taking great care to ensure they won't delete all of your stuff.

First, what happens if we try to pipe the output of a command to another command?

In [8]:
result = subprocess.run(["echo", "hello", "|", "sed", "'s/ell/arib/'"], capture_output=True, text=True)

print(result)

CompletedProcess(args=['echo', 'hello', '|', 'sed', "'s/ell/arib/'"], returncode=0, stdout="hello | sed 's/ell/arib/'\n", stderr='')


That stdout shows that `echo` received the entire pipeline as its input. That's not what we wanted. What about with shell mode?

In [9]:
result = subprocess.run("echo hello | sed 's/ell/arib/'", capture_output=True, text=True, shell=True)

print(result)

CompletedProcess(args="echo hello | sed 's/ell/arib/'", returncode=0, stdout='haribo\n', stderr='')


As you can see, to use shell mode we needed to make 2 changes. First, the command needed to be a single `str` rather than a `list` of `str`. Second, we needed to provide the argument "shell=True". Once we did that, it worked as expected.

## providing input to subprocess calls

Sometimes you will want to use subprocess to run a command that will depend on some variable in your Python environment. there are two ways you might want to do this: by using your variable as part of the command, or by providing that contents of your variable as input as if it were the stdin to the command. Let's take a look at how both of those would work.

In [10]:
path = "../"

result = subprocess.run(["ls", path], capture_output=True, text=True)

print(result)

CompletedProcess(args=['ls', '../'], returncode=0, stdout='10\n2\n3\n5\n6\n7\n8\n', stderr='')


As you can see, the variable's content was incorporated into the command. This approach is useful whenever you want to programatically run shell commands using data from within your Python script.

The other way you can insert data from your Python session into subprocess calls is as the stdin. This can be done using "input"

In [11]:
data = "col1\tcol2\tcol3\n"

result = subprocess.run(["cut", "-f1"], capture_output=True, text=True, input=data)

print(result)

CompletedProcess(args=['cut', '-f1'], returncode=0, stdout='col1\n', stderr='')


As you can see, we used the `cut` command to extract just the first column of our data. We didn't need to provide `cut` a filepath as input because it recieved stdin from subprocess.