# Python and command line

Accessing command line through Python.


## Command line and bash

- A command-line interface (shell / terminal) processes commands to a computer program in the form of lines of text. 

- The program which handles the interface is called a command-line interpreter. 

We are using *bash* programming language in Unix shell (Linux / Mac OS terminal).
P.S. bash, like Python, is an interpreted programming language, i.e directly executed using interpreter without a need of compiling a program. 

- Windows and Unix command-line interpreters are different. That's why some bash commands doesn't work in Windows cmd.


You may access terminal from Jupyter: 

> Menu -> New -> Terminal 

We already know that command line commands can be executed in Jupyter Notebook via `!`. It's Jupyter feature.

Similarly, we can access the shell of Google colab:

In [None]:
# print working directory
!pwd

In [None]:
# ls lists all files in the directory
!ls blala

In [None]:
# username
!whoami

In [None]:
# echo is bash-analog of print
!echo "Preved!"

In [None]:
# get the location of the program
!which python

In [None]:
# pip is the package installer for Python
!pip install pandas
!conda install -c anaconda pandas

In [None]:
# download file from web
!wget  https://files.rcsb.org/download/4OGS.pdb

In [None]:
# download seminar files
!wget https://github.com/litvinanna/intro_to_prog/raw/main/command_line/uniprot.zip

In [None]:
# uncompress archive
!unzip uniprot.zip

In [None]:
# view first 5 lines
!head uniprot/P02144.txt

However, to incorporate the commands into loops or pass arguments it isconvenient to use `subprocess` or `os`.

## Subprocess and os

In [None]:
import os, subprocess

You can execute terminal commands using `os.system`. It runs the command and returns the exit status of the process. 0 means Success.

In [None]:
# l - long output, h - human-readable
!ls -lh uniprot/

In [None]:
os.system('ls -lh uniprot/')

In [None]:
# copy file
os.system('cp uniprot/P02144.txt P02144.txt')

In [None]:
!ls -l not_existing directory

In [None]:
os.system('ls -l not_existing directory')

`subprocess.call` takes the list consisting of the command and its arguments.

It executes the command silently and doesn't print the output. It returns _returncode_ attribute ('0' is for "everything ok"), which is not an output.

In [None]:
subprocess.call(["ls", "-l"]) 

In [None]:
# this will return an error because there is no such directory
subprocess.call(["ls", "-l", "no_dir_with_this_name"]) 

In [None]:
# touch - update last modified time if file exists, if not - create it
filename = 'file.txt'
subprocess.call(["touch", filename]) 

To pass the whole command as a string, not a list, use `shell=True`:

In [None]:
subprocess.call("touch file2.txt", shell=True) 

In [None]:
command = 'echo'
arg = '"Meow ^_^"'
out = 'file.txt'

cmd = ' '.join([command, arg, '>', out])
print(cmd)

subprocess.call(cmd, shell=True) 

In [None]:
# see file contents
!cat file.txt

In [None]:
pdb = '7B2D.pdb'
term = '"REMARK 500"'
out = 'file2.txt'

# separate commands by ;
# grep is used to search for words and patterns in text files
# '>' redirects the output to a file
cmd = ' '.join(['wget', 'https://files.rcsb.org/download/'+pdb, ';', 
                'grep', term, pdb, '>', out])
print(cmd)

subprocess.call(cmd, shell=True) 

In [None]:
!cat file2.txt

To check and rise the error when necessary, use `check_call`:

In [None]:
subprocess.check_call(["ls", "-l", "blabla"])

#### Exercise 

Write a function that takes the Uniprot accession number and downloads the Uniprot file using `wget` if it doesn't exist in the directory `unipot/`, and prints "File exists." if the file exists. 

Link pattern for Uniprot file download: https://rest.uniprot.org/uniprotkb/P05067.txt

The file should be downloaded to `uniprot/` directory. To specify directory, use -P flag of wget: `wget url-to-file -P directoy-name`.

How the function should work:

```Python
getUniprot('P02144') 
File exists.

getUniprot('P42212')
Downloaded P42212. 
```





In [None]:
# YOUR CODE HERE


### Save output of executed command

In [None]:
ls = os.popen('ls -lh').read()
print(ls)

In [None]:
# produces bytes-literal
subprocess.check_output(["echo", "Hello World!"])

As you see, it produces bytes-literal. To decode it use `.decode('utf-8')`:

In [None]:
first_line = subprocess.check_output("head -1 file2.txt", shell=True)
first_line.decode('utf-8')

In [None]:
grep = ' '.join(["grep", "'REMARK 465'", "7B2D.pdb"])
out = subprocess.check_output(grep, shell=True)
print(out.decode('utf-8'))

The same might be done using `.Popen().communicate()`. The difference from `.check_output()` is in process initialization, see details, if interested: https://docs.python.org/3/library/subprocess.html#subprocess.Popen

In [None]:
subprocess.Popen(grep, shell=True, stdout=subprocess.PIPE)\
          .communicate()[0].decode('utf-8')

In [None]:
!ls

#### Exercise 

Write a function that takes the PDB ID of the structure file and returns the method of experimental structure determination. 

The experimental method is stored in the line that starts from "EXPDTA". There is only one line such line in the file.

Use `grep` in your function.

How the function should work:

```Python
getExpMethod('7B2D') 
X-RAY DIFFRACTION
```

In [None]:
# YOUR CODE HERE



#### Exercise

1. Install [STRIDE](http://webclu.bio.wzw.tum.de/stride/), the program for protein secondary structure assignment:

```bash
wget http://webclu.bio.wzw.tum.de/stride/stride.tar.gz # download source code
mkdir stride # make directory stride
tar -xzf stride.tar.gz -C stride # extract archive to the directory stride
cd stride; make # go to folder stride and complile the program using make
```

After installation the program will be available by `./stride/stride`.

2. Write a function that takes the name of the protein structure file and the program mode (possible modes are 'seq' or 'ss'). If the mode is 'seq' (sequence), it runs stride with the flag -q and saves the output with the name of the input file and extension '.fasta'. If the mode is 'ss' (secondary structure), it runs the stride with the flag -o and saves the output with the name of the input file and extension '.ss'. 

How the function should work:

```bash
# will produce file 4OGS.fasta
runStride('4OGS.pdb', 'seq')

# will produce file 4OGS.ss
runStride('4OGS.pdb', 'ss')
```

In [None]:
# check that stride works
!./stride/stride 

In [None]:
# try different flags
!./stride/stride -q 4OGS.pdb

In [None]:
# YOUR CODE HERE



## * Jupyter from cluster

**1. Log in to cluster.**

`ssh username@cluster_ip` \
`ssh m.pak@10.30.194.138`

**2. Run Jupyter.**

`jupyter notebook --no-browser --port=9999`

Since there is no browser on cluster we set `--no-browser`. In the `--port` you may specify any number XXXX. By default it is 8888 (when you launch Jupyter on your computer, it starts in the browser window with the adress localhost:8888). If the port is occupied (someone else launched his Jupyter in this port) you will be assigned another port: 

<img src='https://github.com/litvinanna/intro_to_prog/raw/main/command_line/1.png'>

**3. Redirect the port to your browser.**

Open the terminal/command line on your computer (not cluster!) and run the following command:

`ssh -L localhost:[port for your browser]:localhost:[port you requested on cluster] username@10.30.194.138`

`-L` stands for local. If you don't have any opened session of Jupyter, you may use port 8888 for convenience. If it is occupied, use any other YYYY. 

Example:

`ssh -L localhost:8888:localhost:9999 m.pak@10.30.194.138`

If you are asked "Are you sure you want to continue connecting (yes/no)?" type yes and press Enter. Then enter your cluster password.

**4. Open localhost:YYYY in your browser.**

You will be asked for a token. Copy the token from terminal and paste into the browser field. Jupyter Notebook on cluster will be launched in your browser.

<img src='https://github.com/litvinanna/intro_to_prog/raw/main/command_line/3.png' width='400px'>
<img src='https://github.com/litvinanna/intro_to_prog/raw/main/command_line/4.png'>

<hr>

To see the list of launched notebooks:

`jupyter notebook list`

To shut the notebook use Ctrl + C or the command `jupyter notebook stop <port number>`

<hr>

You may launch Jupyter in the screen to be able to leave the session in the termial and still use Jupyter from cluster in your browser. `-S` flag sets the name of your screen.

`screen -S jp`

To leave the screen and get back to the main session press `Ctrl + A` and the `D`. To see the list of your screens use `screen -ls`. To enter the screen type `screen -r jp`.