### Python notebooks

* interactive
* contain code and presentation
* facilitate collaboration
* easy to write and test code
* provide quick results
* easy to display graphs

#### Start coding

In [2]:
"""
    Gene - A class for demonstration purposes.
    The class has 2 attributes:
    - symbol - text (str) - the gene symbol
    - snp_no - numeric (int) - the number of SNPs known for the gene
    
    The class allows for the update of the numeric attribute.
    - update_snp_no updates snp_no by a given additional number on SNPs
"""
class Gene:
    def __init__(self, gene_symbol = "Gene Symbol", snp_number = 0):
        self.symbol = gene_symbol
        self.snp_no = snp_number
        
    def __str__(self):
        return f"Gene object: Gene symbol = '{self.symbol}', Number of SNPs = {self.snp_no}"
        
    def __repr__(self):
        return f"Gene('{self.symbol}',{self.snp_no})"
    
    def update_snp_no(self, additional_snps = 0):
        """
        Add parameter value to snp_no.

        Keyword arguments:
        int: additional_snps - the number to add (0)
        
        Returns:
        int: updated snp_no
        """         
        old_value = self.snp_no
        try:
            self.snp_no = self.snp_no + additional_snps
        except TypeError: 
            self.snp_no = self.snp_no + 1
            print(f"'{additional_snps}' is not a numeric value, we added 1 because at least one new SNP was found.")
        finally:
            print(f"Old value was {old_value}, new value is {self.snp_no}")
        return self.snp_no

In [3]:
# Explore gene
g1 = Gene()

In [4]:
g1

Gene('Gene Symbol',0)

In [5]:
g1.symbol = "EGR"

In [6]:
g1

Gene('EGR',0)

In [7]:
g1.update_snp_no("t")

't' is not a numeric value, we added 1 because at least one new SNP was found.
Old value was 0, new value is 1


1

In [8]:
g1

Gene('EGR',1)

In [9]:
g1.update_snp_no(5)

Old value was 1, new value is 6


6

#### Do more coding

In [11]:
"""
    EnhancedGene - A class for demonstration purposes.
    The class extends the class Gene with the methods:
    - update_symbol - updates symbol
    - update_snps - updates the snp number given a list of new SNPs
    
"""
class EnhancedGene(Gene):
    
    def update_symbol(self, new_symbol = ""):
        """
        Change symbol to new_symbol

        Keyword arguments:
        str: new_symbol - the string to replace the gene symbol, should contain test ("")
        
        Returns:
        str: updated gene symbol
        """        
        old_value = self.symbol
        try:
            self.symbol = new_symbol
            index = self.symbol.index("test")
        except TypeError: 
            self.symbol = self.symbol + " " + str(new_symbol)
            print(f"'{new_symbol}' is not a string, we made the conversion and added it")
        except ValueError: 
            self.symbol = self.symbol + " test"
            print(f"'{self.symbol}' does not contain 'test', we added 'test' to it")

        finally:
            print(f"Old value was '{old_value}', new value is '{self.symbol}'")
        return self.symbol
    

    def update_snps(self, snp_list = []):
        """
        Add parameter snp_list length to snp_no.
        """
        old_snp_no = self.snp_no
        try:
            self.snp_no += len(snp_list)
        except TypeError:
            print("We did not change the SNP no, no collection of SNPs was provided!")
        else:
            print("SNP no updated!")        
        finally:
            print(f"Old value for the SNP no was {old_snp_no}, new value is {self.snp_no}.")
        return self.snp_no
        



In [12]:
#Explore 
g2 = EnhancedGene("TP53", 20)
g2

Gene('TP53',20)

In [13]:
g2.update_symbol("test EGFR")

Old value was 'TP53', new value is 'test EGFR'


'test EGFR'

In [14]:
g2

Gene('test EGFR',20)

In [15]:
g2.update_snps(5)

We did not change the SNP no, no collection of SNPs was provided!
Old value for the SNP no was 20, new value is 20.


20

In [16]:
g2.update_snps(["rs1", "rs2", "rs3"])

SNP no updated!
Old value for the SNP no was 20, new value is 23.


23

### From exploration work to production

### Python scripts

In [17]:
# All commands with ! can be run without ! in the terminal (git bash or ananconda console)
# if you use the terminal navigate to the same folder as the notebook
# Create a script file test.py
# A script is a .py file with python code
# The first line is a comment line that tells the bash interpreter that this is a python script and what to use to run it
#   #!/usr/bin/python'

# This will work on mac or linux machines for windows users create the file using the jupyter lab menu
# then add the print statement to it
#!touch test.py
#!echo '#!/usr/bin/python3' > test.py
#!echo 'print("This is a python script")' >> test.py

In [1]:
#!which python
# ! - allows you to run bash commands in the notebook
!type python

python is /opt/anaconda3/bin/python


In [2]:
#!which python

/opt/anaconda3/bin/python


In [3]:
#!which python3
!type python3

python3 is /opt/anaconda3/bin/python3


In [4]:
!python --version

Python 3.8.3


In [5]:
!python3 --version

Python 3.8.3


In [6]:
# run/execute script

!python test.py

This is a python script


In [7]:
!python3 test.py

This is a python script


In [21]:
# change permissions to add execute (x) permission for the user (u)

!chmod u+x test.py

In [10]:
try:
    number_var = 2
    if 2 in number_var:
        number_var = 4
except TypeError:
    number_var = 5
    print(number_var)
else:
    number_var = number_var + 7
    print(number_var)

5


In [11]:
number_var

5

In [12]:
number_var = 2
2 in number_var

TypeError: argument of type 'int' is not iterable

In [9]:
!./test.py

This is a python script


In [24]:
#import a module

import test

This is a python script


#### Adding a function

In [14]:
def test_function(no = 0):
    print("This is a function in a python script")
    return no + 1

In [16]:
test_function(5)

This is a function in a python script


6

In [1]:
# add a test function, restart kernel (round arrow menu button), import
# import the functionality implemented in our python script/module

import test as t

This is a python script


In [2]:
dir(t)

['__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 'test_function']

In [4]:
t.test_function(3)

This is a function in a python script


4

In [1]:
# add a test_variable and set a value, restart kernel, import
import test as t
dir(t)

This is a python script


['__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 'test_function',
 'test_variable']

In [2]:
t.test_variable

202

In [3]:
import test as t

### __main__ — Top-level script environment

'__main__' is the name of the scope in which top-level code executes. A module’s __name__ is set equal to '__main__' when read from standard input, a script, or from an interactive prompt.

A module can discover whether or not it is running in the main scope by checking its own __name__, which allows a common idiom for conditionally executing code in a module when it is run as a script or with python -m but not when it is imported.

```python
if __name__ == "__main__":
    # execute only if run as a script
    main() # function that contais the code to execute
```

https://docs.python.org/3/library/__main__.html

In [None]:
list.__name__

In [None]:
def main():
    test_variable = 10
    print(f'The test variable value is {test_variable}')

In [None]:
main()

In [1]:
# add main, the if statement, restart kernel, import
import test as t

This is a python script


In [2]:
dir(t)

['__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 'main',
 'test_function',
 'test_variable']

In [3]:
t.__name__

'test'

In [5]:
t.main()

test
This is a function in a python script
11


In [4]:
!python test.py

This is a python script
__main__
This is a function in a python script
11


#### `sys.argv`

The list of command line arguments passed to a Python script. argv[0] is the script name (it is operating system dependent whether this is a full pathname or not). <br>
If the command was executed using the -c command line option to the interpreter, argv[0] is set to the string '-c'. <br>
If no script name was passed to the Python interpreter, argv[0] is the empty string.

The Python sys module provides access to any command-line arguments using the sys.argv object. 

The sys.argv is the list of all the command-line arguments.<br>
len(sys.argv) is the total number of length of command-line arguments.

Add to the script

```python
import sys

print('Number of arguments:', len(sys.argv))
print ('Argument List:', str(sys.argv))
```

In [6]:
!python test.py

This is a python script
Number of arguments: 1
Argument List: ['test.py']


#### Give some arguments

In [7]:
!python test.py [1,2,4] message 1

This is a python script
Number of arguments: 4
Argument List: ['test.py', '[1,2,4]', 'message', '1']


In [8]:
#!which python3
!type python3

python3 is /opt/anaconda3/bin/python3


```import numpy as np```

In [9]:
"[1, 2 , 3]"

'[1, 2 , 3]'

In [10]:
[1, 2 , 3]


[1, 2, 3]

In [11]:
import numpy as np
np.array([1,2,3])

array([1, 2, 3])

In [12]:
# Making an array outof a string containing a list

import numpy as np
np.array("[1, 2 , 3]".strip('][').split(','), dtype = int)

array([1, 2, 3])

#### Argument parsing

`import getopt`
    
`opts, args = getopt.getopt(argv, 'a:b:', ['foperand', 'soperand'])`

The signature of the getopt() method looks like:

`getopt.getopt(args, shortopts, longopts=[])`

* `args` is the list of arguments taken from the command-line.
* `shortopts` is where you specify the option letters. If you supply a:, then it means that your script should be supplied with the option a followed by a value as its argument. Technically, you can use any number of options here. When you pass these options from the command-line, they must be prepended with '-'.
* `longopts` is where you can specify the extended versions of the shortopts. They must be prepended with '--'.

https://www.datacamp.com/community/tutorials/argument-parsing-in-python
https://docs.python.org/2/library/getopt.html
https://www.tutorialspoint.com/python/python_command_line_arguments.htm

In [None]:
!cp test.py test_getopt.py

Change the file test_getopt.py


```python
    import getopt
    import numpy as np
    
    try:
        # Define the getopt parameters
        opts, args = getopt.getopt(sys.argv[1:], "l:s:n:", ["list","string","number"])
        print("no of arguments:", len(opts))
        if len(opts) != 3:
            print ("Provide 3 arguments.")
            print("usage: test.py -l <list_operand> -s <string_operand> -n <number_operand>")
        else:
            print("options:", opts)
            test_array = np.array(opts[0][1].strip('][').split(','), dtype = int)
            string_text = opts[1][1]
            number_text = int(opts[2][1])
            test_array = test_array * number_text 
            print(f'\nInfo "{string_text}", the updated list is: {test_array}\n')
    except getopt.GetoptError:
        print ("usage: test.py -l <list_operand> -s <string_operand> -n <number_operand>")
```

In [13]:
!python test_getopt.py

no of arguments: 0
Provide 3 arguments.
usage: test.py -l <list_operand> -s <string_operand> -n <number_operand>


In [14]:
!python test_getopt.py -l [1,2,4] -s message

no of arguments: 2
Provide 3 arguments.
usage: test.py -l <list_operand> -s <string_operand> -n <number_operand>


In [15]:
!python test_getopt.py -l [1,2,4] -s message -n 3

no of arguments: 3
options: [('-l', '[1,2,4]'), ('-s', 'message'), ('-n', '3')]

Info "message", the updated list is: [ 3  6 12]



#### `argparse` -increased readability
`import argparse`

`class argparse.ArgumentParser(prog=None, usage=None, description=None, epilog=None, parents=[], formatter_class=argparse.HelpFormatter, prefix_chars='-', fromfile_prefix_chars=None, argument_default=None, conflict_handler='error', add_help=True, allow_abbrev=True)`<br>
https://docs.python.org/3/library/argparse.html#argumentparser-objects

Argument definition<br>
`ArgumentParser.add_argument(name or flags...[, action][, nargs][, const][, default][, type][, choices][, required][, help][, metavar][, dest])`<br>
https://docs.python.org/3/library/argparse.html#the-add-argument-method

`ap.add_argument("-i", "--ioperand", required=True, help="important operand")`

* -i - letter version of the argument
* --ioperand - extended version of the argument
* required - whether the argument or not
* help - maningful description

https://www.datacamp.com/community/tutorials/argument-parsing-in-python
https://docs.python.org/3/library/argparse.html
https://realpython.com/command-line-interfaces-python-argparse/

In [None]:
!cp test.py test_argparse.py

Change the file test_argparse.py



```python
    import argparse
    import numpy as np 


    ap = argparse.ArgumentParser()

    # Add the arguments to the parser
    ap.add_argument("-l", "--list_operand", required=True, help="list operand")
    ap.add_argument("-s", "--string_operand", required=True, help="string operand")
    ap.add_argument("-n", "--number_operand", required=True, help="number operand")

    args = vars(ap.parse_args())
    print("arguments:", args)
    test_array = np.array(args["list_operand"].strip("][").split(","), dtype = int)
    string_text = args["string_operand"]
    number_text = int(args["number_operand"])
    test_array = test_array * number_text 

    print(f"\nResult with argparse.\nInfo '{string_text}', for updated list {test_array}\n")
```


In [None]:
!python test_argparse.py -h

In [None]:
!python test_argparse.py -l [1,2,4] --string_operand message -n 5

##### `action` parameter - count example
https://docs.python.org/3/library/argparse.html#action

'count' - This counts the number of times a keyword argument occurs. For example, this is useful for increasing verbosity levels:

    `ap.add_argument("-v", "--verbose", action='count', default=0)`


In [None]:
!python test_argparse.py -l [1,2,4] --string_operand message -n 3 -vvvv

### Modules

https://docs.python.org/3/tutorial/modules.html
https://www.python.org/dev/peps/pep-0008/#package-and-module-names

If you want to write a somewhat longer program, you are better off <b>using a text editor to prepare the input for the interpreter and running it with that file as input instead. This is known as creating a script.</b> 
    
As your program gets longer, you may want to split it into several files for easier maintenance. You may also want to use a handy function that you’ve written in several programs without copying its definition into each program.

A module is a file containing Python definitions and statements. <b>The file name is the module name with the suffix .py appended</b>. Within a module, the module’s name (as a string) is available as the value of the global variable `__name__`.

In [None]:
#Let's create a module for our classes
!touch gene_module.py
# add Gene class to the file

In [16]:
import gene_module as gm

In [18]:
g1 = gm.Gene_testing()

In [19]:
g1

Gene('Gene Symbol',0)

In [20]:
g1.symbol

'Gene Symbol'

In [None]:
!touch enhanced_gene_module.py
# add EnhancedGene class to the file

In [None]:
import enhanced_gene_module as egm

In [None]:
eg1 = egm.EnhancedGene()

In [None]:
#dir(eg1)

### Packages

https://docs.python.org/3/tutorial/modules.html#packages

<b>Packages are a way of structuring</b> Python’s module namespace by using “dotted module names”. <b>For example, the module name A.B designates a submodule named B in a package named A</b>. Just like the use of modules saves the authors of different modules from having to worry about each other’s global variable names, the use of dotted module names saves the authors of multi-module packages like NumPy from having to worry about each other’s module names.

In [None]:
!mkdir demo_pkg

In [None]:
!cp test.py demo_pkg
!cp gene_module.py demo_pkg
!cp enhanced_gene_module.py demo_pkg

In [None]:
!touch demo_pkg/__init__.py

In [None]:
from demo_pkg import test as tt

In [None]:
tt.test_function(4)

In [None]:
from demo_pkg import gene_module as gmp

In [None]:
gmp.Gene()

In [None]:
dir(gmp)

In [None]:
# restart kernel

from demo_pkg import enhanced_gene_module as egmp

In [None]:
# dir()

In [None]:
# dir(egmp)

In [None]:
# restart kernel
# need to add __all__ = ["test", "gene_module", "enhanced_gene_module"] in __init__.py to see the modules


from demo_pkg import *



In [None]:
dir()

In [None]:
gene_module

In [None]:
gene_module.Gene()

### Tool example

https://docs.python-guide.org/scenarios/cli/

In [None]:
!mkdir Project_Gene

Copy/move the folder demo_pkg into Project_Gene. <br>
Create a file \_\_main\_\_.py in demo_pkg.

```python
import sys

def main(args=None):
    """The main routine."""
    if args is None:
        args = sys.argv[1:]

    print("This is the main routine.")
    print(f"It should do something interesting with the arguments: {args}.")

    # Do argument parsing here (eg. with argparse) and anything else
    # you want your project to do.

if __name__ == "__main__":
    main()
```

In [None]:
!pwd

The python interpreter has -m module option that will run a package module as a script.
It will run the __main__.py module for a package.

In the terminal run:

```
cd Project_Gene
python3 -m demo_pkg
```


then run:

```
python3 -m demo_pkg arg1 arg2 arg3
```

Example from:<br>
https://chriswarrick.com/blog/2014/09/15/python-apps-the-right-way-entry_points-and-scripts/


Create another module in demo_pkg

```python
#!/usr/bin/python3

from demo_pkg import enhanced_gene_module as egm_p
import sys

def another_method(egene):
    egene.symbol = "updated gene symbol"
    egene.update_symbol("test BRCA1")
    egene.update_snps(["rs1","rs2","rs3"])

def main():
    print("this is another script")
    print(sys.argv)
    gene1 = egm_p.EnhancedGene()
    another_method(gene1)
    print(gene1)
    
    
if __name__ == "__main__":
    main()
```

Create setup.py in Project_Gene.

`setup.py` is the build script for setuptools. 
It provides setuptools with parameters which contain information about the package (e.g. name and version).

Entry points allow building commandline tools that run funtions from the package modules.

```python
from setuptools import setup

setup(name='demo_pkg',
      version='0.1.0',
      packages=['demo_pkg'],
      entry_points={
          'console_scripts': [
              'test_run = demo_pkg.test:main', 
              'another_module_run = demo_pkg.another_module:main'              
          ]
      },
      )
```

```
# You could install something with python setup.py -- it is not recommended but things happen. 

python setup.py install --record files.txt

# This will cause all the installed files to be printed to files.txt.
# Then when you want to uninstall it simply run the following command (be careful with the 'sudo')

cat files.txt | xargs sudo rm -rf
```

#### A good way to install your package!    

In the terminal run:
    
```
pip install .
```

Then to test your console commands run:

```
test_run
another_module_run
```

Fix import issue for gene_module by adding package name: package_name.module_name.   
Add demo_pkg.gene_module in the enhanced_gene_module.py file 

In the terminal run the following command to
UNINSTALL THE PACKAGE

```
pip uninstall demo_pkg
```

Then reinstall the package:
    
```
pip install .
```

Then run the code again, no error should occur:
```
another_module_run
```


Example of a package PyVCF:
    
https://github.com/jamescasbon/PyVCF/blob/master/setup.py
    


More resources:

https://python-packaging-tutorial.readthedocs.io/en/latest/setup_py.html <br>
https://python-packaging.readthedocs.io/en/latest/command-line-scripts.html <br>
https://www.geeksforgeeks.org/command-line-scripts-python-packaging/ <br>
https://www.w3schools.com/python/python_modules.asp <br>
https://click.palletsprojects.com/en/7.x/ <br>
https://packaging.python.org/tutorials/packaging-projects/
https://www.git-tower.com/blog/command-line-cheat-sheet/
http://www.yolinux.com/TUTORIALS/unix_for_dos_users.html