<div class='bar_title'></div>

*Practical Data Science*

# Modularization and Code Outsourcing

Matthias Griebel<br>
Chair of Information Systems and Business Analytics

Winter Semester 20/21

__Credits__

- https://realpython.com/python-modules-packages/

**Some useful [magic commands](https://ipython.readthedocs.io/en/stable/interactive/magics.html)**

In [4]:
%pwd #what location 

'/content'

In [5]:
%ls #list directories in current location

[0m[01;34msample_data[0m/


In [6]:
%cd /content/drive/My\ Drive/course #change directory

[Errno 2] No such file or directory: '/content/drive/My Drive/course #change directory'
/content


## Using Google Drive in Colab

Connect to Google Drive

In [7]:
from google.colab import drive
drive.mount('/content/drive')
%cd /content/drive/My\ Drive/

Mounted at /content/drive
/content/drive/My Drive


Clone/update github repo

In [2]:
!git clone https://github.com/pds2021/course.git
%cd /content/drive/My\ Drive/course
!git pull

Cloning into 'course'...
remote: Enumerating objects: 161, done.[K
remote: Counting objects: 100% (161/161), done.[K
remote: Compressing objects: 100% (126/126), done.[K
remote: Total 161 (delta 74), reused 110 (delta 29), pack-reused 0[K
Receiving objects: 100% (161/161), 24.80 MiB | 9.78 MiB/s, done.
Resolving deltas: 100% (74/74), done.
/content/drive/My Drive/course
Already up to date.


What about private repos?

In [8]:
%cd /content/drive/My\ Drive
!git clone https://github.com/pds2021/

/content/drive/My Drive
Cloning into 'pds2021'...
remote: Not Found
fatal: repository 'https://github.com/pds2021/' not found


For private repositories, some options are mentioned on [stackoverflow](https://stackoverflow.com/questions/48350226/methods-for-using-git-with-google-colab).

In the code below your password won't be exposed


In [7]:
import os
from getpass import getpass
import urllib

repo_owner = 'pds2021'
repo_name = 'a4-matjesg'
user = input('User name: ')
password = getpass('Password: ')
password = urllib.parse.quote(password) # your password is converted into url format

cmd_string = f'git clone https://{user}:{password}@github.com/{repo_owner}/{repo_name}.git'

os.system(cmd_string)
cmd_string, password = "", "" # removing the password from the variable

User name: matjesg
Password: ··········


In [9]:
%cd /content/drive/My\ Drive/a4-matjesg
!git pull

/content/drive/My Drive/a4-matjesg
Already up to date.


## Modular programming

___Definition___

Modular programming refers to the process of breaking a large, unwieldy programming task into separate, smaller, more manageable subtasks or modules. Individual modules can then be cobbled together like building blocks to create a larger application.

___Advantages___

There are several advantages to modularizing code in a large application:

- **Simplicity**: Rather than focusing on the entire problem at hand, a module typically focuses on one relatively small portion of the problem. If you’re working on a single module, you’ll have a smaller problem domain to wrap your head around. This makes development easier and less error-prone.

- **Maintainability**: Modules are typically designed so that they enforce logical boundaries between different problem domains. If modules are written in a way that minimizes interdependency, there is decreased likelihood that modifications to a single module will have an impact on other parts of the program. (You may even be able to make changes to a module without having any knowledge of the application outside that module.) This makes it more viable for a team of many programmers to work collaboratively on a large application.

- **Reusability**: Functionality defined in a single module can be easily reused (through an appropriately defined interface) by other parts of the application. This eliminates the need to recreate duplicate code.

- **Scoping**: Modules typically define a separate namespace, which helps avoid collisions between identifiers in different areas of a program. (One of the tenets in the [Zen of Python](https://www.python.org/dev/peps/pep-0020/) is "Namespaces are one honking great idea—let’s do more of those!")

#### Python Modules: Overview

There are actually three different ways to define a module in Python:

1. A module can be written in Python itself.
2. A module can be written in C and loaded dynamically at run-time
3. A built-in module is intrinsically contained in the interpreter, like the itertools module.

A module’s contents are accessed the same way in all three cases: with the `import` statement.

Here, the focus will mostly be on modules that are written in Python. The cool thing about modules written in Python is that they are exceedingly straightforward to build. All you need to do is create a file that contains legitimate Python code and then give the file a name with a .py extension. That’s it! No special syntax or voodoo is necessary.

#### Further Reading

- [Python Docs](https://docs.python.org/3/tutorial/modules.html)
- Tuorials: 
  - https://www.learnpython.org/en/Modules_and_Packages
  - https://realpython.com/python-modules-packages/

#### Autoreload

``autoreload`` is an IPython extension that reloads modules
automatically before executing the line of code typed.

In [12]:
%load_ext autoreload
%autoreload 2

### Export to .py files

In [13]:
def fib(n):
    '''Write Fibonacci series up to n'''
    a, b = 0, 1
    while a < n:
        print(a, end=' ')
        a, b = b, a+b
    print()

In [14]:
fib(1000)

0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 


Write file:

In [10]:
%%writefile example.py
# Fibonacci numbers module

def fib(n):
    '''Write Fibonacci series up to n'''
    a, b = 0, 1
    while a < n:
        print(a, end=' ')
        a, b = b, a+b
    print()

Overwriting example.py


Append to file:

In [16]:
%%writefile -a example.py

# Empty line at beginning
def fib2(n):   
  '''Return Fibonacci series up to n'''
  result = []
  a, b = 0, 1
  while a < n:
    result.append(a)
    a, b = b, a+b
  return result

Appending to example.py


Import and use function:

In [17]:
import example
example.fib(100)


0 1 1 2 3 5 8 13 21 34 55 89 


In [18]:
x = example.fib2(1000)
x

[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987]

### Structuring using subfolders

Create folder for module

In [21]:
!mkdir mymodule

Create .py file

In [28]:
%%writefile mymodule/example2.py
# Fibonacci numbers module

def fib(n):
    '''Write Fibonacci series up to n'''
    a, b = 0, 1
    while a < n:
        print(a, end=' ')
        a, b = b, a+b
    print()

Writing mymodule/example2.py


Import and use module:

In [26]:
from mymodule import example
example.fib(100)

0 1 1 2 3 5 8 13 21 34 55 89 


#### The \_\_init\_\_ File

For modules with subdirectories

The __init__.py files are required to make Python treat directories containing the file as packages. This prevents directories with a common name, such as string, unintentionally hiding valid modules that occur later on the module search path. In the simplest case, __init__.py can just be an empty file, but it can also execute initialization code for the package or set the __all__ variable, described later.

In [None]:
%%writefile mymodule/__init__.py
# Init file

### Exporting to Github

__Option 1__

Download .py file and upload file to project via the github web interface.

__Option 2__

Commit and push in Colab

In [11]:
#@title
!git config --global user.email "you@example.com"
!git config --global user.name "Your Name"
!git add example.py
!git commit -m "Example Commit"
!git push

On branch master
Your branch is up to date with 'origin/master'.

Untracked files:
	[31mmymodule/[m

nothing added to commit but untracked files present
Everything up-to-date
