<div class='bar_title'></div>

*Practical Data Science*

# Modularization and Code Outsourcing

Matthias Griebel<br>
Chair of Information Systems and Business Analytics

Winter Semester 20/21

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Using-Google-Drive-in-Colab" data-toc-modified-id="Using-Google-Drive-in-Colab-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Using Google Drive in Colab</a></span></li><li><span><a href="#Modular-programming" data-toc-modified-id="Modular-programming-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Modular programming</a></span><ul class="toc-item"><li><span><a href="#Python-Modules:-Overview" data-toc-modified-id="Python-Modules:-Overview-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Python Modules: Overview</a></span></li><li><span><a href="#Export-to-.py-files" data-toc-modified-id="Export-to-.py-files-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>Export to .py files</a></span></li><li><span><a href="#Structuring-using-subfolders" data-toc-modified-id="Structuring-using-subfolders-2.3"><span class="toc-item-num">2.3&nbsp;&nbsp;</span>Structuring using subfolders</a></span><ul class="toc-item"><li><span><a href="#The-__init__-File" data-toc-modified-id="The-__init__-File-2.3.1"><span class="toc-item-num">2.3.1&nbsp;&nbsp;</span>The __init__ File</a></span></li></ul></li><li><span><a href="#Exporting-to-Github" data-toc-modified-id="Exporting-to-Github-2.4"><span class="toc-item-num">2.4&nbsp;&nbsp;</span>Exporting to Github</a></span></li></ul></li></ul></div>

**Some useful [magic commands](https://ipython.readthedocs.io/en/stable/interactive/magics.html)**

Get current location

In [None]:
%pwd

List directories in current location

In [None]:
%ls

Change directory

In [None]:
%cd sample_data

## Using Google Drive in Colab

Connect to Google Drive

In [None]:
from google.colab import drive
drive.mount('/content/drive')
%cd /content/drive/My\ Drive/

Clone/update github repo

In [None]:
!git clone https://github.com/pds2021/course.git
%cd /content/drive/My\ Drive/course
!git pull

What about private repos?

In [None]:
%cd /content/drive/My\ Drive
!git clone https://github.com/pds2021/a4-matjesg

For private repositories, some options are mentioned on [stackoverflow](https://stackoverflow.com/questions/48350226/methods-for-using-git-with-google-colab)
- In the code below your password won't be exposed
- Remember to change `repo_owner` and `repo_name`


In [None]:
import os
from getpass import getpass
import urllib

repo_owner = 'pds2021'
repo_name = 'a4-matjesg'
user = input('User name: ')
password = getpass('Password: ')
password = urllib.parse.quote(password) # your password is converted into url format

cmd_string = f'git clone https://{user}:{password}@github.com/{repo_owner}/{repo_name}.git'

os.system(cmd_string)
cmd_string, password = "", "" # removing the password from the variable

In [None]:
%cd /content/drive/My\ Drive/a4-matjesg
!git pull

## Modular programming

___Definition___

Modular programming refers to the process of breaking a large, unwieldy programming task into separate, smaller, more manageable subtasks or modules. Individual modules can then be cobbled together like building blocks to create a larger application.

___Advantages___

There are several advantages to modularizing code in a large application:

- **Simplicity**: 
    - Focusing on one relatively small portion of the problem. 
    - Makes development easier and less error-prone.

- **Maintainability**: 
    - Enforce logical boundaries between different problem domains
    - Minimizes interdependency (modifications to a single module will not have an impact on other parts of the program)
    - More viable for a team of many programmers to work collaboratively on a large application.

- **Reusability**: 
    - Easy reuse 
    - Eliminates the need to recreate duplicate code.

- **Scoping**: 
    - separate namespaces helps avoid collisions between identifiers in different areas of a program

### Python Modules: Overview

There are actually three different ways to define a module in Python:

1. A module can be written in Python itself.
2. A module can be written in C and loaded dynamically at run-time
3. A built-in module is intrinsically contained in the interpreter, like the itertools module.

A module’s contents are accessed the same way in all three cases: with the `import` statement.

__Further Reading__

- [Python Docs](https://docs.python.org/3/tutorial/modules.html)
- Tuorials: 
  - https://www.learnpython.org/en/Modules_and_Packages
  - https://realpython.com/python-modules-packages/

__Autoreload__

``autoreload`` is an IPython extension that reloads modules
automatically before executing the line of code typed.

In [None]:
%load_ext autoreload
%autoreload 2

### Export to .py files

In [None]:
def fib(n):
    '''Write Fibonacci series up to n'''
    a, b = 0, 1
    while a < n:
        print(a, end=' ')
        a, b = b, a+b
    print()

In [None]:
fib(1000)

Write file:

In [None]:
%%writefile example.py
# Fibonacci numbers module

def fib(n):
    '''Write Fibonacci series up to n'''
    a, b = 0, 1
    while a < n:
        print(a, end=' ')
        a, b = b, a+b
    print()

Append to file:

In [None]:
%%writefile -a example.py

# Empty line at beginning
def fib2(n):   
  '''Return Fibonacci series up to n'''
  result = []
  a, b = 0, 1
  while a < n:
    result.append(a)
    a, b = b, a+b
  return result

Import and use function:

In [None]:
import example
example.fib(100)


In [None]:
x = example.fib2(1000)
x

### Structuring using subfolders

Create folder for module

In [None]:
!mkdir mymodule

Create .py file

In [None]:
%%writefile mymodule/example2.py
# Fibonacci numbers module

def fib(n):
    '''Write Fibonacci series up to n'''
    a, b = 0, 1
    while a < n:
        print(a, end=' ')
        a, b = b, a+b
    print()

Import and use module:

In [None]:
from mymodule import example
example.fib(100)

#### The \_\_init\_\_ File

For modules with subdirectories

The __init__.py files are required to make Python treat directories containing the file as packages. This prevents directories with a common name, such as string, unintentionally hiding valid modules that occur later on the module search path. In the simplest case, __init__.py can just be an empty file, but it can also execute initialization code for the package or set the __all__ variable, described later.

In [None]:
%%writefile mymodule/__init__.py
# Init file

### Exporting to Github

__Option 1__

Download .py file and upload file to project via the github web interface.

__Option 2__

Commit and push in Colab

In [None]:
!git config --global user.email "you@example.com"
!git config --global user.name "Your Name"
!git add example.py
!git commit -m "Example Commit"
!git push