<div style="color:red;background-color:black">
Diamond Light Source
<br style="color:red;background-color:antiquewhite"><h1>Python Language: Modules</h1>  

©2000-20 Chris Seddon 
</div>

## 1
In this tutorial we will be investigating importing a library into our Python program.  There are a huge number of libraries available, but here will create our own simple library to illustrate concepts.  

Note that although Jupyter notebook is a fantastic product, it does have some limitations.  Unfortunately, not all Python code produces the correct output in notebook.  Code will work fine on the command line or in Eclipse or PyCharm, but some output will be lost in notebook.  So, in this tutorial we will have to perform some workarounds.  I'll let you know when we do this and hopefully this won't obscure what we are trying to achieve. 

Normally libraries will reside in a different directory from our code, so we will create a new directory for the library:

In [None]:
%%bash
mkdir -p resources/libs

## 2
To begin with, we will create a library and store it in the "libs" directory.  Libraries consisting of a single file are called "modules", multi-file libraries are called "packages".  Both modules and packages work in similar ways.  But, to keep things we'll just work with a module.  

We could write classes and functions in our module, but to keep things simple we will just define 4 one-line functions.

In [None]:
def f1():
    print("f1")
def f2():
    print("f2")
def f3():
    print("f3")
def f4():
    print("f4")

## 3 
Sometimes when people write libraries they include some test code in the library.  This can cause problems for users of the library.  To see what can happen, we will add some code to the library to simulate testing the library code.  Rather than writing real testing code we will just call each of the functions once.  That will be enough to produce the desired effect.  

In order to use this module we'll need to write it to disk. We'll use the %%writefile magic command to do the writing.

In [None]:
%%writefile resources/libs/mylib.py
def f1():
    print("f1")
def f2():
    print("f2")
def f3():
    print("f3")
def f4():
    print("f4")

# testing code
f1()
f2()
f3()
f4()

## 4
Now will import the library code into our own code and try to call the 4 functions.  Normally to call library function you need to prepend the function name with the module name as in:
<pre>
mylib.f1()
</pre>

In [None]:
import mylib

mylib.f1()
mylib.f2()
mylib.f3()
mylib.f4()

## 5
Well this hasn't worked; Python can't find the library.  

This is to be expected because we intentially placed the library in a different directory to our code.  Python locates libraries using a similar mechanism to how "bash" locates commands in Linux.  Python uses the environment variable PYTHONPATH to locate Python libraries ("bash" uses the PATH environment variable to locate commands; PATH contains a set of directories, each separated by a colon, to be searched).  

The PYTHONPATH is normally set up using <pre>module load python</pre>but can be modified on the command line or preferably inside our code.  

To modify the PYTHONPATH inside our code we can write <pre>import sys
sys.path.insert(0, "resources/libs")</pre>
"sys.path" represents the PYTHONPATH in program code.  

"insert" prepends the directory "libs" to the PYTHONPATH.  Changing the PYTHONPATH inside our code only changes the PYTHONPATH for this program.  Other programs will be unaffected.  

Let's try again:

Sometimes Jupyter notebook caches information and changes don't work.  To make sure we don't run into problems clear the cache by<br/> 
<font color=red>
restarting the kernel before executing the next cell.
</font>

In [None]:
import sys
sys.path.append("resources/libs")
import mylib
mylib.f1()
mylib.f2()
mylib.f3()
mylib.f4()

## 6
The above code produces the correct output.  

We can see from the above that import executes all the code in the imported module before continuing with our code.  Notice how the functions are all called twice: once because of the tests inside the module and once because we called the functions from our code.  

Question is: how do we stop the tests running when we import the module, but still let the tests execute when the module is run on its own.  

The answer to this question revolves around the module name.  Every Python file has a module name that is its filename minus the "py" extension.  So with our module stored in "mylib.py", its module name is:<pre>mylib</pre>

The one exception is that the module name is always replaced by "\__main__" for the first file to be executed.  Thus if we only run "mylib.py", its module name will be "\__main__".  But if we import the module it will not be the first file to run and its normal module name ("mylib") will be used.  
Python stores the module name in the special global variable:
<pre>__name__</pre>

We can make use of this in the imported module:
<pre>if __name__ == "__main__":
    ...
</pre>
This will be true if the module is executed on its own, but false if we run it by importing the module.  Here is the new module:

In [None]:
%%writefile resources/libs/mylib.py
def f1():
    print("f1")
def f2():
    print("f2")
def f3():
    print("f3")
def f4():
    print("f4")

# testing code
if __name__ == "__main__":
    f1()
    f2()
    f3()
    f4()

## 7
Now the module will work correctly (see below).  It won't output anything when imported, but will run the tests when run in isolation.  Since the module doesn't produce any output when imported, we can remove the magic command.  The following code will now produce the same output as when run elsewhere.  

You'll often see this if statment in library modules.  You can use it in your own code if you think your code might be made into a module some time in the future.

In [None]:
import sys
sys.path.insert(0, "resources/libs")

import mylib
mylib.f1()
mylib.f2()
mylib.f3()
mylib.f4()

## 8
The following diagram shows what is happening in memory.  Each file has a symbol table that is a dictionary of all the global symbols in the file.  The "mylib" module has 4 global symbols: each of the function pointers.  Our code has only one global symbol (in the import statement): "mylib"

<img src="resources/Slide1.jpg"/>

f.o. stands for function object.

## 9
It's common to abbreviate module names when they are imported.  In our case, the module name is quite short in any case by other modules have longer names.  To use an alias we use:
<pre>import mylib as m</pre>
As another example, when using "matplotlib" it is conventional to use:
<pre>import matplotli.pyplot as plt</pre>

Our code changes to:

In [None]:
import sys
sys.path.insert(0, "resources/libs")

import mylib as m
m.f1()
m.f2()
m.f3()
m.f4()

## 10
The alias modifies our diagram slightly.  Our symbol table now contains "m" rather than "mylib":

<img src="resources/Slide2.jpg"/>

## 11
An alternative to importing directly is to use the "from" construct.  "from" will import symbols from a module into our symbol table.
<pre>from mylib import *</pre>
imports all the symbols from the module into our symbol table.  When using "from" it's as if we have defined the functions locally and we do not supply a module name in the calls.

In [None]:
import sys
sys.path.insert(0, "resources/libs")

from mylib import *
f1()
f2()
f3()
f4()

## 12
The memory diagram now looks like:

<img src="resources/Slide3.jpg"/>

## 13
Note that using * imports the entire symbol table from "mylib" and this is regarded as bad practice because you are importing an unknown set of symbols.  Its much better to be specific:

In [None]:
import sys
sys.path.insert(0, "libs")

from mylib import f1, f2, f3, f4
f1()
f2()
f3()
f4()

## 14
Sometime there is a name clash between remote and local functions.  In that case the latest definition wins.  So, for example, if we define a local "f1" in code below the "from" import then the local definition will appear in our symbol table and hence the local "f1" will be called:

In [None]:
import sys
sys.path.insert(0, "resources/libs")

from mylib import f1, f2, f3, f4

def f1():
    print("local f1")
f1()
f2()
f3()
f4()

## 15
If the "from" statement comes after the local definition, it effectively masks the local function and the remote function will be called:

In [None]:
import sys
sys.path.insert(0, "resources/libs")

def f1():
    print("local f1")

from mylib import f1, f2, f3, f4
f1()
f2()
f3()
f4()

## 16
Note you can use aliases with "from" as we could with the "import" statemet.  In the example below the remote "f1" function is imported into our local symbol table as "ff1" and now both the local and remote "f1" calls can be made:

In [None]:
import sys
sys.path.insert(0, "resources/libs")

def f1():
    print("local f1")

from mylib import f2, f3, f4
from mylib import f1 as ff1
f1()
ff1()
f2()
f3()
f4()

## 17
Using function aliases is probably a bad idea - its confusing,  But, class aliases seem to work very well in practice, whether they are used with "import" or "from".  

So which is best, "import" or "from".  I personally prefer "import" because if you encounter a problem in a library method after importing several modules, you can tell immediately which library is causing the problem.  If, on the other hand, you import everthing into your local symbol table you have no idea which library is causing the problem.  Furthermore, the is a possibility that one library will overwrite the symbols of a library you previously imported - no such problems exist when using "import"

As a final note, you can use both mechanisms at once.  I don't recommend it, but the code will look like:


In [None]:
import sys
sys.path.insert(0, "libs")

import mylib as m

def f1():
    print("local f1")

from mylib import f2, f3, f4
from mylib import f1 as ff1
f1()
ff1()
f2()
f3()
f4()
m.f1()
m.f2()
m.f3()
m.f4()