# 01 - What is a Module

Modules are instances of the `module` type, just as classes are instances of the `class` type. When we create a function (which is type func), how does Python know where to look for it? It uses the **namespace** which is a dictionary of all the labels and what objects they point to in memory. We can look at the dictionary using `globals()`. 

In [9]:
def func():
    a = 10
    return a

f = globals()['func'] # 'globals()' is a dictionary. We look for the key called 'func' and we return the label which is the memory address. We then set another
                      # reference to that label called 'f'.
f is func

True

In [10]:
globals()['func']() # Here we just call the function

10

We can also create a function and call `locals()` to look at the local namespace. If we call `locals()` within the global scope, it will be equivalent to calling `globals()`.

In [11]:
def func():
    a = 10
    b = 20
    print(locals())

func()

{'a': 10, 'b': 20}


We have built-in modules (such as `math`) which are always written in the c language. And we have the standard library which may be written in c or python. Module names are no more than variables, and these labels are all stored in the namespace. 

In [12]:
import math

junk = math
junk.sqrt(4)

2.0

Once we import a module, it gets stored in the system cache and gets a memory address. It now behaves like a singleton object. Its memory address won't change if we reimport it. We can view this cache by importing `sys` and looking at `modules`. This is just a plain old dictionary.

In [13]:
import sys

sys.modules # This is a dictionary

sys.modules['math']

<module 'math' (built-in)>

Remember that these modules are just objects with attributes. One way to look at all the attributes is by using `.__dict__` 

In [14]:
math.__dict__ # It's a dictionary so we can get to the values by their key. This allows us to have another handle to a function 

{'__name__': 'math',
 '__doc__': 'This module provides access to the mathematical functions\ndefined by the C standard.',
 '__package__': '',
 '__loader__': _frozen_importlib.BuiltinImporter,
 '__spec__': ModuleSpec(name='math', loader=<class '_frozen_importlib.BuiltinImporter'>, origin='built-in'),
 'acos': <function math.acos(x, /)>,
 'acosh': <function math.acosh(x, /)>,
 'asin': <function math.asin(x, /)>,
 'asinh': <function math.asinh(x, /)>,
 'atan': <function math.atan(x, /)>,
 'atan2': <function math.atan2(y, x, /)>,
 'atanh': <function math.atanh(x, /)>,
 'ceil': <function math.ceil(x, /)>,
 'copysign': <function math.copysign(x, y, /)>,
 'cos': <function math.cos(x, /)>,
 'cosh': <function math.cosh(x, /)>,
 'degrees': <function math.degrees(x, /)>,
 'dist': <function math.dist(p, q, /)>,
 'erf': <function math.erf(x, /)>,
 'erfc': <function math.erfc(x, /)>,
 'exp': <function math.exp(x, /)>,
 'expm1': <function math.expm1(x, /)>,
 'fabs': <function math.fabs(x, /)>,
 'fact

Another way of doing `math.__dict__['sqrt']` is using `getattr(<object>, <object's attribute>)` So, `getattr(a, b)` is equivalent to `a.b`.

In [15]:
my_sqrt = math.__dict__['sqrt'] # this line and below are equvivalent
my_sqrt = getattr(math, 'sqrt')

print(my_sqrt)
my_sqrt(4)

<built-in function sqrt>


2.0

These modules are of type `ModuleType`.

In [16]:
import types
isinstance(math, types.ModuleType)

True

We can use this to also create modules because `ModuleType` is a constructor -> `ModuleType(<name of module>, <docstring of module>)`. We can then check all our attributes of our modules just like we checked all the attributes of `math` above.

In [17]:
our_mod = types.ModuleType('test', 'This is a test module.')

our_mod.__dict__

{'__name__': 'test',
 '__doc__': 'This is a test module.',
 '__package__': None,
 '__loader__': None,
 '__spec__': None}

We can add in functionality to this in the conventional way. Let's add an attribute (property) and a method. But we still can't use stuff like `from our_mod import hello`.

In [18]:
our_mod.pi = 3.14159
our_mod.return_hello = lambda: 'Hello!'

print(our_mod.pi)
print(our_mod.return_hello())

3.14159
Hello!


# 02 - How Python Imports Modules

When we run a statement such as 

`import fractions`

what is Python actually doing?

The first thing to note is that Python is doing the import at **run time**, i.e. while your code is actually running.\
This is different from traditional compiled languages such as C where modules are compiled and linked at compile time.\
In both cases though, the system needs to know **where** those code files exist.\
Python uses a relatively complex system of how to find and load modules. I'm not going to even attempt to describe this in detail, but we'll take a brief look at the main points.

The `sys` module has a few properties that define where Python is going to look for modules (either built-in or standard library as well as our own or 3rd party).

Where is Python installed?

In [19]:
sys.prefix # Would normally print out something like 'C:\\Users\\<my_name>\\Anaconda3\\envs\\<my_file>

'/usr'

Where does Python look for imports? Using it's path:

In [20]:
sys.path

['/home/nasiq/repos/python-deepdive/Part 1/Section 09 - Modules, Packages and Namespaces',
 '/usr/lib/python310.zip',
 '/usr/lib/python3.10',
 '/usr/lib/python3.10/lib-dynload',
 '',
 '/home/nasiq/.local/lib/python3.10/site-packages',
 '/usr/local/lib/python3.10/dist-packages',
 '/usr/lib/python3/dist-packages',
 '/usr/lib/python3/dist-packages/IPython/extensions',
 '/home/nasiq/.ipython',
 '/home/nasiq/repos/python-deepdive/Part 1/Section 09 - Modules, Packages and Namespaces/08 - structuring_package_imports']

Basically when we import a module, Python will search for the module in the paths contained in `sys.path`.\
If it does not find the module in one of those paths, the import will fail.\
So if you ever run into a problem where Python is not able to import a module or package, you should check this first to make sure the path to your module/package is in that list.

At a high level, this is how Python imports a module from file:
    
* checks the `sys.modules` cache to see if the module has already been imported - if so it simply uses the reference in there, otherwise:
* creates a new module object (`types.ModuleType`)
* loads the source code from file
* adds an entry to `sys.modules` with the name as a key and the newly created module object as the value
* compiles **and executes** the source code

One thing that's really to important to note is that when a module is imported, the module code is **executed**.

## Example 1

Let's create the following setup in Pycharm. `example1` is a directory with 2 files within it.

<img src=s9-images/9.1.png width=300/>

These files are within the jupyter notebook found here -> [main.py](../Section%2009%20-%20Modules,%20Packages%20and%20Namespaces/02%20-%20How%20Python%20Imports%20Modules/example1/main.py) and [module1.py](../Section%2009%20-%20Modules,%20Packages%20and%20Namespaces/02%20-%20How%20Python%20Imports%20Modules/example1/module1.py)

Looking at `module1.py`, we see a basic function that pretty prints out module1's namespace using `globals()`.

<img src=s9-images/9.2.png width=400 />

To run this file within your jupyter, open the `module1.py` link by clicking [here](02%20-%20How%20Python%20Imports%20Modules/example1/module1.py). 

Then, right-click > Create Console for Editor.

In the console type: 

`%run module1.py`

Alternatively, we can run it right in this jupyter notebook using the run file path command below. Make sure to add \ before every space.

`%run ./02\ -\ How\ Python\ Imports\ Modules/example1/module1.py`

but I'm not going to do that because the output is verbose. I'll just use his:

<img src=s9-images/9.3.png width=500 />

Now, within [main.py](./02%20-%20How%20Python%20Imports%20Modules/example1/main.py), we'll print 'running main.py' and import `module1`.

<img src=s9-images/9.4.png width=400 />

Look how it **executes** module1's code:

<img src=s9-images/9.5.png width=600 /> 
The pretty print function is all the way at the bottom, but ive cut the screenshot off because it was too far down.

Now that we've imported module1, we can access its `pprint_dict()` function using `module1.pprint_dict('main.globals', globals())`

Again, how did Python know where to find module1? It looked for it in the systems path (use `sys.path`) and it found it in the 'example1' directory. (It didn't find it in the standard library ('lib') or anywhere else.)

If we have multiple `import module1` statements in our code, it will **only execute once** because when it saw the line `import module1` for the 2nd time, it went looking for it and found it in the system cache. Since it found it there, it did not run it. If it didn't find it there, then it would create and execute it.

## Example 2 

In this example, we can see that when we `import` a module, Python first looks for it in `sys.modules`.

To make the point, we put a key/value pair in `sys.modules` ourselves, and then import it.

In fact we put a function in there instead of a module, and import that.

Please **DO NOT** this, I'm just making the point that `import` will first look in the cache and immediately just return the object if the name is found, basically just as if we had written:

`
module = sys.modules['module']
`

In [21]:
sys.modules['test'] = lambda: 'Testing module caching'

# sys.modules['math'] # Remember this line from above.

import test # looking for 'test' in sys.modules

print(test) # It found it.
print(test())

<function <lambda> at 0x7f37a878b910>
Testing module caching


## Example 3a 

In this example we look at a simplified view of how Python imports a module.\
We use two built-in functions, `compile` and `exec`.\
The `compile` function compiles source (e.g. text) into a code object (bytecode).
The `exec` function is used to execute a code object. Optionally we can specify what dictionary should be used to store global symbols i.e. the namespace.\
In our case we are going to want to use our module's `__dict__` as our namespace because that's what we should use.


These files are within the jupyter notebook found here -> [main.py](../Section%2009%20-%20Modules,%20Packages%20and%20Namespaces/02%20-%20How%20Python%20Imports%20Modules/example3a/main.py) and [module1_source.py](../Section%2009%20-%20Modules,%20Packages%20and%20Namespaces/02%20-%20How%20Python%20Imports%20Modules/example3a/module1_source.py)



This is all module1 does:
    
<img src=s9-images/9.6.png width=300 />

Our goal: **We want to import module1 manually**

For main.py, open it up and the comments should explain what's going on.\
To summarise:
1. we're going to need its path to locate it on disk.
2. We go to that location, open it and read all the code into a variable called `source_code`.
3. We create a module and add some metadata such as `__file__`.
4. We set up a reference to our created module object in `sys.modules` so Python can find it.
5. We create a `code` object using the `compile` keyword -> `code = compile(source_code, mode='exec')`. Don't worry about `mode='exec'`; its not actually executing.
6. We execute the compiled sourcecode
7. Since this code may contain variables like `a=10` or functions such as `def pprint()`, we need to tell Python where to put these variables. We tell Python to put it in `mod.__dict__`, just like we had `math.__dict__`.

Remember that a module is nothing more than a namespace of all the things it has within it. So, within the final point, `exec(code, mod.__dict__)` is simply creating all those functions and storing their references into a dictionary.

Everything is done now. All those steps together are identical to running `import module1` (technically `module1_source`).

## Example 3b 

This is essentially the same as example 3a, except we make our importer into a function and use it to show how we technically should look for a cached version of the module first.

Also, another key takeaway of this example is that the system cache's existence is broader than just the current file we're in. Up till now, we've worked from main.py and we saw that any import statements would look in the system cache. This system cache is not exclusive to main.py. In fact, if we create a brand new python file, that file too will have access to this system cache. So, theoretically, we could use our own manual importer in main.py to create a reference to a module. Then, we can use Python's inbuilt importer in a completely different file to look up that reference in the system cache. It will find the module because all files will have access to this cache.

# 03 - Imports and importlib

In the last video we saw how we could, in a simplistic manner, mimic Python's import.

There is absolutely no need to do this since Python itself provides that functionality, both as a built-in function (`import`) and in the standard library module `importlib`.

In fact, if you want to see how imports are done in pure Python code you can always look at the source code for that library (you should now know where to find that on your local machine - you have to first identify a Python environment (`sys.exec_prefix`) and then look in the `lib` folder.

`importlib` is not actually a pure module (it's still a module type object) - it's actually a package - more on that later. We can then use the `import_module` function to load a module.

In [1]:
import importlib

module_name = 'math'

math2 = importlib.import_module(module_name)

Remember what `import` does. It searches for the module, compiles it, executes it and adds it to the `sys.modules` dictionary. So, we should find it there.

In [2]:
import sys

print(sys.modules['math'])
print('math' in sys.modules)

<module 'math' (built-in)>
True


We get True only because we imported it using importlib. Fractions hasn't been imported so it won't be there.

In [3]:
'fractions' in sys.modules

False

But we can't call `math.sqrt(4)` yet. That's because we don't have a handle to it. In other words, 'math' is not in the `globals()` namespace.

In [4]:
'math' in globals()

False

The two ways to get a handle are:

In [5]:
my_math = sys.modules['math']
my_math = importlib.import_module('math')

In Python there are a number of files that are "code" files, such as

* `.py`: basic text file containing Python code
* `.pyc`: compiled Python code (bytecode)
* `.so`, `.pyd`: think DLL's (Linux / Windows)

amongst others. Furthermore, Python can reach inside `zip` archives for code (as well as other packaged distribution files such as those used by Egg or Wheel).

Conceptually Python divides the work between **finders** and **loaders**.

**finders + loaders == importer**

The **finders** are responsible for finding the module/package and returning the module spec. In crude terms, when we type `import math`, Python goes to each finder and asks if they know of a module called `math` until it reaches the relevant finder. This finder provides information of which loader should be used.

The **loaders**, are responsible for "loading" the source code that is then used in the final steps to compile, execute and cache the module object. An object that implements both is called an **importer** - but they are still two separate concepts.

In [6]:
fractions = importlib.import_module('fractions')
fractions # --> <module 'fractions' from '/usr/lib/python3.10/fractions.py'>
fractions.__spec__ # This is what's returned from the finder. As you can see, it tells us the loader to use and where this fractions module is on disk.

ModuleSpec(name='fractions', loader=<_frozen_importlib_external.SourceFileLoader object at 0x7fef96797fd0>, origin='/usr/lib/python3.10/fractions.py')

Python provides a number of standard finders and importers, such as:

* built-in modules --> Its finder is called `_frozen_importlib.BuiltinImporter`
* frozen modules --> Its finder is called `_frozen_importlib.FrozenImporter`
* import path finder (finds source code files on the import path - for example the `sys.path` entries we have seen before) --> Its finder is called `_frozen_importlib_external.PathFinder`; this is the one used for modules that we create.

To reiterate, Python goes through all finders until it reaches the relevant one.

What's interesting about the import path finder and loader is that they can search (and load from) zip archives. 

In fact it can even be extended to search other resources, including url's, databases, etc. You could theoretically store code in a Mongo or Redis database and import directly from there!

We can create our own finders and loaders which can find and load our modules from a database or a RestAPI call etc. So modules need not be files.

Let's write out a small source file to disk, called module1.py, and lets see if we can find it. We could create the file manually but we're just going to do it internally.

In [13]:
with open('module1.py', 'w') as code_file:
    code_file.write("print('running module1.py..')\n")
    code_file.write("x = 'Python'\n")

importlib.util.find_spec('module1') # It found it and also told us the loader to use as well as the file path of the module.

ModuleSpec(name='module1', loader=<_frozen_importlib_external.SourceFileLoader object at 0x7f367e8e29e0>, origin='/home/nasiq/repos/python-deepdive/Part 1/Section 09 - Modules, Packages and Namespaces/module1.py')

In [1]:
import module1 # prints: running module1.py..  BUT ONLY ON FIRST RUN. If run again, it will find it in the cache so it won't re-run it.
module1.x

running module1.py..


'Python'

What if we create a module and place it in a completely random location, such as on the Desktop or in the Windows 'videos' folder. Would Python be able to find it then? 

The answer is no. The reason is because Python needs to be told where it can look to find the module. We can view all those locations (paths) using `sys.path`.

In [3]:
import sys
sys.path

['/home/nasiq/repos/python-deepdive/Part 1/Section 09 - Modules, Packages and Namespaces',
 '/usr/lib/python310.zip',
 '/usr/lib/python3.10',
 '/usr/lib/python3.10/lib-dynload',
 '',
 '/home/nasiq/.local/lib/python3.10/site-packages',
 '/usr/local/lib/python3.10/dist-packages',
 '/usr/lib/python3/dist-packages',
 '/usr/lib/python3/dist-packages/IPython/extensions',
 '/home/nasiq/.ipython']

This is just a list that we can append to. So, if we have our module saved in a different location, all we have to do is append that file path and then everything will work fine.

In [5]:
import os

# you can use this for Mac/Linux:
# ext_module_path = os.environ['HOME']

# you can use this in Windows 10
#ext_module_path = os.environ['HOMEPATH']

# or you can just hard code some path
# ext_module_path = 'c:\\temp' 

ext_module_path = os.environ['HOME']
ext_module_path

'/home/nasiq'

In [6]:
file_abs_path = os.path.join(ext_module_path, 'module2.py')

with open(file_abs_path, 'w') as code_file:
    code_file.write("print('running module2.py...')\n")
    code_file.write("x = 'python'\n")

sys.path.append(ext_module_path)

In [8]:
import importlib

importlib.util.find_spec('module2')

import module2

running module2.py...


It works! But only because we added the path to the systems list of paths.

# 04 - Import Variants and Misconceptions

To summarise the import process in chronological order:

<img src=s9-images/9.7.png width=500 />

**What about using an import alias such as `import math as r_math`?**--------------------------------**What about `from math import sqrt`?**

<img src=s9-images/9.8.png width=500 /> <img src=s9-images/9.9.png width=487 />

**Finally, `from math import sqrt as r_sqrt`**------------------------------------------------------------ **What about `from math import *`?**

<img src=s9-images/9.10.png width=500 /> <img src=s9-images/9.11.png width=527/>

The key takeaway is that `sys.modules` is the same every time. (If it finds that the module is already in `sys.modules`, then it doesn't reload/re-execute it.) It's only the namespace that changes.

So, this means that using `from math import sqrt` is **not** more efficient because the whole module is still fully loaded into memory. It just reduces the namespace. Technically, `math.sqrt(2)` is a tiny tiny bit slower than `sqrt(2)` because there's one less dictionary lookup, but dictionary lookups are very very fast. So, it's absolutely negligible for all intents and purposes.

# 05 - Reloading Modules

The key takeaway from this subsection is that reloading a module is possible but should never be done in production code. 

Although you technically can reload a module, it's not recommended, especially for production code.
The safest is just to make your code changes, and restart your app.

Even if you are trying to monkey patch (change at run-time) a code module and you want everyone who uses that module to "see" the change, they very well may not, depending on how they are accessing your module.

The reason stems from the fact that if Python sees a 2nd mention of an import statement, it will not re-execute it because it's already in the system cache (sys.modules). Sure, you can delete it from sys modules (e.g. `del sys.modules['test_module']`) but it will still exist in the global namespace. It gets unnecessarily complicated. If you must, then use `importlib.reload(test)` which will re-execute the module and assign it to the same memory address as before. But this still is unnecessary.

# 06 - Using `__main__`

Here are the 4 files used in this section: [run.py](./06%20-%20Using%20__main__/run.py), [__main__.py](./06%20-%20Using%20__main__/__main__.py), [module1.py](./06%20-%20Using%20__main__/module1.py), [timing.py](./06%20-%20Using%20__main__/timing.py) (but these are the files by the time we reach the end of this section; we're going to build it up below).

Let's run `run.py`.

<img src=s9-images/9.12.png width=400 /> <img src=s9-images/9.13.png width=680 />

Output:

<img src=s9-images/9.14.png width=400 />

We notice that run.py's name is `__main__` not run.py. But when we import module1.py (which executes it), it picked up its name. 

This is because, when we invoke a module/execute a program from the command line, the module's name is changed to `__main__`.

This is useful because we can change the code to something like this:

<img src=s9-images/9.15.png width=400 />

Now, the `print` statement will only execute if the program is executed from the command line and not through an import.

**Let's take a look at an example**

<img src=s9-images/9.16.png width=500 />
<img src=s9-images/9.17.png width=500 />


What this code is doing is timing how long it takes to **execute** the code `list(range(1_000_000))`  20 times. We get something like this..

<img src=s9-images/9.18.png width=600 />

But what if we want to call `timeit` from the command line? 

I'm going to explain how the code in [module1.py](./06%20-%20Using%20__main__/module1.py) and [timing.py](./06%20-%20Using%20__main__/timing.py) does this.

Firstly, in [timing.py](./06%20-%20Using%20__main__/timing.py), we have `if __name__ == __main__:` followed by some code (ignore the actual code for now).

<img src=s9-images/9.19.png width=700 /> 

This code will only execute if we call `timing.py` from the command line, i.e. typing `python timing.py` or `code timing.py` into the terminal. This code will not run if `timing.py` is imported into another file like `run.py`. 


To be able to pass in arguments right into the command line, we're going to need a module called `argparse` (but there is apparently a really good 3rd party library called `click` if you want much more command line automation). 

I'm going to explain each line using the line number from the image above:

28. `parser` is a variable that is basically...  .The description keyword's argument is what appears in the command line when we call `-h` for help. It will return `__doc__` which is the docstring of this python file. Since there is a multiline string (docstring) at the top of the file, this description will be returned.

29. This is telling the parser that we have a positional parameter called 'code'. We specify that the type must be a string and a bit of docstring for this argument called 'help'.

31. This is a keyword parameter. '--repeats' is the long form of '-r'. If a user doesn't pass a repeat amount, default to 10. (I believe the -- is for it to be interpreted as a keyword argument.

34. This gets all the arguments from the command line once the parser has been set up.

To call it in the command line we use (i think) :

`python timing.py "<our_positional argument>" <our kwarg> <our_kwarg value>`\
`python timing.py "[x**2 for x in range(100)]" -r 15`



Outcome:
    
<img src=s9-images/9.20.png width=700 />

run.py has imported all this code but since its within `if __name__ == __main__:`, Python will not execute any of it. So, this lets programmers use the functionality of a program but it's not going to interfere with other programmers who have imported this module.

Modules like `zipfile` and `timeit` do this - we can use them from the commandline to directly zip and time things in Python as opposed to importing these modules in a Python script and then using them.

**There is another use-case of `__main__`:**

If we try to execute a **directory**, what Python does is it goes inside and looks for a module called `__main__` to be the **starting point**.

So, if we name our important, starting file as `__main__.py`, and its within a directory called `MainUsage`...

<img src=s9-images/9.21.png width=300 />

... we can just type in the command line: `python MainUsage`, and it will execute `__main__.py`. Also, If `MainUsage` is not a directory but a zipfile, we can still type in the command line: `python MainUsage`, and it will execute `__main__.py`. So, Python's able to look within zipfiles and also execute the main file. We often do this when importing modules which are zipfiles.

There is a 13 minute summary video of modules on udemy.

# 07 - What are Packages

Packages are special type of module. These modules can contain modules, but they can also contain packages (known as sub-packages). If a module is a package, it must have a value set for the `__path__` property. If the module isn't a package then this value will be an empty string: ''. **The empty string refers to the root of the application directory** (the project root is basically the level at which your app starts running so if you're running a module in some folder, then that folder becomes the root for that run). 

Since modules do not have to be entities in a file system (we can use loaders + finders to get them from e.g. a database or RestAPI), the same is true for packages. But typically they are in a file system.

Packages represent a hierarchy of modules and/or packages. For example, a package can contain another package which can contain a module. We use a dot notation to indicate the path hierarchy, and this is normally found in the `__path__`.

`pack1.pack1_1.module1_1` : Remember that this isn't necessarily a file system, but 99% of the time it will be.

How do we import this? 

`import pack1.pack1_1. module1_1`

What this will do is:
1. Import `pack1`
2. Import `pack1.pack1_1`
3. Import `pack1.pack1_1.module1`

The **sys.modules** cache will contain entries for `pack1`, `pack1.pack1_1` and `pack1.pack1_1.module1`.

The **namespace** will only contain `pack1`.

So, how do we represent this packages using a file system? 

**The directory name becomes the package name**

But since a package is just a module, where does the code of the package go? Normally we see that directories can contain files but it itself cannot be a file.

**It goes inside the directory in a special file called `__init__.py`.**

So to define a package in our file system, we must:
1. create a directory whose name will be the package name.
2. create a file called `__init__.py` inside that directory.

This file tells Python that this directory is a package and not a standard directory.

**Importing a package**

It works similar to a module in terms of the loading, executing and caching, because it *is* a module. But it has an extra property: `__path__`.

<img src=s9-images/9.22.png width=550 />

So `<package.name>.__path__` retrieves the filepath of the directory while `<package.name>.__file__` retrieves the filepath of the `__init__.py` file. Normal modules also have the `<package.name>.__file__` property

In [14]:
import module1
module1.__file__

'/home/nasiq/repos/python-deepdive/Part 1/Section 09 - Modules, Packages and Namespaces/module1.py'

<img src=s9-images/9.23.png width=750 />

**The `__package__` property**

This attribute tells us the package that the module code is located in. If applied to a module located in the applications root (e.g. module.py in 'app/' below), then it will return an empty string or it won't be set. If applied to a package in the root, it will return the package name itself. It might be useful to imagine it as returning '.package_name'

Here are some examples for module.py, pack1 and module1a:

<img src=s9-images/9.24.png width=600 />

Here's another example. We can see clearly the dot notation used for nested packages:

<img src=s9-images/9.25.png width=600 />


Let's say we're in module.py at the root of our application and we have the following import statement:

`import pack1.pack1_1.module1_1a`

<img src=s9-images/9.26.png width=750 />

As you can see, this may import much more if the `__init__` files within the hierarchy contain import statements.

We only import to the extent of our position in the hierarchy. So, if we import a package that's near the top of the hierarchy, with name package1 for example, it won't start importing packages within package1, or packages within that. See example below.

<img src=s9-images/9.27.png width=600 />

**Quick reiteration**; in this jupyter notebook we have a structure looking like this:

<img src=s9-images/9.28.png width=400 />

lets try import pack1_1. We know we can't directly import it - we must import it through the hierarchy. 

Note: For this example only, because we're using jupyter notebook and I don't want to start moving this Summary notebook to a different place, we'll need to add an extra filepath to our list of filepaths so that we the finder can import pack1. If this file were in the '07 - What are Packages' folder, we wouldn't have any issues of importing pack1.

In [1]:
import sys
import os

current_working_directory = os.getcwd()
correct_path = '07 - What are Packages'

new_path = os.path.join(current_working_directory, correct_path)

sys.path.append(new_path)

In [32]:
import pack1

executing pack1...
executing pack1_1...
executing module1_1a...
executing module1_1b...


Okay, it seemed to work. Since pack1's `__init__` imports pack1_1, the lines of code within that are executed too. This imports module1_1a and module1_1b.

But lets check sys.modules and globals().

In [42]:
print('pack1_1' in sys.modules)

print('pack1.pack1_1' in sys.modules)
print('pack1.pack1_1' in globals())

False
True
False


So, the first one isn't in the cache because the package *isn't* **pack1_1**. It *is* **pack1.pack1_1** - that's why the 2nd statement is True. 

So why is the 3rd statement false? Why do we not have it in the namespace? 

Since `pack1` is in `globals()`, anything that needs to be accessed from deeper within `pack1` will go through the `pack1` symbol.\
If we really wanted our own reference to it, we must do `from pack1 import pack1_1`, but in reality, this new symbol `pack1_1` is just a reference to `pack1.pack1_1`. They have the same memory IDs.

In [44]:
from pack1 import pack1_1

id(pack1_1) == id(sys.modules['pack1.pack1_1'])

True

If we import something deep within the hierarchy such as module1_1a...

In [2]:
import pack1.pack1_1.module1_1a

executing pack1...
executing pack1_1...
executing module1_1a...
executing module1_1b...


...it executed everything in its way, as expected. Now each of these are in our sys.modules.

In [4]:
print('pack1' in sys.modules)
print('pack1.pack1_1' in sys.modules)
print('pack1.pack1_1.module1_1a' in sys.modules)

True
True
True


But none of these are in `globals()` EXCEPT the top of the hierarchy.

In [5]:
print('pack1' in globals())
print('pack1.pack1_1' in globals())
print('pack1.pack1_1.module1_1a' in globals())

True
False
False


**Why use packages? Developer vs user's perspective**

<img src=s9-images/9.29.png width=600 />

How do we get this functionality for the user? We need to use the `__init__.py`

<img src=s9-images/9.30.png width=600 />

# 08 - structuring_package_imports

## Part 1

Let's say we have a structure like the image below. At the root we have the **common** package which contains a **validators** subpackage which contain numerous modules. Inside each module we have the main function that useful to the user (`is_json(arg)`) as well as some helper functions which are only needed to make `is_json(arg)` work correctly.

The suboptimal way of this current structure is that we have to import every module individually. Another issue is that the code completion is showing functions like `date_helper_1()` which was never intended for the user to see. We want to hide these from the user.

<img src=s9-images/9.31.png width=500 />

<img src=s9-images/9.32.png width=500 />

All of these files (as a finished version) are available in the 'structuring_package_imports' in my current directory. We can look at the namespaces of each of these packages using `<package>.__dict__` (but to look at our self, we use `globals()`).

Note: I'm going to need to append the 'structuring_package_imports' path into my system path so that I can look in that directory. I'm going to hardcode it for now.

In [2]:
import sys
import os

current_working_directory = os.getcwd()
correct_path = '08 - structuring_package_imports'

new_path = os.path.join(current_working_directory, correct_path)

sys.path.append(new_path)

If inside **main.py** we imported something from **common** or **common** itself, we will expect to see the 'common' package in main's namespace. If we look inside **common's** namespace, we will expect to see **validators**, **models**, **helpers**, etc: 

In [5]:
import common
for k in common.__dict__.keys():
    print(k)

__name__
__doc__
__package__
__loader__
__spec__
__path__
__file__
__cached__
__builtins__
validators
models
helpers


Let's take a look at the finished correct approach and break it down.

<img src=s9-images/9.33.png width=300 />

If, in **main.py**, we import the **validators** package, it's going to execute the **validator's** `__init__.py` file which is the image above. Let's ignore all the code in the image except from the commented out lines. 

If we uncomment these, then when we import the **validators** package via `import common.validators`, we're going to be able to write the following in our **main.py** file (for example):

- `common.validators.boolean.is_boolean('True')`
- `common.validators.boolean.is_date({})`

but we can't write

- `common.validators.is_boolean('True')`
- `common.validators.is_date({})`

because, those lines in the **validator's** `__init__.py` (`import common.validators.boolean`) only allow us to write `common.validators.boolean` in our code. Just like when we `import math`, we can't type `sqrt()` - we must still use `math.sqrt()`. The way to get it in our namespace is using `from math import sqrt`.

In this case, we actually want all the functions from boolean so we need:

- `from common.validators.boolean import *`

Now, we can write:

- `common.validators.is_boolean('True')`

**Relative Imports**

This works intuitively. Looking at the commented lines in the image above, we see that the `common.validators` in `common.validators.boolean` is sort of redundant because this `__init__.py` file is in the same directory as boolean, date, etc - they're siblings. So we don't need it. If we wanted a package one level up in the hierarchy, we could use double-dot e.g. `import ..models`. If we wanted to go one level higher again, we would use three dots: `import ...common` (not `import ../../common` or something).

**How can we prevent the helper functions from being imported when we use `import *`?**

1. One way is we can add an `_` in front of the function. So `def boolean_helper_1()` becomes `def _boolean_helper_1()`.

2. Another way is using `__all__`. This is simply a list of variables/functions/objects that Python will export when an import call is made. So, in our boolean file which contains `is_boolean(arg)`, `boolean_helper_1(arg)` and `boolean_helper_2(arg)`, all we do is type:\
`__all__ = ['is_boolean']`\
in the file where all these functions are defined.

An issue with the 2nd approach is that we will have `__all__`'s in every single nested package as well each packages' modules. If things are renamed, this will make things very messy. What we want instead is to write something in our **validator**'s `__init__.py` which imports only the things we need.

Let's do this for our **validator**'s `__init__.py` (image above). Our **validator** package has 4 files, each with 1 useful function and a couple of helper functions. 

1. In each file, we create the `__all__` list and add the objects we want to export as a string.
2. From **validator**'s `__init__.py`, we import everything from each module (`from .boolean import *`, `from .date import *` etc.)
3. Now that we have access to everything in boolean (and the others), we can access its `__all__` using `boolean.__all__` - this is a list.
4. Lists can be concatenated to form one large list, so we can write `boolean.__all__ + date.__all__` - this is still a list.
5. If this concatenation starts becoming quite long, we can convert the list into an expression by enclosing the entire concatenation with brackets.

Look at the image above (or inside **validator**'s `__init__.py`) for the final code.

Using the `import *` approach can be problematic and there's another (slightly better) way, which we'll see in Part 2.

## Part 2

We'll now be moving away from the **validators** package and instead using the **models** package within **common**, (but we'll still use `main.py` for our imports). Familiarise yourself with the structure of **models**.

Let's import `common` and see what's in the namespace

In [3]:
import common

In [6]:
print(common.__dict__.keys())

dict_keys(['__name__', '__doc__', '__package__', '__loader__', '__spec__', '__path__', '__file__', '__cached__', '__builtins__'])


It doesn't contain any of the modules or packages because we haven't got any import statements in **common**'s `__init__.py`. Assume we write in the `__init__.py`: 

`import common.models.posts.post` (This is a module: **post.py** which contains a class called `Post()`.

We now have access to the this module's class with:

`john_post = common.models.posts.post.Post()`

If instead we wished that we could import the **post**'s package and have access to all the functions/classes within the package's two modules, we would have to update the **post** package `__init__.py`. To get access to all functions/classes, we would write:

`from .post import *`/
`from .posts import *`.

Now our import statement in `main.py` reduces to:

`import common.models.posts` 

All functions have been added to `common.models.posts`'s namespace. Now, if we add `__all__` statements to each module to only us to extract the classes that we want, then, the only things we'll expect to see in our namespace is: `post` (the module), `Post` (from `post.py` module), `Post` (the module) and `Posts` (from `posts.py` module). Our helper functions aren't in the `__all__`.

So note that, when we type `from .post import *`, Python adds `post` into the namespace, as well as all the things it imports. The solution to remove these is simple.

Now it's good practice to add to our [models](./08%20-%20structuring_package_imports/common/models/__init__.py) `__init__.py`:

`__all__ = (posts.__all__ + users.__all__)`

Because, if a user tries `from models import *` (which is bad practice) then it will import *everything* from the **model** package's namespace. We just mentioned above that if a line such as `from .posts import *` is mentioned, then, `posts` will be added to the namespace. 

Look at the [models file](./08%20-%20structuring_package_imports/common/models/__init__.py) `__init__.py`. It contains `from .posts import *`, so `posts` is added to **model**'s namespace. But we don't necessarily want that - it clutters things up. So what we do is limit all that is exportable when **model** is imported. Having `__all__ = (posts.__all__ + users.__all__)` in **model**'s `__init__.py` means that if someone tries `from model import *`, it will **only** import things within the `__all__`. 

It's a subtle point. To summarise it, `__all__` only applies when we have `import *`.

One final point to make is that our `__init__.py` files should never really have functional code - only imports. We may have some functional code such as `Try` and `Except` but those are usually for ensuring that the package actually imports and doesn't crash. Our functional code should go in modules and our `__init__.py` should do `from <module> import *`. Take a look at the `asyncio` package to see good practice.

<img src=s9-images/9.34.png width=500 />


## Implicit Namespace Packages 

Directory is a general term that refers to EITHER a regular package, that we've discussed above, or a namespace package. If a directory is not a regular package, then it is a namespace package. Here are the differences:

<img src=s9-images/9.35.png width=600 />

We can still import by using the dot notation whether the package is namespace or regular

<img src=s9-images/9.36.png width=500 />

but a big difference is that we cannot monkey around with the `__path__` object. In other words, we can't trick the package to thinking it's somewhere else to load from a database or API, for example.

# 09 - zipped_packages

Looking inside zipped packages is quite easy. Since Python only looks for modules within sys.path, the solution is to simply add the zip file to our sys.path

<img src=s9-images/9.37.png width=500 />