# Del 06: Modules, Packages and Imports


Po ustaljenih navodilih ustvarimo okolje (mapa reader)
- Dodamo `nano requirements_dev.txt`

requirements_dev.txt:

    -r requirements.txt
    flake8
    black
    rope

- `requirements.txt` zaenkrat pustimo prazno

## Python Modules 

Modular programming refers to the process of breaking a large, unwieldy programming task into separate, smaller, more manageable subtasks or modules. Individual modules can then be cobbled together like building blocks to create a larger application.

There are several advantages to modularizing code in a large application:

<ul>
<li>
<p><strong>Simplicity:</strong> Rather than focusing on the entire problem at hand, a module typically focuses on one relatively small portion of the problem. If you’re working on a single module, you’ll have a smaller problem domain to wrap your head around. This makes development easier and less error-prone.</p>
</li>
<li>
<p><strong>Maintainability:</strong> Modules are typically designed so that they enforce logical boundaries between different problem domains. If modules are written in a way that minimizes interdependency, there is decreased likelihood that modifications to a single module will have an impact on other parts of the program. (You may even be able to make changes to a module without having any knowledge of the application outside that module.) This makes it more viable for a team of many programmers to work collaboratively on a large application.</p>
</li>
<li>
<p><strong>Reusability:</strong> Functionality defined in a single module can be easily reused (through an appropriately defined interface) by other parts of the application. This eliminates the need to duplicate code.</p>
</li>
<li>
<p><strong>Scoping:</strong> Modules typically define a separate <a href="https://realpython.com/python-namespaces-scope/"><strong>namespace</strong></a>, which helps avoid collisions between identifiers in different areas of a program. (One of the tenets in the <a href="https://www.python.org/dev/peps/pep-0020">Zen of Python</a> is <em>Namespaces are one honking great idea—let’s do more of those!</em>)</p>
</li>
</ul>

Functions, modules and packages are all constructs in Python that promote code modularization.

### Python Modules: Overview

There are actually three different ways to define a module in Python:

1. A module can be written in Python itself.
2. A module can be written in C and loaded dynamically at run-time, like the re (regular expression) module.
3. A built-in module is intrinsically contained in the interpreter, like the itertools module.

A module’s contents are accessed the same way in all three cases: with the **import statement**.

Here, the focus will mostly be on modules that are written in Python. The cool thing about modules written in Python is that they are exceedingly straightforward to build. All you need to do is create a file that contains legitimate Python code and then give the file a name with a .py extension. That’s it! No special syntax or voodoo is necessary.

For example, suppose you have created a file called mod.py containing the following:

> Ustvarimo mapo reader

In [None]:
# reader/feed.py
s = "If Comrade Napoleon says it, it must be right."
a = [100, 200, 300]

def foo(arg):
    print(f'arg = {arg}')

class Foo:
    pass

Several objects are defined in mod.py:
- s (a string)
- a (a list)
- foo() (a function)
- Foo (a class)


> Odpremo interpreter v isti mapi kot feed.py file.


Assuming feed.py is in an appropriate location, which you will learn more about shortly, these objects can be accessed by importing the module as follows:

In [None]:
import feed
print(feed.s)

feed.a

feed.foo(['quux', 'corge', 'grault'])

x = feed.Foo()
x

### The Module Search Path

Continuing with the above example, let’s take a look at what happens when Python executes the statement:

    import feed

When the interpreter executes the above import statement, it searches for mod.py in a list of directories assembled from the following sources:

<ul>
<li>The directory from which the input script was run or the <strong>current directory</strong> if the interpreter is being run interactively</li>
<li>The list of directories contained in the <a href="https://docs.python.org/3/using/cmdline.html#envvar-PYTHONPATH"><code>PYTHONPATH</code></a> environment variable, if it is set. (The format for <code>PYTHONPATH</code> is OS-dependent but should mimic the <code>PATH</code> environment variable.)</li>
<li>An installation-dependent list of directories configured at the time Python is installed</li>
</ul>

The resulting search path is accessible in the Python variable sys.path, which is obtained from a module named sys:

In [1]:
import sys
sys.path

['/home/jovyan/work/python-napredni-iskraemeco/Del_06_Modules_Packages_and_Imports',
 '/opt/conda/lib/python38.zip',
 '/opt/conda/lib/python3.8',
 '/opt/conda/lib/python3.8/lib-dynload',
 '',
 '/opt/conda/lib/python3.8/site-packages',
 '/opt/conda/lib/python3.8/site-packages/IPython/extensions',
 '/home/jovyan/.ipython']

Thus, to ensure your module is found, you need to do one of the following:

<ul>
<li>Put <code>feed.py</code> in the directory where the input script is located or the <strong>current directory</strong>, if interactive</li>
<li>Modify the <code>PYTHONPATH</code> environment variable to contain the directory where <code>feed.py</code> is located before starting the interpreter<ul>
<li><strong>Or:</strong> Put <code>feed.py</code> in one of the directories already contained in the <code>PYTHONPATH</code> variable</li>
</ul>
</li>
<li>Put <code>feed.py</code> in one of the installation-dependent directories, which you may or may not have write-access to, depending on the OS</li>
</ul>

There is actually one additional option: you can put the module file in any directory of your choice and then modify sys.path at run-time so that it contains that directory. For example, in this case, you could put mod.py in directory C:\Users\john and then issue the following statements:

In [None]:
sys.path.append(r'C:\Users\john')
sys.path
import feed

Once a module has been imported, you can determine the location where it was found with the module’s `__file__` attribute:

In [None]:
import feed
feed.__file__

import re
re.__file__

The directory portion of `__file__` should be one of the directories in sys.path.

Pokažemo kako zgledajo sistemski moduli: `ls /home/leons/.pyenv/versions/3.9.0/lib/python3.9`

### The import Statement

Module contents are made available to the caller with the import statement. The import statement takes many different forms, shown below.

#### `import <module_name>`

The simplest form is the one already shown above:

    import <module_name>

Note that this does not make the module contents directly accessible to the caller. Each module has its own **private symbol table**, which serves as the global symbol table for all objects defined in the module. Thus, a module creates a separate **namespace**, as already noted.

The statement import `<module_name>` only places `<module_name>` in the caller’s symbol table. The objects that are defined in the module remain in the module’s private symbol table.

From the caller, objects in the module are only accessible when prefixed with `<module_name>` via **dot notation** as illustrated below.

After the following import statement, mod is placed into the local symbol table. Thus, mod has meaning in the caller’s local context:

In [None]:
import feed
feed

But s and foo remain in the module’s private symbol table and are not meaningful in the local context:



In [None]:
s

In [None]:
foo('quux')

To be accessed in the local context, names of objects defined in the module must be prefixed by feed:

In [None]:
feed.s

In [None]:
feed.foo('quux')

Several comma-separated modules may be specified in a single import statement:

    import <module_name>[, <module_name> ...]

#### `from <module_name> import <name(s)>`

An alternate form of the import statement allows individual objects from the module to be imported directly into the caller’s symbol table:

    from <module_name> import <name(s)>

Following execution of the above statement, `<name(s)>` can be referenced in the caller’s environment without the `<module_name>` prefix:



In [None]:
from feed import s, foo
s

foo('quux')


from feed import Foo
x = Foo()
x


Because this form of import places the object names directly into the caller’s symbol table, any objects that already exist with the same name will be overwritten:



In [None]:
a = ['foo', 'bar', 'baz']
a


from feed import a
a

It is even possible to indiscriminately import everything from a module at one fell swoop:

    from <module_name> import *

In [None]:
from feed import *
s
a
foo
Foo

This isn’t necessarily recommended in large-scale production code. It’s a bit dangerous because you are entering names into the local symbol table en masse. Unless you know them all well and can be confident there won’t be a conflict, you have a decent chance of overwriting an existing name inadvertently. However, this syntax is quite handy when you are just mucking around with the interactive interpreter, for testing or discovery purposes, because it quickly gives you access to everything a module has to offer without a lot of typing.

#### `from <module_name> import <name> as <alt_name>`

It is also possible to import individual objects but enter them into the local symbol table with alternate names:

    from <module_name> import <name> as <alt_name>[, <name> as <alt_name> …]

This makes it possible to place names directly into the local symbol table but avoid conflicts with previously existing names:



In [None]:
s = 'foo'
a = ['foo', 'bar', 'baz']

from feed import s as string, a as alist
s
string
a
alist

#### `import <module_name> as <alt_name>`

You can also import an entire module under an alternate name:

    import <module_name> as <alt_name>

In [None]:
import feed as my_module
my_module.a

my_module.foo('qux')

### Guard against unsuccessful imports

Lastly, a try statement with an except ImportError clause can be used to guard against unsuccessful import attempts:

In [2]:
try:
    # Non-existent module
    import baz
except ImportError:
    print('Module not found')

Module not found


In [None]:
try:
    # Existing module, but non-existent object
    from feed import baz
except ImportError:
    print('Object not found in module')

### Styling of Import Statements

PEP 8, the official style guide for Python, has a few pointers when it comes to writing import statements. Here’s a summary:
- Imports should always be written at the top of the file, after any module comments and docstrings.
- Imports should be divided according to what is being imported. There are generally three groups:
    - standard library imports (Python’s built-in modules)
    - related third party imports (modules that are installed and do not belong to the current application)
    - local application imports (modules that belong to the current application)
- Each group of imports should be separated by a blank space.

It’s also a good idea to order your imports alphabetically within each import group. This makes finding particular imports much easier, especially when there are many imports in a file.

Here’s an example of how to style import statements:

In [None]:
"""Illustration of good import statement styling.

Note that the imports come after the docstring.

"""

# Standard library imports
import datetime
import os

# Third party imports
from flask import Flask
from flask_restful import Api
from flask_sqlalchemy import SQLAlchemy

# Local application imports
from feed import a
from feed import foo

The import statements above are divided into three distinct groups, separated by a blank space. They are also ordered alphabetically within each group

> Testiramo v VS code: Control palette -> Organize imports

In [None]:
from flask_sqlalchemy import SQLAlchemy
import os
from feed import a
from flask_restful import Api
from feed import foo
import datetime
from flask import Flask

### The dir() Function

The built-in function dir() returns a list of defined names in a namespace. Without arguments, it produces an alphabetically sorted list of names in the current local symbol table:

In [14]:
print(dir())

['Foo', 'In', 'Out', '_', '_10', '_11', '_12', '_13', '_4', '_5', '_8', '_9', '__', '___', '__builtin__', '__builtins__', '__doc__', '__loader__', '__name__', '__package__', '__spec__', '_dh', '_i', '_i1', '_i10', '_i11', '_i12', '_i13', '_i14', '_i2', '_i3', '_i4', '_i5', '_i6', '_i7', '_i8', '_i9', '_ih', '_ii', '_iii', '_oh', 'a', 'exit', 'foo', 'get_ipython', 'mod', 'quit', 're', 's', 'sys']


In [15]:
qux = [1, 2, 3, 4, 5]

In [17]:
print(dir())

['Foo', 'In', 'Out', '_', '_10', '_11', '_12', '_13', '_16', '_4', '_5', '_8', '_9', '__', '___', '__builtin__', '__builtins__', '__doc__', '__loader__', '__name__', '__package__', '__spec__', '_dh', '_i', '_i1', '_i10', '_i11', '_i12', '_i13', '_i14', '_i15', '_i16', '_i17', '_i2', '_i3', '_i4', '_i5', '_i6', '_i7', '_i8', '_i9', '_ih', '_ii', '_iii', '_oh', 'a', 'exit', 'foo', 'get_ipython', 'mod', 'quit', 'qux', 're', 's', 'sys']


Note how the first call to dir() above lists several names that are automatically defined and already in the namespace when the interpreter starts. As new names are defined (qux, Bar, x), they appear on subsequent invocations of dir().

This can be useful for identifying what exactly has been added to the namespace by an import statement:

When given an argument that is the name of a module, dir() lists the names defined in the module:

In [20]:
import math

In [24]:
print(dir(math))

['__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', 'acos', 'acosh', 'asin', 'asinh', 'atan', 'atan2', 'atanh', 'ceil', 'copysign', 'cos', 'cosh', 'degrees', 'e', 'erf', 'erfc', 'exp', 'expm1', 'fabs', 'factorial', 'floor', 'fmod', 'frexp', 'fsum', 'gamma', 'gcd', 'hypot', 'inf', 'isclose', 'isfinite', 'isinf', 'isnan', 'ldexp', 'lgamma', 'log', 'log10', 'log1p', 'log2', 'modf', 'nan', 'pi', 'pow', 'radians', 'remainder', 'sin', 'sinh', 'sqrt', 'tan', 'tanh', 'tau', 'trunc']


In [None]:
from math import *

In [None]:
print(dir())

### Executing a Module as a Script

Any .py file that contains a module is essentially also a Python script, and there isn’t any reason it can’t be executed like one.

Here again is feed.py as it was defined above:

In [None]:
s = "If Comrade Napoleon says it, it must be right."
a = [100, 200, 300]


def foo(arg):
    print(f"arg = {arg}")


class Foo:
    pass


This can be run as a script:

    python feed.py

There are no errors, so it apparently worked. Granted, it’s not very interesting. As it is written, it only defines objects. It doesn’t do anything with them, and it doesn’t generate any output.

Let’s modify the above Python module so it does generate some output when run as a script:

In [None]:
s = "If Comrade Napoleon says it, it must be right."
a = [100, 200, 300]

def foo(arg):
    print(f'arg = {arg}')

class Foo:
    pass

print(s)
print(a)
foo('quux')
x = Foo()
print(x)

Now it should be a little more interesting:

    python feed.py

Unfortunately, now it also generates output when imported as a module:

In [None]:
import feed

When a .py file is imported as a module, Python sets the special dunder variable `__name__` to the name of the module. However, if a file is run as a standalone script, `__name__` is (creatively) set to the string `'__main__'`. Using this fact, you can discern which is the case at run-time and alter behavior accordingly:

In [None]:
#feed.py
s = "If Comrade Napoleon says it, it must be right."
a = [100, 200, 300]

def foo(arg):
    print(f'arg = {arg}')

class Foo:
    pass

if (__name__ == '__main__'):
    print('Executing as standalone script')
    print(s)
    print(a)
    foo('quux')
    x = Foo()
    print(x)

Now, if you run as a script, you get output:

    python mod.py

But if you import as a module, you don’t:

In [None]:
import mod
mod.foo('grault')

Modules are often designed with the capability to run as a standalone script for purposes of testing the functionality that is contained within the module. This is referred to as unit testing.

## Importlib Resources

https://docs.python.org/3.9/library/importlib.html#module-importlib.resources

One challenge when packaging a Python project is deciding what to do with project resources like data files needed by the project. A few options have commonly been used:
- Hard-code a path to the data file.
- Put the data file inside the package and locate it using `__file__`.
- Use setuptools.pkg_resources to access the data file resource.

Each of these have their shortcomings. The first option is not portable. Using `__file__` is more portable, but if the Python project is installed it might end up inside a zip and not have a `__file__` attribute. The third option solves this problem, but is unfortunately very slow.

A better solution is the new importlib.resources module in the standard library. It uses Python’s existing import functionality to also import data files. Assume you have a resource inside a Python package like this:

    mkdir hello
    cd hello
    echo "Hello, {recipient}!" >> greeting.txt
    touch greet.py

    hello/
    │
    ├── greeting.txt
    └── greet.py

**Resources are files that live within Python packages**. Think test data files, certificates, templates, translation catalogs, and other static files you want to access from Python code. Sometimes you put these static files in a package directory within your source tree, and then locate them by importing the package and using its `__file__` attribute. But this doesn't work for zip files!

Using import works great on Python modules and packages, but import will not work on non-Python data files, such as text files (including JSON, HTML, csv, etc.) or binary files (such as images).

You could hack something together with the `__file__` variable, **which refers to the current Python module as a file in the filesystem**. And many developers do this.

However, `importlib.resources exists`, **since Python 3.7, and presents a more reliable and better-looking way to load data files**.

In [2]:
# greet.py - nedelujoče
def greet(recipient):
    """Greet a recipient."""
    with open("./greeting.txt") as f:
        template = f.read()

    return template.format(recipient=recipient)

> Zaženemo python izven mape hello oz ustvarmo file main.py: in importamo: `from hello import greet` in zaženemo `greet.greet('hello')`: FileNotFoundError: [Errno 2] No such file or directory: './greeting.txt' 

Note that hello needs to be a Python package. That is, the directory needs to contain an `__init__.py` file (which may be empty). You can then read the greeting.txt file as follows:

In [None]:
# greet_work.py - importlib.resources
from importlib import resources

def greet(recipient):
    """Greet a recipient."""
    with resources.open_text("hello", "greeting.txt") as fid:
        template = fid.read()
    return template.format(recipient=recipient)

> Zaženemo python izven mape hello: in importamo: from hello import greet in zaženemo greet.greet('hello')

<p>A similar <a href="https://docs.python.org/3.7/library/importlib.html#importlib.resources.open_binary"><code>resources.open_binary()</code></a> function is available for opening files in binary mode. In the earlier <a href="#customization-of-module-attributes">“plugins as module attributes” example</a>, we used <code>importlib.resources</code> to discover the available plugins using <code>resources.contents()</code>. See <a href="https://www.youtube.com/watch?v=ZsGFU2qh73E">Barry Warsaw’s PyCon 2018 talk</a> for more information.</p>

It is possible to use importlib.resources in Python 2.7 and Python 3.4+ through a backport. A guide on migrating from pkg_resources to importlib.resources is available.

https://importlib-resources.readthedocs.io/en/latest/migration.html

## Python Packages

Suppose you have developed a very large application that includes many modules. As the number of modules grows, it becomes difficult to keep track of them all if they are dumped into one location. This is particularly so if they have similar names or functionality. You might wish for a means of grouping and organizing them.

Packages allow for a hierarchical structuring of the module namespace using dot notation. In the same way that modules help avoid collisions between global variable names, packages help avoid collisions between module names.

Creating a package is quite straightforward, since it makes use of the operating system’s inherent hierarchical file structure.

> V reader/reader mapi ustvarimo datoteke:

#### feed.py

First consider feed.py. This file contains functions for reading from a web feed and parsing the result. Luckily there are already great libraries available to do this. feed.py depends on two modules that are already available on PyPI: feedparser and html2text.

feed.py contains several functions. We’ll discuss them one at a time.

To avoid reading from the web feed more than necessary, we first create a function that remembers the feed the first time it’s read:

In [None]:
# feed.py

import feedparser
import html2text

_CACHED_FEEDS = dict()

def _feed(url):
    """Only read a feed once, by caching its contents"""
    if url not in _CACHED_FEEDS:
        _CACHED_FEEDS[url] = feedparser.parse(url)
    return _CACHED_FEEDS[url]

feedparser.parse() reads a feed from the web and returns it in a structure that looks like a dictionary. To avoid downloading the feed more than once, it’s stored in _CACHED_FEEDS and reused for later calls to _feed(). Both _CACHED_FEEDS and _feed() are prefixed by an underscore to indicate that they are support objects not meant to be used directly.

We can get some basic information about the feed by looking in the .feed metadata. The following function picks out the title and link to the web site containing the feed:

In [None]:
def get_site(url):
    """Get name and link to web site of the feed"""
    info = _feed(url).feed
    return f"{info.title} ({info.link})"

In addition to .title and .link, attributes like .subtitle, .updated, and .id are also available.

The articles available in the feed can be found inside the .entries list. Article titles can be found with a list comprehension:

In [None]:
def get_titles(url):
    """List titles in feed"""
    articles = _feed(url).entries
    return [a.title for a in articles]

.entries lists the articles in the feed sorted chronologically, so that the newest article is .entries[0].

In order to get the contents of one article, we use its index in the .entries list as an article ID:

In [None]:
def get_article(article_id, url):
    """Get article from feed with the given ID"""
    articles = _feed(url).entries
    article = articles[int(article_id)]
    html = article.content[0].value
    text = html2text.html2text(html)
    return f"# {article.title}\n\n{text}"

After picking the correct article out of the .entries list, we find the text of the article as HTML on line 28. Next, html2text does a decent job of translating the HTML into much more readable text. As the HTML doesn’t contain the title of the article, the title is added before returning.

#### viewer.py

The final module is viewer.py. At the moment, it consists of two very simple functions. In practice, we could have used print() directly in `__main__.py` instead of calling viewer functions. However, having the functionality split off makes it easier to replace it later with something more advanced. Maybe we could add a GUI interface in a later version?

viewer.py contains two functions:

In [None]:
# viewer.py
def show(article):
    """Show one article"""
    print(article)

def show_list(site, titles):
    """Show list of articles"""
    print(f"The latest tutorials from {site}")
    for article_id, title in enumerate(titles):
        print(f"{article_id:>3} {title}")

show() simply prints one article to the console, while show_list() prints a list of titles. The latter also creates article IDs that are used when choosing to read one particular article.

<hr>

Given this structure, if the pkg directory resides in a location where it can be found (in one of the directories contained in sys.path), you can refer to the two modules with dot notation (pkg.mod1, pkg.mod2) and import them with the syntax you are already familiar with:

V requirements.txt dodamo:
    
    html2text
    feedparser
    importlib-resources
    

> `cd /home/leons/tecaj/reader` -> odpremo interpreter

- `pip install -r requirements_dev.txt`

In [None]:
import reader.feed, reader.viewer

In [None]:
reader.viewer.show('lalaal')

In [None]:
from reader.viewer import show
show('lalaal')

### Package Initialization

> Ustvraimo še `config.cfg`

    [feed]
    url = https://realpython.com/atom.xml

If a file named `__init__.py` is present in a package directory, it is invoked when the package or a module in the package is imported. This can be used for execution of package initialization code, such as initialization of package-level data.

In [None]:
from configparser import ConfigParser as _ConfigParser

import importlib_resources as _resources

# Version of realpython-reader package
__version__ = "1.0.0"

# Read URL of feed from config file
_cfg = _ConfigParser()
with _resources.path("reader", "config.cfg") as _path:
    _cfg.read(str(_path))
URL = _cfg.get("feed", "url")

To read the URL to the feed from the configuration file, we use configparser and importlib.resources. The latter is used to import non-code (or resource) files from a package without having to worry about the full file path. It is especially helpful when publishing packages to PyPI where resource files might end up inside binary archives.

importlib.resources became a part of the standard library in Python 3.7. If you are using an older version of Python, you can use importlib_resources instead. This is a backport compatible with Python 2.7, and 3.4 and above. importlib_resources can be installed from PyPI.

The special variable `__version__` is a convention in Python for adding version numbers to your package. It was introduced in PEP 396. We’ll talk more about versioning later.

Variables defined in `__init__.py` become available as variables in the package namespace:

> `cd /home/leons/tecaj/reader` -> odpremo interpreter

In [None]:
import reader
reader.__version__
reader.URL

You should define the `__version__` variable in your own packages as well.

> Note: Much of the Python documentation states that an `__init__.py` file must be present in the package directory when creating a package. This was once true. It used to be that the very presence of `__init__.py` signified to Python that a package was being defined. The file could contain initialization code or even be empty, but it had to be present.

Starting with Python 3.3, Implicit Namespace Packages were introduced. These allow for the creation of a package without any `__init__.py` file. Of course, it can still be present if package initialization is needed. But it is no longer required.

> V feed.py dodamo:

In [None]:
from reader import URL

im popravimo funkcije v `def _feed(url=URL):`

###  `__main__.py` file

The first source code file we’ll look at is `__main__.py`. The double underscores indicate that this file has a special meaning in Python. Indeed, when running a package as a script with -m as above, Python executes the contents of the `__main__.py` file.

In other words, `__main__.py` acts as the entry point of our program and takes care of the main flow, calling other parts as needed:

In [None]:
# Standard library imports
import sys

# Reader imports
import reader
from reader import feed
from reader import viewer


def main():
    """Read the Real Python article feed"""
    args = [a for a in sys.argv[1:] if not a.startswith("-")]

    # Get URL from config file
    url = reader.URL

    # An article ID is given, show article
    if args:
        for article_id in args:
            article = feed.get_article(article_id, url=url)
            viewer.show(article)

    # No ID is given, show list of articles
    else:
        site = feed.get_site(url=url)
        titles = feed.get_titles(url=url)
        viewer.show_list(site, titles)


if __name__ == "__main__":
    main()

Notice that main() is called on the last line. If we do not call main(), then our program would not do anything. As you saw earlier, the program can either list all articles or print one specific article. This is handled by the if-else inside main().

### Different Ways of Calling a Package

 Since the package consists of four different source code files, how does the user know which file to call to run reader

The python interpreter program has an `-m` option that allows you to specify a module name instead of a file name. For instance, if you have a script called hello.py, the following two commands are equivalent:



    python hello.py

    python -m hello

Another advantage of using -m is that it works for packages as well as modules. As you saw earlier, you can call the reader package with -m:



    python -m reader

    python -m reader 11

Since reader is a package, the name only refers to a directory. How does Python decide which code inside that directory to run? **It looks for a file named** `__main__.py`. If such a file exists, it is executed. If `__main__.py` does not exist, then an error message is printed:

     python -m math

In this example, you see that the math standard library has not defined a `__main__.py` file.

If you are creating a package that is supposed to be executed, you should include a `__main__.py` file.

## Python Application Layouts

https://realpython.com/python-application-layouts/

Python, though opinionated on syntax and style, is surprisingly flexible when it comes to structuring your applications.

On the one hand, this flexibility is great: it allows different use cases to use structures that are necessary for those use cases. On the other hand, though, it can be very confusing to the new developer.

The Internet isn’t a lot of help either—there are as many opinions as there are Python blogs. In this article, I want to give you a dependable Python application layout reference guide that you can refer to for the vast majority of your use cases.

A lot of us work primarily with Python applications that are run via command-line interfaces (CLIs). This is where you often start with a blank canvas, and the flexibility of Python application layouts can be a real headache.

Starting with an empty project folder can be intimidating and lead to no shortage of coder’s block. In this section, I want to share some proven layouts that I personally use as a starting point for all of my Python CLI applications.

We’ll start with a very basic layout for a very basic use case: a simple script that runs on its own. You’ll then see how to build up the layout as the use cases advance.

### Command-Line Application Layouts: One-Off Script

You just make a .py script, and it’s gravy, right? No need to install—just run the script in its directory!

Well, that’s fine if you’re just making a script for your own use, or one that doesn’t have any external dependencies, but what if you have to distribute it? Especially to a less tech-savvy user?

The following layout will work for all of these cases and can easily be modified to reflect whatever installation or other tools you use in your workflow. This layout will cover you whether you’re creating a pure Python script (that is, one with no dependencies) or using a tool like pip or Pipenv.

While you read this reference guide, keep in mind that the exact location of the files in the layout matters less than the reason they are placed where they are. All of these files should be in a project directory named after your project. For this example, we will use (what else?) helloworld as the project name and root directory.

Here’s the Python project structure I typically use for a CLI app:

    helloworld/
    │
    ├── .gitignore
    ├── helloworld.py
    ├── LICENSE
    ├── README.md
    ├── requirements.txt
    ├── setup.py
    └── tests.py

This is pretty straightforward: everything is in the same directory. The files shown here are not necessarily exhaustive, but I recommend keeping the number of files to a minimum if you plan on using a basic layout like this. Some of these files will be new to you, so let’s take a quick look at what each of them does.

<ul>
<li>
<p><code>.gitignore</code>: This is a file that tells Git which kinds of files to ignore, like IDE clutter or local configuration files. <a href="https://realpython.com/python-git-github-intro/#gitignore">Our Git tutorial has all the details</a>, and you can find sample <code>.gitignore</code> files for Python projects <a href="https://github.com/github/gitignore">here</a>.</p>
</li>
<li>
<p><code>helloworld.py</code>: This is the script that you’re distributing. As far as naming the main script file goes, I recommend that you go with the name of your project (which is the same as the name of the top-level directory).</p>
</li>
<li>
<p><code>LICENSE</code>: This plaintext file describes the license you’re using for a project. It’s always a good idea to have one if you’re distributing code. The filename is in all caps by convention.</p>
<blockquote>
<p><strong>Note:</strong> Need help selecting a license for your project? Check out <a href="https://choosealicense.com/">ChooseALicense</a>.</p>
</blockquote>
</li>
<li>
<p><code>README.md</code>: This is a <a href="https://en.wikipedia.org/wiki/Markdown">Markdown</a> (or <a href="https://en.wikipedia.org/wiki/ReStructuredText">reStructuredText</a>) file documenting the purpose and usage of your application. Crafting a good <code>README</code> is an art, but you can find a shortcut to mastery <a href="https://dbader.org/blog/write-a-great-readme-for-your-github-project">here</a>.</p>
</li>
<li>
<p><code>requirements.txt</code>: This file defines outside Python dependencies and their versions for your application.</p>
</li>
<li>
<p><code>setup.py</code>: This file can also be used to define dependencies, but it really shines for other work that needs to be done during installation. You can read more about both <code>setup.py</code> and <code>requirements.txt</code> in our <a href="https://realpython.com/pipenv-guide/">guide to Pipenv</a>.</p>
</li>
<li>
<p><code>tests.py</code>: This script houses your tests, if you have any. <a href="https://realpython.com/python-testing/">You should have some</a>.</p>
</li>
</ul>

But now that your application is growing, and you’ve broken it out into multiple pieces within the same package, should you keep all pieces in the top-level directory? Now that your application is more complex, it’s time to organize things more cleanly.

### Command-Line Application Layouts: Installable Single Package

Let’s imagine that helloworld.py is still the main script to execute, but you’ve moved all helper methods to a new file called helpers.py.

We are going to package the helloworld Python files together but keep all the miscellaneous files, such as your README, .gitignore, and so on, at the top directory.

Let’s take a look at the updated structure:

    helloworld/
    │
    ├── helloworld/
    │   ├── __init__.py
    │   ├── helloworld.py
    │   └── helpers.py
    │
    ├── tests/
    │   ├── helloworld_tests.py
    │   └── helpers_tests.py
    │
    ├── .gitignore
    ├── LICENSE
    ├── README.md
    ├── requirements.txt
    └── setup.py

The only difference here is that your application code is now all held in the helloworld subdirectory—this directory is named after your package—and that we’ve added a file called `__init__`.py. Let’s introduce these new files:

<ul>
<li>
<p><code>helloworld/__init__.py</code>: This file has many functions, but for our purposes it tells the Python interpreter that this directory is a package directory. You can set up this <code>__init__.py</code> file in a way that enables you to import classes and methods from the package as a whole, instead of knowing the internal module structure and importing from <code>helloworld.helloworld</code> or <code>helloworld.helpers</code>. </p>
<blockquote>
<p><strong>Note:</strong> For a deeper discussion on internal packages and <code>__init__.py</code>, <a href="https://realpython.com/python-modules-packages/">our Python modules and packages overview</a> has you covered.</p>
</blockquote>
</li>
<li>
<p><code>helloworld/helpers.py</code>: As mentioned above, we’ve moved much of <code>helloworld.py</code>’s business logic to this file. Thanks to <code>__init__.py</code>, outside modules will be able to access these helpers simply by importing from the <code>helloworld</code> package.</p>
</li>
<li>
<p><code>tests/</code>: We’ve moved our tests into their own directory, a pattern you’ll continue to see as our program structures gain complexity. We have also split our tests into separate modules, mirroring our package’s structure.</p>
</li>
</ul>

This layout is a stripped down version of Kenneth Reitz’s samplemod application structure. It is another great starting point for your CLI applications, especially for more expansive projects.

https://github.com/navdeep-G/samplemod/tree/master/sample

### Command-Line Application Layouts:  Application with Internal Packages

In larger applications, you may have one or more internal packages that are either tied together with a main runner script or that provide specific functionality to a larger library you are packaging. We will extend the conventions laid out above to accommodate for this:



    helloworld/
    │
    ├── bin/
    │
    ├── docs/
    │   ├── hello.md
    │   └── world.md
    │
    ├── helloworld/
    │   ├── __init__.py
    │   ├── runner.py
    │   ├── hello/
    │   │   ├── __init__.py
    │   │   ├── hello.py
    │   │   └── helpers.py
    │   │
    │   └── world/
    │       ├── __init__.py
    │       ├── helpers.py
    │       └── world.py
    │
    ├── data/
    │   ├── input.csv
    │   └── output.xlsx
    │
    ├── tests/
    │   ├── hello
    │   │   ├── helpers_tests.py
    │   │   └── hello_tests.py
    │   │
    │   └── world/
    │       ├── helpers_tests.py
    │       └── world_tests.py
    │
    ├── .gitignore
    ├── LICENSE
    └── README.md

There’s a bit more to digest here, but as long as you remember that it follows from the previous layout, you will have an easier time following along. I’ll go through the additions and modifications in order, their uses, and the reasons you might want them.

<ul>
<li>
<p><code>bin/</code>: This directory holds any executable files. I’ve adapted this from <a href="http://as.ynchrono.us/2007/12/filesystem-structure-of-python-project_21.html">Jean-Paul Calderone’s classic structure post</a>, and his prescriptions for the use of a <code>bin/</code> directory are still important. The most important point to remember is that your executable shouldn’t have a lot of code, just an import and a call to a <a href="https://realpython.com/python-main-function/">main function</a> in your runner script. If you are using pure Python or don’t have any executable files, you can leave out this directory. </p>
</li>
<li>
<p><code>/docs</code>: With a more advanced application, you’ll want to maintain good documentation of all its parts. I like to put any documentation for internal modules here, which is why you see separate documents for the <code>hello</code> and <code>world</code> packages. If you use <a href="https://realpython.com/documenting-python-code/#documenting-your-python-code-base-using-docstrings">docstrings</a> in your internal modules (and you should!), your whole-module documentation should at the very least give a holistic view of the purpose and function of the module.</p>
</li>
<li>
<p><code>helloworld/</code>: This is similar to <code>helloworld/</code> in the previous structure, but now there are subdirectories. As you add more complexity, you’ll want to use a “divide and conquer” tactic and split out parts of your application logic into more manageable chunks. Remember that the directory name refers to the overall package name, and so the subdirectory names (<code>hello/</code> and <code>world/</code>) should reflect their package names.</p>
</li>
<li>
<p><code>data/</code>: Having this directory is helpful for testing. It’s a central location for any files that your application will ingest or produce. Depending on how you deploy your application, you can keep “production-level” inputs and outputs pointed to this directory, or only use it for internal testing. </p>
</li>
<li>
<p><code>tests/</code>: Here, you can put all your tests—unit tests, execution tests, integration tests, and so on. Feel free to structure this directory in the most convenient way for your testing strategies, import strategies, and more. For a refresher on testing command-line applications with Python, check out my article <a href="https://realpython.com/python-cli-testing/">4 Techniques for Testing Python Command-Line (CLI) Apps</a>.</p>
</li>
</ul>