# Agenda: Week 5 (Modules and packages)

1. Recap and Q&A
2. Intro to modules
3. What do modules contain?
4. The different forms of `import`
5. Developing a module
6. Python's standard library
7. Modules vs. packages
8. PyPI and third-party modules
9. Installing things with `pip`
10. What's next?

# Local and global variables

Remember that outside of a function definition, all variables are *global*. 

Inside of a function:

- If we *set* a variable, then that variable is considered *local*.
- If we ask for a variable value, then Python:
    - First looks inside of the function for a local variable of that name
    - If it doesn't find a local variable of that name, then it looks for a global of that name
    
LEGB -- local, enclosing, global, and builtin -- is the way that Python searches for variables.

If I'm inside of a function, we might very well encounter a situation where there is a local variable with the same name as a global variable. However, based on what we've seen so far, there is no way for us to assign to the global variable of that same name. Once there's a local variable "shadowing" the global, we're out of luck.

There are two ways to get around this:

1. Use the `global` declaration in the function. This gets rid of the local variable; all references to that name are now global.
2. Import the `__main__` module from the Python standard library. That gives you access to all of the global variables via that namespace/module.

In [1]:
# example 1, using nothing -- default, problematic situation

x = 100

def myfunc():
    x = 200
    print(f'In myfunc, the value of x is {x}')
    
print(f'Before, x = {x}')    
myfunc()
print(f'After, x = {x}')

Before, x = 100
In myfunc, the value of x is 200
After, x = 100


In [2]:
# example 2, using global -- this means that assigning to x on line 6 does *NOT* create
# a local variable. All assignments to x then go to the global

# please *NEVER* use "global" in your programs unless you are absolutely, positively desperate

x = 100

def myfunc():
    global x    # this tells Python not to create a local variable for the function
    x = 200
    print(f'In myfunc, the value of x is {x}')
    
print(f'Before, x = {x}')    
myfunc()
print(f'After, x = {x}')

Before, x = 100
In myfunc, the value of x is 200
After, x = 200


In [4]:
# example 3, using __main__
# if you need to assign to a global variable, this is my preferred way of doing it

import __main__
x = 100

def myfunc():
    x = 300            # assigns to the local variable
    __main__.x = 200   # assigns to the global variable via the __main__ namespace
    print(f'In myfunc, the value of x is {x}')   # this retrieves the local value
    
print(f'Before, x = {x}')    
myfunc()
print(f'After, x = {x}')

Before, x = 100
In myfunc, the value of x is 300
After, x = 200


# Modules -- what are they good for?

We've talked about the "DRY" (don't repeat yourself) rule in programming.

It cleans up our code, making it easier to (a) write, (b) maintain, and (c) think about.

1. If you have the same code several lines in a row, then you should replace that code with a loop.
2. If you have the same code in several different parts of your program, you should write a function and then invoke the function in all of those places.
3. If you have the same code in several different programs, I can write the functionality once, and refer to it whenever I need it. This is known in the programming world as a "library," and in Python, libraries are implemented using "modules."

But modules do more than that in Python: They also provide us with *namespaces*, ensuring that we don't have "namespace collisions" -- when two or more parts of a program use the same variable name, and then end up assigning their own values to the other part's variables.

The way that modules handle this is by treating their variables as "attributes," names that come after `.`. If I collaborate with someone else on a Python program, then if I write my things in a module, and they write their things in a module, we cannot have a namespace collision, because the names will be in separate modules!

In [5]:
# we've already used a module -- let's see how we can use it!

# we use "import" to load a module
# import:
# (a) is not a function! Don't use parentheses!
# (b) the argument that we give it is not a string! 
# (c) the argument is not a filename! 

# the argument to import is actually the name of the module variable we want to define

# import 
# (a) creates a module object and
# (b) assigns that module object to a variable

import random

In [6]:
# what is random?
type(random)

module

# What does a module contain?

Python code, most typically:

- Function definitions
- Data definitions
- Class definitions, for new types of data we want to work with

This means that when I say `import random`, I have access to all of the functions that the author wrote for the `random` module. Over time, if they add new functionality there, I'll benefit; all I have to do is say `import random`.

In [8]:
# if I want to use functionality from random, it'll all be as attributes under "random."

# get a random integer from 0-100

# we ask Python to go into the "random" namespace
# execute the function "randint" in that namespace, passing it (0, 100) as arguments.
random.randint(0, 100)

19

In [9]:
# in Jupyter, we can see the definition of a function by putting ?? after its name
random.randint??

# Exercise: Guessing game

1. Choose a random integer (using `random.randint` from 0-100).
2. Ask the user to enter a guess.
3. Print whether the guess is right, too high, or too low.
    - If the user guessed correctly, then exit the program
    - Otherwise, have them try again.
    
Example:

    Guess a number: 50
    Too low!
    Guess a number: 90
    Too high!
    Guess a number: 80
    Too low!
    Guess a number: 86
    You got it!
    

In [11]:
import random    # I must import the module

number = random.randint(0, 100)    # choose a random number, and put it in "number"

while True:
    guess = input('Guess: ').strip()
    
    if not guess.isdigit():
        print(f'Not numeric! Try again!')
        continue
        
    n = int(guess)
    
    if n == number:
        print('You got it!')
        break
        
    elif n < number:
        print('Too low!')
        
    else:
        print('Too high!')

Guess: 50
Too high!
Guess: asdfafa
Not numeric! Try again!
Guess: 25
Too high!
Guess: 12
Too high!
Guess: 6
Too high!
Guess: 3
You got it!


# What else does `random` contain?

`random` is a module, which makes it a container for other names (and thus definitions). How can we find out what else it contains?

1. We use the builtin `dir` function on the `random` module, and get a list of names available in `random`.
2. In Jupyter (or a similar system), we can get help on the module by invoking `help(random)`.
3. Go to the site for Python documentation, assuming that we're using something from the standard library.

In [12]:
# use dir

dir(random)

['BPF',
 'LOG4',
 'NV_MAGICCONST',
 'RECIP_BPF',
 'Random',
 'SG_MAGICCONST',
 'SystemRandom',
 'TWOPI',
 '_ONE',
 '_Sequence',
 '_Set',
 '__all__',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 '_accumulate',
 '_acos',
 '_bisect',
 '_ceil',
 '_cos',
 '_e',
 '_exp',
 '_floor',
 '_index',
 '_inst',
 '_isfinite',
 '_log',
 '_os',
 '_pi',
 '_random',
 '_repeat',
 '_sha512',
 '_sin',
 '_sqrt',
 '_test',
 '_test_generator',
 '_urandom',
 '_warn',
 'betavariate',
 'choice',
 'choices',
 'expovariate',
 'gammavariate',
 'gauss',
 'getrandbits',
 'getstate',
 'lognormvariate',
 'normalvariate',
 'paretovariate',
 'randbytes',
 'randint',
 'random',
 'randrange',
 'sample',
 'seed',
 'setstate',
 'shuffle',
 'triangular',
 'uniform',
 'vonmisesvariate',
 'weibullvariate']

In [13]:
# use "help(random)"

help(random)

Help on module random:

NAME
    random - Random variable generators.

DESCRIPTION
        bytes
        -----
               uniform bytes (values between 0 and 255)
    
        integers
        --------
               uniform within range
    
        sequences
        ---------
               pick random element
               pick random sample
               pick weighted random sample
               generate random permutation
    
        distributions on the real line:
        ------------------------------
               uniform
               triangular
               normal (Gaussian)
               lognormal
               negative exponential
               gamma
               beta
               pareto
               Weibull
    
        distributions on the circle (angles 0 to 2pi)
        ---------------------------------------------
               circular uniform
               von Mises
    
    General notes on the underlying Mersenne Twister core generator:
    


# Different versions of `import`

1. `import MODNAME` -- this imports `MODNAME` into Python, and defines that variable.
2. `from MODNAME import NAME` -- this imports `MODNAME` into memory, but doesn't define it as a variable! The only variable to be defined is `NAME`. This way, we can access a function directly, without going through the module's namespace.
3. `import MODNAME as ALIAS` -- this imports `MODNAME` into Python, and defines a variable named `ALIAS` which refers to it.  Everyone loads `numpy as np` and `pandas as pd`.
4. `from MODNAME import NAME as ALIAS` -- this imports `MODNAME` into Python, but only defines a variable named `ALIAS`, which refers back to `NAME` in `MODNAME`.

In [15]:
# In many programming languages, we tell the language what file we want to import
# Not so in Python! We give it a variable name, and it figures out the filename to access

# If we say "import random", it looks for random.py.

# where does it look? Typically, it looks in the variable sys.path, a list of strings
# where Python searches for modules.

import sys
sys.path

['/Users/reuven/Courses/Current/oreilly-2023-05May-python',
 '/usr/local/Cellar/python@3.11/3.11.3/Frameworks/Python.framework/Versions/3.11/lib/python311.zip',
 '/usr/local/Cellar/python@3.11/3.11.3/Frameworks/Python.framework/Versions/3.11/lib/python3.11',
 '/usr/local/Cellar/python@3.11/3.11.3/Frameworks/Python.framework/Versions/3.11/lib/python3.11/lib-dynload',
 '',
 '/usr/local/lib/python3.11/site-packages',
 '/usr/local/Cellar/pybind11/2.10.4/libexec/lib/python3.11/site-packages',
 '/usr/local/opt/python-tk@3.11/libexec']

In [16]:
# if I've used "import random", then I still need to say "random.randint" to call the randint function

random.randint(0, 100)

32

In [17]:
# what if I just want to say "randint"?
randint(0, 100)

NameError: name 'randint' is not defined

In [18]:
# I can say this:

from random import randint   # now I can call "randint" by itself, not via "random.randint"



In [19]:
randint(0, 100)

51

# Good and bad with `from .. import`

Good:

1. If you have a long module name, then it's annoying to write it out all of the time. This makes it easier to write (and read) code.
2. Sometimes, you're only interested in one name in a module. This lets you just get that name, rather than the whole thing.

Bad:

1. By removing the module name, you can introduce ambiguity into your code. Where did that name come from? If you use `from .. import`, you won't necessarily know.
2. Using `from .. import` still loads the module into memory. You are not saving any memory whatsoever when you use it instead of `import`.

# Another (bad!) way to use `import`

I can say

    from MODNAME import *
    
This defines all of the names defined in the module as variables in our current namespace. This is very bad, and very dangerous.    

# Exercise: Validate strings

1. We're going to check that a string only contains valid characters. By "valid," I mean lowercase English letters.
2. Use the `string` module, and the `string.ascii_lowercase` string in your validation.
3. Ask the user to enter a string.
4. If the string only contains characters in our validation string (`string.ascii_lowercase`), then say "OK" to the user. Otherwise, print "NOT OK."
5. What happens if you use another string in the `string` module for your validation?

In [20]:
import string    # this seems a bit confusing... don't we already have strings?  Yes, but they are "str", and this is "string"

dir(string)

['Formatter',
 'Template',
 '_ChainMap',
 '__all__',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 '_re',
 '_sentinel_dict',
 '_string',
 'ascii_letters',
 'ascii_lowercase',
 'ascii_uppercase',
 'capwords',
 'digits',
 'hexdigits',
 'octdigits',
 'printable',
 'punctuation',
 'whitespace']

In [23]:
import string
is_valid = True

s = input('Enter a string: ').strip()

for one_character in s:    # go through each character
    if one_character not in string.ascii_lowercase: 
        is_valid = False
        break
        
if is_valid:
    print(f'Yes, {s} is valid!')
else:
    print(f'No, {s} is invalid')

Enter a string: goodbye forever
No, goodbye forever is invalid


In [24]:
# now, with that code, I can swap out string.ascii_lowercase, and swap in any other variable

# Next up

1. What's in a module?
2. How can we write a module?
3. How does a module really get loaded?

# What's in a module?

Python code.

Modules are simply files containing Python code in them. Usually that code will be variable definitions, function definitions, and class definitions. 

I've created a file, `mymod.py`, in the same directory as Jupyter. When I say `import mymod`, Python will look in the current directory for `mymod.py`. It'll find that module, and load it.

However, the module file is empty. What will happen?

In [25]:
import mymod

In [26]:
# what does mymod contain? What names are defined in it?

# "dunders" == "double underscore" are special names that Python defines and uses for its own
# internal housekeeping, or special names that we can define, and that Python will use in special ways.

dir(mymod)

['__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__']

In [27]:
mymod.__file__

'/Users/reuven/Courses/Current/oreilly-2023-05May-python/mymod.py'

In [28]:
mymod.__name__  

'mymod'

In [31]:
import mymod  # let's load it again!  ... except that this isn't enough

In [30]:
dir(mymod)

['__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__']

# You can only `import` once

The first time that we use `import` on a module, Python imports it.

Subsequent times, Python doesn't import the module, but relies on a cache in memory to load it.

This is normally not an issue, but it *is* an issue in Jupyter, where Python keeps running the same session.

To get this to work, we'll use the `importlib` module which supplies a bunch of `import`-related utilities, including the `reload` function.

In [32]:
from importlib import reload   # this function will let us reload modules
reload(mymod)

<module 'mymod' from '/Users/reuven/Courses/Current/oreilly-2023-05May-python/mymod.py'>

In [33]:
# what names do we see now?
dir(mymod)

['__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 'hello',
 'x',
 'y']

In [34]:
# I can access them!

mymod.x   # we must name the module, ., then variable

100

In [35]:
mymod.y

[10, 20, 30]

In [36]:
mymod.hello('world')

'Hello, world, from mymod!'

# Exercise: `count_vowels`

1. Create a module, `count_vowels.py`, in which you have a function, `count_vowels` that takes a string.
2. The function returns an integer, the number of vowels (a, e, i, o, u) in the string.
3. `import` the module and call the function to check it.

In [37]:
import count_vowels   # this defines count_vowels as a module, via which I can get to its function

In [38]:
count_vowels.count_vowels('hello out there')

6

# Summarizing what I did

1. I created a file (using Open -> Text file from the main Jupyter page), called `count_vowels.py`.
2. In that file, I defined a function, called `count_vowels`.
3. In my notebook, I said `import count_vowels`, which looked for `count_vowels.py`, found it in the current directory, and loaded the module into memory.
4. At that point, all of the functions and variables defined in `count_vowels.py` are available as `count_vowels.NAME`. That includes the function `count_vowels`, which ends up being a weird-looking `count_vowels.count_vowels` function.

# Where can module files be located?

We've seen that if a module file is (a) in the same directory as a program or (b) somewhere in `sys.path`, then we're fine with `import`.

But where can/should I put module files? Can I load them explicitly with a pathname? And how can I change `sys.path`?

You cannot load them with an explicit pathname. Which means that you need to change `sys.path` somehow:

- You can use `list.append`, but I would not recommend that.
- You can set the `PYTHONPATH` environment variable to tell Python where to look for modules. Whatever is there in that variable is added to `sys.path`.

# What's going on when I `import` a module?

Once Python has found a module file (i.e., a file ending with `.py`), how does that get turned into definitions on our module object?

If our module contains assignment (with `=`) and variable definition (with `def`), that means the assignment lines need to execute, and the `def` line needs to execute.

That means ... when we `import` a module, the module's file is executed?

The answer is **YES**! When you `import`, you're running a module, from start to finish.

In [39]:
reload(mymod)

Hello from mymod!
Goodbye from mymod!


<module 'mymod' from '/Users/reuven/Courses/Current/oreilly-2023-05May-python/mymod.py'>

In [40]:
dir(mymod)

['__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 'hello',
 'x',
 'y']

# Global variables -> attributes

Any global variable defined in our module is turned into an attribute on the module object.

- Global variable `x` becomes `mymod.x`
- Global variable `y` becomes `mymod.y`
- Function `hello` becomes `mymod.hello`.

This is a good thing, putting into action the namespaces that we discussed earlier.

Maybe... it works the other way, too?  Maybe the "dunder" names that are defined here, that we see with `dir`, are also global variables in the module object?

Can we get `__name__`, the name of the module, inside of our module object?

YES WE CAN!

In [41]:
reload(mymod)

Hello from mymod!
Goodbye from mymod!


<module 'mymod' from '/Users/reuven/Courses/Current/oreilly-2023-05May-python/mymod.py'>

# `__name__` in modules

`__name__` can have one of two values:

- Normally, it is set to a string, the name of the module. In `mymod.py`, the value will be the string `mymod`.
- If our module is executed as a program (not imported as a module), then `__name__` contains a special string value, `'__main__'`.

The first file to execute in any Python program is always going to have `__name__` equal to `'__main__'`.

Because modules are executed when they're imported, and because we can find out whether a module was imported or executed by examining `__name__`, we very very very often see the following at the bottom of a Python module:

```python
if __name__ == '__main__':
    # do things here for a standalone program
```

In other words, we can have our module do something special if, and only if, the module is run as a program.

In [42]:
reload(mymod)

Hello from mymod!


<module 'mymod' from '/Users/reuven/Courses/Current/oreilly-2023-05May-python/mymod.py'>

# Exercise: Menu

1. Create a module, `menu.py`, which will contain functionality that shows the user a list of options and lets them choose one option.
2. In that module, define a function, `get_user_choice`. This function should take a list of strings. 
    - It shows the strings to the user
    - It asks the user, repeatedly, to enter one of them.
    - If the user enters a legit string, then it returns that string.
    - If the user enters a string that's not an OK choice, it forces the user to try again.
    
I should be able to say:

```python
import menu
s = menu.get_user_choice(['a', 'b', 'c'])
```

In [43]:
import menu
s = menu.get_user_choice(['a', 'b', 'c'])


Enter one of ['a', 'b', 'c']: b


In [44]:
s

'b'

In [45]:
s = menu.get_user_choice(['a', 'b', 'c'])


Enter one of ['a', 'b', 'c']: x
Bad choice; try again
Enter one of ['a', 'b', 'c']: y
Bad choice; try again
Enter one of ['a', 'b', 'c']: z
Bad choice; try again
Enter one of ['a', 'b', 'c']: A
Bad choice; try again
Enter one of ['a', 'b', 'c']: a


# Next up

1. Python standard library
2. Modules and packages
3. PyPI, etc.

# 