# Agenda, Week 5: Modules and packages

- Review of the challenge
- Q&A
- Modules
    - What are modules?
    - What do modules contain?
    - `import` and modules
- Writing a module
    - Creating a module file
    - Loading that module
- Modules vs. packages
- Python standard library
- PyPI (Python Package Index)
    - What is it?
    - Downloading things with `pip`
    - Deciding what modules you want to use (and don't want to use)
- `pip` and installing packages from PyPI
- Final questions
- Where do you go from here?
    - What can you do with the information from this course?
    - What can/should you do to improve your Python further?

# Review of the challenge

In [13]:
def count_ips(filename):
    output = {}

    for one_line in open(filename):
        ip_address = one_line.split()[0]

        # if we have seen this IP address before,
        # just increment its value by 1
        if ip_address in output:
            output[ip_address] += 1

        # if this is the first time we're seeing ip_address,
        # add it to the dict with a value of 1
        else:
            output[ip_address] = 1


    return output

ip_address_counts = count_ips('mini-access-log.txt')

for key, value in ip_address_counts.items():
    print(f'{key}:{value}')

67.218.116.165:2
66.249.71.65:3
65.55.106.183:2
66.249.65.12:32
65.55.106.131:2
65.55.106.186:2
74.52.245.146:2
66.249.65.43:3
65.55.207.25:2
65.55.207.94:2
65.55.207.71:1
98.242.170.241:1
66.249.65.38:100
65.55.207.126:2
82.34.9.20:2
65.55.106.155:2
65.55.207.77:2
208.80.193.28:1
89.248.172.58:22
67.195.112.35:16
65.55.207.50:3
65.55.215.75:2


# How to sort anything, my talk from Euro Python 2021

https://www.youtube.com/watch?v=Z3c2LvEJeu0

In [4]:
s = 'abcde fg hijk lmnop qr'

# If I run str.split() on this string, I'll get back a new list of strings

s.split()  # any whitespace (space, \n, \t, \r, \v), any combination, any length

['abcde', 'fg', 'hijk', 'lmnop', 'qr']

In [5]:
# s hasn't changed at all!
s

'abcde fg hijk lmnop qr'

In [7]:
s.split(' ')   # one space character at a time is our field separator

['abcde', 'fg', 'hijk', 'lmnop', 'qr']

In [8]:
s = 'abcde   fg  hijk  lmnop    qr'

In [9]:
s.split()  

['abcde', 'fg', 'hijk', 'lmnop', 'qr']

In [10]:
s.split(' ')

['abcde', '', '', 'fg', '', 'hijk', '', 'lmnop', '', '', '', 'qr']

In [11]:
s.split()[0]   # get a new list based on s, then return the item at index 0 from that list

'abcde'

In [12]:
one_line

NameError: name 'one_line' is not defined

In [15]:
counts = {}

for one_line in open('mini-access-log.txt'):
    ip_address - one_line.split()[0]   # grab the IP address, the item at index 0 in each line's list
    counts[ip_address] += 1

67.218.116.165
66.249.71.65
65.55.106.183
65.55.106.183
66.249.71.65
66.249.71.65
66.249.65.12
66.249.65.12
66.249.65.12
66.249.65.12
66.249.65.12
65.55.106.131
65.55.106.131
66.249.65.12
66.249.65.12
66.249.65.12
66.249.65.12
66.249.65.12
66.249.65.12
66.249.65.12
65.55.106.186
65.55.106.186
66.249.65.12
66.249.65.12
66.249.65.12
74.52.245.146
74.52.245.146
66.249.65.43
66.249.65.43
66.249.65.43
66.249.65.12
66.249.65.12
66.249.65.12
66.249.65.12
66.249.65.12
66.249.65.12
65.55.207.25
65.55.207.25
66.249.65.12
66.249.65.12
66.249.65.12
66.249.65.12
66.249.65.12
66.249.65.12
66.249.65.12
65.55.207.94
65.55.207.94
66.249.65.12
65.55.207.71
66.249.65.12
66.249.65.12
66.249.65.12
98.242.170.241
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38


# DRY -- don't repeat yourself

1. If you have the same line repeated multiple times, then you should use a loop.
2. If you have the same code several times in the same program, then you should use a function.
3. If you have the same code several times across *multiple* programs, then you should use a *library*. In Python, our libraries are called "modules."

# Modules in Python do two things

1. They are our libraries, allowing us to reuse code and thus write more maintainable software.
2. They are our namespaces.  A "namespace" ensures that two parts of a program don't have the same variable name, and thus collide.  When they do, it's called a "namespace collision."

Let's say that I write part of a program, and call my variable `x`.  Let's say that you are collaborating with me, and by a strange freaky accident, you also call your variable `x`.  If we combine our two programs together, what will happen?

In Python? The answer is that there likely won't be a problem, because each separate module is its own namespace. Meaning that each file we work with, each module we work with, has its own, separate set of variables.

My module's `x` is distinct from your module's `x`.  And thus we don't have to worry about collisions.

# To use a module, use the `import` statement

Notice a few things about `import`:

1. It's not a function. Don't use ().
2. The name that comes after the `import` statement is the name of the module variable you want to create.  It's not a string. It's not a filename.
3. After running `import`, the module is loaded into memory, and you can use the named module.

In [16]:
import random

In [17]:
# what is the value of "random"?
type(random)

module

In [22]:
random.randint(0, 100)   # we'll call the randint function in the random module, passing (0, 100)

32

In [23]:
# what other methods (and data) are available to us via the module?

# option 1: use "dir" on the module object, and find what attributes it defines
dir(random)

['BPF',
 'LOG4',
 'NV_MAGICCONST',
 'RECIP_BPF',
 'Random',
 'SG_MAGICCONST',
 'SystemRandom',
 'TWOPI',
 '_ONE',
 '_Sequence',
 '_Set',
 '__all__',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 '_accumulate',
 '_acos',
 '_bisect',
 '_ceil',
 '_cos',
 '_e',
 '_exp',
 '_floor',
 '_index',
 '_inst',
 '_isfinite',
 '_log',
 '_os',
 '_pi',
 '_random',
 '_repeat',
 '_sha512',
 '_sin',
 '_sqrt',
 '_test',
 '_test_generator',
 '_urandom',
 '_warn',
 'betavariate',
 'choice',
 'choices',
 'expovariate',
 'gammavariate',
 'gauss',
 'getrandbits',
 'getstate',
 'lognormvariate',
 'normalvariate',
 'paretovariate',
 'randbytes',
 'randint',
 'random',
 'randrange',
 'sample',
 'seed',
 'setstate',
 'shuffle',
 'triangular',
 'uniform',
 'vonmisesvariate',
 'weibullvariate']

In [24]:
# option 2: in Jupyter, type the module name, ., then tab to expand
random.g

AttributeError: module 'random' has no attribute 'g'

# Exercise: Count lowercase letters

1. The `string` module contains an attribute named `ascii_lowercase`.
2. Ask the user, repeatedly, to enter a string.
    - If the user enters the empty string, then stop asking, and exit from the loop.
3. If the user enters a non-empty string, count how many of these characters are lowercase.  I want you to iterate over each character of each line, checking to see if it's in `string.ascii_lowercase`
4. Print the number of lowercase letters in the user's string.

In [25]:
import string

while True:
    s = input('Enter a string: ').strip()
    
    if s == '':   # empty string? stop asking!
        break
        
    total = 0 
    for one_character in s:
        if one_character in string.ascii_lowercase:
            total += 1
            
    print(f'There are {total} lowercase letters in {s}.')
        

Enter a string: hello
There are 5 lowercase letters in hello.
Enter a string: Hello
There are 4 lowercase letters in Hello.
Enter a string: h e l l o !
There are 5 lowercase letters in h e l l o !.
Enter a string: HELLO
There are 0 lowercase letters in HELLO.
Enter a string: 


In [27]:
# we can solve the earlier challenge in another way, too
# we can use the collections.Counter module to count IP addresses

import collections

def count_ips(filename):
    output = collections.Counter()   # create a new Counter object

    for one_line in open(filename):
        ip_address = one_line.split()[0]

        output[ip_address] += 1

    return output

ip_address_counts = count_ips('mini-access-log.txt')

for key, value in ip_address_counts.items():
    print(f'{key}:{value}')

67.218.116.165:2
66.249.71.65:3
65.55.106.183:2
66.249.65.12:32
65.55.106.131:2
65.55.106.186:2
74.52.245.146:2
66.249.65.43:3
65.55.207.25:2
65.55.207.94:2
65.55.207.71:1
98.242.170.241:1
66.249.65.38:100
65.55.207.126:2
82.34.9.20:2
65.55.106.155:2
65.55.207.77:2
208.80.193.28:1
89.248.172.58:22
67.195.112.35:16
65.55.207.50:3
65.55.215.75:2


In [30]:
# Counters are different in a number of ways from dicts
# one way is that they implement the most_common method:

ip_address_counts.most_common()   # sorted from most common to least common

[('66.249.65.38', 100),
 ('66.249.65.12', 32),
 ('89.248.172.58', 22),
 ('67.195.112.35', 16),
 ('66.249.71.65', 3),
 ('66.249.65.43', 3),
 ('65.55.207.50', 3),
 ('67.218.116.165', 2),
 ('65.55.106.183', 2),
 ('65.55.106.131', 2),
 ('65.55.106.186', 2),
 ('74.52.245.146', 2),
 ('65.55.207.25', 2),
 ('65.55.207.94', 2),
 ('65.55.207.126', 2),
 ('82.34.9.20', 2),
 ('65.55.106.155', 2),
 ('65.55.207.77', 2),
 ('65.55.215.75', 2),
 ('65.55.207.71', 1),
 ('98.242.170.241', 1),
 ('208.80.193.28', 1)]

In [31]:
ip_address_counts.most_common(5)   # show me the 5 most common IP addresses

[('66.249.65.38', 100),
 ('66.249.65.12', 32),
 ('89.248.172.58', 22),
 ('67.195.112.35', 16),
 ('66.249.71.65', 3)]

In [33]:
# Counter objects are special-purpose dicts
# anything a dict can do, they can do.  This is known as "inheritance."

ip_address_counts.keys()

dict_keys(['67.218.116.165', '66.249.71.65', '65.55.106.183', '66.249.65.12', '65.55.106.131', '65.55.106.186', '74.52.245.146', '66.249.65.43', '65.55.207.25', '65.55.207.94', '65.55.207.71', '98.242.170.241', '66.249.65.38', '65.55.207.126', '82.34.9.20', '65.55.106.155', '65.55.207.77', '208.80.193.28', '89.248.172.58', '67.195.112.35', '65.55.207.50', '65.55.215.75'])

In [34]:
ip_address_counts.values()

dict_values([2, 3, 2, 32, 2, 2, 2, 3, 2, 2, 1, 1, 100, 2, 2, 2, 2, 1, 22, 16, 3, 2])

In [35]:
sum(ip_address_counts.values())

206

# Next up

1. Alternative forms of `import`
2. Writing a module
3. Modules vs. packages



In [36]:
# If I want to use random.randint, I cannot just say "randint"

random.randint(0, 100)

9

In [38]:
randint(0, 100)   #randint doesn't exist as a variable -- it can only be named after a .

NameError: name 'randint' is not defined

In [39]:
# we can make it possible to use "randint" directly as a variable, rather than
# via "random", with the "from .. import" syntax

from random import randint

# this :
# (1) imports the random module, if needed
# (2) it defines randint as a global variable
# (3) it does *not* define random  (but if random was previously defined, it's not erased)

In [40]:
randint(0, 100)

14

In [42]:
# if I want random and randint both to be defined (with randint being random.randint), I need
# to execute two separate lines:

import random                # this defines the "random" variable
from random import randint   # this defines the "randint" variable

In [43]:
# what if I want to import a module, but I want to define a different variable name?
# I can use the "import .. as" syntax

import random as r    # this loads the random module, but defines the r variable to refer to it

In [44]:
r.randint(0, 100)

30

In [45]:
# if I want to import a name from a module with an alias, I can do that, as well
from random import randint as ri

In [46]:
ri(0, 100)

86

# Four ways to import

- `import MODNAME`
- `import MODNAME as ALIAS`
- `from MODNAME import NAME`
- `from MODNAME import NAME as ALIAS`

These all do the same two things:

- Create a module object
- Define a variable.  In the first two cases, the variable refers to the module object. In the second two cases, the variable refers to an attribute on that module object.

# There is also a fifth way... which you should not use!

`from MODNAME import *`

Please **NEVER EVER EVER** use this!

- it imports the module into memory
- for each variable defined in the module, it defines a new global variable in your current namespace

If the module defined variables, `a`, `b`, and `c`, then after running `from MODNAME import *`, you will also have variables named `a`, `b`, and `c`, all referring to the module's values of the same names.

What's wrong here is that you don't know what variables will be defined, or how many there will be, or if there are clashes with variables/functions you've defined.

# When I say `import`, where does Python look?

The variable `sys.path` is a list of strings, indicating in which directories Python should look for module files.

- If you say `import string`, then Python looks for a file called `string.py` in each directory of `sys.path`.
- If it finds `string.py`, then that module is loaded, and Python stops looking.
- If it doesn't find `string.py`, then it raises a `ModuleNotFoundError`.

In [47]:
import sys    # create the variable that refers to this module

sys.path

['/Users/reuven/Courses/Current/oreilly-2022-summer-first-steps',
 '/usr/local/Cellar/python@3.10/3.10.5/Frameworks/Python.framework/Versions/3.10/lib/python310.zip',
 '/usr/local/Cellar/python@3.10/3.10.5/Frameworks/Python.framework/Versions/3.10/lib/python3.10',
 '/usr/local/Cellar/python@3.10/3.10.5/Frameworks/Python.framework/Versions/3.10/lib/python3.10/lib-dynload',
 '',
 '/usr/local/lib/python3.10/site-packages']

In [48]:
import random

In [49]:
random

<module 'random' from '/usr/local/Cellar/python@3.10/3.10.5/Frameworks/Python.framework/Versions/3.10/lib/python3.10/random.py'>

In [50]:
string

<module 'string' from '/usr/local/Cellar/python@3.10/3.10.5/Frameworks/Python.framework/Versions/3.10/lib/python3.10/string.py'>

In [51]:
collections

<module 'collections' from '/usr/local/Cellar/python@3.10/3.10.5/Frameworks/Python.framework/Versions/3.10/lib/python3.10/collections/__init__.py'>

In [52]:
!ls -l mymod.py

-rw-r--r-- 1 reuven staff 0 Jul 14 18:19 mymod.py


In [53]:
# can I import this module?
import mymod

In [54]:
mymod

<module 'mymod' from '/Users/reuven/Courses/Current/oreilly-2022-summer-first-steps/mymod.py'>

In [55]:
type(mymod)

module

In [58]:
# what names are defined on this module?
# many, even for an empty module -- the "dunder" names (double underscore) are defined,
# because Python's module-loading system put them there.

In [59]:
dir(mymod)

['__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__']

In [60]:
mymod.__file__

'/Users/reuven/Courses/Current/oreilly-2022-summer-first-steps/mymod.py'

In [62]:
mymod.__name__

'mymod'

In [63]:
# let's import the module again!
import mymod

In [65]:
# where are all of the names (x , y, and hello) that I just defined?

dir(mymod)

['__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__']

# Python only loads modules once

In a given Python session, `import` will only actually load a module the first time you ask it to do so. Every subsequent time, `import` will define the variable, but not load the module.

- Solution 1: Restart Python and/or Jupyter
- Solution 2: Load `importlib` and use `importlib.reload`

`importlib` is a module (of course) that gives us access to the module-handling internals in Python.

In [66]:
import importlib
importlib.reload(mymod)  # reload the "mymod" module again

<module 'mymod' from '/Users/reuven/Courses/Current/oreilly-2022-summer-first-steps/mymod.py'>

In [67]:
dir(mymod)

['__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 'hello',
 'x',
 'y']

In [68]:
mymod.x

100

In [69]:
mymod.y

[10, 20, 30]

In [70]:
mymod.hello('world')

'Hello, world!'

In [72]:
# every module's __builtins__ just refers to the core builtins module in every Python system
string.__builtins__ is mymod.__builtins__

True

# Exercise: Writing a module

1. Create a new module, `vowels.py`.
2. In this module, write a function, `count_vowels`, which takes a string and retuns an integer.
    - The string will be text
    - The integer will be the number of vowels in that text.
3. This means that your function can be accessed by `import vowels` and then running `vowels.count_vowels` on a string.
4. From outside of the module (i.e., in Jupyter or a separate program), `import vowels` and run `vowels.count_vowels`.

Reminder: `import` only works once within a given Python session for a certain module. So if you update your module, you'll need to either restart Jupyter's kernel or use `importlib.reload` to reload the module object you already created.

In [73]:
import vowels

user_string = input('Enter a string: ').strip()

count = vowels.count_vowels(user_string)

print(f'Your string contains {count} vowels.')

Enter a string: this is a fantastic test, right?
Your string contains 8 vowels.


In [74]:
dir(vowels)

['__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 'count_vowels']

If all global variables *inside* of my module file are then accessible as attributes on the module object (e.g., `count_vowels` is a global function in `vowels.py`, but is found as `vowels.count_vowels` when we import `vowels`), is the opposite true?

For example, if `__name__` is an attribute on the module `vowels`, then is `__name__` available as a globl variable inside of our module file?

In [75]:
importlib.reload(mymod)   # reload the mymod module, executing every line in the file

Hello from mymod!
Goodbye from mymod!


<module 'mymod' from '/Users/reuven/Courses/Current/oreilly-2022-summer-first-steps/mymod.py'>

In [77]:
importlib.reload(mymod)  # here, I replaced the string 'mymod' with __name__... and it worked!

Hello from mymod!
Goodbye from mymod!


<module 'mymod' from '/Users/reuven/Courses/Current/oreilly-2022-summer-first-steps/mymod.py'>

# The special `__name__` variable

The variable `__name__` is always defined in Python. It represents the namespace in which you're currently operating.  

It can either contain the string `'__main__'` or it can contain the name of the current module.

1. If our module was imported, then `__name__` containd `'__main__'`.
2. But if our module was run as a program from the command line, or is the first program to execute, then `__name__` is defined to be the string `'__main__'`.

By checking whether `_

In [78]:
importlib.reload(mymod)

mymod.x

Hello from mymod!
Goodbye from mymod!


100

In [79]:
mymod.y

[10, 20, 30]

In [80]:
mymod.hello('world')

'Hello, world!'