# Agenda: Modules and packages

0. Q&A
1. Challenge
2. Modules -- what are they?
3. Different forms of `import`
4. Developing our own module
5. Python standard library
6. Modules vs. packages
7. PyPI and `pip`
8. Q&A - AMA -- what's next?

In [2]:
def count_ips(filename):
    output = {}

    for one_line in open(filename):
        ip_address = one_line.split()[0]

        if ip_address in output:
            output[ip_address] += 1  # seen before? add 1
        else:
            output[ip_address] = 1   # first time? set to 1

    return output

print(count_ips('logfile.txt'))

{'67.218.116.165': 2, '66.249.71.65': 3, '65.55.106.183': 2, '66.249.65.12': 32, '65.55.106.131': 2, '65.55.106.186': 2, '74.52.245.146': 2, '66.249.65.43': 3, '65.55.207.25': 2, '65.55.207.94': 2, '65.55.207.71': 1, '98.242.170.241': 1, '66.249.65.38': 100, '65.55.207.126': 2, '82.34.9.20': 2, '65.55.106.155': 2, '65.55.207.77': 2, '208.80.193.28': 1, '89.248.172.58': 22, '67.195.112.35': 16, '65.55.207.50': 3, '65.55.215.75': 2}


In [4]:
for one_line in open('logfile.txt'):
    pass   # do nothing

In [5]:
one_line

'66.249.65.38 - - [31/Jan/2010:21:08:00 +0200] "GET /browse/one_node/1892 HTTP/1.1" 200 1296 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"\n'

In [6]:
one_line[:12]

'66.249.65.38'

In [7]:
one_line[13:14]

'-'

# DRY -- the "don't repeat yourself" rule

1. If we have several lines in a row that repeat themselves, we can "DRY up" that code with a loop.
2. If we have code that repeats in several places across a program, we can "DRY up" that code with a function.
3. If we have code that repeats across several different programs, we can "DRY up" the code with a *library*.

Every programming language supports libraries. This allows us to write functions and data once, and then access those functions/data from numerous programs:

- Dictionary with the months of the year (names + numbers)
- Function for logging into a system with a username and password
- Function that retrieves the latest stock price, given a symbol
- Function that reads from a file and returns the longest word

In Python, we call our libraries "modules." A module contains Python data + functions. But it does more than that. It's also a *namespace*, meaning that it walls off its variables from other variables you might define.

Imagine that you write a program with a function `hello`, and then you load a module that also defines a function `hello`. You don't want to have a "namespace collision," where it's unclear which `hello` is now defined. By putting any definitions in a namespace, you avoid this sort of problem. You can think of namespaces as last names, or surnames, for your variables. 



# How do we use modules?

In Python, we load modules using the `import` statement. It looks a bit weird, but it's one of the most common things to put in a Python program.

Some things to consider about `import`:

- It's not a function. Don't use parentheses with it. You write `import`, a space, and then the module you want to import.
- In other languages, you often pass the name of the library you want to load as a string, in quotes. Not so in Python! Here, the name of the module you give is actually the variable name you want to define.

In [8]:
# let's say I want a random integer
# I can use the "random" module for that, and the "randint" function in that module

import random

# after this line runs, "random" is defined as a variable

type(random)

module

In [9]:
# if I want to use a function defined in the random module, I say random.FUNCNAME()

random.randint(0, 100)   # this returns a single random int in the range 0-100

41

There are a bunch of different forms of `import` that we can use:

1. The standard form, where we say `import MODULENAME`. This defines `MODULENAME` as a global variable, a module object whose attributes are the functions and values defined in the module.
2. To define a single name from a module, use `from MODULENAME import NAME`. Note that this does *not* define `MODULENAME` as a global variable, but it does define `NAME`.
3. To load a module but define a variable with a different name, use `import MODULENAME as ALIAS`.
4. To define a single name from a module, but alias it to another name when you load it, use `from MODULENAME import NAME as ALIAS`.
5. To define every name in a module as a global variable in your program, use `from MODULENAME import *`. **I beg you never to use use this!**

In [11]:
# what happens if I tire of saying random.randint? What if I just want to say randint?
# right now, randint doesn't exist as a variable. It exists as an attribute on the random module we loaded

randint(0, 100)

NameError: name 'randint' is not defined

In [12]:
# there are many times that we might be using a function so often that we tire of saying both
# the module name and the function name. In such cases, we want the function to be loaded as a variable,
# rather than the module

# for that, we have this syntax:

from random import randint

# the above still loads the entire random module into memory
# the above does *not* define random as a variable
# but it *does* define "randint" as a variable (function) name that we can use

In [13]:
randint(0, 100)

82

In [14]:
# are there other options?
# if the module name is long, hard to spell, or just annoying, you can load the module
# and give it an alias, an alternate name

# this is *very* common in the world of data analysis, where everyone calls NumPy np and Pandas pd

import random as r     # this loads the module, but doesn't define "random" as a variable. It defines "r" instead

In [15]:
r.randint(0, 100)

9

In [16]:
# maybe there's already another "randint" that I don't want to clobber
# maybe I just want a shorter alias

from random import randint as ri     # now, randint won't be defined -- ri will

In [17]:
ri(0, 100)

53

In [19]:
# I can define multiple aliases within a single module, loading these two names

from random import randint as ri, choice as ch

# Exercise: Number guessing game

1. Use the `random.randint` function to choose a random integer between 0 and 100.
2. Repeatedly ask the user to guess the number.
    - If the user gets its right, then print "You got it!" and exit.
    - Otherwise, tell the user that they're too high or too low, and let them try again.
    - If they enter a non-numeric value, scold them and let them try again.
  
Example:

    Guess the number: 50
    Too low!
    Guess the number: 75
    Too high!
    Guess the number: 70
    Too low!
    Guess the number: 72
    You got it!
    

In [None]:
import random

number = random.randint(0, 100)  

while True:
    s = input('Guess the number: ').strip()

    if not s.isdigit():
        print(f'{s} is not numeric; try again')
        continue

    guess = int(s)    # get an integer based on the user's input

    if guess == number:
        print('You got it!')
        break
    elif guess < number:
        print('Too low')
    else:
        print('Too high')
        
        

# How can you find the names defined in a module?

1. Read the module's documentation. If it comes with Python, it'll be at https://docs.python.org. If it's on PyPI, then it's at https://pypi.org .
2. Use the `help` function in Jupyter on the module, as in `help(random)`. That'll display the docstring, the documentation for users, including a list of names.
3. If you're in an IDE such as PyCharm or VSCode, then hovering over a module name will often display its documentation, including a list of names.
4. Use the `dir` function on the module in Jupyter, and get a list of strings -- the names defined on that module object. 

In [21]:
dir(random)

['BPF',
 'LOG4',
 'NV_MAGICCONST',
 'RECIP_BPF',
 'Random',
 'SG_MAGICCONST',
 'SystemRandom',
 'TWOPI',
 '_ONE',
 '_Sequence',
 '_Set',
 '__all__',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 '_accumulate',
 '_acos',
 '_bisect',
 '_ceil',
 '_cos',
 '_e',
 '_exp',
 '_floor',
 '_index',
 '_inst',
 '_isfinite',
 '_log',
 '_os',
 '_pi',
 '_random',
 '_repeat',
 '_sha512',
 '_sin',
 '_sqrt',
 '_test',
 '_test_generator',
 '_urandom',
 '_warn',
 'betavariate',
 'choice',
 'choices',
 'expovariate',
 'gammavariate',
 'gauss',
 'getrandbits',
 'getstate',
 'lognormvariate',
 'normalvariate',
 'paretovariate',
 'randbytes',
 'randint',
 'random',
 'randrange',
 'sample',
 'seed',
 'setstate',
 'shuffle',
 'triangular',
 'uniform',
 'vonmisesvariate',
 'weibullvariate']

In [22]:
# you can always run help on any one of these

help(random.triangular)

Help on method triangular in module random:

triangular(low=0.0, high=1.0, mode=None) method of random.Random instance
    Triangular distribution.
    
    Continuous distribution bounded by given lower and upper limits,
    and having a given mode value in-between.
    
    http://en.wikipedia.org/wiki/Triangular_distribution



# Next up

1. Developing our own module
2. What happens inside of a module file?
3. Python's standard library

In [23]:
# where are modules getting loaded from?

random   # ask the module to show me its printed representation

<module 'random' from '/usr/local/Cellar/python@3.11/3.11.7/Frameworks/Python.framework/Versions/3.11/lib/python3.11/random.py'>

In [24]:
# where can these files live?

import sys     # load the Python runtime system
sys.path       # list of strings, directory names + zipfiles, where Python looks for modules

['/Users/reuven/Courses/Current/OReilly-2023-12December-python',
 '/usr/local/Cellar/python@3.11/3.11.7/Frameworks/Python.framework/Versions/3.11/lib/python311.zip',
 '/usr/local/Cellar/python@3.11/3.11.7/Frameworks/Python.framework/Versions/3.11/lib/python3.11',
 '/usr/local/Cellar/python@3.11/3.11.7/Frameworks/Python.framework/Versions/3.11/lib/python3.11/lib-dynload',
 '',
 '/Users/reuven/Library/Python/3.11/lib/python/site-packages',
 '/usr/local/lib/python3.11/site-packages',
 '/usr/local/Cellar/pybind11/2.11.1/libexec/lib/python3.11/site-packages',
 '/usr/local/opt/python-tk@3.11/libexec']

In [25]:
# when we say "import ABCD", Python looks for ABCD.py in each of these directories + zipfiles, one
# at a time. The first one to have a match wins!

import mymod

In [26]:
mymod  # show me your printed representation

<module 'mymod' from '/Users/reuven/Courses/Current/OReilly-2023-12December-python/mymod.py'>

In [27]:
# what names are defined in this module?

dir(mymod)

['__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__']

In [28]:
mymod.__file__

'/Users/reuven/Courses/Current/OReilly-2023-12December-python/mymod.py'

In [29]:
mymod.__name__

'mymod'

In [30]:
import mymod

In [31]:
dir(mymod)

['__builtins__',
 '__cached__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 'hello',
 'x',
 'y',
 'z']

In [32]:
# if you're running Jupyter, then you typically need to tell Python to reload a module
# if you've already loaded it before. You can do that with

import importlib           # yes, a module for working with modules!
importlib.reload(mymod)    # reload a module 

<module 'mymod' from '/Users/reuven/Courses/Current/OReilly-2023-12December-python/mymod.py'>

In [33]:
mymod.x

100

In [34]:
mymod.y

[10, 20, 30]

In [35]:
mymod.z

{'a': 100, 'b': 200, 'c': 300}

In [36]:
mymod.hello('world')

'Hello, world!'

# Exercise: `menu` function

1. Define a module called `menu`, meaning that the file will be `menu.py`, and put it in the same directory as your program / Jupyter.
2. In that module, define a function called `menu`, which will take a list of strings as an argument. That list contains all of the options that the user can choose from.
3. Ask the user to choose one of those options.
    - If the user chooses one, then that value is returned from the function
    - If the user enters an invalid choice, then have them try again
  
We should be able to write the following code:

```python
import menu
user_choice = menu.menu(['a', 'b', 'c'])   # user must choose a, b, or c
print(f'User chose {user_choice}')
```

In [39]:
import menu
user_choice = menu.menu(['a', 'b', 'c'])   # user must choose a, b, or c
print(f'User chose {user_choice}')

Enter your choice (['a', 'b', 'c']):  q
Enter your choice (['a', 'b', 'c']):  p
Enter your choice (['a', 'b', 'c']):  c


User chose c


In [40]:
import mymod

Hello from mymod!
Goodbye from mymod!


# Names and attributes

Inside of `mymod.py`, I've defined four variables: `x`, `y`, `z`, and `hello`. Inside of the module file, those are global variables, and I can use them as global variables.

Outside of the module, when I import `mymod`. those are actually attributes, names that come after the module name and a dot: `mymod.x`, `mymod.y`, `mymod.z`, and `mymod.hello`. Always, when we `import` a module, its names are available to us as attributes on the module object.

So: Anything that was a global variable in the module file is an attribute outside.

Is the opposite true, too? If there is an attribute on the module object, then is it available as a global variable inside of the module? 

For example, we know that `__name__` contains the name of the module. Can I use `__name__` as a global variable inside of `mymod.py`?

In [41]:
import mymod

Hello from mymod!
Goodbye from mymod!


# What is `'__main__'`?

`__name__` is available in two ways:

- Inside of the module file, it's a variable, and it contains the name of the current module as a string.
- Outside of the module file, it's an attribute on the module object, also containing the module's name as a string

The thing is, the value of `__name__` isn't *always* the module name. It can be the string `'__main__'` if the module is the first part of a Python program to run.

If you invoke `python mymod.py`, then the value of `__name__` inside of `mymod.py` will not be `'mymod'`, but will be `'__main__'`. 

Why? The answer is: This allows us to distinguish between when a module is being run as a program, and when it's being loaded by someone else via `import`. 

This leads us to one of the most famous, and most misunderstood lines in all of Python:

```python
if __name__ == '__main__':
    something_here
```

This line, if put in a module file, basically means: Only execute the below code when the module is being run as a program. Ignore this code when we `import` the module.

If you look at nearly any module in all of Python, it'll have this `if` statement at the bottom, followed by some code. This allows us to write a module that can also be run as an interactive program, with slightly different functionality.

This comparison is *NOT* mandatory. It does *NOT* correspond to the `main` function in C.

In [42]:
import mymod

# Python standard library

Python comes with a very large number of modules already defined and ready for use. They're known as the "Python standard library," and you are guaranteed that any installation of Python includes all of them.

You still need to use `import` to use something from the standard library; these modules aren't imported by default (because they'll use up too much memory). But you can easily import them.

The documentation for the standard library is here: https://docs.python.org/3/library/index.html

# Next up

1. Exercise with the standard library
2. PyPI and `pip`



In [44]:
from collections import Counter

Counter([10, 20, 30, 20, 30, 40, 20, 30, 40, 20])  # pass an iterable to Counter

Counter({20: 4, 30: 3, 40: 2, 10: 1})

In [45]:
Counter('this is a bunch of words and it is a bunch of great words and I am writing too many words'.split())

Counter({'words': 3,
         'is': 2,
         'a': 2,
         'bunch': 2,
         'of': 2,
         'and': 2,
         'this': 1,
         'it': 1,
         'great': 1,
         'I': 1,
         'am': 1,
         'writing': 1,
         'too': 1,
         'many': 1})

In [None]:
# we can treat a Counter object like a dict, both retrieving values with []
# and also invoking methods like .items() to iterate



# Exercise: `Counter`

One of the most useful things in the standard library is the `collections` module, and the `Counter` data type in that module. You can think of `Counter` as a variation on dictionaries. A `Counter` object has keys and values, just like a dict. But you normally don't set it up in the same way. Rather, you create a `Counter` by passing it a list of values, normally strings or integers. The resulting `Counter` object's keys are the unique values that you passed to it, and its values are integers telling you how often each appeared.

We are now going to rewrite our earlier program, counting IP addresses, using `Counter`.

1. Create a list of the IP addresses used in `mini-access-log.txt`. If there are 100 lines in the file, then the list should contain 100 IP addresses, all strings.
2. Use `Counter` to count how often each address appears.
3. Iterate over the `Counter` object, printing each IP address and how often it appears.