# Agenda, week 5: Modules and packages

1. Review of the challenge
2. Q&A
3. What are modules?
4. Using `import` to retrieve data from modules
5. Different variations on `import`
6. How do we develop a module?
7. What happens when a module is imported?
8. Python's standard library
9. Packages and PyPI -- finding and downloading them onto your computer
10. Using `pip`
11. What's next?



In [1]:
# Challenge solution

# Write your code below...

def count_ips(filename):
    output = {}
    for one_line in open(filename):

        # turn the line into a list of strings, separating on whitespace
        # grab the first field, at index 0-- the IP address
        ip_address = one_line.split()[0]  

        # have I seen this IP address already?
        # If so, then just add 1 to its value

        if ip_address in output:     # "in" on a dict checks the keys
            output[ip_address] += 1
        else:                        # first time seeing this IP address
            output[ip_address] = 1

    return output



In [2]:
counts = count_ips('mini-access-log.txt')

In [5]:
# let's print this dict nicely!

for key, value in counts.items():
    print(f'{key}:\t{value}')

67.218.116.165:	2
66.249.71.65:	3
65.55.106.183:	2
66.249.65.12:	32
65.55.106.131:	2
65.55.106.186:	2
74.52.245.146:	2
66.249.65.43:	3
65.55.207.25:	2
65.55.207.94:	2
65.55.207.71:	1
98.242.170.241:	1
66.249.65.38:	100
65.55.207.126:	2
82.34.9.20:	2
65.55.106.155:	2
65.55.207.77:	2
208.80.193.28:	1
89.248.172.58:	22
67.195.112.35:	16
65.55.207.50:	3
65.55.215.75:	2


In [8]:
# what if your boss doesn't want to see numbers, but rather wants to see a histogram?

for key, value in counts.items():
    print(f'{key}:\t{value * "x"}')

67.218.116.165:	xx
66.249.71.65:	xxx
65.55.106.183:	xx
66.249.65.12:	xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
65.55.106.131:	xx
65.55.106.186:	xx
74.52.245.146:	xx
66.249.65.43:	xxx
65.55.207.25:	xx
65.55.207.94:	xx
65.55.207.71:	x
98.242.170.241:	x
66.249.65.38:	xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
65.55.207.126:	xx
82.34.9.20:	xx
65.55.106.155:	xx
65.55.207.77:	xx
208.80.193.28:	x
89.248.172.58:	xxxxxxxxxxxxxxxxxxxxxx
67.195.112.35:	xxxxxxxxxxxxxxxx
65.55.207.50:	xxx
65.55.215.75:	xx


In [6]:
5 + 'a'

TypeError: unsupported operand type(s) for +: 'int' and 'str'

In [7]:
5 * 'a'   # yes, this will work!

'aaaaa'

# Modules and packages

We've spoken several times about "Don't repeat yourself" -- the "DRY rule."  We've talked about it in two different contexts so far:

1. If we have several lines in a row that are roughly the same, we should turn those into a loop.
2. If we have several places in a program that are roughly the same, we should turn those into a function and then invoke the function in several places.
3. If we have the same code in several different programs, we can use a library, and then reference the code in that library whenever we need it.

In Python, our libraries are known as "modules and packages." A module is a single file containing Python code, and a package is a directory containing one or more modules + other packages.  (You can think of them as files and folders.)

1. If you want to do something that others have already done many times before, the odds are good that you can use someone else's library to do it.
2. If you have implemented something (including at work) that might help others (or yourself) in future programs, then you can write a module and share it with others.

The whole idea here is that you shouldn't be re-inventing the wheel. And if you use a module, then you don't have to worry about writing or maintaining that code. It's someone else's problem!

Modules in Python do all of this -- they let us reuse code, and concentrate on the new, distinct problems we have to solve.

But modules also do something else: They are also namespaces! In other words, if I'm working on a program that uses a variable `x` and you're working on a program that uses a variable `x`, then we don't want them to collide and interfere with one another. Modules separate each file, such that this cannot happen (or at least, not easily).

To use a module, you use the `import` statement.  A few things about `import`:

- It is a statement, not a function. Don't use parentheses.
- The argument you give to `import` is a word, not a string. It is the name of the module you want to load.  It is not a filename, either!
- After you use `import`, you can use the module; anything defined in the module is available as an *attribute* on the module object, after a `.` .

In [9]:
import random       # this imports the "random" module that comes with Python

# once we've done that, we can access an attribute x as random.x
# attributes can be data, functions, or even classes (data types)

random.randint(0, 100)   # here, we retrieve the "randint" function from the "random" module ... and run it!

95

# Exercise: `glob`

The `glob` module contains a function, also c