# Agenda: Modules and packages

0. Q&A
1. Challenge
2. Modules -- what are they?
3. Different forms of `import`
4. Developing our own module
5. Python standard library
6. Modules vs. packages
7. PyPI and `pip`
8. Q&A - AMA -- what's next?

In [2]:
def count_ips(filename):
    output = {}

    for one_line in open(filename):
        ip_address = one_line.split()[0]

        if ip_address in output:
            output[ip_address] += 1  # seen before? add 1
        else:
            output[ip_address] = 1   # first time? set to 1

    return output

print(count_ips('logfile.txt'))

{'67.218.116.165': 2, '66.249.71.65': 3, '65.55.106.183': 2, '66.249.65.12': 32, '65.55.106.131': 2, '65.55.106.186': 2, '74.52.245.146': 2, '66.249.65.43': 3, '65.55.207.25': 2, '65.55.207.94': 2, '65.55.207.71': 1, '98.242.170.241': 1, '66.249.65.38': 100, '65.55.207.126': 2, '82.34.9.20': 2, '65.55.106.155': 2, '65.55.207.77': 2, '208.80.193.28': 1, '89.248.172.58': 22, '67.195.112.35': 16, '65.55.207.50': 3, '65.55.215.75': 2}


In [4]:
for one_line in open('logfile.txt'):
    pass   # do nothing

In [5]:
one_line

'66.249.65.38 - - [31/Jan/2010:21:08:00 +0200] "GET /browse/one_node/1892 HTTP/1.1" 200 1296 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"\n'

In [6]:
one_line[:12]

'66.249.65.38'

In [7]:
one_line[13:14]

'-'

# DRY -- the "don't repeat yourself" rule

1. If we have several lines in a row that repeat themselves, we can "DRY up" that code with a loop.
2. If we have code that repeats in several places across a program, we can "DRY up" that code with a function.
3. If we have code that repeats across several different programs, we can "DRY up" the code with a *library*.

Every programming language supports libraries. This allows us to write functions and data once, and then access those functions/data from numerous programs:

- Dictionary with the months of the year (names + numbers)
- Function for logging into a system with a username and password
- Function that retrieves the latest stock price, given a symbol
- Function that reads from a file and returns the longest word

In Python, we call our libraries "modules." A module contains Python data + functions. But it does more than that. It's also a *namespace*, meaning that it walls off its variables from other variables you might define.

Imagine that you write a program with a function `hello`, and then you load a module that also defines a function `hello`. You don't want to have a "namespace collision," where it's unclear which `hello` is now defined. By putting any definitions in a namespace, you avoid this sort of problem. You can think of namespaces as last names, or surnames, for your variables. 



# How do we use modules?

In Python, we load modules using the `import` statement. It looks a bit weird, but it's one of the most common things to put in a Python program.

Some things to consider about `import`:

- It's not a function. Don't use parentheses with it. You write `import`, a space, and then the module you want to import.
- In other languages, you often pass the name of the library you want to load as a string, in quotes. Not so in Python! Here, the name of the module you give is actually the variable name you want to define.

In [8]:
# let's say I want a random integer
# I can use the "random" module for that, and the "randint" function in that module

import random

# after this line runs, "random" is defined as a variable

type(random)

module

In [9]:
# if I want to use a function defined in the random module, I say random.FUNCNAME()

random.randint(0, 100)   # this returns a single random int in the range 0-100

41

There are a bunch of different forms of "import" that we can use:

1. The standard form, where we say `import MODULENAME`

In [11]:
# what happens if I tire of saying random.randint? What if I just want to say randint?
# right now, randint doesn't exist as a variable. It exists as an attribute on the random module we loaded

randint(0, 100)

NameError: name 'randint' is not defined

In [None]:
# there are many times that we might be using a function so often that we tire of saying both
# the module name and the function name. In such cases, we want the function to be loaded as a variable,
# rather than the module

# for that, we have this syntax:

from random import randint

# the above still loads the entire random module into memory
# the above does *not* define random as a variable
# but it *does* defin