# Agenda, Week 5: Modules and packages

- Review of the challenge
- Q&A
- Modules
    - What are modules?
    - What do modules contain?
    - `import` and modules
- Writing a module
    - Creating a module file
    - Loading that module
- Modules vs. packages
- PyPI (Python Package Index)
    - What is it?
    - Downloading things with `pip`
    - Deciding what modules you want to use (and don't want to use)
- `pip` and installing packages from PyPI
- Final questions
- Where do you go from here?
    - What can you do with the information from this course?
    - What can/should you do to improve your Python further?

# Review of the challenge

In [13]:
def count_ips(filename):
    output = {}

    for one_line in open(filename):
        ip_address = one_line.split()[0]

        # if we have seen this IP address before,
        # just increment its value by 1
        if ip_address in output:
            output[ip_address] += 1

        # if this is the first time we're seeing ip_address,
        # add it to the dict with a value of 1
        else:
            output[ip_address] = 1


    return output

ip_address_counts = count_ips('mini-access-log.txt')

for key, value in ip_address_counts.items():
    print(f'{key}:{value}')

67.218.116.165:2
66.249.71.65:3
65.55.106.183:2
66.249.65.12:32
65.55.106.131:2
65.55.106.186:2
74.52.245.146:2
66.249.65.43:3
65.55.207.25:2
65.55.207.94:2
65.55.207.71:1
98.242.170.241:1
66.249.65.38:100
65.55.207.126:2
82.34.9.20:2
65.55.106.155:2
65.55.207.77:2
208.80.193.28:1
89.248.172.58:22
67.195.112.35:16
65.55.207.50:3
65.55.215.75:2


# How to sort anything, my talk from Euro Python 2021

https://www.youtube.com/watch?v=Z3c2LvEJeu0

In [4]:
s = 'abcde fg hijk lmnop qr'

# If I run str.split() on this string, I'll get back a new list of strings

s.split()  # any whitespace (space, \n, \t, \r, \v), any combination, any length

['abcde', 'fg', 'hijk', 'lmnop', 'qr']

In [5]:
# s hasn't changed at all!
s

'abcde fg hijk lmnop qr'

In [7]:
s.split(' ')   # one space character at a time is our field separator

['abcde', 'fg', 'hijk', 'lmnop', 'qr']

In [8]:
s = 'abcde   fg  hijk  lmnop    qr'

In [9]:
s.split()  

['abcde', 'fg', 'hijk', 'lmnop', 'qr']

In [10]:
s.split(' ')

['abcde', '', '', 'fg', '', 'hijk', '', 'lmnop', '', '', '', 'qr']

In [11]:
s.split()[0]   # get a new list based on s, then return the item at index 0 from that list

'abcde'

In [12]:
one_line

NameError: name 'one_line' is not defined

In [15]:
counts = {}

for one_line in open('mini-access-log.txt'):
    ip_address - one_line.split()[0]   # grab the IP address, the item at index 0 in each line's list
    counts[ip_address] += 1

67.218.116.165
66.249.71.65
65.55.106.183
65.55.106.183
66.249.71.65
66.249.71.65
66.249.65.12
66.249.65.12
66.249.65.12
66.249.65.12
66.249.65.12
65.55.106.131
65.55.106.131
66.249.65.12
66.249.65.12
66.249.65.12
66.249.65.12
66.249.65.12
66.249.65.12
66.249.65.12
65.55.106.186
65.55.106.186
66.249.65.12
66.249.65.12
66.249.65.12
74.52.245.146
74.52.245.146
66.249.65.43
66.249.65.43
66.249.65.43
66.249.65.12
66.249.65.12
66.249.65.12
66.249.65.12
66.249.65.12
66.249.65.12
65.55.207.25
65.55.207.25
66.249.65.12
66.249.65.12
66.249.65.12
66.249.65.12
66.249.65.12
66.249.65.12
66.249.65.12
65.55.207.94
65.55.207.94
66.249.65.12
65.55.207.71
66.249.65.12
66.249.65.12
66.249.65.12
98.242.170.241
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38


# DRY -- don't repeat yourself

1. If you have the same line repeated multiple times, then you should use a loop.
2. If you have the same code several times in the same program, then you should use a function.
3. If you have the same code several times across *multiple* programs, then you should use a *library*. In Python, our libraries are called "modules."

# Modules in Python do two things

1. They are our libraries, allowing us to reuse code and thus write more maintainable software.
2. They are our namespaces.  A "namespace" ensures that two parts of a program don't have the same variable name, and thus collide.  When they do, it's called a "namespace collision."

Let's say that I write part of a program, and call my variable `x`.  Let's say that you are collaborating with me, and by a strange freaky accident, you also call your variable `x`.  If we combine our two programs together, what will happen?

In Python? The answer is that there likely won't be a problem, because each separate module is its own namespace. Meaning that each file we work with, each module we work with, has its own, separate set of variables.

My module's `x` is distinct from your module's `x`.  And thus we don't have to worry about collisions.

# To use a module, use the `import` statement

Notice a few things about `import`:

1. It's not a function. Don't use ().
2. The name that comes after the `import` statement is the name of the module variable you want to create.  It's not a string. It's not a filename.
3. After running `import`, the module is loaded into memory, and you can use the named module.

In [16]:
import random

In [17]:
# what is the value of "random"?
type(random)

module

In [22]:
random.randint(0, 100)   # we'll call the randint function in the random module, passing (0, 100)

32

In [None]:
# what other methods (and data) are available to us via the module?

# option 1: use "dir" on the module object, and find what attributes it defines
dir(ran)