# Week 5 agenda

1. Review last week's challenge
2. Modules and packages
    - Importing modules
    - Using modules
    - Writing modules (a tiny bit -- needs an external editor)
    - PyPI
    - `pip` and installing packages from the Internet
3. General Q&A about Python, software, etc.    

In [1]:
# challenge program:

def count_ips(filename):
    output = {}
    for one_line in open(filename):
        fields = one_line.split()
        ip_address = fields[0]

        if ip_address in output:    # have we seen this IP address already?
            output[ip_address] += 1 # if so, add 1 to the count
        else:
            output[ip_address] = 1  # otherwise, set it to 1

    return output

# count_ips('logfile.txt')        


# Modules and packages

Let's start with my favorite programming rule, DRY (don't repeat yourself):

1. If we have several lines repeated in our program, we can replace them ("DRY up our code") with a loop.
2. If we have the same code repeated in multiple places in our program, we can replace them with a function.
3. If we have the same code repeated in multiple programs, we can use a *library*.  Or, as it's known in Python, a *module*.

# Using a module in Python

In order to use a module in Python, we must "import" it.  This gives us access to whatever the module has defined. That'll typically be:

- Data structures
- Functions
- Entirely new types of data ("classes")

The `import` statement in Python is *not* a function! It's a statement -- so don't try to use it with parentheses. Think of `import` sort of like `def`.  `def` creates a new function object, and assigns it to a variable.  In the same way, `import` creates a new module object, and assigns it to a variable.

If I say `import abcd`, the variable `abcd` will then be defined, and it'll contain a module object.  Assuming, of course, that `abcd` exists as a module on your computer.

In [None]:
import random  

In [3]:
type(random)   # what kind of data does the "random" variable contain?

module

In [4]:
# once I've imported the module, I have access to all of the data and functions that it defines.
# I can access those via a .
# meaning: MODULENAME.DATA or MODULENAME.FUNCTION
# then I just use the data, or use the function, as per usual.

# for example, the "random" module defines the "randint" function.  I can call it as follows:

random.randint(0, 100)

3

In [6]:
# we can ask a module to print itself out ("printed representation" of an object)
random

<module 'random' from '/usr/local/Cellar/python@3.10/3.10.4/Frameworks/Python.framework/Versions/3.10/lib/python3.10/random.py'>

# How does Python know where to find `random` and load it?

If we say `import random`, Python looks for a file called `random.py`, where `py` is the standard Python suffix for program files.

Where does it look for `random.py`?

It looks in a whole bunch of directories, known as the "search path." It looks through each of the directories in this path, one at a time.  The first directory in which it finds `random.py` wins, and the search ends.

If Python doesn't find a matching name in its search path, it raises an error.  This means that module import is a matter of "first come, first serve."

In [7]:
import sys     # sys is a special module -- it describes your Python running environment

In [8]:
sys.version    # what version of Python am I running?

'3.10.4 (main, Apr 26 2022, 19:42:59) [Clang 13.1.6 (clang-1316.0.21.2)]'

In [9]:
sys.path       # this is a list of strings -- the search path for modules we import

['/Users/reuven/Courses/Current/oreilly-2022-q2-first-steps',
 '/usr/local/Cellar/python@3.10/3.10.4/Frameworks/Python.framework/Versions/3.10/lib/python310.zip',
 '/usr/local/Cellar/python@3.10/3.10.4/Frameworks/Python.framework/Versions/3.10/lib/python3.10',
 '/usr/local/Cellar/python@3.10/3.10.4/Frameworks/Python.framework/Versions/3.10/lib/python3.10/lib-dynload',
 '',
 '/usr/local/lib/python3.10/site-packages',
 '/usr/local/lib/python3.10/site-packages/argclass-0.1.2-py3.10.egg',
 '/usr/local/Cellar/pybind11/2.9.2/libexec/lib/python3.10/site-packages',
 '/usr/local/lib/python3.10/site-packages/IPython/extensions',
 '/Users/reuven/.ipython']

# Exercise: Character classification

1. Import the `string` module in Python.  Historically, this module used to have a lot of functionality, but most of that was moved into methods on the `str` (string) class.  However, it still defines a few different variables that can be useful.  For example, `string.digits` (all digits), `string.punctuation` (punctuation), and `string.ascii_letters` (letters).
2. Define a dict with three keys -- `digits`, `punctuation`, and `letters`, and set the value to be 0 in each.
3. Ask the user to enter a string.
3. Go through the string, one character at a time:
    - If the character is a digit, add 1 to the `digits` value
    - If the character is punctuation, add 1 to the `punctuation` value
    - If the character is letter, add 1 to the `letter` value
4. Print out the resulting dict    

In [10]:
import string

In [11]:
string.digits

'0123456789'

In [12]:
string.punctuation

'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'

In [13]:
string.ascii_letters

'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'

In [20]:
counts = {'digits':0, 'punctuation':0, 'letters':0}

s = input('Enter a string: ').strip()

for one_character in s:
    if one_character in string.digits:
        counts['digits'] += 1
    elif one_character in string.punctuation:
        counts['punctuation'] += 1
    elif one_character in string.ascii_letters:
        counts['letters'] += 1
    else:
        print(f'Ignoring character "{one_character}" ({ord(one_character)})')
        
print(counts)        

Enter a string: hello in Hebrew is שלום
Ignoring character " " (32)
Ignoring character " " (32)
Ignoring character " " (32)
Ignoring character " " (32)
Ignoring character "ש" (1513)
Ignoring character "ל" (1500)
Ignoring character "ו" (1493)
Ignoring character "ם" (1501)
{'digits': 0, 'punctuation': 0, 'letters': 15}


# Three examples (so far) of modules

- `random`, which contains functions for generating random numbers
- `sys`, which contains the Python language/system information
- `string`, which defines a number of variables we can use for classifying characters

In [18]:
s = 'abcdefghij'

# I can use the random.choice method to retrieve a random element of s

random.choice(s)

'i'

# How can I know what a module provides?

1. Use `dir`
2. Use `help`
3. Use the Python documentation site

In [21]:
# the "dir" function in Python, when applied to a module, shows us all of the names 
# available via that module

dir(random)

['BPF',
 'LOG4',
 'NV_MAGICCONST',
 'RECIP_BPF',
 'Random',
 'SG_MAGICCONST',
 'SystemRandom',
 'TWOPI',
 '_ONE',
 '_Sequence',
 '_Set',
 '__all__',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 '_accumulate',
 '_acos',
 '_bisect',
 '_ceil',
 '_cos',
 '_e',
 '_exp',
 '_floor',
 '_index',
 '_inst',
 '_isfinite',
 '_log',
 '_os',
 '_pi',
 '_random',
 '_repeat',
 '_sha512',
 '_sin',
 '_sqrt',
 '_test',
 '_test_generator',
 '_urandom',
 '_warn',
 'betavariate',
 'choice',
 'choices',
 'expovariate',
 'gammavariate',
 'gauss',
 'getrandbits',
 'getstate',
 'lognormvariate',
 'normalvariate',
 'paretovariate',
 'randbytes',
 'randint',
 'random',
 'randrange',
 'sample',
 'seed',
 'setstate',
 'shuffle',
 'triangular',
 'uniform',
 'vonmisesvariate',
 'weibullvariate']

In [22]:
# the list of strings we get back from "dir" describes the attributes we can use
# on the module itself.  So if 'abcde' is an element shown by "dir", we can then say

# module.abcde

# `_` in Python names

If a name starts with `_`, that is supposed to mean it's private -- we shouldn't use it, because it might change, or its internal to an object.  People often do use these, but then it's their fault if something goes wrong.

If a name starts and ends with `__` (double underscore), then we often call it "dunder" in Python. For example, `__str__` is pronounced "dunder str."  Usually, these are methods that we don't call directly, but which Python knows to invoke at specific times and particular circumstances.

In [23]:
# we can also use the "help" function to find out about an entire module, or about one name in the module

help(random)

Help on module random:

NAME
    random - Random variable generators.

DESCRIPTION
        bytes
        -----
               uniform bytes (values between 0 and 255)
    
        integers
        --------
               uniform within range
    
        sequences
        ---------
               pick random element
               pick random sample
               pick weighted random sample
               generate random permutation
    
        distributions on the real line:
        ------------------------------
               uniform
               triangular
               normal (Gaussian)
               lognormal
               negative exponential
               gamma
               beta
               pareto
               Weibull
    
        distributions on the circle (angles 0 to 2pi)
        ---------------------------------------------
               circular uniform
               von Mises
    
    General notes on the underlying Mersenne Twister core generator:
    


In [24]:
# If I want to know about one thing, I can ask for help on that specific thing

help(random.randint)

Help on method randint in module random:

randint(a, b) method of random.Random instance
    Return random integer in range [a, b], including both end points.



In [25]:
help(random.random)

Help on built-in function random:

random() method of random.Random instance
    random() -> x in the interval [0, 1).



In [26]:
random.random()

0.7056282600553893

In [None]:
# what is random.random?

# - the module random
# - inside of the module random, we have the function random

# these are totally separate!`

In [27]:
# Final place to get documentation for Python modules, especially if they come with the language,
# is docs.python.org

# Functions vs. methods

Both functions and methods are verbs in Python -- they both do things.

They also both:

- Are invoked with `()`
- Can take arguments
- Return values

The difference is that functions are free floating.  We just call a function, and don't preface its name with a `.`.  By contrast, methods are always attached to objects.

For example:

```python
s = 'abcd'

len(s)     # we call the function len, which returns the integer 4
s.upper()  # we call the method upper on s
```

The exception to this rule is that when you load functions or methods from a module, the function/method comes after a `.`.  That makes it a bit more confusing or unclear.

# Next up

1. Different forms of `import`
2. We'll develop our own simple module



In [28]:
import random   # this defines the "random" variable, and I can access the module's components via it

In [29]:
random.randint(0, 100)

86

In [30]:
# can I just call randint without the preceding "random." ?
randint(0, 100)

NameError: name 'randint' is not defined

In [31]:
# If I'm going to use random.randint a lot, it's annoying that I have to say random.randint,
# and I cannot just say randint

# to handle this, we have an alternative import syntax

from random import randint

# This 
# (1) imports the random module into memory
# (2) defines the randint variable to refer to random.randint
# (3) BUT BUT BUT it does *not* define the "random" variable!

In [32]:
randint(0, 100)

1

In [33]:
import random                        # everything in the random module is available as random.NAME

from random import randint           # only randint is available

from random import randint, choice   # only randint and choice are available

In [34]:
# what if I want to import a module, but I want to assign it to a different variable name?

import random as r    # (1) load the random module, (2) assign it to the variable r

In [35]:
r.randint(0, 100)     # r is now an alias to random

99

In [36]:
# in the world of data science, we use packages like NumPy and Pandas, which are traditionally
# loaded as follows:

import numpy as np
import pandas as pd



In [37]:
# we can also rename anything we've loaded via "from .. import"

from random import randint as ri   # now the function randint is available as "ri"

In [38]:
ri(0, 100)

75

In [39]:
# this is a popular version of from .. import

# from random import *

# I BEG YOU -- never use this!

# What's wrong with `from .. import *`?

There are at least two problems, one practical and one philosophical.

Practically, you're saying: Whatever names are defined in that module, I want them to be defined as global variables in my program.  That raises all sorts of problems:

- What if a name from that module conflicts with a name in your program? Which gets priority?
- What if the module is updated, and you don't pay attention, and then a name conflicts with one of yours?

Philosophically, programmers have tried for *decades* to avoid defining too many global variables, which inevitably cause problems.  Here, you're saying: Please! Give me lots of new global variables!  Keeping track of things will be very hard.

We want to have lots of namespaces.  We want to separate names, for easier understanding of our code, and `import *` undoes that.

# Summarizing syntax of `import`

- `import MODNAME`
- `import MODNAME as A`
- `from MODNAME import THING`
- `from MODNAME import THING as A`
- `from MODNAME import *`

In [40]:
import hello

Hello


In [41]:
import mymod

In [42]:
dir(mymod)

['__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__']

In [43]:
mymod.__file__

'/Users/reuven/Courses/Current/oreilly-2022-q2-first-steps/mymod.py'

In [44]:
random.__file__

'/usr/local/Cellar/python@3.10/3.10.4/Frameworks/Python.framework/Versions/3.10/lib/python3.10/random.py'

In [45]:
string.__file__

'/usr/local/Cellar/python@3.10/3.10.4/Frameworks/Python.framework/Versions/3.10/lib/python3.10/string.py'

In [46]:
pd.__file__

'/usr/local/lib/python3.10/site-packages/pandas/__init__.py'

In [47]:
mymod.__name__

'mymod'

In [48]:
string.__name__

'string'

In [49]:
pd.__name__

'pandas'

In [50]:
import mymod

In [51]:
dir(mymod)

['__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__']

# `import` only works once!

`import` normally does two things:

1. Loads the module
2. Defines the variable

However, it only loads the module *once* per Python session.  If you're running an actual program, that's not an issue.  But if you're in Jupyter or a debugger, it is.

In [1]:
import mymod

In [2]:
dir(mymod)

['__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 'hello',
 'months',
 'x',
 'y']

In [3]:
mymod.x

100

In [4]:
mymod.y

[10, 20, 30]

In [5]:
mymod.months

{'Jan': 1, 'Feb': 2, 'Mar': 3, 'Apr': 4, 'May': 5, 'Jun': 6}

In [6]:
mymod.months['Mar']

3

In [7]:
mymod.hello('world')

'Hello, world!'

# Exercise: Menu

1. Create a new file, called `menu.py`. (You can use Jupyter for this, if you don't have a favorite editor.)
2. In `menu.py`, define a function called `menu`.  (Yes, this means that it'll be accessed as `menu.menu`.
3. The `menu` function should take a list of strings as an argument.
    - Display the list of strings to the user, and ask them to choose one
    - If they choose an element from the list, then return their choice
    - If they choose something not on the list, scold them and have them try again

Basically, then, calling `menu.menu(['a', 'b', 'c'])` guarantees that we'll get a return value of either `a`, `b`, or `c`.

From Jupyter (or elsewhere), write a short program that invokes `menu.menu`:

```python
import menu
user_choice = menu.menu(['a', 'b', 'c'])
print(user_choice)
```

In [8]:
import menu
user_choice = menu.menu(['a', 'b', 'c'])
print(user_choice)


Enter a choice (a/b/c): q
q is not a valid choice.
Enter a choice (a/b/c): what?
what? is not a valid choice.
Enter a choice (a/b/c): oh well
oh well is not a valid choice.
Enter a choice (a/b/c): b
b


In [1]:
import mymod

Hello from mymod!
Goodbye from mymod!


# Variables (inside) and attributes (outside)

Any variable (or function) we define inside of the module is visible to anyone who imports the module (outside) as an attribute on the module object.


### `x`
- Inside of `mymod.py`, we define the variable `x`.
- Outside of `mymod.py`, when we `import mymod`, we have access to `mymod.x`.

### `y`
- Inside of `mymod.py`, we define the variable `y`.
- Outside of `mymod.py`, when we `import mymod`, we have access to `mymod.y`.

### `hello`
- Inside of `mymod.py`, we define the function `hello`.
- Outside of `mymod.py`, when we `import mymod`, we have access to `mymod.hello`.

### `__name__`
- Outside of `mymod.py`, when we `import mymod`, we have access to `mymod.__name__`.
- So, inside of `mymod.py`, we have access to the variable `__name__`?



In [2]:
mymod.__name__

'mymod'

In [1]:
import mymod

Hello from mymod!
Goodbye from mymod!
