# Agenda: Day 5 (modules and packages)

1. What are modules? What do they contain?
2. `import` -- and its different variations
3. Re-importing modules -- when does this matter, and how do we do it?
4. A little bit about writing our own modules
5. Python's standard library
6. PyPI -- Python Package Index, and installing things with `pip` (warning: this might not work for you)
7. Where to from here?
    - How/where can you improve your Python after this course?
    - What could/should you study next?

# Modules -- what are they?

Remember the DRY rule -- "Don't Repeat Yourself!"

- If you have several lines in a row that roughly repeat themselves, you can use a loop.
- If you have the same code in several different places in your program, you can use a function.
- If you have the same code in several *different* programs, you can use a library.

Every programming language (that I know of) supports the use of external libraries. Those libraries contain data structures, functions, and classes that we might want to use in more than one program.

This means that if there's functionality which repeats in my program, then I can benefit by putting it in a library and then using it not just in the current program, but in future programs. If there's functionality that'll help me (or my team, or my company, or the world) beyond my current project, then putting things in a library makes a lot of sense.

In Python, we have libraries -- but we call them "modules."

Modules do two things:

1. They are libraries, as described here.
2. They are also *namespaces*, ensuring that variable names don't collide with one another.

Namespaces are sort of like last names for variables -- they greatly reduce the chance that we'll have a namespace collision. 

Imagine having to write the following kinds of software:
- Reliable login systems, including cryptography
- Printing things on a printer in graphics format
- Translates from Python to C
- Many modern libraries implement the client side of an API, so that you can access all sorts of online services

# Using modules with `import`

If you want to use a module in Python, you must first `import` it. The syntax looks like this:

    import random

Notice a few things about this:

1. I don't put the word `random` in quotes. That's because it's not a string, but rather is the name of the variable to which we'll be assigning the new module.
2. I also don't use parentheses after `import` -- it's not a function! Only functions need to use ()
3. I'm not giving a filename to `import`, but rather, I'm giving the name of the module that wish would be installed into my current Python.

In [1]:
import random

In [2]:
# what did we get in the module object?
type(random)

module

In [4]:
# what can we do with a module object?
# not much! We can retrieve items from its attributes:

random.randint(0, 10)    # because "randint" is a function defined in the "random" module

5

In [5]:
# what if you don't know what attributes are available?
dir(random)

['BPF',
 'LOG4',
 'NV_MAGICCONST',
 'RECIP_BPF',
 'Random',
 'SG_MAGICCONST',
 'SystemRandom',
 'TWOPI',
 '_ONE',
 '_Sequence',
 '_Set',
 '__all__',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 '_accumulate',
 '_acos',
 '_bisect',
 '_ceil',
 '_cos',
 '_e',
 '_exp',
 '_floor',
 '_index',
 '_inst',
 '_isfinite',
 '_log',
 '_os',
 '_pi',
 '_random',
 '_repeat',
 '_sha512',
 '_sin',
 '_sqrt',
 '_test',
 '_test_generator',
 '_urandom',
 '_warn',
 'betavariate',
 'choice',
 'choices',
 'expovariate',
 'gammavariate',
 'gauss',
 'getrandbits',
 'getstate',
 'lognormvariate',
 'normalvariate',
 'paretovariate',
 'randbytes',
 'randint',
 'random',
 'randrange',
 'sample',
 'seed',
 'setstate',
 'shuffle',
 'triangular',
 'uniform',
 'vonmisesvariate',
 'weibullvariate']

# Exercise: Count punctuation

1. Write a function, `count_punctuation`, which will take a string and return the number of punctuation characters in it.
2. Set `total` to 0.
3. Go through each character. You can find out what characters are considered punctuation marks by checking `string.punctuation`.
4. If so, add 1 to the count
5. Return the count when you're done.

Example:

    count_punctation('He is really a what?!?')
    3

If I mention `string.punctuation` in my description, that means you should `import` the `string` module, and then you should consult with its `string.punctuation` attribute to know if the current character is punctuation.

In [7]:
# given a string, count the punctuation marks
import string
s = 'He is really a what?!?'
total = 0

for one_character in s:
    if one_character in string.punctuation:
        total += 1

print(total)


3


# If I want to call random.randint, I can:



In [8]:
random.randint(0, 100)

91

In [9]:
# what if I want to just say "randint"?

# I can't right now:

randint

NameError: name 'randint' is not defined

In [12]:
# I can you randint, only under the namespace of "random"
# If want to call randint by itself, rather than via its module, can I?

from random import randint 

# This makes it possible to invoke "randint" without "random".

randint(0, 100)

91

In [13]:
# let's make this a function now:

import string

def count_punctuation(s):
    total = 0
    
    for one_character in s:
        if one_character in string.punctuation:
            total += 1
    
    print(total)


In [17]:
count_punctuation('hello out there?!?')

3


In [18]:
# let's use from..import syntax now

from string import punctuation

def count_punctuation(s):
    total = 0
    
    for one_character in s:
        if one_character in punctuation:
            total += 1
    
    print(total)


In [19]:
count_punctuation('hello! hello?')

2


# Different forms of `import`

1. `import MODNAME` -- we define a new module called MODNAME, all names are attributes on it
2. `import MODNAME as ALIAS`  -- we define a new module called ALIAS, whose contents come from MODNAME.py
3. `from MODNAME import NAME` -- we can now call NAME by itself, rather than as MODNAME.NAME.
4. `from MODNAME import NAME as ALIAS`  -- we can now call NAME with an alias, rather than as MODNAME.NAME
5. `from MODNAME import *` -- this is a HORRIBLE, HORRIBLE, AWFUL IDEA!

In [21]:
# let's assume that you use randint in your module.
# you could say

from random import randint

# what if we have another function variable whose name is the same? Then we will have a namespace collsion!

# First: This is why namespaces exist, so we can work without having to worry about variable names colliding.
# BUT I've heard from companies where basically outlaw the use of "from .. import" because it leads to great
# ambiguity, that the names are no longer conextualized.

# We have another solution: We can alias our module or its imported name

import random as r   # the idea is that "r" is an alias to our random module.

In [22]:
random

<module 'random' from '/usr/local/Cellar/python@3.11/3.11.4_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/random.py'>

In [23]:
r

<module 'random' from '/usr/local/Cellar/python@3.11/3.11.4_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/random.py'>

In [24]:
random is r

True

In [26]:
# aliases can be conventions

import numpy as np
import pandas as pd

# Exercise: Files' first lines

1. The `glob` module in Python gives us access to pattern-matching for filenames. You can thus run `glob.glob("*.txt")`, and you'll get back a list of strings that seemed match that pattern.
2. Each string is a filename.
3. Iterate over the list of filenames, open each file, and print the first line of each file.

Try to keep your pattern within the bounds of as normal possible. In a real program, we would be able to recover -- here, we don't yet have exception handling, so just be careful.

glob.glob('*.txt')  # this returns a list of filenames
- Iterate over that list
- Open the file, get the first line

Note that `f.readline()` (singular!) returns one line from a file.

- Is it better to use `import` here? Or is `from .. import` better? What are the trade-offs?

Another way:
- Take the list we get from `glob.glob` (list of strings / filenames)
- Go through each filenames, one by one
- Open the file based on that, iterate over the lines
  Immediately `break`
  


In [28]:
# it's very clear in this version that we know what module was imported, and where the function came from

import glob   # this means that the module object is available, if we want any other functionality

pattern = '*.txt'

glob.glob(pattern)

['myconfig.txt',
 'mini-access-log.txt',
 'nums.txt',
 'shoe-data.txt',
 'linux-etc-passwd.txt',
 'wcfile.txt',
 'myfile.txt']

In [29]:
# this keeps things somewhat ambiguous -- where did this "glob" function come from?

from glob import glob  # only the "glob.glob" function is available. Other functionality is not.

pattern = '*.txt'

glob(pattern)

['myconfig.txt',
 'mini-access-log.txt',
 'nums.txt',
 'shoe-data.txt',
 'linux-etc-passwd.txt',
 'wcfile.txt',
 'myfile.txt']

In [32]:
for one_filename in glob(pattern):
    print(one_filename)
    print(f'\t{open(one_filename).readline()}')

myconfig.txt
	a=10

mini-access-log.txt
	67.218.116.165 - - [30/Jan/2010:00:03:18 +0200] "GET /robots.txt HTTP/1.0" 200 99 "-" "Mozilla/5.0 (Twiceler-0.9 http://www.cuil.com/twiceler/robot.html)"

nums.txt
	5

shoe-data.txt
	Adidas	orange	43

linux-etc-passwd.txt
	# This is a comment

wcfile.txt
	This is a test file.

myfile.txt
	hello again



In [33]:
for one_filename in glob(pattern):
    print(one_filename)
    for one_line in open(one_filename):
        print(one_line)
        break

myconfig.txt
a=10

mini-access-log.txt
67.218.116.165 - - [30/Jan/2010:00:03:18 +0200] "GET /robots.txt HTTP/1.0" 200 99 "-" "Mozilla/5.0 (Twiceler-0.9 http://www.cuil.com/twiceler/robot.html)"

nums.txt
5

shoe-data.txt
Adidas	orange	43

linux-etc-passwd.txt
# This is a comment

wcfile.txt
This is a test file.

myfile.txt
hello again



# Next up

- How do modules work (a little)
- Python standard library

In [35]:
# import -- no parentheses
# mymod -- not a filename
# mymod -- we don't add the .py suffix
# mymod -- we don't put quotes around its name

import mymod

In [36]:
type(mymod)

module

In [37]:
mymod

<module 'mymod' from '/Users/reuven/Courses/Current/2023-08August-Python/mymod.py'>

In [38]:
# what was defined on this module? Let's check its attributes with "dir"

dir(mymod)

['__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__']

Python looks for modules in all of the directories named in the `sys.path` list of strings. When you use `import`, it goes through each of these directories, one at a time, looking for your module.

In [39]:
import mymod

In [40]:
dir(mymod)

['__builtins__',
 '__cached__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 'hello',
 'x',
 'y']

# Modules normally only load once

If you use `import` in a Python program, then that imports the module *until the program exits*. This means that (normally) when you're in Jupyter, if you load a module and then change it, you'll need to reload it.


In [41]:
# how do I get access to the variables and function I wrote:
# I just access them as attributes of the module

mymod.x

100

In [42]:
mymod.y

[10, 20, 30]

In [43]:
mymod.hello('world')

'Hello, world!'

In [44]:
import mymod

Hello from mymod!
Goodbye from mymod!


In [45]:
dir(mymod)

['__builtins__',
 '__cached__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 'hello',
 'x',
 'y']

In [46]:
mymod.__name__    # "dunder name" -- double underscore name

'mymod'

In [47]:
import mymod

Hello from mymod!
Goodbye from mymod!


# `__name__` is a special variable

The variable `__name__` is always defined in Python. It is a string, indicating whether the current namespace (i.e., where global variables are stored) is the original, startup Python namespace or if we're in a module that was loaded.

- If `__name__` is accessed from a file that was imported, then `__name__` is the name of the module
- In other cases, if `__name__` is accessed from a file that was the first to run in the program from the command line... then it is the special string value `'__main__'`.

This allows our modules to distinguish between when they are running as standalone programs, and when they are being imported as modules. How would I distinguish?

One of the most famous lines in all of Python:

```python
if __name__ == '__main__':
    # stuff goes here, assuming that the program was *not* imported, and this is a standalone call
```

In [48]:
import mymod

# Exercise: Create and import a module

1. Using Jupyter's file-editor capability, create a file called `littlemod.py`. In that file, define a variable `name` to be your name (a string).
2. Import the module into Jupyter, and see that you can access that variable.

In [49]:
import littlemod

In [50]:
littlemod.name

'Reuven'

In [54]:
dir(littlemod)

['__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 'name']

In [59]:
import littlemod

In [60]:
dir(littlemod)

['__builtins__',
 '__cached__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 'greet',
 'name']

In [61]:
littlemod.greet('Reuven')

'Hello, my amazing boss, Reuven!'

In [62]:
littlemod.greet('someone else')

'Hello, someone else, whoever you are'

# Python's standard library

We can write modules, of course. But Python comes with a huge number of modules when you download and install the language. For years, people talked about Python as "batteries included," meaning that anything you would need to do was likely in the standard library, and thus automatically available with a simple `import`. 

Anything that comes with the standard library is guaranteed to be available for anyone with Python on their computer. This means that if I write a program with Python 3.11 on my Mac, and I use a bunch of modules from the standard library, then I give my program to you, using Windows (but also Python 3.11), you can run the program without any fear that the modules will be broken.

In [63]:
import sys
sys.path

['/Users/reuven/Courses/Current/2023-08August-Python',
 '/usr/local/Cellar/python@3.11/3.11.4_1/Frameworks/Python.framework/Versions/3.11/lib/python311.zip',
 '/usr/local/Cellar/python@3.11/3.11.4_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11',
 '/usr/local/Cellar/python@3.11/3.11.4_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/lib-dynload',
 '',
 '/usr/local/lib/python3.11/site-packages',
 '/usr/local/Cellar/pybind11/2.11.1/libexec/lib/python3.11/site-packages',
 '/usr/local/opt/python-tk@3.11/libexec']

In [64]:
import fixer_currency

# Next up

- Finalizing some loose ends with modules/packages
- What next? For Python, for you, and your career  -- questions will be amazing to answer then!



# Some module/package thoughts

How can we know if a module is good?  How can we know if a module is safe? https://awesome-python.com to the rescue!

# Exercise: Character count

1. We've written programs in this course that take a string, iterate over it one character a time, and use a dict to store the number of times each character appears in the string.
2. It turns out that Python has a data structure that does this already, the `collections.Counter` object.
3. If you create a new instance of `Counter` by running it on a string (e.g., `Counter('abcdeabcde')`, you get back a Counter object, which is similar to a dictionary, but its keys will be the characters in the string and its values will be the number of times that character appears.
4. Because it acts like a dictionary, you can iterate over it like one.
5. Use `import` and the `Counter` class to ask the user for a string, and then to print a report on how many times each character appeared in it.

In [65]:
s = input('Enter a string: ').strip()

counts = {}

for one_character in s:
    if one_character in counts:
        counts[one_character] += 1
    else:
        counts[one_character] = 1

for key, value in counts.items():
    print(f'{key}: {value}')

Enter a string:  hello to everyone out there!


h: 2
e: 6
l: 2
o: 4
 : 4
t: 3
v: 1
r: 2
y: 1
n: 1
u: 1
!: 1


In [68]:
# collections.Counter means -- module collections, class Counter

import collections

# I could also say: from collections import Counter
counts = collections.Counter(s)

for key, value in counts.items():
    print(f'{key}: {value}')

h: 2
e: 6
l: 2
o: 4
 : 4
t: 3
v: 1
r: 2
y: 1
n: 1
u: 1
!: 1


# What's next?

- Where/how can you use Python in your work?
- What are good topics for you to learn regarding Python?
- How can you improve?

# Using Python in your work

The rule of thumb for software is: If it's annoying and time consuming for you to do, then maybe the computer could/should do it for you.

Where could you use Python?

- Repetitive tasks on your computer:
    - Erasing/moving old backup files
    - Searching through certain documents
    - Retrieving data from important/useful data sources
- Data analysis
    - Pandas -- data analysis
    - Machine learning
    - Data engineering
- Web development
    - Python powers many Web sites -- Django, Flask, FastAPI
    - Client-side used to be only JavaScript... PyScript is a new contender using Python!

# So, how do we improve?

Practice, practice, and more practice: Find opportunities to practice Python