# Week 5 agenda

1. Review last week's challenge
2. Modules and packages
    - Importing modules
    - Using modules
    - Writing modules (a tiny bit -- needs an external editor)
    - PyPI
    - `pip` and installing packages from the Internet
3. General Q&A about Python, software, etc.    

In [1]:
# challenge program:

def count_ips(filename):
    output = {}
    for one_line in open(filename):
        fields = one_line.split()
        ip_address = fields[0]

        if ip_address in output:    # have we seen this IP address already?
            output[ip_address] += 1 # if so, add 1 to the count
        else:
            output[ip_address] = 1  # otherwise, set it to 1

    return output

# count_ips('logfile.txt')        


# Modules and packages

Let's start with my favorite programming rule, DRY (don't repeat yourself):

1. If we have several lines repeated in our program, we can replace them ("DRY up our code") with a loop.
2. If we have the same code repeated in multiple places in our program, we can replace them with a function.
3. If we have the same code repeated in multiple programs, we can use a *library*.  Or, as it's known in Python, a *module*.

# Using a module in Python

In order to use a module in Python, we must "import" it.  This gives us access to whatever the module has defined. That'll typically be:

- Data structures
- Functions
- Entirely new types of data ("classes")

The `import` statement in Python is *not* a function! It's a statement -- so don't try to use it with parentheses. Think of `import` sort of like `def`.  `def` creates a new function object, and assigns it to a variable.  In the same way, `import` creates a new module object, and assigns it to a variable.

If I say `import abcd`, the variable `abcd` will then be defined, and it'll contain a module object.  Assuming, of course, that `abcd` exists as a module on your computer.

In [None]:
import random  

In [3]:
type(random)   # what kind of data does the "random" variable contain?

module

In [4]:
# once I've imported the module, I have access to all of the data and functions that it defines.
# I can access those via a .
# meaning: MODULENAME.DATA or MODULENAME.FUNCTION
# then I just use the data, or use the function, as per usual.

# for example, the "random" module defines the "randint" function.  I can call it as follows:

random.randint(0, 100)

3

In [6]:
# we can ask a module to print itself out ("printed representation" of an object)
random

<module 'random' from '/usr/local/Cellar/python@3.10/3.10.4/Frameworks/Python.framework/Versions/3.10/lib/python3.10/random.py'>

# How does Python know where to find `random` and load it?

If we say `import random`, Python looks for a file called `random.py`, where `py` is the standard Python suffix for program files.

Where does it look for `random.py`?

It looks in a whole bunch of directories, known as the "search path." It looks through each of the directories in this path, one at a time.  The first directory in which it finds `random.py` wins, and the search ends.

If Python doesn't find a matching name in its search path, it raises an error.  This means that module import is a matter of "first come, first serve."

In [7]:
import sys     # sys is a special module -- it describes your Python running environment

In [8]:
sys.version    # what version of Python am I running?

'3.10.4 (main, Apr 26 2022, 19:42:59) [Clang 13.1.6 (clang-1316.0.21.2)]'

In [9]:
sys.path       # this is a list of strings -- the search path for modules we import

['/Users/reuven/Courses/Current/oreilly-2022-q2-first-steps',
 '/usr/local/Cellar/python@3.10/3.10.4/Frameworks/Python.framework/Versions/3.10/lib/python310.zip',
 '/usr/local/Cellar/python@3.10/3.10.4/Frameworks/Python.framework/Versions/3.10/lib/python3.10',
 '/usr/local/Cellar/python@3.10/3.10.4/Frameworks/Python.framework/Versions/3.10/lib/python3.10/lib-dynload',
 '',
 '/usr/local/lib/python3.10/site-packages',
 '/usr/local/lib/python3.10/site-packages/argclass-0.1.2-py3.10.egg',
 '/usr/local/Cellar/pybind11/2.9.2/libexec/lib/python3.10/site-packages',
 '/usr/local/lib/python3.10/site-packages/IPython/extensions',
 '/Users/reuven/.ipython']

# Exercise: Character classification

1. Import the `string` module in Python.  Historically, this module used to have a lot of functionality, but most of that was moved into methods on the `str` (string) class.  However, it still defines a few different variables that can be useful.  For example, `string.digits` (all digits), `string.punctuation` (punctuation), and `string.ascii_letters` (letters).
2. Define a dict with three keys -- `digits`, `punctuation`, and `letters`, and set the value to be 0 in each.
3. Ask the user to enter a string.
3. Go through the string, one character at a time:
    - If the character is a digit, add 1 to the `digits` value
    - If the character is punctuation, add 1 to the `punctuation` value
    - If the character is letter, add 1 to the `letter` value
4. Print out the resulting dict    

In [10]:
import string

In [11]:
string.digits

'0123456789'

In [12]:
string.punctuation

'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'

In [13]:
string.ascii_letters

'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'

In [20]:
counts = {'digits':0, 'punctuation':0, 'letters':0}

s = input('Enter a string: ').strip()

for one_character in s:
    if one_character in string.digits:
        counts['digits'] += 1
    elif one_character in string.punctuation:
        counts['punctuation'] += 1
    elif one_character in string.ascii_letters:
        counts['letters'] += 1
    else:
        print(f'Ignoring character "{one_character}" ({ord(one_character)})')
        
print(counts)        

Enter a string: hello in Hebrew is שלום
Ignoring character " " (32)
Ignoring character " " (32)
Ignoring character " " (32)
Ignoring character " " (32)
Ignoring character "ש" (1513)
Ignoring character "ל" (1500)
Ignoring character "ו" (1493)
Ignoring character "ם" (1501)
{'digits': 0, 'punctuation': 0, 'letters': 15}


# Three examples (so far) of modules

- `random`, which contains functions for generating random numbers
- `sys`, which contains the Python language/system information
- `string`, which defines a number of variables we can use for classifying characters

In [18]:
s = 'abcdefghij'

# I can use the random.choice method to retrieve a random element of s

random.choice(s)

'i'

# How can I know what a module provides?

1. Use `dir`
2. Use `help`
3. Use the Python documentation site

In [21]:
# the "dir" function in Python, when applied to a module, shows us all of the names 
# available via that module

dir(random)

['BPF',
 'LOG4',
 'NV_MAGICCONST',
 'RECIP_BPF',
 'Random',
 'SG_MAGICCONST',
 'SystemRandom',
 'TWOPI',
 '_ONE',
 '_Sequence',
 '_Set',
 '__all__',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 '_accumulate',
 '_acos',
 '_bisect',
 '_ceil',
 '_cos',
 '_e',
 '_exp',
 '_floor',
 '_index',
 '_inst',
 '_isfinite',
 '_log',
 '_os',
 '_pi',
 '_random',
 '_repeat',
 '_sha512',
 '_sin',
 '_sqrt',
 '_test',
 '_test_generator',
 '_urandom',
 '_warn',
 'betavariate',
 'choice',
 'choices',
 'expovariate',
 'gammavariate',
 'gauss',
 'getrandbits',
 'getstate',
 'lognormvariate',
 'normalvariate',
 'paretovariate',
 'randbytes',
 'randint',
 'random',
 'randrange',
 'sample',
 'seed',
 'setstate',
 'shuffle',
 'triangular',
 'uniform',
 'vonmisesvariate',
 'weibullvariate']

In [22]:
# the list of strings we get back from "dir" describes the attributes we can use
# on the module itself.  So if 'abcde' is an element shown by "dir", we can then say

# module.abcde

# `_` in Python names

If a name starts with `_`, that is supposed to mean it's private -- we shouldn't use it, because it might change, or its internal to an object.  People often do use these, but then it's their fault if something goes wrong.

If a name starts and ends with `__` (double underscore), then we often call it "dunder" in Python. For example, `__str__` is pronounced "dunder str."  Usually, these are methods that we don't call directly, but which Python knows to invoke at specific times and particular circumstances.

In [23]:
# we can also use the "help" function to find out about an entire module, or about one name in the module

help(random)

Help on module random:

NAME
    random - Random variable generators.

DESCRIPTION
        bytes
        -----
               uniform bytes (values between 0 and 255)
    
        integers
        --------
               uniform within range
    
        sequences
        ---------
               pick random element
               pick random sample
               pick weighted random sample
               generate random permutation
    
        distributions on the real line:
        ------------------------------
               uniform
               triangular
               normal (Gaussian)
               lognormal
               negative exponential
               gamma
               beta
               pareto
               Weibull
    
        distributions on the circle (angles 0 to 2pi)
        ---------------------------------------------
               circular uniform
               von Mises
    
    General notes on the underlying Mersenne Twister core generator:
    


In [24]:
# If I want to know about one thing, I can ask for help on that specific thing

help(random.randint)

Help on method randint in module random:

randint(a, b) method of random.Random instance
    Return random integer in range [a, b], including both end points.



In [25]:
help(random.random)

Help on built-in function random:

random() method of random.Random instance
    random() -> x in the interval [0, 1).



In [26]:
random.random()

0.7056282600553893

In [None]:
# what is random.random?

# - the module random
# - inside of the module random, we have the function random

# these are totally separate!`

In [27]:
# Final place to get documentation for Python modules, especially if they come with the language,
# is docs.python.org

# Functions vs. methods

Both functions and methods are verbs in Python -- they both do things.

They also both:

- Are invoked with `()`
- Can take arguments
- Return values

The difference is that functions are free floating.  We just call a function, and don't preface its name with a `.`.  By contrast, methods are always attached to objects.

For example:

```python
s = 'abcd'

len(s)     # we call the function len, which returns the integer 4
s.upper()  # we call the method upper on s
```

The exception to this rule is that when you load functions or methods from a module, the function/method comes after a `.`.  That makes it a bit more confusing or unclear.

# Next up

1. Different forms of `import`
2. We'll develop our own simple module



In [28]:
import random   # this defines the "random" variable, and I can access the module's components via it

In [29]:
random.randint(0, 100)

86

In [30]:
# can I just call randint without the preceding "random." ?
randint(0, 100)

NameError: name 'randint' is not defined

In [31]:
# If I'm going to use random.randint a lot, it's annoying that I have to say random.randint,
# and I cannot just say randint

# to handle this, we have an alternative import syntax

from random import randint

# This 
# (1) imports the random module into memory
# (2) defines the randint variable to refer to random.randint
# (3) BUT BUT BUT it does *not* define the "random" variable!

In [32]:
randint(0, 100)

1

In [33]:
import random                        # everything in the random module is available as random.NAME

from random import randint           # only randint is available

from random import randint, choice   # only randint and choice are available

In [34]:
# what if I want to import a module, but I want to assign it to a different variable name?

import random as r    # (1) load the random module, (2) assign it to the variable r

In [35]:
r.randint(0, 100)     # r is now an alias to random

99

In [36]:
# in the world of data science, we use packages like NumPy and Pandas, which are traditionally
# loaded as follows:

import numpy as np
import pandas as pd



In [37]:
# we can also rename anything we've loaded via "from .. import"

from random import randint as ri   # now the function randint is available as "ri"

In [38]:
ri(0, 100)

75

In [39]:
# this is a popular version of from .. import

# from random import *

# I BEG YOU -- never use this!

# What's wrong with `from .. import *`?

There are at least two problems, one practical and one philosophical.

Practically, you're saying: Whatever names are defined in that module, I want them to be defined as global variables in my program.  That raises all sorts of problems:

- What if a name from that module conflicts with a name in your program? Which gets priority?
- What if the module is updated, and you don't pay attention, and then a name conflicts with one of yours?

Philosophically, programmers have tried for *decades* to avoid defining too many global variables, which inevitably cause problems.  Here, you're saying: Please! Give me lots of new global variables!  Keeping track of things will be very hard.

We want to have lots of namespaces.  We want to separate names, for easier understanding of our code, and `import *` undoes that.

# Summarizing syntax of `import`

- `import MODNAME`
- `import MODNAME as A`
- `from MODNAME import THING`
- `from MODNAME import THING as A`
- `from MODNAME import *`

In [40]:
import hello

Hello


In [41]:
import mymod

In [42]:
dir(mymod)

['__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__']

In [43]:
mymod.__file__

'/Users/reuven/Courses/Current/oreilly-2022-q2-first-steps/mymod.py'

In [44]:
random.__file__

'/usr/local/Cellar/python@3.10/3.10.4/Frameworks/Python.framework/Versions/3.10/lib/python3.10/random.py'

In [45]:
string.__file__

'/usr/local/Cellar/python@3.10/3.10.4/Frameworks/Python.framework/Versions/3.10/lib/python3.10/string.py'

In [46]:
pd.__file__

'/usr/local/lib/python3.10/site-packages/pandas/__init__.py'

In [47]:
mymod.__name__

'mymod'

In [48]:
string.__name__

'string'

In [49]:
pd.__name__

'pandas'

In [50]:
import mymod

In [51]:
dir(mymod)

['__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__']

# `import` only works once!

`import` normally does two things:

1. Loads the module
2. Defines the variable

However, it only loads the module *once* per Python session.  If you're running an actual program, that's not an issue.  But if you're in Jupyter or a debugger, it is.

In [1]:
import mymod

In [2]:
dir(mymod)

['__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 'hello',
 'months',
 'x',
 'y']

In [3]:
mymod.x

100

In [4]:
mymod.y

[10, 20, 30]

In [5]:
mymod.months

{'Jan': 1, 'Feb': 2, 'Mar': 3, 'Apr': 4, 'May': 5, 'Jun': 6}

In [6]:
mymod.months['Mar']

3

In [7]:
mymod.hello('world')

'Hello, world!'

# Exercise: Menu

1. Create a new file, called `menu.py`. (You can use Jupyter for this, if you don't have a favorite editor.)
2. In `menu.py`, define a function called `menu`.  (Yes, this means that it'll be accessed as `menu.menu`.
3. The `menu` function should take a list of strings as an argument.
    - Display the list of strings to the user, and ask them to choose one
    - If they choose an element from the list, then return their choice
    - If they choose something not on the list, scold them and have them try again

Basically, then, calling `menu.menu(['a', 'b', 'c'])` guarantees that we'll get a return value of either `a`, `b`, or `c`.

From Jupyter (or elsewhere), write a short program that invokes `menu.menu`:

```python
import menu
user_choice = menu.menu(['a', 'b', 'c'])
print(user_choice)
```

In [8]:
import menu
user_choice = menu.menu(['a', 'b', 'c'])
print(user_choice)


Enter a choice (a/b/c): q
q is not a valid choice.
Enter a choice (a/b/c): what?
what? is not a valid choice.
Enter a choice (a/b/c): oh well
oh well is not a valid choice.
Enter a choice (a/b/c): b
b


In [1]:
import mymod

Hello from mymod!
Goodbye from mymod!


# Variables (inside) and attributes (outside)

Any variable (or function) we define inside of the module is visible to anyone who imports the module (outside) as an attribute on the module object.


### `x`
- Inside of `mymod.py`, we define the variable `x`.
- Outside of `mymod.py`, when we `import mymod`, we have access to `mymod.x`.

### `y`
- Inside of `mymod.py`, we define the variable `y`.
- Outside of `mymod.py`, when we `import mymod`, we have access to `mymod.y`.

### `hello`
- Inside of `mymod.py`, we define the function `hello`.
- Outside of `mymod.py`, when we `import mymod`, we have access to `mymod.hello`.

### `__name__`
- Outside of `mymod.py`, when we `import mymod`, we have access to `mymod.__name__`.
- So, inside of `mymod.py`, we have access to the variable `__name__`?



In [2]:
mymod.__name__

'mymod'

In [1]:
import mymod

Hello from mymod!
Goodbye from mymod!


# `__name__` and `'__main__'`

The variable `__name__` is always defined in Python.  It tells us what namespace we're currently in:

- The first file loaded in a Python program, the one that we specify when we run Python from the command line, will always have the special string value `'__main__'` assigned to `__name__`.
- If we import a module, then its `__name__` will be the string version of the module name.

Meaning: If I `import mymod`, then `mymod` will believe that `__name__` is the string `'mymod'`.  But if I run `mymod.py` from the command line, then it will have `__name__` defined to be `'__main__'`.

Who cares?

This allows a module to distinguish between when it was imported, and when it is being run as an actual standalone program.  

This means that we can ask the user for interactive input, or print things out, when the program is run, but not when it's imported.  We can have a two-faced module, one which is both importable and runnable.

In [1]:
import mymod

# Who uses this?

Nearly every module in the Python world uses this technique, in different ways:

- Some modules, when run, will test themselves
- Some modules will give you a demo of their capabilities
- Some modules will run interactive programs, using the functions they defined


Remember (and this is hard):

- `__name__` is a variable that contains a string
- `'__main__'` is a string, assigned to `__name__` in certain circumstances.

# Next up

1. Python standard library
2. Modules vs. packages
3. PyPI and `pip`
4. What next? Your questions?

In [2]:
import os  # operating system -- info about files and directories

In [3]:
os.listdir('.')

['hello.py',
 "O'Reilly - 2022-q2-apac-first-steps.ipynb",
 'First steps week 2, 2022 05May 20.ipynb',
 'mini-access-log.txt',
 'nums.txt',
 'First steps week 3, 2022 05May 27.ipynb',
 'config.txt',
 'shoe-data.txt',
 'linux-etc-passwd.txt',
 '__pycache__',
 'README.md',
 'First steps week 4, 2022 06June 03.ipynb',
 'menu.py',
 'wcfile.txt',
 'exercise-files.zip',
 'First steps week 5, 2022 06June 10.ipynb',
 'myfile.txt',
 '.ipynb_checkpoints',
 '.git',
 'mymod.py']

In [4]:
os.stat('hello.py')

os.stat_result(st_mode=33188, st_ino=76161725, st_dev=16777223, st_nlink=1, st_uid=501, st_gid=20, st_size=15, st_atime=1654831302, st_mtime=1654831301, st_ctime=1654831301)

In [7]:
# I could write a program that, given a directory, goes through each file and prints its name and size

for one_filename in os.listdir('.'):
    size = os.stat(one_filename).st_size
    
    print(f'{one_filename}: {size}')

hello.py: 15
O'Reilly - 2022-q2-apac-first-steps.ipynb: 76345
First steps week 2, 2022 05May 20.ipynb: 90584
mini-access-log.txt: 36562
nums.txt: 42
First steps week 3, 2022 05May 27.ipynb: 120970
config.txt: 54
shoe-data.txt: 1676
linux-etc-passwd.txt: 2683
__pycache__: 160
README.md: 579
First steps week 4, 2022 06June 03.ipynb: 101023
menu.py: 244
wcfile.txt: 165
exercise-files.zip: 6148
First steps week 5, 2022 06June 10.ipynb: 74554
myfile.txt: 13
.ipynb_checkpoints: 192
.git: 416
mymod.py: 450


In [12]:
def file_sizes(dirname):
    output = {}
    for one_filename in os.listdir(dirname):
        output[one_filename] = os.stat(os.path.join(dirname, one_filename)).st_size

    return output

In [13]:
file_sizes('.')

{'hello.py': 15,
 "O'Reilly - 2022-q2-apac-first-steps.ipynb": 76345,
 'First steps week 2, 2022 05May 20.ipynb': 90584,
 'mini-access-log.txt': 36562,
 'nums.txt': 42,
 'First steps week 3, 2022 05May 27.ipynb': 120970,
 'config.txt': 54,
 'shoe-data.txt': 1676,
 'linux-etc-passwd.txt': 2683,
 '__pycache__': 160,
 'README.md': 579,
 'First steps week 4, 2022 06June 03.ipynb': 101023,
 'menu.py': 244,
 'wcfile.txt': 165,
 'exercise-files.zip': 6148,
 'First steps week 5, 2022 06June 10.ipynb': 78311,
 'myfile.txt': 13,
 '.ipynb_checkpoints': 192,
 '.git': 416,
 'mymod.py': 450}

In [14]:
file_sizes('/etc')

FileNotFoundError: [Errno 2] No such file or directory: '/etc/X11'

In [19]:
file_sizes('/Users/reuven/Courses')

{'Think like a programmer': 96,
 'Machine Learning': 224,
 "Modi'in course suggestions": 192,
 'PostgreSQL': 224,
 'Testing with pytest': 160,
 'non-programmers': 1312,
 'WPE': 1056,
 '.DS_Store': 38916,
 'python-bootcamp': 128,
 'HTML5': 256,
 'Pytest': 288,
 'Python': 352,
 'XML Processing in Python': 416,
 'Old': 1472,
 'coaching-python': 128,
 'Testing': 256,
 'Humor': 160,
 'Current': 736,
 'Design Patterns': 352,
 'Mypy': 160,
 'Data Science': 544,
 'Regular Expressions': 416,
 'Files': 1216,
 'DAB': 1216,
 'Online': 1216,
 'threading': 96,
 'Django': 64,
 'YouTube-notebooks': 1632,
 'Ruby': 288,
 'Git': 704,
 'Master Python': 128}

# Exercise: Total directory size

1. Write a function, `dirsize`, that takes a directory name (string) as an argument.
2. Inside of the function, go through each file in the directory (with `os.listdir`), and run `os.stat` on it, getting the size (`.st_size`).  Add that to the total, and return an integer - the total size of files in the directory.


In [20]:
import os 

def file_sizes(dirname):
    output = {}
    for one_filename in os.listdir(dirname):
        output[one_filename] = os.stat(os.path.join(dirname, one_filename)).st_size

    return output

In [21]:
import os

def dirsize(dirname):
    total = 0
    
    for one_filename in os.listdir(dirname):
        total += os.stat(os.path.join(dirname, one_filename)).st_size
        
    return total

In [23]:
dirsize('/Users/reuven/Desktop')

154246782

In [24]:
dirsize('/Users/reuven/Downloads')

FileNotFoundError: [Errno 2] No such file or directory: '/Users/reuven/Downloads/.#ATT00002.txt'

In [25]:
import os

def dirsize(dirname):
    total = 0
    
    for one_filename in os.listdir(dirname):
        full_filename = os.path.join(dirname, one_filename)
        
        if os.path.exists(full_filename):
            total += os.stat(full_filename).st_size
        
    return total

In [26]:
dirsize('/Users/reuven/Downloads')

11011872139

In [27]:
dirsize('/etc')

2512234

# Modules vs. packages

A module is, as we've seen, an individual file with a `.py` extension.  It can contain definitions of functions, variables, and classes.

What if you have several related modules? Then it becomes a pain to distribute them to other people.  You have to coordinate the use of several files at the same time.

Python provides us with the idea of a "package," a directory in which we have multiple modules.  You can import a package, and it can have special instructions for how to deal with the directory's import.

When you import a module, you don't necessarily know if it's a module (one file) or a package (multiple files in a directory).  It works the same way in your Python program.

In [28]:
os

<module 'os' from '/usr/local/Cellar/python@3.10/3.10.4/Frameworks/Python.framework/Versions/3.10/lib/python3.10/os.py'>

In [29]:
os.__file__

'/usr/local/Cellar/python@3.10/3.10.4/Frameworks/Python.framework/Versions/3.10/lib/python3.10/os.py'

In [31]:
import random
random.__file__

'/usr/local/Cellar/python@3.10/3.10.4/Frameworks/Python.framework/Versions/3.10/lib/python3.10/random.py'

In [32]:
os.listdir('/usr/local/Cellar/python@3.10/3.10.4/Frameworks/Python.framework/Versions/3.10/lib/python3.10/')

['zipfile.py',
 'shutil.py',
 'tempfile.py',
 'lib-dynload',
 'encodings',
 'queue.py',
 '_pyio.py',
 'crypt.py',
 'pkgutil.py',
 'distutils',
 'lzma.py',
 'asyncore.py',
 '__phello__.foo.py',
 '_sitebuiltins.py',
 '_bootsubprocess.py',
 'copyreg.py',
 'sndhdr.py',
 'rlcompleter.py',
 'zoneinfo',
 'gzip.py',
 'ctypes',
 'ipaddress.py',
 'trace.py',
 'webbrowser.py',
 'nntplib.py',
 '_compat_pickle.py',
 'unittest',
 'dis.py',
 'bdb.py',
 'zipapp.py',
 'cmd.py',
 'tty.py',
 'curses',
 'tabnanny.py',
 '_py_abc.py',
 'cProfile.py',
 'zipimport.py',
 'token.py',
 'textwrap.py',
 'base64.py',
 '_markupbase.py',
 'bz2.py',
 'signal.py',
 'sre_constants.py',
 'cgitb.py',
 '_aix_support.py',
 '_threading_local.py',
 'pyclbr.py',
 'test',
 'gettext.py',
 'wave.py',
 'weakref.py',
 'bisect.py',
 'opcode.py',
 'netrc.py',
 'heapq.py',
 'functools.py',
 'modulefinder.py',
 '_compression.py',
 'tracemalloc.py',
 'hashlib.py',
 'cgi.py',
 'codeop.py',
 'turtledemo',
 'fnmatch.py',
 'multiprocessing'

In [33]:
import urllib

In [34]:
urllib.__file__

'/usr/local/Cellar/python@3.10/3.10.4/Frameworks/Python.framework/Versions/3.10/lib/python3.10/urllib/__init__.py'

In [35]:
!ls /usr/local/Cellar/python@3.10/3.10.4/Frameworks/Python.framework/Versions/3.10/lib/python3.10/urllib/

__init__.py  error.py  request.py   robotparser.py
__pycache__  parse.py  response.py


In [37]:
!ls -lh /usr/local/Cellar/python@3.10/3.10.4/Frameworks/Python.framework/Versions/3.10/lib/python3.10/urllib/__init__.py

-rw-r--r-- 1 reuven admin 0 Mar 23 22:25 /usr/local/Cellar/python@3.10/3.10.4/Frameworks/Python.framework/Versions/3.10/lib/python3.10/urllib/__init__.py


In [40]:
!cat   /usr/local/Cellar/python@3.10/3.10.4/Frameworks/Python.framework/Versions/3.10/lib/python3.10/xml/__init__.py

"""Core XML support for Python.

This package contains four sub-packages:

dom -- The W3C Document Object Model.  This supports DOM Level 1 +
       Namespaces.

parsers -- Python wrappers for XML parsers (currently only supports Expat).

sax -- The Simple API for XML, developed by XML-Dev, led by David
       Megginson and ported to Python by Lars Marius Garshol.  This
       supports the SAX 2 API.

etree -- The ElementTree XML library.  This is a subset of the full
       ElementTree XML release.

"""


__all__ = ["dom", "parsers", "sax", "etree"]


In [41]:
# method 1 for importing from "mypackage": Use . as a directory/file separator

import mypackage.moda
import mypackage.modb

In [42]:
mypackage.moda.hello('world')

'Hello from moda, world!'

In [43]:
mypackage.modb.hello('world')

'Hello from modb, world!'

In [1]:
# method 2 for importing "mypackage": Use "from .. import"

from mypackage import moda, modb

In [2]:
moda.hello('world')

'Hello from moda, world!'

In [3]:
modb.hello('world')

'Hello from modb, world!'

In [1]:
import mypackage

In [2]:
type(mypackage)

module

In [3]:
dir(mypackage)

['__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__path__',
 '__spec__']

In [4]:
mypackage.moda

AttributeError: module 'mypackage' has no attribute 'moda'

In [1]:
# method 3: we define __init__.py in mypackage, which will execute when we import the package
import mypackage

In [2]:
dir(mypackage)

['__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__path__',
 '__spec__',
 'moda',
 'modb']

# Next up

1. PyPI -- the Python package index
2. `pip` -- the Python installer for packages
3. Your questions about Python, software engineering, AMA!

In [3]:
import rich

In [4]:
rich.print('Hello, world!')

In [6]:
rich.print('Look at this: [bold][red]Hello, world![/red][/bold].  Pretty neat, right?')

In [7]:
rich.__file__

'/usr/local/lib/python3.10/site-packages/rich/__init__.py'

# Data structures vs. algorithms

Data structures are nouns.  They describe how we organize our data.  

We've seen a bunch of data structures:

- Strings
- Lists
- Tuples
- Dicts

Algorithms are procedures, recipes, for solving problems.  

Our data structures are implemented using algorithms.  For example, how does a dict add a new key-value pair?  It uses an algorithm for that, to know whether the location in memory it wants to use has already been taken (or not).

We can write (or implement) algorithms using data structures. 

Some common algorithms that people want to learn have to do with:
- Searching
- Sorting



# Data science -- what do you need to know?

Data science is:
- Data analytics (meaning: make sense of what you already know)
- Data engineering (meaning: getting data from the outside world into your program)
- Machine learning (meaning: make predictions based on data you've collected)

To work in data science, you need to know (a) Python and (b) NumPy and/or Pandas.  Those are the data structures you'll use day to day.

Data engineering requires knowledge of SQL and all sorts of communication systems that I don't know a lot about.  Maybe also clustering, cloud computing, and 3rd party services like AWS.

Machine learning requires knowing about some algorithms (to compare them), testing your models (to know which one is most accurate), cleaning your data, and tricks/tips to improve your model's predictive ability.

# Picking projects

How can you get better at programming? Program!

One option: Do something for work. Then you're getting paid! But.. work might not want to pay for your learning curve.

Option 2: Do something for yourself! Solve a problem you have at home, even a small one.

Option 3: Join an open-source project. Which one? Choose it based on two factors: (1) you care about it and (2) the people running it are nice.  If you can find a mentored sprint at a Python conference, on a topic you care about, that's the best.  Second best are non-mentored sprints.  Third best is -- e-mail the people in charge of the project, and ask where they need help.

Hint: Often, the most help is needed in documentation, debugging.

