 An Introduction To Python
=========================

Welcome!
--------

This course will take you on a journey through the world of Python 3.0.  Whether you are new to programming or just new to Python, expect to come away with a solid understanding of:

-   What the heck is Python and where did it come from?


-   Interacting with Python at the command line (Python is really
    gregarious!)
    

-   Working effectively with an interactive debugging environment


-   Utilizing professional code distributions and package managers


-   Python objects including classes, methods, and collections


-   Customizing your Python objects with your own code

This might sound like a lot – and it is. But Python really is fun to work with and very forgiving to the beginner. 

You can’t break Python. 

It talks to you. In fact, it’s downright loquacious. You can ask it questions and it (usually) provides sensible answers. And when it doesn’t, there are hundreds of thousands of great resources available over the internet. We’ll show you how to access some of these, as well.  So roll up your sleeves and get ready to enjoy the ride!

The Python Environment
----------------------

You should already have Python installed on your machine. If not, no worries. You could go ahead and install a fresh Python distribution and an IDE (integrated development environment).

Yes, it’s possible to download, compile, and install Python. If you’re ever interested, downloads are available directly from the official Python web site:

https://www.python.org/downloads.

This being said, there are distributions such as EnThought and Anaconda which offer integrated package management that allows easy access to extended libraries and their sometimes extensive/tedious dependencies. This is especially valuable when adding complex libraries such as NumPy, which can take hours to get right.

For this course, we’ll use the Anaconda distribution. This supports all sorts of data science and visualization packages that we’ll touch on in the advanced course, along with Python virtual environments “right out of the box.” You can install it by browsing to this sight and following
the instructions to get the 64-bit version for Python 3. 

https://www.anaconda.com/download/

The “official” IDE for Python is called IDLE. It provides a rudimentary and very light weight debugger and file manager. IDLE works and works well, but for serious work I much prefer a more fully-functional environment such as PyCharm, WingIDE, or VSCode with the Python plug-in. If you’re accustomed to Eclipse, you could use the PyDev plug-in[1]. A light-weight debugger, Spyder, comes packaged with Anaconda. If you want to use Wing, you can download a full-on Professional version (free for 30 days)[2]. You have many options.

Python uses environment variables to help it work and play well with the operating system (OS). The most important of these are PATH and PYTHONPATH.

PATH contains the directories your OS will search when looking for a particular application. In order for Python to “just work” when you
type:

**\$ python 

You have to include Python’s binary (executable) directory in the PATH variable. That’s where python.exe lives. On a Windows system, you can do that by clicking:  

**Start ... Control Panel... System ... Advanced System Settings ...
Environment Variables**

Then edit PATH by appending the binary directory. You have to be a bit careful. Add a semicolon, no spaces, and the name of the new directory with no trailing slash. Something like:

\<existing path\>;c:\\Path\\To\\Directory

On a Linux system, you can simply add an export to \~/.bashrc:

Export PATH = \$PATH:/path/to/directory

PYTHONPATH works the same way PATH does, but it’s used internally by Python to search for modules (libraries, for instance) that you’ll want to import into your code’s namespace.

Once you make the necessary changes to your environment variables, you’ll need to start a new terminal to apply them (or run source \~/.bashrc in Linux).

Some Basics 
-----------

Let’s have a quick look at some Python concepts. We’ll cover all these
in more depth later, but we’ll put them on the table early.

Python is an amazingly simple language from a syntax perspective.
Believe it or not, there are only 33 keywords only a few of which are
exotic. Here’s a list[3]:

|        |          |         |          |        |
|--------|----------|---------|----------|--------|
| False  | class    | finally | is       | return |
| None   | continue | for     | lambda   | try    |
| True   | def      | from    | nonlocal | while  |
| and    | del      | global  | not      | with   |
| as     | elif     | if      | or       | yield  |
| assert | else     | import  | pass     |        |
| break  | except   | in      | raise    |        |
|<img width=110/>|<img width=110/>|<img width=110/>|<img width=110/>|<img width=110/>|

Let’s have a quick look at some Python concepts. We’ll cover all these in more depth later, but we’ll put them on the table early.

If you are ever uncertain about whether a variable name you’re considering is a keyword (or otherwise known to the interpreter), you can just type it into a command line. If the interpreter “yells at you”, you can choose another name.

A few other things to know:  

-   Case matters – Python is case sensitive  


-   Indentation matters – Python doesn’t rely on brackets or braces   


-   Whitespace typically does not matter  


-   Help is just a few keystrokes away – just type “help”  

So, what is Python, anyway?

-   A **high-level**, **object oriented** (each object has its own
    repertoire of nouns and verbs, and they can borrow from and build on
    each other) **interpreted** language (“just in time” compilation).
    

-   You can think of it a giant wrapper around hundreds of thousands of
    lines of C and C++ libraries, each providing consistent APIs
    (Application Programming Interfaces) so everything has the same look
    and feel.
    

-   Created by **Guido van Rossum**, a great fan of **Monty Python**, in
    the late 1990s as a successor to the ABC language, with significant
    support from **DARPA**. (That’s Guido holding a beer in the photo
    below.)
    

-   Highly extensible to applications such as GIS (pyqgis), scientific
    programming (numpy), visualization (matplotlib), and many other
    domains.
    

-   Python is operating system agnostic, and runs equally well on
    Windows, Linux, Raspberry Pi, and Mac systems.
    

Python has been around since 1994 and has undergone several transformations. Most notably, Python evolved from 2.7 to 3.0 in 2008, implementing Unicode characters and addressing some lingering structural concerns.

There are plenty of solid applications written in 2.7 in the world, but most newer developments have shifted to Python 3. You should be aware that there are enough substantive differences between 2.x and 3.x that code written against one version is not wholly compatible with the other.

By design Python is simple, clear, and transparent. Philosophical tenants (shown in abbreviated form here) are always available from a Python prompt:

\>\>\> import this


> The Zen of Python, by Tim Peters
>
- Beautiful is better than ugly.

- Explicit is better than implicit.

- Simple is better than complex.

- Complex is better than complicated.

- Flat is better than nested.

- Sparse is better than dense.

- Readability counts.

- Special cases aren't special enough to break the rules.

- Although practicality beats purity.

- Errors should never pass silently.

- Unless explicitly silenced.

- In the face of ambiguity, refuse the temptation to guess.

- There should be one-- and preferably only one --obvious way to do it.

- Although that way may not be obvious at first unless you're Dutch.

- Now is better than never.

- Although never is often better than \*right\* now.

- If the implementation is hard to explain, it's a bad idea.

- If the implementation is easy to explain, it may be a good idea.

- Namespaces are one honking great idea -- let's do more of those!.

OK. Let’s get started and have some fun!

Basic Python Syntax
===================

Hello, Python
-------------

As you know, there’s an unwritten Cosmic Law stating all computer courses have to start with a “Hello World” application. Since we don’t want the Cosmic Police kicking the door down, let’s start with that.  Besides, it’s a great introduction to Python’s command line operations.

To invoke the interpreter, type this from the command line (except the “\$”):

\$ python3

Python 3.9.0 (default, Oct 14 2020, 11:33:57) [GCC 4.8.4\] on linux

Type "help", "copyright", "credits" or "license" for more information.

\>\>\>


So what’s this? The first few lines let you know what “flavor” of Python you’re using, and how it was compiled. This can be important because you may well have several versions installed on the same machine. More on that later.

The last line is the interpreter’s prompt (\>\>\> )  to let you know you’re talking to Python. So let’s say “hello.”

In [342]:
print("hello")

hello


In this command, we utilize one of Python’s top-level methods, the **print function,** and one of Python’s top-level objects, the string, known internally as a **str**. Note that the arguments to the function are enclosed in parentheses, and that the contents of the string are enclosed in quotes.

Strings are ubiquitous, so it’s good to know a few ways to compose them.  

## Comments and Escape Characters

Here are some examples. These are annotated using the **\#** character – that’s how you can make single line comments. Anything to the right of the \# will not compile, so you can embed your comments in-line.

In [343]:
# You can use double quotes – useful with apostrophes.
print("Hello, Guido's cool project.")

# You can also “escape” the apostrophe with a "\".
print('Hello, Guido\'s cool project.')

Hello, Guido's cool project.
Hello, Guido's cool project.


Let’s try a few more strings. This time, we’ll assign them to names so wrangling them will be easier. 


## Namespaces and String Variants
So what are names? These are just tags you can associate with values. “Values” are simply the bits of information stored in the computer’s memory.

In [344]:
#single quotes work just fine
team = 'Cubs'

#you can also use triple double quotes
year = """2018"""

#... or triple single quotes
outcome = '''win'''

#these can be provided to print as arguments, separated by commas.
print(team, year, outcome)

Cubs 2018 win


## Automatic Concantenation
You can provide a series of strings together and Python will automatically concatenate them (run them together):

In [345]:
"and" "a" "one" "and" "a" "two"

'andaoneandatwo'

## Line Continuation
If you ever want a multi-line string, you can use either of the triple quote styles.

In [346]:
chant = """ \
            Cubs
             2018
             win"""
print(chant)

             Cubs
             2018
             win


If you study the above example carefully, you’ll notice a couple features. The interpreter will automatically allow a command to be broken up into several lines if there’s an unclosed control symbol. Since we had not yet included the terminating triple quote, it patiently waited. This feature also works for parentheses, braces, etc.

Here is another example:

In [347]:
# we’ll introduce the list object soon
a_list = [99,
          33,
          40
          ]
a_list

[99, 33, 40]

You now know how to communicate with Python’s command line. You can execute functions, create objects, provide commands, and receive output.  

Now, it’s time to roll your sleeves up.

## Exercise:
Please write your own version of a “hello world”, working at the terminal. Practice using all four of the string composition methods, and with both single and multiple line versions.  Document your code for posterity.

# What’s Your Number?
-------------------

Do you remember when I said that Python is loquacious? I was not kidding. You can have an interactive conversation with the interpreter in which it will tell you about itself and all the objects it knows about. The best part is that you can ask it for help and get reliable information to guide your programming efforts.

No course can teach you everything you need to know, not even this one.  Knowing how to efficiently get the help you need is perhaps the most important skill you can acquire. So the information in this chapter is the most critical bit of the course.

The interpreter can tell you about itself. Sometimes you have to access built-in libraries by importing them into your local **namespace**. A namespace is all the names (“tags”), associated with the **values** stored in memory, your program has access to. By importing more names, we have access to more values and objects.

## Which Python

Here, we’re asking the interpreter what version of Python we’re using, which release, and what operating system it finds itself operating within.

Let’s have a conversation!

In [348]:
import sys  # tools for working with the Python environment
sys.version

'3.9.0 | packaged by conda-forge | (default, Oct 14 2020, 22:54:35) [MSC v.1916 64 bit (AMD64)]'

In [349]:
sys.version_info

sys.version_info(major=3, minor=9, micro=0, releaselevel='final', serial=0)

In [350]:
import os  # tools for working with the OS
os.name

'nt'

## Chatting up an Object

Now, this gets even better. Let’s say we have an object. 

*“One is the loneliest number that there ever was ...”*, or so the song goes. 

Maybe we can address the issue by engaging with 1.  Or maybe we can’t, but let’s try.  To start with we’ll give it the name “one”, pull up a chair, then chat it up.

In [351]:
one = 1  # hey, can I call you ‘one’?

In [352]:
one  # tell me a little about yourself

1

In [353]:
id(one)  # where do you live?

2324896704816

In [354]:
type(one)  # I like your type. What is it, anyway?

int

In [355]:
isinstance(one, int)  # Um. Right, then an integer?

True

Let’s step through this preliminary conversation. 

After we assigned it a name, we then called it by name and it responded. It happened to respond with its value, but not all objects do that. The response is built into the object as its **\_\_repr\_\_** method, which is executed when you type the name. When you make your own classes later on, you’ll be able to make it say something like “I’m 1, and I’m a Leo”, or anything else you like. The **int** object simply replies with a parsimonious string representation of itself.

The next bit, invoking the top level function **id** provides a unique descriptor of the object. Under the hood, it’s just the location in memory where the object resides (that’s how it’s guaranteed to be unique – no two objects can occupy the same space at the same time).

We’re pretty sure what type of object one is, but we can ask to be sure using the **type** method. In this case, we learn that it’s an **int** object (not an “integer”). In fact, **int** is the name of a built-in function and already reserved by Python. We can use it with the **isinstance** method to verify its type. This can become important – for instance, we might want to check whether an object is a number before attempting to do math with it.

One of the really useful aspects of Python is that objects come to us already endowed with properties and already knowing how to do type-appropriate tricks. More properly these are called **attributes** and **methods** (“nouns” and “verbs”). 

## Methods Can Be Built Into Objects

We can discover some of the methods built into the **int** objects empirically, just by interacting with it.  So let’s experiment:

In [356]:
one + one  # can you add?

2

In [357]:
one / one  # divide?

1.0

In [358]:
one * one  # OK then, can you multiply?

1

In [359]:
one ** one  # do powers?

1

In [360]:
one % one  # how about modulo operations?

0

This is pretty much as expected, since we’re already familiar with integers. But we’ll frequently encounter objects that we’re not familiar with, or need to know how they handle various operators. Sometimes things aren’t as expected. For instance, what happens if we do a “multiplication” or “addition” operation on a string?

##  Implementation of Operators Varies

Sometimes things aren’t as expected. For instance, what happens if we do a “multiplication” or “addition” operation on a string?

In [361]:
 "go" + "cubs"

'gocubs'

In [362]:
"cubs" * 10

'cubscubscubscubscubscubscubscubscubscubs'

Getting Help
------------

Though it’s interesting and really informative to experiment, you’ll typically be too busy to indulge. In these cases, the most straightforward way to engage is to ask for help. This is easy enough to accomplish as you can see below. 

In [363]:
help(one)

Help on int object:

class int(object)
 |  int([x]) -> integer
 |  int(x, base=10) -> integer
 |  
 |  Convert a number or string to an integer, or return 0 if no arguments
 |  are given.  If x is a number, return x.__int__().  For floating point
 |  numbers, this truncates towards zero.
 |  
 |  If x is not a number or if base is given, then x must be a string,
 |  bytes, or bytearray instance representing an integer literal in the
 |  given base.  The literal can be preceded by '+' or '-' and be surrounded
 |  by whitespace.  The base defaults to 10.  Valid bases are 0 and 2-36.
 |  Base 0 means to interpret the base from the string as an integer literal.
 |  >>> int('0b100', base=0)
 |  4
 |  
 |  Built-in subclasses:
 |      bool
 |  
 |  Methods defined here:
 |  
 |  __abs__(self, /)
 |      abs(self)
 |  
 |  __add__(self, value, /)
 |      Return self+value.
 |  
 |  __and__(self, value, /)
 |      Return self&value.
 |  
 |  __bool__(self, /)
 |      self != 0
 |  
 |  __ceil_

Another way to get information quickly is to use the built-in **dir** method on the object. The output is a little harder to interpret, especially when you’re looking at it for the first time, but you can get a fairly compact look at the contents of the object’s **namespace**.  Let’s check it out, and I’ll explain some of the bits and pieces below.

In [364]:
dir(one)

['__abs__',
 '__add__',
 '__and__',
 '__bool__',
 '__ceil__',
 '__class__',
 '__delattr__',
 '__dir__',
 '__divmod__',
 '__doc__',
 '__eq__',
 '__float__',
 '__floor__',
 '__floordiv__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getnewargs__',
 '__gt__',
 '__hash__',
 '__index__',
 '__init__',
 '__init_subclass__',
 '__int__',
 '__invert__',
 '__le__',
 '__lshift__',
 '__lt__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__neg__',
 '__new__',
 '__or__',
 '__pos__',
 '__pow__',
 '__radd__',
 '__rand__',
 '__rdivmod__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rfloordiv__',
 '__rlshift__',
 '__rmod__',
 '__rmul__',
 '__ror__',
 '__round__',
 '__rpow__',
 '__rrshift__',
 '__rshift__',
 '__rsub__',
 '__rtruediv__',
 '__rxor__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__sub__',
 '__subclasshook__',
 '__truediv__',
 '__trunc__',
 '__xor__',
 'as_integer_ratio',
 'bit_length',
 'conjugate',
 'denominator',
 'from_bytes',
 'imag',
 'numerator',
 'real',
 'to_bytes']

The strange looking names that are surrounded by the “\_\_” characters (which some call “dunder”, for “double underscore”) are mostly methods that implement logic to make operational characters like “+” or “\<.” In general, any name beginning with one or more “\_” characters is intended for the object’s internal usage. You can access these, of course, but you’ve been warned.

The plain-text names are designed for “public consumption.” You can learn more about them by slogging through the verbose help, asking for help on a more narrow topic, or just by typing them in at the command line, something like:

In [365]:
one.denominator

1

In [366]:
one.real

1

In [367]:
one.from_bytes

<function int.from_bytes(bytes, byteorder, *, signed=False)>

In [368]:
help(one.from_bytes)

Help on built-in function from_bytes:

from_bytes(bytes, byteorder, *, signed=False) method of builtins.type instance
    Return the integer represented by the given array of bytes.
    
    bytes
      Holds the array of bytes to convert.  The argument must either
      support the buffer protocol or be an iterable object producing bytes.
      Bytes and bytearray are examples of built-in objects that support the
      buffer protocol.
    byteorder
      The byte order used to represent the integer.  If byteorder is 'big',
      the most significant byte is at the beginning of the byte array.  If
      byteorder is 'little', the most significant byte is at the end of the
      byte array.  To request the native byte order of the host system, use
      `sys.byteorder' as the byte order value.
    signed
      Indicates whether two's complement is used to represent the integer.



You’ll recall that when you simply type the name of an object into the command line, you’re asking it to describe itself by invoking its **\_\_repr\_\_** method. This is one of Python’s so-called “magic methods” which are baked into the object with specific names that are meaningful to the interpreter. As another example, the **\_\_str\_\_** method is invoked when you use the object as an argument to the **print** method.

Here’s a list of some of the other “magic methods”, along with their meanings. From the **dir** listing, we can see that most of these are baked into the **int** object. We’ll actually be writing our own versions of some these later in the course, but for now, here’s an overview of the math operations.

| Human                   | Operator    | Method                |
|-------------------------|-------------|-----------------------|
| addition                | x + y       | x.\_\_add\_\_(y)      |
| bitwise and             | x & y       | x.\_\_and\_\_(y)      |
| bitwise or              | x \| y      | x.\_\_or\_\_(y)       |
| bitwise xor             | x ^ y       | x.\_\_xor\_\_(y)      |
| division                | x / y       | x.\_\_truediv\_\_(y)  |
| floor division          | x // y      | x.\_\_floordiv\_\_(y) |
| floor division & modulo | divmod(x,y) | x.\_\_divmod\_\_(y)   |
| left bit-shift          | x \<\< y    | x.\_\_lshift\_\_(y)   |
| modulo (remainder)      | x % y       | x.\_\_mod\_\_(y)      |
| multiplication          | x \* y      | x.\_\_mul\_\_(y)      |
| raise to power          | x \*\* y    | x.\_\_pow\_\_(y)      |
| right bit-shift         | x \>\> y    | x.\_\_rshift\_\_(y)   |
| subtraction             | x - y       | x.\_\_sub\_\_(y)      |
|<img width=210/>         |<img width=210/>    |<img width=210/>|

You’ll note that the **help** method is not within the namespace of the objects. That’s because **help** is a top-level method. It works by ransacking the object for what are called its **docstrings** – these are provided by the programmer. They survive compilation so are carried with the object itself, and can be whatever the programmer wants them to be.  By contrast, comments do not survive compilation and exist mostly as developer-to-developer notes.

**Docstrings** are simply the first executable line(s) of code in an object, assuming a couple of things: the line(s) of code is a **str** object, and the **str** object is not associated with a name. A quick example should serve to clarify.

In [369]:
def my_first_function():
    "the most simple function on Earth"

help(my_first_function)

Help on function my_first_function in module __main__:

my_first_function()
    the most simple function on Earth



Here, the built-in **help** method simply picked off the docstring and served it up as content within its delivery system (which is just like **man** to deliver manual pages in a Posix system).

Had this function been a method within a containing object like a class, the help system would provide all available docstrings in the containing object. That’s what happened when we used **help**(**one**) – we got all the docstrings for the methods and attributes within the **int** class.

While the built-in functionality typically serves well, sometimes it’s necessary to go outside your code or use libraries that support introspection of your objects[4]. Python has exceptionally well-written official documentation that’s available at:

<http://python.org/3>

One word of caution – at the top-left of the Python docs web page, you’ll see a small scroll-down menu with a number in it. 

The number is the version of Python addressed by the main page’s contents. It’s quite easy to follow a link from a StackOverflow posting (say) that vectors into the wrong version. Among versions of Python3 it usually won’t matter, but there are significant differences between 2.x and 3.x and you could waste a lot of time trying to get the wrong material to work.


Numbers and More Numbers
------------------------

There are only two other numeric types within the core Python language, the **float** and **complex** types. There are some exotic types in external libraries such as decimal, such as **Fraction** (for real numbers) and **Decimal** (for arbitrary levels of precision)[5].

The **float** type is pretty much what you would expect – it’s anything with a decimal point. Here’s an example:

In [370]:
root_beer = 1.0
type(root_beer)

float

Unless you’re an electrical engineer or a glutton for punishment, you probably have little to do with “complex numbers.” They’re not all that complex, but require two components. The first is “real” – an ordinary number that stands on its own feet – and the second is “imaginary” – a number that’s multiplied by the square root of -1. Yes, that’s where it gets weird, but it’s beyond the scope of this class. Here’s one way to specify a **complex** object:

In [371]:
complexity = (3 + 5j)   #Note that 'j' signifies the imaginary component - not 'i'
type(complexity)

complex

 Accuracy and Garbage
--------------------

Integers can be represented accurately in a computer because they are nicely discrete. Later versions of Python implement integers in such a way as they can be arbitrarily long, and don’t have to be explicitly specified as such.

Floating point numbers are only so accurate (the default level of precision is 28 places). This can become an issue with numbers that never end, like 1/3, or numbers which are really large. Just for fun, you can try adding .1 + .1 + .1 to see what happens.

A final point: floating point number can contain some “garbage bits” in the 1/1000000 decimal place and beyond, even if you’ve specified it as simply as 1.0 – something to be aware of when doing tests of equality.

 Numeric Types as Methods
------------------------

We have seen numeric types like **int** and **float** used to describe objects, but they can also be used to convert data types (“type casting”) by invoking them as constructor methods. Here’s how you can convert an **int** to a **float**

In [372]:
one = 1
one_float = float(one)
type(one_float)

float

In [373]:
and_back = int(one_float)
type(and_back)

int

In [374]:
one = 1
one_float = float(one)
type(one_float)

float

In [375]:
and_back = int(one_float)
type(and_back)

int

Python will even try to convert strings to numeric types. If it can’t, you’ll get a **ValueError** exception.

In [376]:
one_string = ("1")
one_string_float = float(one_string)
one_string_float

1.0

In [377]:
type(one_string_float)

float

You can also directly specify variables invoking the constructor with appropriate arguments. Python will do its level best to whistle up the right object for you. Here are some examples:

In [378]:
print(complex(1, 2), complex(1))

print(float(1), int('333'))

print(int(1.3),  str(123))

(1+2j) (1+0j)
1.0 333
1 123


Getting User Input
------------------

Python has built-in method for getting information from the user using the standard input stream (think “keyboard” for now). You probably won’t be building applications using direct user input very often, but it’s handy to know how to use. The syntax is straightforward:

In [379]:
# NB Python will hang until you hit 'Enter' !  This will drive you insane if you forget.

answer = input("Hey, Pat, how’s it going?   ")
print(answer)

Hey, Pat, how’s it going?   s'okey
s'okey


The response will always be returned as a string, no matter what, as you can see from this terminal session:

\>\>\> my_input = input("Can I pleeese have a one? ")
Can I pleeese have a one? 1

\>\>\> my_input

'1'

\>\>\> type(my_input)

class ‘str’

If you need an integer, you’ll have to use the **int** constructor to type-cast it.



## Exercise:

You now have the wisdom of the ages at your disposal – a good bit of it, anyway. Let’s see if you can put some of this to good use.

- If it takes three buckets of water to put out a campfire (two just won’t do it), how many campfires can you extinguish with 50 buckets? Solve the problem using the modulo operator.  You might use help() or experiment if you don't know how to use it.


- Create a program that tests how “*”, ‘/’, “+” and “-“ are implemented for string, float, and complex numbers. Use a docstring to document it and make sure it works when calling help on it. Include some comments as a kindness to yourself and your teammates.


- Create a program that asks the user for their name, sign, and favorite beverage, then asks them to confirm that information.

You now have the wisdom of the ages at your disposal – a good bit of it, anyway. Let’s see if you can put some of this to good use.

# “String Theory” 
---------------

Now that you know how to create strings, let’s explore what how we can manipulate and interact with them.

The first thing you should know that strings are **sequences**. We’ll discuss more complicated sequences later, but for now you should know you can iterate over them quite easily[6]. 

Check this out:

In [380]:
some_string = "Hey!"
for s in some_string:
    print(s)

H
e
y
!


String Methods
--------------

There’s a rich inventory of methods baked into the **str** object that makes text processing really easy[8]. As with any Python object, we can get a quick look using **dir**.




\>\>\> dir(str)

[snip, ‘capitalize’, ‘casefold’, ‘center’, ‘count’, ‘encode’, ‘endswith’, ‘expandtabs’, ‘find’, ‘format’, ‘format_map’, ‘index’, ‘isalnum’, ‘isalpha’, ‘isdecimal’, ‘isdigit’, ‘isidentifier’, ‘islower’, ‘isnumeric’, ‘isprintable’, ‘isspace’, ‘istitle’, ‘isupper’, ‘join’, ‘ljust’, ‘lower’, ‘lstrip’, ‘maketrans’, ‘partition’, ‘replace’, ‘rfind’, ‘rindex’, ‘rjust’, ‘rpartition’, ‘rsplit’, ‘rstrip’, ‘split’, ‘splitlines’, ‘startswith’, ‘strip’, ‘swapcase’, ‘title’, ‘translate’, ‘upper’, ‘zfill’]




Most of these are self-explanatory, but we’ll discuss a few of the most commonly-encountered ones and how to use them here. You can test a string – or a bit of one – to see what type of characters are in it.  

Note that in Python there is no “character” object *per se* – a single character is simply a really short string. We could apply these operations to the intact some\_string object just as easily.

Also note that when we execute a method, we need the parentheses even if we’re not supplying any arguments[9]. 

The following script demonstrates some of the string methods and introduces the use of "{ }". This is one way to create placeholders for arguments supplied to the **format** method. We’ll cover formatting in more detail later.

In [408]:
some_string = "$R5a "
for s in some_string:
    print("{} is alphanumeric? {}".format(s, s.isalnum()))
    print("{} is alpha? {}".format(s, s.isalpha()))
    print("{} is numeric? {}.".format(s, s.isnumeric()))
    print("{} is upper? {}.".format(s, s.isupper()))
    print()

$ is alphanumeric? False
$ is alpha? False
$ is numeric? False.
$ is upper? False.

R is alphanumeric? True
R is alpha? True
R is numeric? False.
R is upper? True.

5 is alphanumeric? True
5 is alpha? False
5 is numeric? True.
5 is upper? False.

a is alphanumeric? True
a is alpha? True
a is numeric? False.
a is upper? False.

  is alphanumeric? False
  is alpha? False
  is numeric? False.
  is upper? False.



There are lots of methods to query for, and change the case of, a string. Note that the queries return a Boolean (**True** or **False**) object.

In [382]:
my_team = "The Chicago Cubs"

print("my team ends with an ‘s’? {}".format(my_team.endswith("s")))
print("my team starts with an ‘s’? {}".format(my_team.startswith("s")))
print("in caps {} .".format(my_team.upper()))
print("swapped-case {} .".format(my_team.swapcase()))

my team ends with an ‘s’? True
my team starts with an ‘s’? False
in caps THE CHICAGO CUBS .
swapped-case tHE cHICAGO cUBS .


Breaking up and Getting Together Again
--------------------------------------

There are also methods to split up strings (returning a **list**[10] object) and to create a string by stitching together the elements of an **iterable**[11] object. In the example below, we can supply the **split** method a single argument – the string we’ll use to determine where to break the main string up[12]. Providing a single space (or calling it with no arguments) splits up the words into individual **list** elements.

The reciprocal operation is **join**. With **join** we provide a string to insert between each element pulled from the list to reform the string. Though this example uses a **list**, this method works for any iterable object (such as a **tuple**[13] or even another string).

In [383]:
my_team = "The Chicago Cubs"
split_team = my_team.split(" ")
print("splits: {}".format(split_team))
print()

join_string = "!!!"
together_again = join_string.join(split_team)
print("together again: {}."
    .format(together_again), end=join_string + "\n")

splits: ['The', 'Chicago', 'Cubs']

together again: The!!!Chicago!!!Cubs.!!!


Literals and Escape Sequences
-----------------------------

Sometimes, you need to insert a specific character into a string that you can’t type directly. There are a couple ways to handle this.  The first is using “escape sequences." We’ve already seen that “\\n"

produces a new line character, but there are many more. The idea here is that inclusion of a backslash “\\" tells the interpreter to do something special with (“escape") the character that follows instead of taking it literally. Others include:

> \\t tab
>
> \\r carriage return
>
> \\’ prints a single quote
>
> \\" prints a double quote
>
> \\\\ if you really want a backslash

Another way to handle this is to figure out what the character code is.  You can do that easily enough for the first 128 (ASCII) characters by printing them out with the **chr** method[14], finding your character, then using the same method in your code . You could go something like:

In [384]:
for i in range(33, 36): #yes, I’m cheating I know it’s 34
    print (i, chr(i))
    
QUOTE = chr(34)
print("I’m a double quote: {}".format(QUOTE))

33 !
34 "
35 #
I’m a double quote: "


Cleaning House and Moving Furniture
-----------------------------------

Python’s string methods have an easy-to-use, and remarkably efficient, ways to do searches and replacements. “Under the hood" they wrap regular expressions (“regexes") and, for many operations, deliver the same performance. Once we get into regexes later, you will really appreciate this implementation.

### Boolean Evaluations
Observe how Python handles Boolean tests applied to things like strings. You might think this is pretty weird:

In [385]:
dog = "Quinn the Husky"
dog

'Quinn the Husky'

In [386]:
not dog

False

Huh? How can Python possibly evaluate a string as a Boolean value? 

Well, it turns out that whenever we ask for a Boolean evaluation, Python will go ahead and serve one up. 

It will return **False** if the object is the number 0, the keyword **None**[15], a null string, an empty list, or any other empty object. 

The opposite is also the case. When a Boolean operation is applied against a non-zero number, a string with at least one character, etc., **True** will be returned. 

Here's how one might use this behavior to test whether a user provided any input:

Note the use of the **if** keyword for a simple logical branch. The syntax is straightforward:

**if** \<condition\>:

      \<an indented suite\>

### More Python Keywords and **str** Object Manipulations
Let's take a tour of some of the methods built into the string object with some real code. Here we'll learn ways to find and replace components of **str** objects. 

We'll begin with a string containing some popular sins.

In [387]:
seven_deadly = " avarice envy wrath sloth \
gluttony lust hubris "
print(seven_deadly)

 avarice envy wrath sloth gluttony lust hubris 


Here is how we can use the keyword **in**[16] to figure out if one string can be found within another. This is a very vague query that will yield a Boolean response. 

In Python we can use **not** in a logical expression for negation.

In [388]:
sin = 'eNvY'

print("{} in there? {}\n"
      .format(sin.lower(),
              sin.lower() in seven_deadly.lower())
      )

# doesn't exist? (use not for Boolean negation)
print("Try the key word not:")
print("{} NOT in there? {}\n"
      .format(sin.lower(),
              sin.lower() not in seven_deadly)
      )

envy in there? True

Try the key word not:
envy NOT in there? False



Note that we're invoking the lower method when searching the string.  It's not necessary, but not a bad idea because it lets you get away without trying every petty permutation of the word.

A more surgical approach is to use **find**. This will attempt to locate the initial position of the string you're looking for and return its findings. If it strikes out, **find** will return a -1.

In [389]:
# Locate the position. Note the two-element tuple object.
for sin in ('envy', 'texting while driving', 'sloth'):
    sin_position = seven_deadly.find(sin.lower())

    print ("Checking {}:".format(sin))

    if sin_position > 0:
        print("Yup. We found '{}' starting in position {}.\n"
                  .format(sin.lower(), sin_position))
    else:
        print("Nope. Since find() returned {}, '{}' wasn't there.\n"
                  .format(sin_position, sin.lower()))
        

Checking envy:
Yup. We found 'envy' starting in position 9.

Checking texting while driving:
Nope. Since find() returned -1, 'texting while driving' wasn't there.

Checking sloth:
Yup. We found 'sloth' starting in position 20.



It's possible to replace characters with new ones with the – you guessed it – **replace** method[17]. Here's a simple example that shows a potential pitfall for a new Python programmer. 

Strings are "immutable" – not changeable. Running a replace operation doesn't change the string.

In [390]:
#Replace characters
print("strings are immutable - compare these:\n")
seven_deadly.replace('avarice', 'greed')

print("no replacement: ", seven_deadly, "\n")
seven_deadly = seven_deadly.replace('avarice', 'greed')

print("replaced at last: ", seven_deadly + '\n')

strings are immutable - compare these:

no replacement:   avarice envy wrath sloth gluttony lust hubris  

replaced at last:   greed envy wrath sloth gluttony lust hubris 



You can remove leading and trailing whitespaces using various flavors of the **strip** method[18]. 

This is really handy when handling potentially-messy data like user inputs. The following example demonstrates this, along with one of Python's top-level functions len.  

Note that **len** is not built into the string method – it's generally available for all objects

In [391]:
#Remove leading/trailing spaces. Note chained methods.
print("The original sin list is {} characters long.".
        format(len(seven_deadly)))

print("With the white spaces removed it's {} long.".
        format(len(seven_deadly.strip())))

The original sin list is 45 characters long.
With the white spaces removed it's 43 long.


Python's ability to have one function "swallow" the output of another so it's possible to chain operations together. Here, the output from the **strip** method is fed into the **len** method. The combined result is fed to **format**. And the result of all that activity is rolled up into a single argument to **print**. You'll see a lot of code like this as you explore Python.

 

Making Things Beautiful: the Format Mini-Language[19]
-----------------------------------------------------

You have already seen some basic applications of Python's text formatting capabilities. If you place opposing curly braces in a string they serve as "catcher's mitt" for any arguments you provide to **format**().  Python actually supports different styles of formatting.  We touch on a few of these here.

The olde tyme, C-style method (which you'll see in Python 2 code) is not recommended, but is still supported.

In [392]:
my_str = "'My String'"
my_float = 1.23
my_int = 666

In [393]:
print("Python can print strings %s, floats %f, and ints %i." %(my_str, my_float, my_int))

Python can print strings 'My String', floats 1.230000, and ints 666.


You can also provide empty braces as placeholders.   Under the hood, they are indexed as 0, 1, 2, etc. and are replaced by arguments provided to **format()**:

In [394]:
print("Python can print strings {}, floats {}, and ints {}.".  
      format(my_str, my_float, my_int))

Python can print strings 'My String', floats 1.23, and ints 666.


It's also possible to scramble the indices, and even repeat elements.  Note, also, that we don't need to use all the arguments provided.   Here's how:

In [395]:
print("Some numbers: {3} {3} {2} {2} {1} {0}".format(0, 1, 2, 3, 4))

Some numbers: 3 3 2 2 1 0


Separating the formatted string from the data is also possible.  That can make your code more reusable.   Here, we're using a multi-line version of a string.   Instead of a numeric index, we're naming the fields - that allows us to use it with dictionary-like arguments:

In [396]:
stg = """\
    Dear {donor},\n   
       I'm running for {office} and would like to shake
       you down for a ${dollars} contribution.\n
    Sincerly, \n
     - {candidate}"""

print(stg.format(donor = "SuperPAC", office = "President",
                 dollars =100, candidate = 'Me'))

    Dear SuperPAC,
   
       I'm running for President and would like to shake
       you down for a $100 contribution.

    Sincerly, 

     - Me


If we already have the variables defined, we can simply create a format statment using those names.  This is the most common and most readable (IMHO) way to do it.   Note that you can perform operations of any kind within the format string's {placeholders}.

In [397]:
fav_musician = "Jerry Garcia"
fav_band = "Grateful Dead"

# This is called an 'f-string' - introduced in Python 3.6
print(f"{fav_musician} and the fabulous {fav_band} are the greatest!\n")

print(f"... because there are {len(fav_musician)} letters in {fav_musician}'s name...")
print(f"... and half of that is {len(fav_musician)/2}.")


Jerry Garcia and the fabulous Grateful Dead are the greatest!

... because there are 12 letters in Jerry Garcia's name...
... and half of that is 6.0.


It's also easy to provide column width and justification specifications.  The general form of the specification is something like:

{ \<field name\> : \<alignment\>[21] \<width\> }

This example creates three columns of equal width, left justified in a
format string, then "recycles" the string to build a nicely-formatted
table.

In [398]:
stg = "{0:<10} {1:^10} {1:>10}"
print(stg.format("Presenting: Addition!!!\ng", ''))
print(stg.format("output", "input"))
print(stg.format("=" * 6, "=" * 6))
print(stg.format(4, 2))
print(stg.format(8, 4))


Presenting: Addition!!!
g                      
output       input         input
4              2               2
8              4               4


You can also ask Python to temporarily cast variables into other forms for the purposes of output. Here, we're printing out decimal, binary, and hex flavors of some numbers.

In [399]:
fmt = "{0:6} = {0:#16b} = {0:#06x}"
for i in (1, 23, 456, 7890):
    print(fmt.format(i))

     1 =              0b1 = 0x0001
    23 =          0b10111 = 0x0017
   456 =      0b111001000 = 0x01c8
  7890 =  0b1111011010010 = 0x1ed2


Other options allow you to do things like represent add "+" and "-" signs in front of values, to pad out columns, etc. Things can be much more complicated but Python's official documentation is (fortunately) exceptional on this topic[22].

Here's a final example that demonstrates how to use formatting to output a type-cast version of the information displayed.

In [400]:
"""List people's names, ages and weights."""
data = [
        ("Steve", 59, 202),
        ("Dorothy", 49, 156),
        ("Simon", 39, 155),
        ("David", 61, 135) 
        ]
for name, age, weight in data:
    print("{0:12s} {1:4d} {2:4d}.".format(name, age, weight))

Steve          59  202.
Dorothy        49  156.
Simon          39  155.
David          61  135.


Strings can be evaluated with logical tests.  Python uses == for a logical 'is equal to' and != for 'is not equal to'.

In [401]:
string_a = 'CUBS'
string_b = 'cubs'

print(f"Are '{string_a}' and '{string_b}' the same?   {string_a == string_b}")
print(f"Are '{string_a}' and '{string_b}' different?  {string_a != string_b}")


Are 'CUBS' and 'cubs' the same?   False
Are 'CUBS' and 'cubs' different?  True


OK. You have learned a lot about string and numeric data types, how to work with strings, and how to make nicely formatted output. Now, let's flex some Pythonic muscles and make something beautiful!

## Exercise:

Please write an application that quizzes challenges the user to guess all of Python's keywords, keeping track of the successes. 

When the user gives up (maybe by typing "help"), report the number of correctly-guessed keywords, then number not guessed, and a nicely-formatted table for each group.

Hint: you'll find some useful tools in the keyword library, which you can get by going:

import keyword

Hint: you just might find the keyword called **in** useful to determine membership in keywords. You can access it like this:

keyword.kwlist

You'll want to use this approach to query the list you'll find in the keyword library. 

    list_of_critters = ['fox', 'skunk', 'bear']

    print (f"Is 'fox' in our critter list? {'fox' in list_of_critters}") 
    
    Is 'fox' in our critter list? True

Also, you'll need to be able to use the for and if statements.  Recall the general forms:

for i in iterable_object: 
    
    pass
    
if condition:   
    
    pass

Language Components
===================

From a syntax perspective, Python is a simple language. As we've seen, there are only 33 keywords. While Python has data types, variables don't need to be declared to be of some specific type before use. There's no requirement to initialize variables. Objects generally come to us "batteries included." Pretty slick, no?

Indentation
-----------

We've seen the **for** statement before, but let's take another look.

In [402]:
for value in range(5):
    print(value)

0
1
2
3
4


The first line is of the form:

\<key word\> \<name for index\> **in** \<iterable object\>:

That's it – no fussing around with pointers, instancing index values, or running past the end of the object. The **for** statement "just works."

What follows is an "indented suite", and it can comprise any number of lines of code. Indentation is extremely important in Python because that's how the interpreter keeps track of how to group statements.  Unlike languages such as C and Java, it can't rely on braces for that purpose – Python uses <span class="underline">only</span> indentation.  Statements at the same level of indentation are treated as a being in the same code block, and code blocks can be nested to any level.

This further simplifies Python – it's unburdened by the clutter of braces, statement terminating semicolons, and loop terminating keywords.  All this creates and enforces a high level of readability. The cost, of course, is the need to have attention to fine level of detail and a limited ability to write spatially-dense code.

Strictly speaking, Python doesn't care how much each statement in the same suite is indented, as long as all are the same. Some developers use tabs (each is seen by the interpreter as a single character), but four white spaces is the recommended convention per Python's official style guide "PEP-8." 

PEP-8 is a bit of a dry read, but well worth a quick look, especially if you're planning to work as part of a team[24].   Jupyter, as well as many IDEs have built-in PEP-8 checkers and will (usually!) format your code nicely even if you've forgotten some nuance.

## Iteration and Lazy Evaluation

Many objects in Python have the built-in capability for looping over the elements.  Under the hood, each of these objects has an __iter__ method that provides exact instructions for doing so.   

In the exercise, you saw how the **for** keyword can invoke this automatic looping behavior over a **list** object.  Just above, we used a **range** object with the same syntax.   The **range** object is good to know about because it's a good way to whistle up a sequence of numbers.

It's also an example of a common built-in optimizing technique achieved by making "iterator" and "generator" objects.  These objects contain recipes to generate the next object in a series without having to remember all the elements in the series.   The example we just saw will create five consecutive integers - but it only had to have the wisdom to add one to the last integer it produced.  

These expressions take exactly the same amount of memory to create:

In [403]:
print(range(5))
print(range(5_000_000_000_000))

range(0, 5)
range(0, 5000000000000)


Note that we're printing out string representations of the objects.   We're not causing either one to iterate or create all of its elements.  That will happen only when we apply a **for** loop.

You'll also note use of underscores to specify the integer 5 trillion.  This is possible since Python 3.6.

The if Statement
----------------

We've touched on the **if** statement, the most basic of control statements. Let's expand the discussion a bit. Its general form is:

if \<condition\>:

    <indented suite>

elif \<condition\>:

    <indented suite>

else:

    <indented suite>

The first condition presented to the **if** is evaluated True or False.  If evaluated **True**, the indented suite immediately beneath is executed and it's done. Execution drops out of the entire block of code.

There can be any number of **elif** ("else if") statements and they are evaluated in top-to-bottom order. If any is **True**, the associated indented suite of statement is evaluated and it's done. Only when no **if** or **elif** condition is met will the **else** suite be executed[25].

The **elif** and **else** clauses are completely optional, and leaving them off is routine. 

Here's a simple example where we apply an the **if** alone to prevent a crash (technically, an "exception" caused by a ZeroDivisionError).

In [404]:
denominator = 0
numerator = 100
if denominator:
    print(numerator/denominator)

## Using for and if Together

Here's a program to demonstrate a more complicated application, this time coupling **for** and **if** expressions.  We introduce a few nuances in the process.

The object iterated over is a **tuple**.   A tuple is an ordered set of objects and is represented within parentheses - something like:  (0, 'bear' 1.1).

In his case we have a tuple-of-tuples.  The contained elements (1, 0) and (9, 3) are each tuples in their own right.  The outer tuple object bundles them:   ( (1, 0), (9, 3) )

We can iterate over the outer tuple to get:

In [405]:
for element in ( (1, 0), (9, 3) ):
    print (element)

(1, 0)
(9, 3)


A common operation to use within a **for** statement is called "unpacking".  You do not have to use it unless you want to, but we bring the topic up here so you'll recognize it when you see it.

Let's assume you have the tuple (9, 3).   You can create an assignment to split it up into constituent elements by providing a name for each element on the left side of an expression like this:

In [406]:
first_tuple_component, second_tuple_component = (9, 3)

print(f"first: {first_tuple_component}  second: {second_tuple_component}")

first: 9  second: 3


This will work on any Python sequence, like a string or a list.   You have to remember to provide just the right number of names on the left side.

Suppose our tuple components contained numerator, denominator pairs.   An easy way to parse these out is to do the "unpacking" operation in the header of the **for** loop.   This lets you save a couple of steps and - much more importantly - make code transparent enough for anyone to understand.

Here we take advantage of unpacking and the left-to-right operation of conditional statements to make a robust means of generating fractions.

In [407]:
for numerator, denominator in ( (1, 0), (9, 3) ):
    if denominator and numerator/denominator == 3:
        print("yay! We have a {numerator/denominator}!".format())

KeyError: 'numerator/denominator'

The while Statement
-------------------

The **while** statement is also pretty straightforward:

condition = <something>

while <condition>:

    <indented suite>

else:

    <indented suite>

The indented suite will be executed top-to-bottom forever until it's asked to stop or the condition is no longer **True**. Here's a simple example using a condition to halt execution. Note that the "sentinel condition" and counter are defined outside and before the loop[26].

In [None]:
stop_me = False
counter = 0
while not stop_me:
    print(f"The counter is now: {counter}.", end = '')
    if counter > 2:
        print(" Yo. I'm done!")
        stop_me = True     
    else:
        print(" I'm still trudging along.")      
        
    counter = counter + 1 

There are more elegant ways to proceed, however. A common idiom is to use the keyword **break** within the loop to terminate it immediately.  This can be combined with tautologically-true condition to streamline things. 

For instance, you could go:

In [None]:
counter = 0
while True:
    print(f"The counter is now: {counter}.", end = "")
    if counter > 2:
        print(" Yo. I'm done!")
        break
    else:
        print(" I'm still trudging along.")
        
    counter += 1

Not only is this simple to read, it doesn't require much in the way of resources to evaluate. This is not at issue with a simple application, but if your project involves millions of iterations e.g., a polling routine for a web server, efficiency becomes relevant.

You probably noticed that the **else** clause did not execute. The reason is that its indented suite runs only if the **while** statement terminates due to the condition becoming **False**. The **break** keyword causes the code that would do this to be bypassed.

The philotic twin of the **break** statement is **continue**. When **continue** is encountered, execution is immediately passed to the top of the loop where the condition is reevaluated.

The **else** clause is completely optional. Since we're using **break**, we'll get rid of the extra baggage and show how **continue** might be applied. Note the use of the comparative operator == (returns a Boolean evaluation of whether the two values are the same) and the shortcut way to increment the counter with the += operator[27].

In [None]:
counter = 0
while True:
    if counter == 1:
        print("One is the loneliest number that there ever was.")
        counter += 1
        continue
    print("The counter is now: {}.".format(counter))
    if counter:
        break

    counter += 1

Inner and Outer Loops
---------------------

The final point I would make here is that **break** and **continue** work on both **for** and **while** loops. And both work only on the inner-most loop (the one they reside inside of). 

Here's an example of a program with both types of loops:


In [None]:
stop_me = False
counter = 0
for i in range(10):
    if i%2: #odd numbers evaluate True:
        while True:
            if counter == 1:
                print("One is the loneliest number that there ever was.")
                counter += 1
                continue
            print(f"The counter is now: {counter } and i is now {i}.")
            if counter:
                break #breaks out of the inner(while) loop

            counter += 1
    if i == 5:
        print("I'm done - about to go on a break.")
        break #this breaks out of the outer (for) loop

Some Useful Logical and Binary Operators
----------------------------------------

Python comes with a typical set of comparative operators. Here's a quick
summary:

|     | Equal to              |
|:----|:----------------------|
| !=  | Not equal             |
| \<  | Less than             |
| \>  | Greater than          |
| \<= | Less than or equal    |
| \>= | Greater than or equal |
|<img width=150/>|<img width=150/>|

There is also a set of bitwise operators. These operate on the 1s and 0s in a number represented in binary format. You can create a binary representation of a number in Python by using the keyword **bin**, something like:

In [None]:
bin(128)

You can see that the result's first two characters are '0b', flagging it as a binary representation. You'll also note that the result is a string, which you can't do much with mathematically – you need to do operations <span class="underline">before</span> it's converted.

The operators for bitwise shifts are **\>\>** and **\<\<** and here's how you might apply them as you're creating the binary number. These are like changing the power of 10 to which you raise a base 10 number.

In [None]:
mybin_1000 = bin(1000)
fmt = "{:30} {:30}"
print("Base 2 operations:")
print(fmt.format("mybin_1000: ", mybin_1000))
print(fmt.format("mybin_1000 shifted left: ", bin(1000 << 3)))
print(fmt.format("mybin_1000 shifted right: ", bin(1000 >> 5)))

There are other operators you can use to "mask" a binary number with another. A common use is IP address masking when setting up subnets.  These are:

| a & b  | Both 1 -\> 1, otherwise -\>0                                        |
|--------|---------------------------------------------------------------------|
| a \| b | Both 0 -\> 0, otherwise -\>1                                        |
| \~a    | "flips" each bit. 1 -\> 0 and 0 -\>1                                |
| a ^ b  | if the bit in b is 0, use the bit in a; otherwise flip the bit in a |

This is just an aside, but you can use the "exclusive or", a.k.a. XOR, implemented with the "^" operator, as a cheesy encryption tool. That's because sequential applications simply flip the results. The first application encrypts the original and the second application undoes the encryption.

## Exercise:


Create a program that asks the user to guess a number between 1 and 100 (whole numbers only). If the guess is wrong, let the user know if the guess is too high or too low. If the guess is right, offer hearty congratulations and exit the program. If, after five attempts, the user can't get the number then offer deep condolences and invite him/her to try again. Be sure to check for valid input and remind the user if necessary.


Hint: the random library has several handy pseudo-random number generator functions such as random.randrange(). You'll need to import random in order to use it. A good way to start might be:

    import random

    help(random)


Also, you might consider putting in some sort of logical that will "freeze" both the random number and the user's guess while debugging. That gives you a stationary target.

Collections
===========

Next, we'll investigate several of the built-in Python **sequences.** Sequences are composite objects (that is they contain zero or more separate objects). Sequence objects are aware of their status as such and have built-in methods to take advantage of that fact. For instance, all sequences are **iterators** (they know how to loop over themselves), they know how long they are, and can be indexed.

Creating Sequences 
------------------

In the code below, we will take a first-order look at several sequences, and how one might form them. Note that we can use data types such as **range** and **list** as methods to create new instances of said data types.

In [None]:
the_string = "strings are sequences with an encoding"
the_bytes = bytearray(the_string, encoding = 'UTF-8')
the_range = range(10)
the_list = list(range(10))
the_tuple = tuple(range(10))
print("String: ", the_string)
print("Bytes: ", the_bytes)
print("List: ", the_list)
print("Tuple: ", the_tuple)

Note that the bytearray's string representation is proceeded by a "b". That's just a visual indicator that the object is stored as bytes. Other annotations you might encounter include :
u (Unicode, in Python 2.x) r (asks Python to interpret \n \t, and other format characters literally).

As is the case with strings, it's possible to create several objects directly using the appropriate braces. This approach is called "duck typing" – from the interpreter's perspective, if it walks like a duck and quacks like a duck, it must be a duck. 

These all work:

In [None]:
another_list = ["what's", 'up', 'doc?']
print('List')
print(another_list)
print(type(another_list))
print()

a_dict = {'a':'eh', 'b':'bee', 'c':'see'}
print('Dictionary')
print(a_dict)
print(type(a_dict))
print()
      
my_tuple = (1,3,4)
print("Tuple")
print(my_tuple)
print(type(my_tuple))

Naturally, any of objects can be directly created with a constructor (using the object name as a verb). 

This can take the guess work out of the situation. Here's an example for the **set**[29] object:

In [None]:
favorite_set = set()
type(favorite_set)
print(f"An empty set:  {favorite_set}")   

favorite_set.add(1)
print(f"A one-element set:  {favorite_set}")

Python is not strongly typed (at first, anyway).  Unlike many languages, Python does not enforce variable declaration and homogenous types within a collection structure. In fact, it's very cavalier about use and reuse of variable names[30].

In [None]:
reusable = "green"
reusable = 1
reusable = ["some", "list", "of", 4, "elements"]
print(reusable)
print()

reusable = None
reusable is None  # (How to check for a None type.)
reusable          # (prints as 'None'.  In REPL, does not display anything.)
print('-' * 10)


Creating Index Values with enumerate
------------------------------------

Although you don't need to create index values for your loop, you can get some easily by using the built-in **enumerate** method.  It produces tuples of (index, object in iterable).

Here's how you might apply it:

In [None]:
fruits = ('apple', 'banana', 'kiwi')
for snack_and_index in enumerate(fruits):
    print(snack_and_index)

Here, we created a tuple of fruits. Within the **for** statement we applied **enumerate**. For each element of the tuple, **enumerate (**returned a **tuple** of (index, element\_value) ).

We can upgrade this a bit by applying some print formatting and by "unpacking" the **tuple** returned. 

Here's how might work:

In [None]:
fruits = ("apple", "banana", "kiwi")
start_at = 1
for index, snack in enumerate(fruits, start_at):
    print("fruit #{} is a(n) {}".format(index, snack))

The names "snack" and "index" have been successively associated with values contained within the **tuples** provided by **enumerate**, and are available within the indented suite. You can see that the names are "recycled" at each iteration. We provided an optional argument "start\_at" to select the first value of the index produced. If you don't provide the argument, the first index value will be zero.

Slices and Sequence Indexing
----------------------------

So far, we have looked at really small, manageable sequences where it's easy and cheap enough to iterate through all the elements. However, in real-life situations we're likely to encounter structures that contain millions of elements. If we have some *a priori* knowledge of where the element we're looking for lives, or if we want to apply something better than a brute-force search algorithm, we would want a more clever way to approach sequences.

Fortunately, Python supports the ability to "slice" a sequence using index values.  Slicing and indexing prove to be very important when wrangling array-like objects such as those in Numpy and Pandas.

For instance, you can go:

In [None]:
the_list = list(range(0,10))
print("Everything:", the_list)
print()
print("A slice:", the_list[0:6:2])

The general syntax is:

\<iterable\>\[ \<start\> : \<stop\> : \<stride\> \]

The start, stop, and stride parameters are all optional. Be default start is the first element of the sequence, stop is the last element, and stride is 1.

In the simplest form, the entire sequence will be produced:

In [None]:
the_list[:] # [::] works the same

Let's go back and look at the first slicing example. One might expect that the last element produced would be 6 – after all, isn't that what we requested with the stop parameter? Not so – the last element produced is the one <span class="underline">before</span> the stop parameter … something to keep in mind to avoid surprises[31].

Python also has a built-in **slice** object, which works just like the index specifications above. Note that it's applied with \[square brackets\], just like the hard-coded slice.

In [None]:
slicer = slice(0,6,2)

print("Slice object")
print(type(slicer))
print()
print("A slice:", the_list[slicer])

Using the **slice** object can make your code much easier to maintain because you don't have to fiddle with hard-coded index values. It can be especially useful if you're parsing data whose format is likely to change and over which you have little control (like text scraped from a web site).    

A few **slice** objects at the top of your code could make adjustments / updates a snap. This being said, if its specification is too "distant" from where it's being applied your code could become less transparent for human consumers.

The main difference between using a **slice** object and the hard-coded version is that all the parameters (start, stop, and stride) have to be supplied.  If you don't care to supply your own value, just use the **None** object as a placeholder and you'll get the default value.

Here are a few additional examples of how you might use slices:

In [None]:
the_string = "strings are sequences with an encoding"
the_list = list(range(0,10))
the_slice = slice(3, 10) # positions 3...9 inclusive

print("Slice : ", the_slice, "\n")
print("Slice a list: ", the_list[3:10], "\n") # typical
print("Alt. syntax : ", the_list.__getitem__(the_slice), "\n") # more verbose

print("The whole string: ",the_string[:]) # all
print("Slice a string: ", the_string[3:10])
print("Skipping: ",the_string[::2], "\n") # step by 2

new_list = the_list[:] # same as the_list.copy()
print("A clean copy, as a new object:", new_list)

Dictionaries in Python
----------------------

Python supports dictionary objects, known to some languages as "hash tables" or "hash mappings." The basic object, **dict**, is built into the language and variants of it, such as the **OrderedDict** are available in the **collections** library[32].



Basics
------

A **dict** object is another form of a collection – it has some important differences from sequence objects like the **list** and string. Among these:

-   Since it's a hash[33] (optimized for efficiency) the elements are
    not in any guaranteed order. In other words, the act of adding an
    element can change the order of the others.

-   Operations like slicing, indexing, etc. are not available because
    they're dependent on the ordering of the elements.

The **dict** is essentially a one-way lookup table. Given a unique key, one can efficiently find its associated value. However, since values are not necessarily unique you can't go the other way (you can't use a value to look up a key).

Here's some basic usage:

In [None]:
# Alternative syntax for creation
the_dict = {"key":"value", "A":2, "B":3}
same_dict = dict(key = "value", A = 2, B = 3)

print ("The dict:", the_dict, '\n')
print("Are the dicts the same?", the_dict == same_dict, '\n')

# One way to add a key:value pair 
the_dict['C'] = 3
print("We added a new key:value pair: ", the_dict, '\n')

# One way to extract a value if you know the key is good.
my_value = the_dict['C']
print(f"The value associated with 'C' is: {my_value} \n")

# One way to extract a value if the key is sketchy.
my_value = the_dict.get('D', "some default value")
print(".. and yet again:", the_dict)
print()
print("The value I got with a bad key is: '{}'.".format(my_value))

Using get
---------

 While it's completely OK to call a value out by name, if that value does not exist a **KeyError** exception will occur. The **get** method allows you to provide some default value in this case (or if you don't, a valid **None** object will be returned). 
 
 Here's a quick example:


In [None]:
plant_dict = {'raspberry': 'rubus',
              'elm': 'ulmus',
              'maple': 'acer'
              }

# create a format string
fmt_str = "The {} is more properly called a {}.\n"

# get with no argument
look_for = 'raspberry'
found = plant_dict.get(look_for)
print(fmt_str.format(look_for, found))

# get with a default value
look_for = 'alder'
default = 'SOMETHING. I dunno'
found = plant_dict.get(look_for, default)
print(fmt_str.format(look_for, found))

# get with default None
look_for = 'sumak'
default = None
found = plant_dict.get(look_for, default)
if found:
    print(fmt_str.format(look_for, found))
else:
    print(f"Dude, I'm looking for {look_for}, but I give up ;-)")

Keys are Immutable
------------------

The keys of a **dict** need to be unique and hashable. As a result, one of the requirements of a key is that be immutable[34]. To determine whether a proposed key is hashable, "under the hood" the interpreter looks for the **\_\_hash\_\_** "magic method", and uses the hash thus produced for internal bookkeeping. 

If there's no **\_\_hash\_\_** method, the proposed key won't qualify. As long as the key meets these requirements, it can be just about anything, though strings and integers are most commonly used.

Useful Dictionary Methods
-------------------------

Here's a little more code you can use to "kick the tires" on the **dict** object. Note the use of the **update** method of the **dict** object and the top-level **del**[35] method.

In [None]:
mydict = {}
mydict['team'] = 'Cubs'

# Another way to add elements us via update.  For input, just about any iterable t
#    that's comprised of 2-element iterables will work.
mydict.update([('town', 'Chicago'), ('rival', 'Cards')])
print(f"The dict is now:  {mydict}.\n")

# we can print it out using the items method (it returns a tuple)
for key, value in mydict.items():
    print("The key is {} and value is {}.".format(key, value))
    

# and evaluates left to right; this protects from crashes if no 'rival'
print("Let's get rid of a rival \n")
if "rival" in mydict and mydict['rival'] == 'Cards':
    del(mydict['rival'])   #top-level method

print("By the grace of a top-level function, the Cards are gone.\n")
print(f"The dict is now:  {mydict}")

You can extract the keys and values by the cleverly-named **keys** and **values** methods. These return iterable objects containing the appropriate values.

You will notice from the representations of the keys and values that they are their own special objects.  We can easily type-cast them to more usable objects using the constructor for a collection.

In [None]:
the_dict = {"key":"value", "A":2, "B":3}

keys = the_dict.keys()
print(f"Keys: {keys}")

vals = the_dict.values()
print(f"Values: {vals}\n")

print(f"Values as a list object: {list(vals)}")

Another way to remove elements from a **dict** is to use its **pop** and **popitem** methods. These not only remove elements, but return the element removed. 

Here's how you might employ these methods:

In [None]:
my_dict = {'team': "Cubs", "town": "Chicago", "rival": "Cards"}
print("dict is: {}\n".format(my_dict))


# Use popitem to get some (unknown) item (crashes if empty dict).
if my_dict:
    key, value = my_dict.popitem()
    print(f"We've removed: {value}.\n")
    print(f"The dict is now: {my_dict}.\n")

# With pop we can pick a key and use a default, just in case.
default = None
looking_for = 'bad key'
key_value_tuple = my_dict.pop(looking_for, default)
if key_value_tuple:
    print("{} is {}\n.".format(looking_for, key_value_tuple))
else:
    print("Sorry, no '{}'' here.".format(looking_for))

Sorting a Dictionary
--------------------
A word about dict sequencing.   It's important to note that most 'flavors' of Python have been using an ordered version of dict objects since Version 3.6.   That is the key:value pairs are stored in the order loaded into the dict.   Earlier versions allowed the key:value pairs to shift positions in the interest of efficiency.   If you need your code to run on older versions of Python, you'll want to choose a collections.OrderedDict object instead of the normal dict object.


Occasionally, you'll want to sort the contents of a **dict** by keys. This is a snap because they're guaranteed to be unique and have a natural "sort order."  We can accomplish sorting by keys succinctly by applying the top-level **sorted** method (this is not part of the list object).

You'll note that the we have chained operations together in the example below.  This is extremely common in Python and is considered a sign of elegant parsimony.   Python begins by solving the inner-most bit of a nested expression then feeds the result to the next-inner-most bit ... and so on.

It can be horrible for the new user, so we'll do it one step at a time first:

In [None]:
# The long way:
my_dict = {'team': "Cubs", "town": "Chicago", "rival": "Cards"}
keys = my_dict.keys()                           # the dict_key object
list_of_keys = list(keys)                       # a list version
sorted_list_of_keys = sorted(list_of_keys)    # a sorted version of the list
print(sorted_list_of_keys)

In [None]:
# The 'Pythonic' way:
print(sorted(list(my_dict.keys())))

Sorting by values is a little tougher, but here's how you can do it. You need to provide a short routine that returns a value if you give it a key. If you provide it to the **list** object's **sort** method[36], it will use the routine to guide its efforts. We'll get into functions more later. For now, know that the general form is:

<function\_name>():

    <indented suite>

It can take zero or more arguments and optionally return an object of your choice.

The **sort** command by default will sort the elements "naturally" – alphabetically if strings and numerically otherwise. You can influence that behavior by giving it something else besides the **list** elements to sort "naturally". It could be anything – a random number, word count, the last letter of a word, etc. Things will be sorted by whatever is returned by the sorter routine you write.

This is what a "null sorter" would look like (it returns exactly what we passed it):

In [None]:
def sorter(key):
    "simply returns what's provided i.e., contributes nothing"
    return key

We can invoke a list object's sort() method to sort the elements of the list.   Behind the scenes, it uses the null sorter to get the job done.    In other words, it uses the natural sort order of the stuff in the list.

You can roll your own sorter, if you want.   When you do, Python essentially feeds each element of the list into the routine you provide.  

Here's how it works: whatever is returned by the routine gets 'paired up' with each element.  The returned values are things that get sorted - not the original list contents.   When the sort is completed, the returned values go away leaving the list elements in the proper order.

Here's an example:

In [None]:
def sorter(key):
    "Creates a sort using the value associated with a dict key"
    return my_dict[key]

my_dict = {'team': "Cubs", "town": "Chicago", "rival": "Cards"}
key_list = list(my_dict.keys())
key_list.sort(key = sorter)

for key in key_list:
    print(key, my_dict[key])


Miscellaneous Notes on dict Objects
-----------------------------------

Many Python applications, such as Pandas, build user-friendly array indices using objects built on dict-like objects.   Essentially, one can specify row and column names in human terms.  The human terms get mapped to pointers and other computer-friendly objects.   In this way dict-like objects form a cognitive bridge between the human and computer worlds.

You can use dictionaries to hold elements of sparse arrays by using **tuple**s as keys. Let's say you wanted to model movements of goldfish in a tank. You could account for each cubic centimeter of water and monitor each fish passage in and out – you might use a dense 3-D array for that. Perhaps more efficiently, you could monitor the fish themselves by providing a three-**tuple** to represent the current location of each. 

This **dict** uses **tuples** for keys:

In [None]:
array_dict = {(0, 0, 0): 12, (2, 4, 98): 23}
array_dict[(2, 4, 98)]

Keys can be heterogeneous, as can the values. This is a valid **dict**:

In [None]:
hetero_dict = {'a': 1, (1,3,4): "x"}
hetero_dict

If a **dict** has a key named "a", and you provide a new element who's key is 'a', the original will be replaced.


In [None]:
new_dict = dict( ( ('a', 3), ('b', 88) ) )
print("Original:", new_dict)

new_dict['a'] = "Godzilla!"
print("Updated:",new_dict)

## Memory Considerations
Adding elements one at a time can be expensive, especially as the **dict** grows. That's because the entire object needs to be copied to a new patch of memory if it grows past its allocated space. It's more efficient to use an *en masse* operation like **update**.

The set Object and List Comprehension
-------------------------------------

The **set** object is a mighty useful arrow in your quiver. A **set** is essentially a collection of unique objects. The **set** is another hash object, making retrieval of objects very efficient, even when the **set** is large.

The fact that a **set** contains only unique objects makes it great for an automatic de-duplicator. If you try to add an object that already exists, the attempted addition fails silently. 

Here's a common idiom to de-dup a **list**:

In [None]:
set([1,1,2,2,2,3,3,4,5])

Here's a slightly more complicated example, this one using a "list
comprehension" to create a set of random numbers using the random
library.    Here it is, quickly.  

Yes, it's horrible the first time you see it (very Pythonic), but we'll explain.

In [None]:
import random
values = 1000
dedup = set([int(1000*random.random()) for i in range(values)])
print("we found {} unique values!".format(len(dedup)))

In [None]:
# The long way, using the list append() method.

In [None]:
random_list = []
for i in range(values):
    rand_num = random.random()
    rand_int = int(rand_num * 1000)
    random_list.append(rand_int)
dedup = set(random_list)

**Sets** have built-in methods for efficiently finding unions and intersections. A union is a **set** of every unique element from both sets combined. An intersection is a **set** of the elements found in both. This code serves to illustrate how you might apply them using the "**&**" and "**\|**" operators:

In [None]:
the_set = {"monkey", "gorilla", "dog", "cat"}
print("Set: ", the_set)
the_set.add("parrot")
print("Set: ", the_set)
other_set = {"gorilla", "elephant", "pig", "chicken"}
print("Set intersection:", the_set & other_set)
print("Set union:", the_set | other_set)

Note that although this example uses the "\|" and "&" operators with two **set** objects, there's no limit to the number of objects in play. If a, b, and c are all **sets**, you can go a&b&c or a\|b\|c.

Named Tuples
------------

**The namedtuple** object is something of a hybrid between a **tuple** and a **dict**. Each **namedtuple** can be instantiated into your current namespace as a 'normal' **tuple**, but the wrinkle is that you can provide each element an alias (nickname) so you can refer to it easily. It's not part of basic Python so you need to **import** it from the collections library.

Here's some code to play with which shows the basic functionality. Note that you can address these objects like dictionaries, but can also use them as **tuples**.

In [None]:
"""An introduction to the namedtuple object"""
from collections import namedtuple

#make a tuple with the tag 'Animal'
Animal = namedtuple('Animal', ('species', 'name'))

#specify this animal by providing the species and name
a1 = Animal('gorilla', 'magilla')
print(a1)

#... and specify another
a2 = Animal(species = 'gorilla', name = 'fred')
print(a2)

#we can call out the specifics using dot notation
print(a1.name, a2.name)

#... or, split the tuple containing the first animal
myspecies, myname = a1
print(myspecies, myname)

Getting Fancier with the namedtuple Object
------------------------------------------

Here's a slightly more complicated example – something you might use if you were developing a chess game in Python. We won't elaborate this to completion – but it may be interesting practice. If you do the upgrade, you might look into using Unicode to represent your chess pieces[38] (Python 3 is Unicode-compliant[39]).

This code develops a two-dimensional grid using a **list** of **namedtuple** objects. This is a simple way to whistle up a 2-d array-like structure using only basic Python tools.

You can see that each of the **namedtuple** objects carries with several properties, nicely bundled, and transparently addressed. In this sense, the objects we're creating here are a bit like the build-in Python objects – they're already endowed with "out of the box" capabilities.


In [None]:
from collections import namedtuple

Piece = namedtuple('Piece', 'type color position symbol')

# a list of lists is a really useful data structure
grid = []
for r in range(8):
    grid.append([None, None, None, None, None, None, None, None])

grid[7] = [
            Piece("Rook", "black", [7, 0], "R"),
            Piece("Knight", "black", [7, 1], "K"),
            Piece("Bishop", "black", [7, 2], "B"),
            Piece("Queen", "black", [7, 3], "Q"),
            Piece("King", "black", [7, 4], "K"),
            Piece("Bishop", "black", [7, 5], "B"),
            Piece("Knight", "black", [7, 6], "K"),
            Piece("Rook", "black", [7, 7], "R")
            ]

grid[6] = []
for c in range(8):
    p = Piece("Pawn", "black", [6, c], "P")
    grid[6].append(Piece("Pawn", "black", [6, c], "P"))
    
for p in grid[7]:
    print(p.symbol, end="")
print()    
for p in grid[6]:
    print(p.symbol, end="")

Copying Sequences
-----------------

In Python, it's possible to have multiple names associated with the same object. This can be really handy but the convenience comes with a cost: the only way to tell for sure is with the object's **id**. 

Here are a couple trivial examples that show the potential pitfalls:

In [None]:
one = 1
one_prime = 1
print("one:", id(one), "one_prime", id(one_prime))

if id(one) == id(one_prime):
    print( "Same object.")
else:
    print("Different objects.")


You run into this with collections, too. This code will demonstrate:

In [None]:
def test_lists(first, second):
    test_result = first == second
    print("same list, right? {}.".format(test_result))

    if test_result == False:
        print("first id is {} second id is {}."\
        .format(id(first), id(second)))

print('Make a list, then create a copy')
a_list = ["Tony", "Sally", "George"]
b_list = a_list
print(a_list, b_list)
test_lists(a_list, b_list)

#update the first list, but not the second.
print("\nAdd someone to the a_list, leaving the b_list alone")
a_list.append("Sam")
test_lists(a_list, b_list)

print("\nUh, what happened here?")

"What happened here" is that we have simply assigned two names to the same object (they have the same **id**).  Then we used one of the names to access, and change, the original object.

Having multiple names for the same object is not too weird when you think about it.   The president is called "the President", "POTUS", "Commander-in-Chief" ... then you can read political columns to get even more names :-)   

The point is that there can be many synonyms - "aliases" or "names" for any object because any of them are unique keys tied to the same dict object (that's just how Python keeps track of its namespaces) 

To get around this, you need to make a "deep" (true) copy. Here are some ways to do it:

In [None]:
new = old[:] #forces the list to iterate over itself
new = list(old) #list constructor creates new object
new = old.copy() #uses the list object's copy method

One place this comes up in data science is working with slices of data structures.   Slices are really just aliases to part of an original object, much like a "view" into a database table provides access to part of that table.   When working with Numpy and Pandas, a very common way to hack into the relevant data involves creating a slice then manipulating it.  Numpy won't give you a warning, but Pandas will.  Unless you're really sure what you're doing, you will want to pay close attention and avoid the temptation to ignore or disable the warnings.

## Exercise:
I just hired a new admin and it's not working out so well. I asked her to organize my Cubs player database. It turned out to be a disaster. After she quit, I found two versions of the database. Can you help? 

One version of the database (stored as a dict) uses the players' weights as the key. 

I'll spare the details, but it looks like this:

In [None]:
players = { 185:('Tyler', 'Chatwood'),
            219:('Luke', 'Farrell'),
            190:('Kyle', 'Hendricks')
          }

Can you find a way, working only with code and this dictionary, to produce a nicely formatted table that's sorted by the players' last name?
The other version was only a little better and organized by last name – sort of. There were lots of duplications caused by bad typing. 

Here's a bit of it:

In [None]:
players = { 'Hendricks': ('Kyle', 225),
            'HeNDricks': ('Kyle', 225),
            'Hendrix': ('Kyle', 225) 
          }

Can you find a way to clean this version up working with the set object?

It's beyond the scope of this class, but just for fun you might look at the code in py_fuzzy_lookup.py. For ways to further refine. It demonstrates a tool that will let you find and score "fuzzy" (inexact) matches.

Functions
=========

Functions are just bits of code that encapsulate – you guessed it – functionality. We'll explore them in some depth in this chapter.

You've already used several of them like **print**, **input**, and **bin**. You've used them with commands like **list.sort** – though, technically when contained within classes, they're called "methods". And we've just written a small function to sort **dict** objects by value.  You're already a getting to be a pro. You'll become more so by rolling a few of your own.

Scope of Names In a Module
--------------------------

Here are a couple example functions that each take a single, positional parameter. The first is pretty straightforward, and the second introduces the **global** keyword.

In Python (and all languages) there's a notion of "scope" – that's the part of the code where a variable is visible. Indentation levels are a good gauge of scope within a module. The objects defined in the first column (setStar function and the variables STAR, Favorites and X) are visible throughout the module and are called "globals"[40]. Variables tucked away within a function e.g., title\_length in section are visible only within that function, normally. These are called "locals"[41].

If you want to enhance the visibility of the variable and make it global to the module, you can employ the **global** keyword, as is done in SetStar for the variable STAR. Once you do that, you facilitate "two way communication" with that STAR object and allow the function to alter the value . If you don't do that, the function can only alter the "local copy" of STAR[42].

In [None]:
# Globals namespace
STAR = "Sirius" # Polaris
Favorites = [ ]
X = 100

def setStar(name): 
    """ Contains a local namespace just for this function."""
    #global STAR  #<---remove comment to map to global namespace
    STAR = name
    Favorites.append(name)
    print("local STAR: ", STAR)
    
setStar("Polaris")
print("global STAR: ", STAR)#output

Passing Information into a Function
-----------------------------------

You have a tremendous amount of flexibility around how, and whether, to pass information into functions. You can "roll your own" from some combination of fixed positional arguments (a contract between you and your function to provide a precise number or arguments); positional "wildcard" arguments (0 to a zillion arguments); specific key / value pairs; or "wildcard" key / value pairs (0 to a zillion). Here are some examples of the options available to you:

The simplest of all functions can be constructed like this:



In [None]:
def simple():
    "a docstring, or the keyword pass"

You need the keyword **def**, some name, enclosing parentheses and a colon in the header; a nominal indented suite following. This function takes no arguments and returns **None**. Not so interesting, really, but syntactically-correct. You can upgrade in a variety of ways. If you have only named positional parameters as with this constructor:

    def positional_only(input1, input2, input3):

Here you promise three inputs – no more and no fewer. Sometimes, though, you really don't know how many bits of information you'll get and want to generalize. Python provides flexibility when the number of arguments is unpredictable. This constructor requires one positional argument then zero or more additional ones:

     def positional_plus(input1, more_inputs): 
 
The "\*" signals the interpreter to expect a tuple of unknown size, and to create a local name "more\_inputs" tied to a **tuple** object (you don't use the \* inside the function). The **tuple** will scoop all the additional arguments provided beyond the first one (that's assigned to input1). If only argument is provided the **tuple** will be empty.

If you want to use a **dict** as input, you can construct the function
like this:

    def dict_only(a = None, b = None):

This is a handy idiom because objects a and b are always created when the function is called. That makes them optional – neither, either, or both can be used when calling it. Another way to use a **dict** is to use a placeholder for one with a constructor that looks like this:

    def dict_placeholder(**input_dict): 

When you do this, you create a local name "input\_dict" that's tied to the key:value pairs provided when the function is called (you don't use the \*\* internally). Let's take a look at a couple more examples.

In [None]:
def eats(*foods): # gather positional args in a tuple
    print("foods: ", foods) # foods is a tuple now
    
    
print("Tuple of positional arguments to eats():")  
# open-ended number of positional arguments passed in...
eats("Spaghetti", "Oysters", "Chili", "Crackers", "Rice")

def example(*args, **kwargs): # keyword args dict
    """
    (* ) convert positionals -- tuple
    (**) convert keyword args -- dict
    """
    print("\nPositionals:")
    for arg in args: # loop over the tuple
        print(arg, sep = ", ", end = " ")

    print("\nKwargs")
    for key, value in kwargs.items(): # ...now the dict
        print("Arg name:", key, "Value: ", value)
    
# positional + keyword (named) arguments
example( 1,2,3,4, on_vacation = True, at_work = False )

# same thing using "exploders" * and **
example( *(1,2,3,4), **dict(on_vacation = False, at_work = True) )

As you can see, you've got plenty of options here. This being said, there are a couple of constraints. The order is important – the positional arguments need to come first, then the **tuple** of arbitrary size, then the dictionary. As soon as the interpreter hits the **tuple** argument, it's game over for positional arguments. As soon as it hits the dictionary, it's game over for both positional arguments and **tuples**.

Returning Information From a Function
-------------------------------------

A function does not have to return anything. If it does return something, you don't have to give the returned value a name or otherwise use it. 

Personally, I think it's a good idea to return something – even if it's just the number 1 to show "Yup, I executed" or a -1 to show "Uh, there was a problem." But it's up to you – without instructions in this regard a function returns the **None** object.

Finally, in Python you can only return one object. This isn't necessarily an issue because the object can be as complex as it needs to be to fully "pass the baton" to the next routine. The **dict** is my "go to" object for complex returns – mostly because it can be as complex as it needs to and, properly done, completely self-documenting.

First Class Objects
-------------------

Functions are known as "first class objects" because they can be used just as classes and other top-level objects in Python. For instance, they can be passed as arguments to other functions without issue[44].  This is really powerful because you can separate responsibilities cleanly between separate, isolated bits of code. As a result, different teams can work on the individual functions and atomic tests can be written against the capabilities promised by each function. All this makes a "divide and conquer" approach to development a snap and – perhaps more importantly – can keep code easy to maintain as requirements change.

Here's how you might use functions' "first class" status Python to take a foray into functional programming. For this example, imagine we have the "F team" in Florida and that it's never met the "G team" in Georgia.  Each has been working 24/7 on its task, has written tests, and otherwise has things dialed in. F has perfected the art of doubling a value and G has nailed adding 2 to a value.

As consumers of these efforts, we can "stand on the shoulders of giants" and repurpose / combine the efforts into our own project. 

Here's an example:

In [None]:
def compose(g, f):
    """Take two functions as inputs and return a
    function that's their composition"""
    
    def newfunc(x):
        return g(f(x))
    
    return newfunc
        
def G(n):
    return n + 2
    # input function
    
def F(n):
    return n * 2

# compare:
H = compose(G, F) # build a 3rd function from 1 &amp; 2

#print("G(F(x)):", H(100)) # G(F(x))

# ... now with
#H = compose(F, G)
#print("F(G(x)):", H(100)) # F(G(x))

## Inner Functions

It's possible for a function to contain other functions called "inner functions." In this case, not only is there an inner function, but that's what gets returned. The object that gets returned is "loaded for bear", retaining the information originally passed into **addLetters**.

In [None]:
def addLetter(letters):  # -- pass in a string
    """
    A function factory builds and returns function objects.
    L is a function that will add whatever letters are passed
    in to be the ending letters.
    """
    def L(s):
        return s + letters
    return L


# These are functions (versions of the inner
#    function L() returned from addLetters()
add_s = addLetter("s")
add_ed = addLetter("ed")

# Then we can execute these functions like any others
print(add_s('Unhinged rant.'))
print(add_ed('In an unhinged fashion rant'.))

Closures
--------

A closure is an inner function with a memory. Let's say that you have on good authority that the meaning of life is 42. You want to stash that pearl of insight away and be prepared to tweak it as life gets more interesting and the universe evolves.

You might want to create a Python function to which you can provide the original value. Then you might ask the main function to return another function that accepts your changes. 

Here's how you might do it:

In [None]:
def outer_space(outer_input):
    "outer-most function"
    last_answer=outer_input
    def inner_space(inner_input):
        "inner-most function"
        nonlocal last_answer
        last_answer += inner_input
        return last_answer
    return inner_space

MEANING_OF_LIFE=41
original = outer_space(MEANING_OF_LIFE)
print("The original meaning of life is {}".format(original(0)))

tweak=3
new_meaning = original(tweak)
print("But now we think it's a bit more: {}".format(new_meaning))

next_new_meaning = original(tweak)
print("... and now: {}".format(next_new_meaning))

As you can see, the inner function retained some "institutional memory", encapsulated it, and made itself available for further interaction.

The **nonlocal** keyword, introduced here, acts to make a variable in an inner function available to the outer one. This is precisely the relationship that the **global** keyword in the outer function has with the containing module.

 Python's Take on Map, Reduce and Filter
---------------------------------------

Python has some limited built-in capability to handle map/reduce operations. It can easily perform the same sort of analysis possible with big data analytical tools like Hadoop, but just not at the same scale. (Hadoop has its own way to manage workload and file systems – that's where its real magic lies).

In a nutshell, a map operation is one that iterates through a bunch of objects and does the same thing to each one. A reduce operation sifts through the mapped objects and filters out the "good stuff." Generally speaking, the **map** function is run once then sifted through multiple times by different reducers.

To implement mapping in python, we can use the **map** function. Its syntax is straightforward:

mapped\_data = map (\<processing\_function\>, \<iterable\>)

To filter data, we can call upon the **filter** function. It works pretty much like map, but the processing filter passed judgment on each element of the iterable objects by evaluating it **True** or **False**.  The result is another iterable of only those that have been deemed **True**. The syntax is identical to that of **map**:

filtered\_data = filter(\<processing\_function\>, \<iterable\>)

Here's an example of how you might use **map** and **filter** to find even sums of integers provided as **tuples**. These methods are interesting because they each take two arguments: incoming data and the name of the routine to process the data. Each returns a generator-like object. This means they don't process all the data at once. Instead, they proceed one step at a time through their task, then only when requested. This sort of "lazy execution" makes handling of even large data sets possible because only the bits of it that are being processed need be in the memory.

Python also has a **reduce** method – it works pretty much like **map**, except the operations defined in the function are applied cumulatively.  Note that it's implemented using a **lambda** expression[45] (an anonymous "one liner" function that can be used in place of a standard function).

In [None]:
from functools import reduce


def add_numbers(things_to_add):
    "adds two numbers"
    first, second = things_to_add
    return first + second


def find_evens(thing_to_evaluate):
    "returns True if even"
    return not thing_to_evaluate % 2  # True if 0, False otherwise


integers = [(2, 2), (4, 4), (5, 6), (7, 8)]

mapped_data = map(add_numbers, integers)
filtered_results = filter(find_evens, mapped_data)
print("The even sums are {}".format(list(filtered_results)))

cumulative_mult = reduce(lambda x, y: x*y, [1, 2, 3, 4, 5])
print("reduce returned: {}".format(cumulative_mult))

Function Dispatch
-----------------

Until Python 3.10, Python didn't have a statement like **case** or **switch**.   One day, you'll be able to switch to 3.10 for all your data science needs, but until the main distros (Anaconda, etc.) and the underlying data science building blocks are all updated you will likely be using an earlier version of Python.

For better or worse. However, there are other ways to get the same functionality.  We've already seen that a complex **if** .. **elif** .. **else** structure can accommodate this. However, this can get unwieldy and hard to maintain.

An alternative, and potentially more robust, approach involves mapping functions to a dictionary, which is then used to dispatch the function.  The following example shows how you might implement such a beast and, for good measure, introduces the **random.choice** method.

As you review the code, imagine how much easier it would be to maintain than strictly branching logic. Let's say your client suddenly asked you to add a new capability – **str**.**title** how could you add it while minimally touching the existing code[46]?


In [None]:
#py_function_7.py
from random import choice
def f_upper():
    return str.upper
def f_lower():
    return str.lower
def f_swap():
    return str.swapcase

# Use a dict object to match names (like 'up') to one of your functions
function_mapper={'up': f_upper, 'lower': f_lower, 'swap': f_swap}
choices = list(function_mapper.keys())

my_string='Lions and Tigers and Bears, Oh MY!'
for i in range(3):
    mychoice=choice(choices)
    print(function_mapper[mychoice]()(my_string))

The **random**.**choice** method is a convenient way to make a (pseudo) random choice from a collection of options.

The **print** is a bit of a mouthful, so let's break it down.

Note that most of what we're doing is passing around the name of a function, like **f\_upper** or **str.upper** without actually executing it. This allows fairly succinct code because we can chain operations together in a single line. Is the virtue of being compact outweighed by the vice of being tough to decipher?


In [None]:
mychoice = 'up'  # Just to force a choice

print(f" mychoice: {mychoice}")
print()


# String version of the command then the  output.
print("function_mapper['up']")
print(function_mapper['up'])  
print()

print("function_mapper['up']()")
print(function_mapper['up']() )  # the () is the execution operator.  f_upper returns a function.
print()

print("function_mapper['up']() (my_string)")
print(function_mapper['up']() (my_string) )  # this feeds mystring to the function
 


## Exercise

Awesome job so far, folks! This has been a long session, but functions (and methods in classes) are one of the most important building blocks of any serious programming effort. Let's try out your new skills on a couple of problems.

Python's built-in title casing routine has much to be desired. Check this out:

In [None]:
lst = [ "shot in the dark",
        "guido van rossum",
        "monty python's life of brian"
      ]
for item in lst:
    print(item.title())

In the first instance, a really short word got capitalized. The second will drive much European nobility mad because "von" and "van" are generally lower case. And the first letter after an apostrophe is almost never capitalized.

Please see if you can write a function to address these issues – I'm sure you can do better than str.title! W

While you're about it, see you can break the task down into small, atomic bits. That'll make it easier to test and maintain. You might use a series of functions something like:

    process_list() 
        process_title() 
        process_word() 
            process_apostrophe()
            process_royalty()

Feel free to grab code from solution _python_1_chapter05_starter_code.py if you'd like. Many of the mechanics have been worked out, but it would benefit from reorganization.

Modules and Libraries
=====================

Python has several built-in object types that provide a versatile and "off the shelf" collection of tools for handing information you'll encounter in real-world situations. This chapter discusses and demonstrates a few of the libraries (modules) that you can **import** into your code. We'll look at just how you can access these libraries while managing how they work and play with the applications you are developing.

Importing Libraries
-------------------

As you've seen, the way to access libraries to extend Python's core capabilities is to **import** them into your program's namespace. Here are some examples:

In [None]:
import time
from decimal import Decimal, getcontext
from fractions import Fraction as Q # rational number
from datetime import datetime, timezone
from collections import namedtuple

 Basic Use of import
-------------------

If you examine these commands carefully, you'll note several variants.  The first example adds the module **time** into your code. We can use the **dir** keyword to capture the namespace. If used without arguments, it grabs the global namespace; with an argument, it reports on the namespace of the object presented in the argument.

Here, we show the results from executing the **dir** command against a fresh Python shell in a command line interpreter interface.   The results will be the same in a Jupyter notebook, except there will be some extra Jupyter-specific entries.

Initially we have a very parsimonious global namespace:

    dir()
    ['__builtins__', '__doc__', '__loader__', '__name__', '__package__']

When we use the **import** keyword to bring in an extra module, the name of that module is added to our namespace:

    import time
    dir()
    ['__builtins__', '__doc__', '__loader__', '__name__', '__package__', 'time']
    
When Python first opens, it has access to the global namespace and everything in the '__builtins__' module (that's automatically imported).    If we take a quick look at the contents of __builtins__, we'll see keywords at the bottom.  These include the variable types we've seen to date. At the top we'll see mostly exceptions (errors and warnings) along with a few more keywords like True, False, and None.

In [None]:
", ".join(dir (__builtins__))

The act of importing the new module doesn't directly bring all the elements of the **time** module's namespace into your module – you only have access to **time**. The **time** module, however, has its own namespace. 

So what is a "namespace", anyway?   You can think of it as a dict object.   The names are the keys and the values represent the executable code associated with the names.    If you invoke a command like **dir**, Python accesses a dictionary, finds the associated code and executes it.    The act of importing a module is actually very light weight - all you're really doing is updating the master dictionary used by your program with pointers to other resources.

You can take a look at the namespace of the **time** module by using the **dir** function with **time** as the argument.

In [None]:
", ".join(dir (time))

Dot Notation 
------------

While it's isolated from that of your own namespace, you can still access the new elements. You can do this by using "dot notation."    Here, we access an attribute called 'timezone':

In [None]:
time.timezone

Here we'll access the method called 'time' within the time namespace.   We can do that because the first "time" refers to an object available to our global namespace; and the second "time" refers to an object within the module's namespace. 

Because time.time is a method we see only the text representation of the object ... that is until we execute using the () operator.

In [None]:
time.time

In [None]:
time.time()

The "dot" delimits namespaces, which may be configured hierarchically, and allows your code to be specific in terms of exactly which object it addresses. In the code above, you'll see that within the **time** module there is a built in function of the same name. This creates no conflict for our own module because only the name of the newly-imported module is visible.

Because of this separation it is completely safe to import several different modules without being concerned about whether object names from one can "overwrite" those of another.

It's also possible to reach into an external module and import only the bits we want into our local namespace. Observe the effects of importing the **Decimal** class and the **getcontext** function. After the **import**, we now can access to the external objects on a "first name" basis.

In [None]:
from decimal import Decimal, getcontext
Decimal

In [None]:
getcontext

 Renaming Object When Importing
------------------------------

A slight variant of this is to choose a name for the imported object that suits your purpose. Python is agnostic relative to the name you choose, within its basic rules.  

Here, we import the Fraction object from the fractions library.   We import it again, giving it the name "Q".   Essentially, we've created two entries to our main namespace dict for the same object.   We can access it by either "Q" or "Fraction".

In [None]:
from fractions import Fraction
from fractions import Fraction as Q

In [None]:
same_object = id(Fraction) == id(Q)
if same_object:
    print(f"Fraction and Q are aliases to the object at {id(Q)}.")


It might make sense to import an object by a specific name to avoid a namespace collision, or to follow a conventions adapted by your teammates. For instance, it's typical to use **numpy**, **pandas**, and **seaborn** by going:

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns

This is done mostly for brevity. It's much easier to type "np" than "numpy" - especially if you have to do it thousands of times.

Avoid renaming imported objects unless you've got a good reason to do so because it could confuse your team. For instance, you can do one or both of these:

In [None]:
import string
string.ascii_uppercase
'ABCDEFGHIJKLMNOPQRSTUVWXYZ'

In [None]:
import string as elephant
elephant.ascii_uppercase
'ABCDEFGHIJKLMNOPQRSTUVWXYZ'

… however, if you did the latter you could sow much confusion.

Also, while it's possible to import the entire namespace of a library into your code, you typically don't want to do so. In this example we import the entire **string** module's namespace using "star notation":

In [None]:
import string
from string import *

# Namespace of string
string_module_namespace = ", ".join(dir(string))
print(f"The namespace of the string library is: \n\n{string_module_namespace}\n")

# ID of string.digits versus digits
same_object = id(string.digits)==id(digits)
if same_object:
    print("The string.digits from the string module namespace is the same object as digits in the global one,")
    print(f"known under the hood as {id(digits)}")

In [None]:
", ".join(dir(string))

While this might be manageable, what if you were already using a variable called "digits"?

In [None]:
digits = ['thumb', 'pointer', 'middle', 'ring', 'pinky']
digits

In [None]:
from string import *
digits

That's trouble, right?    The message is that you want to be a bit cautious.  It's certainly easier to type in a single name like 'digits', but sometimes a more verbose name is actually better.  For instance, anyone reading the code will recognize 'string.digits' for what it is - it's got provenance, after all.   Transparency matters.

Sometimes it makes sense to use "star notation" when you're importing only one outside library and using its tools exclusively. This is often the case when you are working with GUIs because all the components you need are included in the imported module – there almost no chance of namespace collisions, even when you're part of a collaborative effort.

File System Based Namespaces
----------------------------

Python has the ability to use the file system to create namespaces for modules and it works just like what we've seen for ordinary objects.  Building packages is beyond the scope of this course, but for now know that you can have a directory structure like:

    main_application
        __init__.py

    subdir_1
        __init__.py
        module.py

    subdir_2
        __init__.py
        module.py

If main\_application can be discovered by your app, then you can go:

    from main_application import subdir_1, subdir_2

… which gives you access to subdir\_1.module and subdir\_2.module. The two identically-named module names are each tucked in behind their directory name and will not present namespace conflicts in your app.

On a related note, what makes main\_application discoverable by Python?  When you import a module, Python looks at the contents of **sys.path** – an ordered list of all the directories on its search path. It's comprised of normal, default locations and whatever might be in your PYTHONPATH environment variable. 

The element, and the first place the interpreter looks, is in your current working directory. What this means – and this has caught many an unwary programmer – is that if you name your module the same as a system package, then when you try to import a Python module you will get your own instead. Then you will go insane trying to figure out why your imported module doesn't work as expected.

As a result, you'll want to be a little careful about choosing names. If you're considering 'string.py', for instance, you might first want to attempt to **import** **string**. If the operation succeeds, 'string.py' is already taken, if your get an **ImportError** then you're safe. 

Bear in mind that installation of new packages or editing the PYTHONPATH can change things.

As you'll see, the libraries available in Python's standard library and general ecosystem provide highly-leveraged ways to extend the already-powerful capabilities. 

Time-related Objects: time, datetime, and calendar
--------------------------------------------------

Since we're on the topic of using imported libraries, let's take a look at Python's three principle libraries providing dealing with time. Each has its own strengths, weaknesses and capabilities. We'll go through some of the capabilities of each here.

Before we jump in, there are some things to be aware of:

-   Computers "think" of time in terms of the elapsed seconds since an
    agreed-upon point in time called an "epoch." For most POSIX
    (Linux-like) systems, the epoch began on January 1, 1970. For
    Windows, it's January 1, 1601[47]. Typically, this won't matter
    because Python is OS-agnostic.


-   Since different parts of the world are in different time zones, we
    often think of what time it is in Greenwich (London), England. This
    is known by various names such as GMT, UTC and Zulu.


-   However, your computer (or AWS slice, or server) may "think" in
    local time. Local time is a little weird. Time zones can shift
    because of local decisions. Daylight savings time can vary
    county-by-county, and state-by-state. "Summer time" in Europe and
    elsewhere doesn't necessarily synch with U.S. daylight savings time.
    UTC is reliable.


-   There are two "flavors" of elapsed time available. One is told by
    the clock on your wall – it's an objective measure of the time you
    experience. The other is the elapsed time the CPU works on your
    program. You'll want to be sure of which you're using when it's
    material (such as when your application is running on a busy machine
    the clock time isn't a great measure of how efficient your program
    is.

Time Objects
------------

Here are a few lines of code that use some of the basic functionality of time. This code grabs both the clock time and CPU time and demonstrates use of the **sleep** method. You'll note that these are reported to different levels of precision and produce slightly different results. 

**time**.**process_time** reports how long this script has been running while **time**.**time** reports the number of seconds that have elapsed since the dawn of time.

In [None]:
import time
print(time.process_time(), time.time())
time.sleep(1) #seconds
print(time.process_time(), time.time())

 Datetime Objects
----------------

For most purposes, you'll be working with **datetime** objects provided from the library of the same name. A **datetime** object is a fairly easy-to-read tuple-like structure, and it's straightforward to extract information.  

The Datatime object is everywhere in computing, but you should be aware that the implementation can vary slightly, even among Python modules.

Here's an example of how you can use datetime:

In [None]:
from datetime import datetime
d = datetime.now()
print(d)
print("the hour is: {}.".format(d.hour))
print("the year is: {}.".format(d.year))

In [None]:
from datetime import datetime, timezone
now_here = datetime.now()
print("Now, this timezone: ", now_here)

In [None]:
# Datetime is timezone-aware
now_uk = datetime.now(timezone.utc)
print("Now, in England: ", now_uk)
the_date = now_uk

In [None]:
# Dates can be used in a "mathy" sense
print("Days since 01-01-0001: ", the_date.toordinal())
epoch = datetime(1970,1,1, tzinfo = timezone.utc)

In [None]:
print("01/01/1970 timestamp : ", epoch.timestamp())
print("Now in English timestamp: ", now_uk.timestamp())

In [None]:
delta = now_uk - epoch
print("Time delta in seconds: ", delta.total_seconds())

For all its charms, **datetime** produces fairly ugly output, left to its own devices. Fortunately, there's an easy way to customize is using the "string from time" functionality, known as **strftime**[49]. The basic syntax is:

\<datetime object\>.strftime(\<format string\>)

A simple example follows.   Note that we're able to combine Python formatting with strftime formatting:

In [None]:
from datetime import datetime

date_format = "%d, %b %Y"
now = datetime.now()
print(f"Hello! Today is {now.strftime(date_format)}.")

If you want, you can include other characters in the format string. In the next example, we provide punctuation (":" and "-" characters) to make something like we'd see in a log file (easy to sort chronologically).

In [None]:
from datetime import datetime
now = datetime.now()
exact_format = "%Y-%m-%d %H:%M"
print(f"Or, more precisely, {now.strftime(exact_format)}.")

There are many formatting strings available. You can download a "cheat sheet"[50] in the likely event you don't want to memorize them.   Or you can take a quick look at strftime.org.

Working with Calendar
---------------------

Python also has a time-related function that knows how to print nicely-formatted calendars, figure out the day of week, keep track of leap days, etc.[51].  

This is sort of a "toy module" because serious packages have more heavy duty ways to manage calendar functions.   If you want something quick and dirty with minimal overhead, you may be able whistle up something suitable we with a couple of commands.


Here's a brief flyover of some of its capabilities[52].

In [None]:
import calendar
#create a TextCalendar instance
cal = calendar.TextCalendar()
print("We just produced a {}.\n".format(type(cal)))
calendar.prmonth(2025,4)  #April, 2025

In [None]:
#what day of the week was I born?
birthday_year = 1957
birthday_month = 5
birthday_day = 10
birthday_day_of_week = calendar.weekday(birthday_year, birthday_month, birthday_day)
birthday_dict = {0:'Mon', 1:'Tue', 2:'Wed', 3:'Thur', 4:'Fri', 5:'Sat', 6:'Sun'}
print("I was born on a {}.".format(birthday_dict[birthday_day_of_week]))

Introduction to Python's "Middleware" Libraries
-----------------------------------------------

In the spirit of continuing a discussion of adding capability beyond Python's core repertoire, we might consider some of Python's built-in modules **sys** and **os**. We'll discuss their basic contributions here. I would encourage you to explore these on your own using **dir** and **help** as your investigation tools.

-   The **sys** module contains information about the particular Python
    installation you're working with and how it's installed on your
    local OS. This is where **sys.path** (the **list** of directories
    the interpreter uses to find imported modules).
    

-   The **os** module contains a large repertoire of "middleware" that
    operates between Python and your local OS. It keeps track of things like
    the correct path separator to use.

Here are a couple examples from my systems. This is what I get from a Windows box:


And this is the same information from one of my Linux virtuals:

The tools in **os** are really important if you want to maximize the portability of your code from platform-to-platform. The last thing you want to do is put in conditional statements to do things like construct file paths[53]. Or – worse yet – having your code work only on the "flavor" of development platform you're using.

## Exercise:

Starting with a text representation of your birthday e.g., "May 5, 1970" please create a routine that produces a report describing how long you've been alive (to the nearest day) and what day of the week you were born.

Exceptions
==========

Things sometimes don't go as planned – especially when you're dealing with user input, content captured "from the wild", or integrating code written by others into your own. In these situations you need a way to gracefully handle surprises without crashing your code.

Sometimes things do go as expected and you want to set up "sentinels" to identify specific, anticipated situations and act accordingly.

You'll find Python's built-in exceptions model to be useful in both of these situations. This chapter will explore how you can put it to use.

Basic Usage
-----------

The easiest form of an exception is the **assert** statement. Its syntax is pretty easy:

assert \<some statement\>

In [None]:
assert True

In [None]:
assert False

As you can see, the interpreter generates an **AssertionError** (which is a type of exception) when statement evaluated is **False**. This sort of a blunt approach that doesn't yield much actionable information, but it is a "quick and dirty" way to identify a problem during execution.

As this stands, finding a **False** condition basically crashes your program – something you usually don't want to happen. So how can you handle this more smoothly? You can tell the interpreter what to do in specific circumstances using a **try** / **except** block. 

Here's how:

In [None]:
try:
    assert False
except AssertionError:
    print("Sorry, pal")
    print("I'm gonna keep running.")

Much better, right? The general form of a try/except block is:

    try:
        <an indented suite of statements>

    except <some exception>:
        <an indented suite of statements >

    except <some exception>:
        <an indented suite of statements >

    finally:
        <an indented suite of statements>

You can "stack" exceptions handled much like **elif** statements in an **if**..**then**..**else** block. Exception handling stops at the first one found. The **finally** block gets executed no matter what.

Here's an example:

In [None]:
def bad_int():
    int('a')

def bad_not_defined():
    int(a)

def bad_div():
    1/0

def good():
    print("Hi!")

for func in (bad_int, bad_not_defined, bad_div, good):
    try:
        func()
    except ValueError:
        print("you have no values.")
    except TypeError:
        print("Learn how to type.")
    except NameError:
        print("You have a horse with no name.")
    except ZeroDivisionError:
        pass
    finally:
        print("I'm done.\n")

One important point to note here is that, like an **if .. elif ..elif** stack of code only one exception handler's indented suite can be executed. The implication is that you want to catch the most general exceptions at the top of the stack and the most general ones at the bottom. More on this in a moment, but it's an important design consideration.

The Exception Class
-------------------

In Python, exceptions are all derived from the same parent (called **Exception**) and specified to handle individual problems[54]. This presents the possibility to create your own if none of the built-in ones fit. 

Here's how you might create simple custom exception and a more elaborate one to handle the potentially-serious Wombat condition:

In [None]:
# A minimal custom exception
class MinimalException(Exception):
    pass

class WombatException(Exception):
    def __str__(self):
        return("Wombat!")

def wombat():
    raise WombatException
    
wombat()
#result
#__main__.WombatException: Wombat!

Here, we see what's known as a "Traceback".   This shows the chain of execution present at the time the problem was encountered.   The proximate (latest) cause is at the bottom.   The exception event was triggered earlier by the call to wombat().   Careful inspection of this reveals the call stack (layering of what calls what) and local context of the trigger(s).

We'll get into classes later in the course. For now, know that we took a **class** object already part of the language, inherited from, and redefined part of it.

Because all exceptions are inherited from the same parent, there's a complex "family tree" of available ones. Here's part of it:

    +-- BaseException
    +-- Exception
    +-- StopIteration
    +-- StopAsyncIteration
    +-- ArithmeticError
        | +-- FloatingPointError
        | +-- OverflowError
        | +-- ZeroDivisionError

You don't need an encyclopedic understanding of all this but it's important to know that the interpreter will catch exceptions closest to the "root" of the family tree. To put it another way, if you have a line of code like this:

    except ArithmeticError:
    
Python will catch the **FloatingPointError**, **OverFlowError**, and **ZeroDivisionError**. 

If you do this:

    except Exception:   
    
Python will catch any exception.  


Python has many built-in exception types that handle specific problems encountered during execution. Some of the more common ones include:

    -   **KeyboardInterrupt** (the user hits cntl-c)

    -   **TypeError** (the wrong type of object was provided)

    -   **ValueError** (an invalid argument was supplied)  
    
The entire "family tree" of exceptions is in the official docs[55] and could make a valuable addition to your personal library.    

Miscellaneous
-------------

If you want to capture the message that normally comes with the exception in your own **except** clause, you can create a handle to it and print it out. You can also print out the call stack and specific lines of code that caused the problem (what you normally get when you make a mistake at the keyboard) using **traceback**.

Also, you can ask the interpreter to pass handling of the exception to a handler higher up in the call stack (one of the routines that called your current routine) with the keyword **raise**. If that error was handled higher up, it will be handled there. 

Here, we set up a handler in the for block – a fancier one this time because we have named our error handler "ve" and printed it out. We also have chosen one of the methods of the **traceback** module to give detailed information about the source of the problem[56].

Note that the function **another\_bad\_int** does not have any of the handling logic. If a problem is detected there, it "passes the buck" to the handler we just set up[57].

Here's an example:

In [None]:
import traceback
def another_bad_int():
    try:
        int('b')
    except:
        raise
        
for func in (another_bad_int, ):
    try:
        func()
    except ValueError as ve:
        print("you have no values\n\n", ve, "\n")
        #traceback.print_stack()    # <---- uncomment for really verbose output
        raise

Strategies
----------

-   You want to get a good match between the exception(s) you looking
    for and the exceptions you're handling. Make the **except**
    statements as narrow as possible (furthest out on the branch of the
    "family tree" as possible).
    

-   Know what you're looking for and why. You can try to break it at the
    keyboard first to figure out what could go wrong.
    

-   Keep it "local" – check a line or two at a time.


Here's a worst case scenario:

In [None]:
try:
    "1,000 lines of code you don't understand"
except Exception:
    pass

Input and Output
================

So far, we've been working with programs and data that exist only in memory. This is fine unless you want the information produced to persist between sessions. To accomplish this, you'll want to learn how to store information in files of some type. These could be text files, databases, or (if you want to store intact objects) **JSON**, **pickle**, or **shelve** files.

To accomplish any of these we need to establish a pipe to move information to another system resource and the necessary system object to receive it. Let's start with simple file object and the built-in **open** directive. The syntax is:

    <file handle name> = open(<file name>, <mode>)
    
There will always be some sort of encoding.  That's the mapping between the characters you see rendered on your screen and the bytecodes stored in the computer.   If you don't specify, the system default on whatever machine your code is running on will be chosen.

Basic File i/o
--------------

TWe can create a file called 'afile' in the write only mode. If another file of the same name already exists it will get overwritten silently, so be careful.

Once the file is open, we can write something to it using the file handle's **write** method. When done we can execute the **close** method.

To open a new file and establish a handle to it, we can go:

In [None]:
f = open('afile', 'w')
print(f)
f.write("Hello, afile")

When finished with a file, it's good practice to close it.  You can see that even when the file is closed, the file handler object still exists.

In [None]:
f.close()
print(f)
print(f.closed)

We can use the file handle object to tell us about the file. This code sample shows you some of the things you can learn.

In [None]:
f = open('afile', 'w')
f.write("Hello afile!")

# Some attributes and methods of interest:
print("What's up with {}?\n".format(f.name))
print("readable?", f.readable())
print("writable?", f.writable())
print("encoding:", f.encoding)
print("closed?", f.closed)
print()
print("Closing Now!")
f.close()
print("closed?", f.closed)

There are different modes for opening text files. Besides read "r" and write, "w" you can open it an append mode "a." This allows you to open a file for writing without destroying any existing file. Instead, any new write operations add material to the end. If you include a "+" with any of these, you get both read and write access.

The modes we've seen so far are for text files – these all involve some sort of encoding operation to convert human-readable characters to raw bytes. If the file is storing binary data, you have to let the compiler know by using a "b directive." These take the form "r+b", "w+b" and "a+b." 

For instance, you can go:

    f = open ('junk2', 'w+b')

 Creating a Context
------------------

In the examples above, we closed the files when done with them. This is considered a best practice because, although Python's built in garbage collection will probably take care of things, it works on its own schedule. Some operating systems (Windows) have a limit on the number of open files you're allowed, and with large applications – like file-based database systems you could get into trouble.

An easier way to handle closing files IMHO is to create a context using the **with** keyword. A context is like a temporary sandbox for a block of code to run in. When the code is done, the context is automatically terminated and all the objects are eligible for garbage collection.  

Here's how you might use it:

In [None]:
with open('junk2', 'w') as f:
    f.write('hey there junk2')
    
print(f.closed)


File Pointers
-------------

Internally the file has a pointer – sort of like a sticky note – to tell it what line it's on. When you **open** a file in a read or write mode, the pointer starts at the top. If it's open in the append mode, the pointer starts at the bottom.

You can move the file pointer around using the file handler's **seek** method, invoked with a single argument for the position you want to move it. 

Here's an example:

In [None]:
f = open('afile', 'a+')
f.read()

In [None]:
f.seek(0)

In [None]:
f.read()

In [None]:
f.seek(6)

In [None]:
f.read()

In [None]:
f.tell()

In [None]:
f.seek(0)

In [None]:
f.tell()

You'll note that the first time we read it, nothing was reported. That's because the file pointer was already at end of the file. With seek we put the pointer to the top of the file and **read** performed as expected. The **tell** method simply reports the position of the pointer. Here we're verifying that **seek** did its job.

I would point out that using **seek** against a text file is a dicey proposition – unless you're simply going to the top. 

The reason is that the width of a binary character is predictable but the width of a character is not. If it's only the bottom half of ASCII, the characters are one byte long, but you never know. This all depends on the encoding.  For instance, the ubiquitous UTF -8 can hold all 1MM + Unicode code points, but individual characters only consume the required "space" – anywhere from one to four 8-bit bytes.

 Working with the File System
----------------------------

The easiest way to work with the file system is to use the **os** library[59]. Here is where you can find all sorts of tools that you might use at the command line of a shell program. We'll use the following example to take some of the methods for a spin around the block while exploring your file system and the repertoire of the library.

In [None]:
import os

We can find the current working directory and navigate directories using the same sort of techniques one might use at the command line.

In [None]:
# Get the name of the current directory
original_dir = os.getcwd()
print('We started in:')
print(original_dir)

In [None]:
# Go to the parent directory
os.chdir('..')
print("Now we're in:")

The directory contents will show up as a list.  We can use list methods to locate specific files.

In [None]:
#get the original directory contents
dir_contents = os.listdir(original_dir)
print(", ".join(dir_contents))

look_for = 'afile'
print()
print(f"Is {look_for} in {original_dir}?\n\n  {'Yep' if look_for in dir_contents else 'Nope.'}")

In [None]:
look_for in original_dir

The file system is a zoo inhabited by many different beasts.  Everyone is familiar with files, directories, and subdirectories, of course.   But there are also sockets, symbolic links, mounts, streams, etc.  To the file system they're basically just objects.   You might care, so you can use **os.path** methods to find out.

Here's how you can screen directory components:

In [None]:
print("Examining the contents of {}.\n".format(original_dir))

# Get a copy of the dir_contents object that we can trash.
sacrificial_dir_contents = dir_contents.copy()

while True:
    # pop() is handy for consuming a list.
    try:
        fs_object_name = sacrificial_dir_contents.pop()
    except IndexError:   # We've exhausted the file contents
        break
    
    # os.path.join() will always use the right path separator.
    fs_object = os.path.join(original_dir, fs_object_name)

    # These os.path methods can classify a file system object:
    if os.path.isdir(fs_object):
        label = 'dir'
    if os.path.isfile(fs_object):
        label = 'file'
    if os.path.islink(fs_object):
        label = 'link'
    
    print(label, fs_object_name)

We can use **os.path** methods to query for existence of specific file system objects and create them if they don't already exist.

In [None]:
#checking for a directory, creating if it's not there
print("\nIf you don't have a junk directory, let's make one.\n")
look_for = 'junk'
look_in = original_dir

if not os.path.exists(os.path.join(look_in, look_for)):
    os.mkdir(os.path.join(look_in, look_for))
    
#another way, using exceptions
try:
    os.mkdir(os.path.join(look_in, look_for))
except:
    pass

# Success?
if os.path.exists(os.path.join(look_in, look_for)):
    print(f"Yea!  {look_for} has been created in {look_in}!")

Many things you can do at the command line you can accomplish within Python.    Let's take look at what attributes and methods are available.  We'll create a little utility function to screen out stuff that we probably don't want to see.

In [None]:
def screen_module(amodule):
    clean = []
    for name in dir(amodule):
        if name[0].islower() and name[0] != '_':
            clean.append(name)
    clean_names = ", ".join(clean)
    print(f"Great stuff in {amodule}\n\n {clean_names}:")

screen_module(os)
print()
screen_module (os.path)

File Metadata
-------------

You can retrieve file metadata using the **os.stat** method.

In [None]:
os.chdir(original_dir)
os.stat('afile')



The output is pretty horrible to look at, and poorly-labeled, but here's
a list of a few of the ones you'll need most:

    st_ctime - time of creation
    st_gid - group ID of owner
    st_mode - protection bits
    st_mtime - time of last modification
    st_size - total size, in bytes
    st_uid - user ID of owner

You can access an individual element using its name - they are just attributes of the returned object. For instance, if you wanted the file size you could go:

In [None]:
os.stat('afile').st_size

Pickle
-------------------

Python supports a native whole-object serialization protocol called **pickle.**  It can encode and decode intact objects like functions and classes into a form that can be saved to the hard drive. The **pickle** methods work in byte streams.

Why would you want to work with bytes? For one thing, you can store your code very compactly. More importantly, you can stream byte code between applications or computers whereas you can't do that with Python objects.

Another advantage is that serialized data is in a very predictable format. That makes life very easy on the interpreter. Let's say you have a huge data file that you read into a Python array for processing. If the file starts out life as a comma-delimited file, say, reading it is expensive because the interpreter has to figure out how large every piece is, convert it to a numeric data type, and put it into memory.

If, on the other hand, the interpreter knows the geometry of the data and can "mechanically" ingest the pieces of it, very little processing has to happen – it just needs to be loaded into memory.

In my experience a 2MB, csv-formatted file takes around 30 seconds to load on a decent laptop; a serialized version takes only about 0.5 seconds. Your mileage may vary, but you can bet on seeing some real performance differences[61].

So how do we work this magic with pickle? Let's find out.

In [None]:
"""quite a pickle"""
import pickle

#make an object
obj = [ [1, 2, 3],
        [4, 5, 5],
        [7, 8, 9]
      ]
print("hey, we've got an object")
print(obj)

In [None]:
#open a binary file (remember, we're writing bytes)
pickle_file = "brine"
with open(pickle_file, 'wb') as f:
    pickle.dump(obj, f)

In [None]:
# Let's kill the object to prove this works
#    We could go obj = None; keeps the name, associates with None
#    del is a top-level method to actually remove the object
del obj 
try:
    obj
except NameError:
    print("\nno object here!")

with open(pickle_file, 'rb') as f:
    recovered_obj = pickle.load(f)
    
#now, take a look
print("\nPresto, chango, here's our recovered object!")
print(recovered_obj)

You can **pickle** multiple objects, but you have to do so individually.
Here's how you might do it:

In [None]:
"""Quite a crowed pickle barrel"""
import pickle

# Make a few objects
obj0 = [[1, 2, 3],
        [4, 5, 5],
        [7, 8, 9]
       ]
obj1 = "howdy doody"
obj2 = set([33,43,53])

# Serially store these objects
pickle_file = "spicy.pkl"
with open(pickle_file, 'wb') as f:
    pickle.dump(obj0, f)
    pickle.dump(obj1, f)
    pickle.dump(obj2, f)

In [None]:
# Destroy the objects   
obj = None; obj1 = None; ojb2 = None # not recommended

# Serially recover the objects
with open(pickle_file, 'rb') as f:
    recovered_obj0 = pickle.load(f)
    recovered_obj1 = pickle.load(f)
    recovered_obj2 = pickle.load(f)
    
# Now, take a look
print("Our objects survived recovery!\n")
print(recovered_obj)
print(recovered_obj1)
print(recovered_obj2)

While this works, sometimes it's more convenient to organize objects to be pickled in a **dict** object – this makes tracking them much easier.  

Since we're storing, then recovering, the objects serially we need to keep track of the order.  The pickle object works on a strick "first in, first out" basis.

Here is how you might apply this strategy. 

This example embeds a (potential)really big mistake. Can you spot it?

In [None]:
import pickle
from datetime import datetime

from datetime import datetime
timestamp = f"JUNK = {chr(39)}{datetime.now().strftime('%d, %b %Y')}{chr(39)}"

# Create one-line file then import an object from it
with open('file_for_import.py', 'w') as junk:
    junk.write(timestamp)     
from file_for_import import JUNK

pickle_file="dill.pkl"
# Make a few objects
obj0 = [[1, 2, 3],
        [4, 5, 5],
        [7, 8, 9]
       ]
obj1 = "howdy doody"
obj2 = set([33,43,53])

# Make a dict to pickle:
to_pickle={ 'obj0' : obj0,
            'obj1': obj1,
            'obj2': obj2,
            'junk': JUNK
            }

# Pickle the dict then destroy it
with open(pickle_file, 'wb') as f:
    pickle.dump(to_pickle,f)

In [None]:
del to_pickle

with open(pickle_file, 'rb') as f:
    recovered = pickle.load(f)
    
print("Here is our recovered object:")
for k,v in recovered.items():
    print(f"{k}:  {v}\n")

print("Now we can pick off an object by name:\n")
print(recovered['junk'])

In [None]:
from datetime import datetime
timestamp = f"JUNK = {chr(39)}{datetime.now().strftime('%d, %b %Y')}{chr(39)}"

# Create one-line file from which we can import an object
with open('file_for_import.py', 'w') as junk:
    junk.write(timestamp)  

    
from file_for_import import JUNK
JUNK

The "big mistake" here involves the inclusion of JUNK in our file system object. The issue is that it's a value imported from another routine.  Safe enough if working alone, I imagine, but what if junk.py were being maintained by another team working long and hard on perfecting the value? Today it might be 777, but what if later it were refined to become 888? 

Our persistent object would have stale information and we might never know it. In fact, the pickle file does not even a path back to find where 777 came from in the first place. We certainly won't get warnings or error messages. 

Anyway, you can do this, but be careful.

Pickle Caveats
--------------

You can't **pickle** everything. The objects have to be discrete and available globally to the module. Here's what can be pickled, strait from the docs[62]:

-   **None**, **True**, and **False**


-   integers, long integers, floating point numbers, complex numbers


-   normal and Unicode strings


-   **tuples**, **lists**, **sets**, and dictionaries containing only
    picklable objects
    

-   functions defined at the top level of a module


-   built-in functions defined at the top level of a module


-   **class** objects that are defined at the top level of a module

**pickle** is a Python-specific tool – pickled objects can't be deserialized on other platforms. 

Finally, since pickled objects can contain malicious code they are potentially vectors for infection. You don't want to accept persisted objects from any source you don't trust.

Other Serializers
--------------

While pickle is great at serializing and storing intact Python objects and data sets it's not at all human-friendly, nor are objects usable by other languages.  Its prime advantages are that it's fast and efficient.  
It's also built into pandas and several data analytic packages.   For instance, if you train a model using sklearn (or most other packages), you'll save the trained model as a pickled object.

Is efficiency important?  Of course it is.   As a practical matter, though, you'll want to consider whether readability and transparency are more important.   For smaller datasets or Python programs you'll hardly notice performance differences.   If you're ever going to want to read your file, you want to consider whether a few hundred milliseconds of computer time is more valuable than a few hours of your own time.

For human-readable serialization, you have a few choices - all of which have roughly the same API (interface) as pickle does.   These include:

    import xml   # Extended markup language, built-in
    import json  # Javascript object notation, built-in
    yaml         # Yet another markup language, not built-in, most human-readable IMHO
    
If you want yaml, you'll have to install if first with a command like one of these:
    $ pip install yaml
    # conda install yaml


## Exercises:

Please create three functions in the same module (file). Each will take two inputs. One file will add the numbers, another will multiply and the third will subtract them.

Destroy the three functions then recreate them from the serialized files. Verify that they still work as well as the original ones.


Classes in Python
=================

Classes are where the rubber meets the road. It's here that all the concepts we've discussed to date coalesce in to useful, reusable programming product. The real power of class objects stems from the fact that Python (like most modern languages) is designed to be "object oriented." But what does that really mean?

Object-oriented languages are built on three tenants:

-   Polymorphism


-   Inheritance


-   Encapsulation

We'll now discuss each in more detail.

Polymorphism
------------

This is the notion that there can be one interface to the world which, from the user's perspective, is the same for accomplishing many things.  The question of just what gets performed and how happens "under the hood" with the user blissfully unaware. We've already encountered that with respect to the "+" operator.

In [None]:
"abc" + "bcd"

In [None]:
1 + 2

In [None]:
complex(3,4) + complex(4,4)

In [None]:
[3,4,5] + [7,8,9]

Google is another example, writ large. A single input window unleashes unbelievable computing horsepower, access to petabytes of indexed information (and targeted ads, but oh well). And who knows what happens when you hit the "Search" button?

In [None]:
Google is another example, writ large. A single input window unleashes unbelievable computing horsepower, access to petabytes of indexed information (and targeted ads, but oh well). And who knows what happens when you hit the "Search" button

Inheritance
-----------

Inheritance is the idea that you can create related objects by separating what they have in common from how they differ. That way, all the common elements can exist in one and only one place. And the differentiating element can exist more locally – again in one and only one place.

We've already encountered this in examining the exception hierarchy, repeated here:

    +-- BaseException
    +-- Exception
    +-- StopIteration
    +-- StopAsyncIteration
    +-- ArithmeticError
        | +-- FloatingPointError
        | +-- OverflowError
        | +-- ZeroDivisionError


All **Exception** objects "inherit" the characteristics of the **BaseException** – that's where all the boilerplate and housekeeping lives. All the exceptions e.g., **ArithmeticError** inherit the characteristics of **Exception** and add their own special sauce.  Further down the line, the **OverflowError** inherits everything that the **ArithmeticError** has and further specializes.

We've seen this already in a really simple form we created a custom class:

In [None]:
class WombatException(Exception):
    def __str__(self):
        return("Wombat!")

In so doing, we inherited from Exception then overrode any existing
**\_\_str\_\_** method – that's the one that **print** uses – to make it
print out the message.

I recognize that all this is a bit abstract so far, so let's jump into some code and see what we can do. We can start with something simple.  Here's how we can make a class, inherit from it (create a "subclass"), and create a specific instance.

In [None]:
class SuperSimple:
    a = 1
    
class SuperSimpleSubclassed(SuperSimple):
    pass

s = SuperSimpleSubclassed()

print("free variable 'a' is {}".format(s.a))

As you can see the instance **SuperSimpleSubclassed**, 's', contains the object 'a' which came along for the ride from **SuperSimple**. It's a common practice to have a base class of some sort – this will contain fundamental functionality and boilerplate code that will become available to all subordinate classes. Here's a slightly more complicated example of a base class:


In [None]:
#A base class
class BaseClass:
    def __init__(self):
        print("BaseClass __init__()")
        
    def shout_out(self):
        print('\nYo! from BaseClass\n')
        
    def print_something(self, thing):
        print("\n{} from BaseClass \n".format(thing))
        
base = BaseClass()
base1 = BaseClass()
base.shout_out()

Here, we've created two subclasses, **base** and **base1**. These are independent objects which have their own namespaces. The result is that each has its own copy of **shout\_out** and **print\_something**. Now, if we want to extend this class the process is simple. 

We inherit from it and add a new method **hello\_child**. We also replace the parent class' **\_\_init\_\_** method by including a new method of the same name. In case like this, we may well want to retain access to the parent class' method. 

We can do so easily by invoking the **super** method. In **ChildClass** we execute it to ensure anything introduced into the namespace by the parent class is also available to the child class.

In [None]:
class ChildClass(BaseClass):
    "simple inheritance, executing parent class __init__()"
    def __init__(self):
        print("ChildClass __init__()")
        super().__init__()

    def hello_child(self):
        print("hello from ChildClass")

# This directly calls ChildClass.__init__() and indirectly BaseClass.__init__()
kid = ChildClass()

In [None]:
# The kid object is an instance of ChildClass
kid.hello_child()

Unlike many languages Python supports multiple inheritance. This makes it easy to "cherry pick" objects from already-developed code, even from disparate sources. 

Here's an example:

In [None]:
class AnotherClass:
    def print_something(self, thing):
        print("AnotherClass is printing {}\n".format(thing))
        
class ComboClass1(BaseClass, AnotherClass):
    def __init__(self):
        #when methods have same name, leftmost is preferred
        self.print_something('something')

class ComboClass2(AnotherClass , BaseClass):
    def __init__(self):
        self.print_something('something')

Python automatically resolves any conflict between methods of the same name which may exist in multiple inherited classes. The left-most method trumps the others, so to change the behavior all one has to do is switch the order.

In [None]:
combo = ComboClass1()

In [None]:
combo = ComboClass2()

It's possible to "overload" arithmetic operators by replacing built-in methods like **\_\_add\_\_** and **\_\_mul\_\_**. Just for fun, here's how you could create a new version of the **str** object to override how it handles the "+" sign[63]:

In [None]:
class newStr(str):
    def __init__(self, value):
        self.value = value

    def __add__(self, other):
        return "{}+{}".format(str(self.value), str(other ))

s = newStr('hello')
s

In [None]:
news = s + 444
news * 4

Only the '+' operator is overridden – the new object retains all of the other behaviors associated with the built-in **str** object.

Let's put all this together, examining some issues around variable scoping and "ownership" among the different objects. Here's some intact code, followed by a breakdown of the important bits of it:

In [None]:
# py_animal_class.py

class Animal:
    # Class variable (in class namespace)
    tricks = ["jumping", "playing dead", "rolling over", "walking backwards"]

    def __init__(self, name, species, age, fav_food):
        "Variables with 'self.' belong individually to each instance.)"
        self.name = name
        self.species = species
        self.__dict__["age"] = age  # alternative to self.age = age
        self.fav_food = fav_food
        self.stomach = []

    def __str__(self):
        " This is what print() uses."
        fstr = "Hi, I'm {}, a {} who loves {}!\n"\
            "And I know how to all these things: {}!\n"
        return fstr.format(self.name, self.species,
                           self.fav_food,
                           " and ".join(self.tricks))

    def __repr__(self):
        " This is what the REPL uses."
        return f"Animal(name={self.name}, species={self.species})"

We created Animal by using the keyword **class**.

    class Animal: 

Because "Animal" is not followed by any arguments it has not inherited from any other **class** –we're starting with a clean slate.

Next, we have a "top level", global-to-**Animal** variable defined.

In [None]:

tricks = ["jumping", "playing dead","rolling over", "walking backwards"]



This variable's scope is such that it's visible to any of the class methods ("functions" when standing alone) within **Animal**.

Right below, we see the first instance of a method beginning with a "dunder."

In [None]:
def __init__(self, name, species, age, fav_food):
    " The instance initializer."



The special name **\_\_init\_\_** signals the interpreter that this method needs to run when a new instance of the class is created.

Note that the constructor (and the constructors for all the methods) have "self" as the first argument. That's a stand-in for "this particular instance of **Animal** – having nothing to do with any other instance of **Animal**." In other words "self" refers a specific manifestation of the general class **Animal**[64]. You can see it in use in each of the next several statements.

Variables like **self.name** and **self.species** are assigned for this particular instance based on the input arguments provided when an instance is created, as it might be with a statement like:

In [None]:
# Make us a dawg and create an introduction with the __str__() method
mypet = Animal("Fang", "dog", 10, "steak")
print(mypet)

In [None]:
# Create a cat with the same hobbies as the dog.
neighbor_cat = Animal("Fluffy", "cat", 1, "mice")
print(neighbor_cat)

In [None]:
# Override an attribute of the Animal instance that we call "Fluffy"
print(f"Original cat tricks and food: {neighbor_cat.tricks} / {neighbor_cat.fav_food}")

# This creates the overrides
neighbor_cat.tricks = ['tangling yarn', 'throwing kitty litter']
neighbor_cat.fav_food = 'rats'

print(f"Alternative cat tricks and food: {neighbor_cat.tricks} / {neighbor_cat.fav_food}")

In [None]:
# Override an attribute of the Animal instance that we call "Fang"
print(f"Original dog tricks: {mypet.tricks} / {mypet.fav_food}")

# This creates the override
mypet.tricks = ['gnawing mailman', 'howling @ moon']

print(f"Alternative dog tricks: {mypet.tricks} / {mypet.fav_food}")

As you can see the two Animal instances have complete independence from one another.  Sort of.

If you look carefully at the class definition, you'll see that the fav_food is defined in the __init__() method as a member of the "self" namespace:  

    self.fav_food = fav_food

So that means that my_pet.fav_food is a separate object from neighbor_cat.fav_food.

It's another story altogether with tricks.  This is defined as a member of the Animal class:

    class Animal:
        tricks = ["jumping", "playing dead", "rolling over", "walking backwards"]
        
The result is that the default value for tricks is just what's defined in the class.  For any instance, we can do a 'hard override' on this default by specifying a different value for an instance like this:

    mypet.tricks = ['gnawing mailman', 'howling @ moon']
    
Which forcefully imposes a different attribute va;ue on the instance.   If we fail to do so, our dog is stuck with the class-level behaviors forever.   One way to think of this is:   absent imposing a different value, the instance's value is linked to the class's value.

Here's an example:


In [None]:
# Here's another cat.
another_cat = Animal("Snarly", "cat", 100, "catnip")
another_cat.tricks

In [None]:
# Change the tricks at the class level.
Animal.tricks = ['Tired of doing tricks!']

In [None]:
another_cat.tricks

What happened is that the instance's bag of tricks changed - even after the instance was created!   That's because of the linkage back to the class variable.

The moral of the story is that you need to be aware of scope and the effect of changes.

Note that Python is very cavalier about how you can change attributes.  It's very much of a "responsible adults can do what they want" approach.   So, as we've seen, Python will let you:

    Change class-level attributes at will
    Update instance-level attributes at will
    
And, you can even add attributes not initially assigned.  Watch this:

In [None]:
another_cat.reaction = "Yikes"
another_cat.eyes = "Crossed"
print(f"This cat has {another_cat.eyes} eyes.  {another_cat.reaction}!")


In [None]:
tricks = ["xxx"]
print(f"Alternative dog tricks: {mypet.tricks} / {mypet.fav_food}")

Encapsulation
-------------

Encapsulation is the notion that there is one and only one way to access an object. 

The idea is that objects are atomic, but they have a specific context – and you're supposed to interact with them "by the rules." For example, some languages use getters and setters to interact with a class' attributes and enforce that interaction. That way there are specific points of contact that serve as gateways. As another example, methods ("verbs") within a class are only accessible within the context of the class.

Python naturally supports encapsulation by "hiding" variables within the namespaces of different structures like functions, inner functions, classes, and modules. A further level of control can be exercised by employing "getters" and "setters" to control access to instance variables. Note that Python has no such requirement but sometimes it's a good idea to lock things down further.

To accomplish this, Python employs a special "decorator[65]" called **property**. Decorators are methods that take the decorated object as an argument – essentially swallowing the decorated object, much as the print function can swallow another function using its output as an argument as in this example:

    print("hello world".upper())

In the following example we define two methods, both named after a variable we want to manage **val**. 

The idea is that if **val** is requested, the method under **@property** is executed. If there's an attempt to change **val**, the method under **@val.setter** is executed.  

Here, we're using the **setter** to screen out assignment of inappropriate object types – we're accepting only integer values.

In [None]:
class GetSet:
    def __init__(self):
        self._val = None
    
    @property
    def val(self):
        return self._val

    @val.setter
    def val(self, value):
        if isinstance(value, int):
            self._val = value
            print("OK. Thanks for giving me " + str(value))
        else:
            print("Sorry, I'm looking for an integer but you gave me " + str(value))


In [None]:
# Make an instance of GetSet and provide a value
gs = GetSet()
gs.val = 3

In [None]:
gs.val = "aardvark!"

The reason this works is that we've introduced a fake variable here '_val' (this is called "name mangling").  You'll see that it's introduced as 'self._val' in the __init__() method.   This is the variable that carries the real payload.

When we want to interact with variable exposed to the outside world 'val', our decorated class methods by the same name intercept the request.

If the interaction is merely a request for the current value, we return the internal 'self._val'.

If the interaction makes an attempt to change the value, we subject the request to thorough interrogation.  Only after it passes muster do we pass the new value to the internal 'self_val' variable.

This provides some "soft" protection against outside code changing attributes.   It's not remotely like security against malicious code or an overly-confident programmer.   In fact, by convention any attribute that begins with a single- or double-underscore is named as a warning to other coders that it's intended for internal use only.

Regular Expressions
===================

Regular expressions (also known as REs or regexes) offer a very low-level way to process text, very efficiently.  Regexes have been around for about as long as computing - in fact they're the "re" in the ubiquitous grep (short for global regular expression print).

You'll run into them as a matter of course in Perl and Django URL resolvers. They can be used for search expressions in Vi/Vim, VSCode.  You can even use them in Pandas to conduct cell- or column-wise wrangling on text columns.

While powerful, the offer some of the least transparent, horrifically-hard-to-read statements ever. IMHO, anyway. So we'll build up gradually and your confidence in using them should materialize quickly.

Python support is contained in the **re** library, and in the normal **help** facility. 

A really excellent online resource can be found at https://regex101.com/#python. It provides an interactive way to build your regex, test it against a string, view matches.   Best of all it provides a breakdown of exactly what each step of your regex is doing.

Begin at the beginning
----------------------

For starters, let's just look for a string inside another string. For this you need a regex string to describe what you're looking for and another string to search.

In [None]:
import re

# Our regular expresion 'regex' is the string we're looking for.
regex = r'x'       # 'r' means don't escape any characters.
target = r'Texas'

# Use the search() method to do a simple search
result = re.search(regex, target)

if result:
    print("Yay! {} found..".format(regex))
    print("The found object is: {}.".format(result))

As you can see, this returned a **SRE\_Match** object. The fact that it returned anything means that we hit pay dirt by finding at least one match.

Character Sets
--------------

If you want to look for any one of a bunch of characters, you can use what's known as a "character set." It's just a string that includes surrounding square brackets. 

Any of these work: 

    r'[abc]'  r'[12345]'  r'[1-4]' 

Here are some examples:

In [None]:
regex =  r"[0123456789]" , #all digits

# Here are some alternatives, some ideomatic

alternatives = [ r"[0-9]",              #shorthand for all digits
                 r"[abc]" ,             #any of a, b, or c
                 r"[abc] [abc] [abc]",  #3 consecutive letters; each any of  a or b or c
                 r"[^0-9]",             #NOT a digit (the ^ in first position negates
               ]


In [None]:
import re

def test_regex(regex_list, target_list):
    "Quick test for regex matches, takes two lists."
    print(f"{'Regex':>20}       {'Result':<50}")
    print()
    
    for r in regex_list:
        for t in target_list:
            result = re.search(r, t)
            if result:
                print(f"{r:>20} {'.'*3} ** matched with {t:<50}")
            else:
                print(f"{r:>20} {'.'*3} failed with {t:<50}")
            del result
        print()

In [None]:
# Here are a few targets - we'll test these against all our regex expressions.
target = (r'Texas', r'123', r"a", r"abc")

# Invoke the test
test_regex(list(regex) + alternatives, target)

Special Symbols
---------------

This is pretty easy so far, right? 

Regular expressions use a lot of special characters which are shorthand for matching rules. You've got to escape them with the "\\" if you intend to use them literally. 

We'll visit them pretty soon, but for now, here they are:

     [ ] ( ) { } + * ? $ ^ . 

Shortcuts
---------

Since people are looking for the same strings all the time there are some clichés / shortcuts available. The general form is a slash followed by a letter. If the letter is lower case, it's an "affirmative" search; if upper case, it's a "negative" search.   Note the symmetry between the upper- and lower-case strings.

Here are a few:

|                           |                                    |
|---------------------------|------------------------------------|
| \\w any digit or letter   | \\W anything but a digit or letter |
| \\d any digit             | \\D anything but a digit           |
| \\s any whitespace        | \\S anything but a whitespace      |
| . (a dot) anything        |                                    |
| ^ match only at beginning | \$ match only at the end           |

Additionally, there are shortcuts that keep you from repeating bits of your regex string when you're looking for multiple instances of the same thing. 

Here are some shorthand expressions for specific quantities:

        \*     0 or more

        ?      0 or 1

        +      1 or more

        {x}    Exactly x

        {x,y}  Between x and y, inclusive of x and y

        {x, }  X or more

Here are some examples:

In [None]:
import re
regex = (   r"cu" ,     #'cu'
            r"0{\d4}",  #4 digits
            r"x{1,3}" , #1,2, or 3 'x'
            r"0{2,}",   #2 or more zeros
            r"\W+",     #contains at least one #non alphanumeric
        )

target = (r'Chicago Cubs', 
          r'1000000 dollars',
          r"xxxooo", 
          r"$2500K")

test_regex(regex, target)

Match objects and compiled regexes
----------------------------------

It's possible to compile your regex to increase performance – this is especially useful if you plan to use it thousands of times because it doesn't have to happen "on demand" each time.

You can query the match objects to find starting and ending positions.  This is useful if you want to use indices to retrieve content around particular phrases.

The following example demonstrates some of the information you can retrieve and the use of **re.compile**:

In [None]:
import re

# Set our regex and target strings.
regex = r"Cubs"
target = r'Chicago Cubs'

# This step complies the regex.
compiled = re.compile(regex)

# The complied version has the same API as the re library
result = compiled.search(target)

if result:
    print("{:10} matched with {:10}." \
    .format(target, regex))
    print("span", result.span())
    print("start", result.start())
    print("end", result.end())

Finally, I'd like to point out that re has a **VERBOSE** mode that allows you to stretch complicated expressions (yes, they can get really complicated) across several lines. When you do so, it's possible to provide comments to help your teammates decipher just what it's intended to do.

Here's an example borrowed from *Dive into Python* [67], to illustrate its use. This regex is used to validate Roman numerals.

In [None]:
pattern = """
            ^ # beginning of string
            
            M{0,4}           # thousands - 0 to 4 M's
            (CM|CD|D?C{0,3}) # hundreds - 900 (CM), 400 (CD),
                             # 0-300 (0 to 3 C's), or 500-800
                             # (D, followed by 0 to 3 C's)
                             
            (XC|XL|L?X{0,3}) # tens - 90 (XC), 40 (XL), 0-30
                             # (0 to 3 X's),
                             # or 50-80 (L, followed by 0 to 3
                             # X's)
                             
            (IX|IV|V?I{0,3}) # ones - 9 (IX), 4 (IV), 0-3
                             # (0 to 3 I's), or 5-8 (V,
                             # followed by 0 to 3 I's)
                             
            $                # end of string
            """

re.search(pattern, 'M', re.VERBOSE)

Substitution
------------

You can use regexes to do string substitutions using the **re.sub** method. Here's an example:

In [None]:
# Set up a search expression, the new value and a target and target string.
regex = 'cubs'
change_to = "CUBS"
target = """If the cubs actually win again, will they still be the cubs?'

# Invoke the re.sub() method
result = re.sub(regex, change_to, target)
if result:
    print(result)

## Exercise

Please write a regex that will convert credit card numbers from their original form (4-four digit numbers) to a safer form like xxxx-xxxx-xxxx-3423. Feel free to use regex101.com or any other resource.

Footnotes
========

[1] www.pydev.org/manual\_101\_install.html

[2] http://wingware.com/downloads This is a commercial product. If you decide to buy it, this discount code will get a substantial savings: PJBD50A1.

[3] No need to memorize these – we're just showing how simple the underlying language is.

[4] You may want to familiarize yourself with the **inspect** library.  It has dozens of tools like **getargspec** (finds arguments accepted by a function with their default values), **getmodule** (returns the module an object was defined in), and **getsource** (returns the source code).  

If you're using an IDE, you might find that some of these are wrapped up and made available via the IDE's graphical interface. It's easy enough to go:

    >>> import inspect
    >>> help(inspect)

[5] If you're interested in exploring these ../experimental/py\_precision.py has examples. If you go on to use elements from Python's scientific stack e.g., **numpy**, you'll be using the numeric types supported by your C compiler. These will typically be limited to 64 bits in modern machines. Cf.  https://docs.scipy.org/doc/numpy-1.13.0/user/basics.types.html

[6] Note all the things we did NOT have to do: declare the looping variable s; find the length of the object; move the pointer along the sequence; ensure we don't run the pointer "off the edge" of the object.  This is all handled by methods defined as part of the object.

[7] An "indented suite" is just a \$20 word for all the lines of code beneath a header (the **for** statement in this case) that's indented the same number of spaces. This is considered the "code block" associated with the header.

[8] We're going into quite a bit of detail on string objects here – partly because they're might useful and partly because they are exemplars of all other Python objects you'll encounter.

[9] You can think of opposing parentheses as the "execution operator", much as + is (usually) the "addition operator". Note that the function won't execute without the ():

    >>>'777'.isalnum
        <built-in method isalnum of str object at 0x000002575B764998>
    >>> '777'.isalnum()
        True

[10] A **list** is a built-in Python object. It's an ordered sequence that may contain heterogenous elements, something like: \[ 'one', 2, 3.0, (4 + 5j) \] Note the square brackets. More on the **list** object later.

[11] An "iterable object" is something you can use a for or next operation on – it knows how to loop over itself.

[12] With no arguments, it splits on anything in the same time zone as a white space (tabs, new line characters, and white spaces).

[13] A **tuple** is the same as a list except it's "immutable" (unchangeable). It has parenthesis, something like:

    ( 'one', 2, 3.0, (4 + 5j) )

[14] The opposite of **chr** is **ord**. Thus **chr**(90) is 'Z' and **ord**('Z') is 90.

[15] **None** is an honest-to-goodness Python object. It doesn't do much, but you can assign a name to it and use the name as you would that of any other variable.

[16] You'll find **in** used in several contexts (as with the **for** statement). It invokes the **\_\_contains\_\_** method to determine membership.

[17] **replace** changes all occurrences by default. If you provide an optional parameter count, you can limit the number of replacements:

    >>> 'aaa'.replace('a', 'A', 2)
        'AAa'

[18] The **lstrip** method removes whitespaces on the left, **rstrip** on the right, and **strip** from both ends.

[19] This section can be skipped without loss of continuity, but may be worth a quick glance.

[20] Back in the day, one had to provide more specific placeholders like %s, %i, and %f for strings, integers and floating point numbers. In fact, Python will still accept these "legacy" specifications. This works:

        >>> "%s %s" %('Hey', 'Joe!')
        'Hey Joe!

[21] The most common alignment choices are left (\<), right (\>), or center(^).

[22] <https://docs.python.org/3/library/string.html#format-string-syntax> It may be well worth your while to take a moment with the docs to get an idea for the breadth of options available. Dozens of examples appear at the bottom that will serve as recipes for complex formatting chores.

[23] "Unpacking" is a common Python idiom – you can create new names and assign members of a sequence as their values using this sort of shorthand (works on lists, tuples, strings, etc.):

    >>> first, second = "AB"
    >>> print(first, second)
        A B

[24] https://www.python.org/dev/peps/pep-0008

[25] As a consequence, be sure to put general "catch all" tests at the bottom of your block – otherwise they could block execution of the contents of more specific tests.

[26] You might want to step through this **while** loop a line at a time in a debugger, as well as the others encountered in this chapter, to observe which lines get executed – and in what order.

[27] Python supports lots of permutations of this including -= \*= /= and %=.

[28] You can think of a binary shift as increasing or decreasing a binary number by a power of 2. In the base 10 world it's like going 1.2 \* 10\*\*2 = 120 … 1.2 \* 10\*\*3 = 1200 … 1.2 \* 10\*\*4 = 12000. As you increase to power of 10, you just tack on a zero because you're shifting the "1.2" by increasing orders of magnitude.

[29] A **set** is a collection of zero or more unique objects. More on the **set** object later.

[30] This is convenient, but has performance issues at scale – the interpreter has to be prepared to constantly adjust things, and that means lots of relatively expensive memory allocation operations. If you're considering a project that could deal with large amounts of data consider using immutable sequences (**tuples** and **strings**) where possible and/or objects that impose heterogenous elements e.g., **array.array** or **numpy.ndarray** objects.

[31] It's helpful to think of the "stop" parameter like a real-world stop sign. You don't roll through it, but stop before it. Though a bit strange at first, this specification is handy because iterable\[:2\] and iterable\[2:\] are complementary "halves" of the entire object.

[32] The collections library contains some great tools not covered here including **defaultdict** (a **dict**-like object that automatically installs a new key:default\_value pair if the key doesn't exist) and **deque** (a double-ended queue useful for managing processes and threads).

[33] Generally a hash is a computer science term for the eponymous data structure. There's a very approachable article in Wikipedia, in case you're interested: https://en.wikipedia.org/wiki/Hash\_function

[34] If you accidentally use a mutable (changeable) object you'll get a nearly-indecipherable error message, something like: builtins.TypeError: unhashable type: 'list'

[35] You may have noted that Python objects don't have destructor methods, as you find in many other languages. The **del** method is a general purpose tool for destroying unwanted objects. You typically don't need to worry about this because Python has built-in "garbage collection". Objects that are dereferenced get destroyed automatically.

[36] NB **sort** is a method belonging to the **list** object. It's an "in-place" operation – that means that the elements are simply shifted around and no new object is created. As a result, it returns a **None** object. The **sorted** function is a built-in that will work on any iterable. It creates and returns a new **list** object.

[37] It's easy to build a list comprehension if you do it in stages:

        [ ]                              #empty list
        [ for i in range(3) ]            #add an iterating expression
        [ i for i in range(3) ]          #add the thing that goes into the list
        [ i for i in range(3) if i %2 ]  #add a filter (anything that produces a Boolean)

[38] Cf. https://en.wikipedia.org/wiki/Chess\_symbols\_in\_Unicode

[39] Python 2.x can use Unicode, but it's "by request only".

[40] You can view the global namespace (really a **dict** mapping names and values) using **globals()**. In some IDEs the function names are not shown in stack data.

[41] You can view the local namespace using **locals()**. The local namespace will change as the execution moves from function to function.

[42] You may want to experiment by removing and replacing the comment in **setStar**.

[43] Note the use of the keyword **pass**. This is a placeholder whose only job is to take up space – enough to establish indentation.

[44] In some languages, like Java, functions don't really have a meaning outside the context of a containing class; they're known as "class methods".

[45] Simple **lambda** functions can add readability to your code because they can be defined geographically close to where they're being applied. Though they're only "one liners" they can be arbitrarily complex and really difficult for your teammates to figure out. Sometimes it's much more transparent to use a traditional function.

[46] You can check an annotated version to find out: py\_function\_7\_upgraded.py

[47] As an aside (just for fun) the epoch "odometer" will "roll over" for 32 bit Linux systems on January 19, 2038. This could be interesting if any of these machines still exist. If you have a 64-bit system, you can relax. The rollover won't happen until sometime in the year 292,277,026,596.

[48] Cf. https://docs.python.org/3/library/datetime.html

[49] There is also an "opposite" method that will convert a string into a **datetime** object using the same codes – it's called **strptime.**

[50] http://strftime.org/

[51] If you get serious about wrangling dates you'll want to consider using **pandas**. The basic **pandas** object Series is essentially a **list** on steroids. Like a **list** it has a 0-based integer index.  But you can "bolt on" an index of any other data type, including dates.  When you do, pandas knows how to "stretch" it from days to weeks, interpolate missing values any way you ask, etc. It is aware of business days, holidays, etc. Well worth the investment to learn, IMHO.

[52] You might want to use help() on the **TextCalendar** object produced in this script. It knows how to return specific months, week, or years as printable calendar snippets, as iterators, or as lists of **datetime** objects.

[53] The **os.path** package has several useful methods for teasing apart path names, joining them with the correct separators, determining if specific file system objects exist, etc. The **shutil** library has several methods for working with macro-scale file system manipulations like moving entire directory trees.

[54] If you're running this code in a debugger like WingIDE, the debugger may stop execution inappropriately when an exception is encountered. If this happens, ask the debugger to ignore this exception.  In Wing, there's a checkbox in the Exceptions tab for this purpose.

[55] https://docs.python.org/3/library/exceptions.html\#exception-hierarchy

[56] It's worthwhile taking a quick look at the namespace of the traceback library – it has many tools you can use when doing text-only debugging (these get wrapped up in tools like Exceptions and Call Stack when using an IDE).

[57] There is a more elaborate example in py\_handler\_practice.py, in case you're interested in playing around with it.

[58] The 'r' mode is the default. Strictly speaking you don't need to specify it.

[59] Alternatively, you can use the **subprocess** library. This give you the ability to spawn new processes with which you can invoke bash scripts, execute system programs, and anything else you might do at the command line. For instance, you can go:

    >>> import subprocess

    >>> subprocess.call('dir', shell=True) \#subprocess.call('ls -l', shell=True)

… but you can paint yourself into a corner vis-à-vis OS portability.  Complete docs are here:

https://docs.python.org/3/library/subprocess.html

[60] Java Script Object Notation – an XML like, human-readable format.

[61] You can use **time.time** or, with a bit more effort use **timeit** if you want to experiment. The official docs for the latter can be found here: https://docs.python.org/3/library/timeit.html)

[62] https://docs.python.org/3/library/pickle.html

[63] NB: You would likely never do this in practice. Just because you can, doesn't mean you should :-)

[64] You don't have to use 'self'. You could use 'aardvark' if you wanted, but people would look at you funny ;-)

[65] You'll find some more examples of decorators and their uses in your course file director, all with names beginning with py\_decorators.

[66] Be sure to set it to the Python dialect of regex (there are several). The "cheat sheet" at the bottom right – shows what all the tokens mean. As you build your regex, a complete explanation of it shows up in the upper right. It's not so easy to see, but on the left there's a button that will produce syntactically-correct Python code based on your explorations. This is my "go to" tool for all things regex.

[67] diveintopython.net – an awesome, well-documented, open source book
designed for advanced programmers. It's a good read as you advance along
the path of becoming one.