Language Components
===================

From a syntax perspective, Python is a simple language. As we've seen, there are only 33 keywords. While Python has data types, variables don't need to be declared to be of some specific type before use. There's no requirement to initialize variables. Objects generally come to us "batteries included." Pretty slick, no?

Indentation
-----------

We've seen the **for** statement before, but let's take another look.

In [None]:
for value in range(5):
    print(value)

The first line is of the form:

\<key word\> \<name for index\> **in** \<iterable object\>:

That's it – no fussing around with pointers, instancing index values, or running past the end of the object. The **for** statement "just works."

What follows is an "indented suite", and it can comprise any number of lines of code. Indentation is extremely important in Python because that's how the interpreter keeps track of how to group statements.  Unlike languages such as C and Java, it can't rely on braces for that purpose – Python uses <span class="underline">only</span> indentation.  Statements at the same level of indentation are treated as a being in the same code block, and code blocks can be nested to any level.

This further simplifies Python – it's unburdened by the clutter of braces, statement terminating semicolons, and loop terminating keywords.  All this creates and enforces a high level of readability. The cost, of course, is the need to have attention to fine level of detail and a limited ability to write spatially-dense code.

Strictly speaking, Python doesn't care how much each statement in the same suite is indented, as long as all are the same. Some developers use tabs (each is seen by the interpreter as a single character), but four white spaces is the recommended convention per Python's official style guide "PEP-8." 

PEP-8 is a bit of a dry read, but well worth a quick look, especially if you're planning to work as part of a team[24].   Jupyter, as well as many IDEs have built-in PEP-8 checkers and will (usually!) format your code nicely even if you've forgotten some nuance.

## Iteration and Lazy Evaluation

Many objects in Python have the built-in capability for looping over the elements.  Under the hood, each of these objects has an __iter__ method that provides exact instructions for doing so.   

In the exercise, you saw how the **for** keyword can invoke this automatic looping behavior over a **list** object.  Just above, we used a **range** object with the same syntax.   The **range** object is good to know about because it's a good way to whistle up a sequence of numbers.

It's also an example of a common built-in optimizing technique achieved by making "iterator" and "generator" objects.  These objects contain recipes to generate the next object in a series without having to remember all the elements in the series.   The example we just saw will create five consecutive integers - but it only had to have the wisdom to add one to the last integer it produced.  

These expressions take exactly the same amount of memory to create:

In [None]:
print(range(5))
print(range(5_000_000_000_000))

Note that we're printing out string representations of the objects.   We're not causing either one to iterate or create all of its elements.  That will happen only when we apply a **for** loop.

You'll also note use of underscores to specify the integer 5 trillion.  This is possible since Python 3.6.

The if Statement
----------------

We've touched on the **if** statement, the most basic of control statements. Let's expand the discussion a bit. Its general form is:

if \<condition\>:

    <indented suite>

elif \<condition\>:

    <indented suite>

else:

    <indented suite>

The first condition presented to the **if** is evaluated True or False.  If evaluated **True**, the indented suite immediately beneath is executed and it's done. Execution drops out of the entire block of code.

There can be any number of **elif** ("else if") statements and they are evaluated in top-to-bottom order. If any is **True**, the associated indented suite of statement is evaluated and it's done. Only when no **if** or **elif** condition is met will the **else** suite be executed[25].

The **elif** and **else** clauses are completely optional, and leaving them off is routine. 

Here's a simple example where we apply an the **if** alone to prevent a crash (technically, an "exception" caused by a ZeroDivisionError).

In [None]:
denominator = 0
numerator = 100
if denominator:
    print(numerator/denominator)

## Using for and if Together

Here's a program to demonstrate a more complicated application, this time coupling **for** and **if** expressions.  We introduce a few nuances in the process.

The object iterated over is a **tuple**.   A tuple is an ordered set of objects and is represented within parentheses - something like:  (0, 'bear' 1.1).

In his case we have a tuple-of-tuples.  The contained elements (1, 0) and (9, 3) are each tuples in their own right.  The outer tuple object bundles them:   ( (1, 0), (9, 3) )

We can iterate over the outer tuple to get:

In [None]:
for element in ( (1, 0), (9, 3) ):
    print (element)

A common operation to use within a **for** statement is called "unpacking".  You do not have to use it unless you want to, but we bring the topic up here so you'll recognize it when you see it.

Let's assume you have the tuple (9, 3).   You can create an assignment to split it up into constituent elements by providing a name for each element on the left side of an expression like this:

In [None]:
first_tuple_component, second_tuple_component = (9, 3)

print(f"first: {first_tuple_component}  second: {second_tuple_component}")

This will work on any Python sequence, like a string or a list.   You have to remember to provide just the right number of names on the left side.

Suppose our tuple components contained numerator, denominator pairs.   An easy way to parse these out is to do the "unpacking" operation in the header of the **for** loop.   This lets you save a couple of steps and - much more importantly - make code transparent enough for anyone to understand.

Here we take advantage of unpacking and the left-to-right operation of conditional statements to make a robust means of generating fractions.

In [None]:
for numerator, denominator in ( (1, 0), (9, 3) ):
    if denominator and numerator/denominator == 3:
        print(f"yay! We have a {numerator/denominator}!")

The while Statement
-------------------

The **while** statement is also pretty straightforward:

condition = <something>

while <condition>:

    <indented suite>

else:

    <indented suite>

The indented suite will be executed top-to-bottom forever until it's asked to stop or the condition is no longer **True**. Here's a simple example using a condition to halt execution. Note that the "sentinel condition" and counter are defined outside and before the loop[26].

In [None]:
stop_me = False
counter = 0
while not stop_me:
    print(f"The counter is now: {counter}.", end = '')
    if counter > 2:
        print(" Yo. I'm done!")
        stop_me = True     
    else:
        print(" I'm still trudging along.")      
        
    counter = counter + 1 

There are more elegant ways to proceed, however. A common idiom is to use the keyword **break** within the loop to terminate it immediately.  This can be combined with tautologically-true condition to streamline things. 

For instance, you could go:

In [None]:
counter = 0
while True:
    print(f"The counter is now: {counter}.", end = "")
    if counter > 2:
        print(" Yo. I'm done!")
        break
    else:
        print(" I'm still trudging along.")
        
    counter += 1

Not only is this simple to read, it doesn't require much in the way of resources to evaluate. This is not at issue with a simple application, but if your project involves millions of iterations e.g., a polling routine for a web server, efficiency becomes relevant.

You probably noticed that the **else** clause did not execute. The reason is that its indented suite runs only if the **while** statement terminates due to the condition becoming **False**. The **break** keyword causes the code that would do this to be bypassed.

The philotic twin of the **break** statement is **continue**. When **continue** is encountered, execution is immediately passed to the top of the loop where the condition is reevaluated.

The **else** clause is completely optional. Since we're using **break**, we'll get rid of the extra baggage and show how **continue** might be applied. Note the use of the comparative operator == (returns a Boolean evaluation of whether the two values are the same) and the shortcut way to increment the counter with the += operator[27].

In [None]:
counter = 0
while True:
    if counter == 1:
        print("One is the loneliest number that there ever was.")
        counter += 1
        continue
    print("The counter is now: {}.".format(counter))
    if counter:
        break

    counter += 1

Inner and Outer Loops
---------------------

The final point I would make here is that **break** and **continue** work on both **for** and **while** loops. And both work only on the inner-most loop (the one they reside inside of). 

Here's an example of a program with both types of loops:


In [None]:
stop_me = False
counter = 0
for i in range(10):
    if i%2: #odd numbers evaluate True:
        while True:
            if counter == 1:
                print("One is the loneliest number that there ever was.")
                counter += 1
                continue
            print(f"The counter is now: {counter } and i is now {i}.")
            if counter:
                break #breaks out of the inner(while) loop

            counter += 1
    if i == 5:
        print("I'm done - about to go on a break.")
        break #this breaks out of the outer (for) loop

Some Useful Logical and Binary Operators
----------------------------------------

Python comes with a typical set of comparative operators. Here's a quick
summary:

|     | Equal to              |
|:----|:----------------------|
| !=  | Not equal             |
| \<  | Less than             |
| \>  | Greater than          |
| \<= | Less than or equal    |
| \>= | Greater than or equal |
|<img width=150/>|<img width=150/>|

There is also a set of bitwise operators. These operate on the 1s and 0s in a number represented in binary format. You can create a binary representation of a number in Python by using the keyword **bin**, something like:

In [None]:
bin(128)

You can see that the result's first two characters are '0b', flagging it as a binary representation. You'll also note that the result is a string, which you can't do much with mathematically – you need to do operations <span class="underline">before</span> it's converted.

The operators for bitwise shifts are **\>\>** and **\<\<** and here's how you might apply them as you're creating the binary number. These are like changing the power of 10 to which you raise a base 10 number.

In [None]:
mybin_1000 = bin(1000)
fmt = "{:30} {:30}"
print("Base 2 operations:")
print(fmt.format("mybin_1000: ", mybin_1000))
print(fmt.format("mybin_1000 shifted left: ", bin(1000 << 3)))
print(fmt.format("mybin_1000 shifted right: ", bin(1000 >> 5)))

There are other operators you can use to "mask" a binary number with another. A common use is IP address masking when setting up subnets.  These are:

|        |                                                                     |
|--------|---------------------------------------------------------------------|
| a & b  | Both 1 -\> 1, otherwise -\>0                                        |
| a \| b | Both 0 -\> 0, otherwise -\>1                                        |
| \~a    | "flips" each bit. 1 -\> 0 and 0 -\>1                                |
| a ^ b  | if the bit in b is 0, use the bit in a; otherwise flip the bit in a |

This is just an aside, but you can use the "exclusive or", a.k.a. XOR, implemented with the "^" operator, as a cheesy encryption tool. That's because sequential applications simply flip the results. The first application encrypts the original and the second application undoes the encryption.

## Exercise


Create a program that asks the user to guess a number between 1 and 100 (whole numbers only). If the guess is wrong, let the user know if the guess is too high or too low. If the guess is right, offer hearty congratulations and exit the program. If, after five attempts, the user can't get the number then offer deep condolences and invite him/her to try again. Be sure to check for valid input and remind the user if necessary.


Hint: the random library has several handy pseudo-random number generator functions such as random.randrange(). You'll need to import random in order to use it. A good way to start might be:

    import random

    help(random)


Also, you might consider putting in some sort of logical that will "freeze" both the random number and the user's guess while debugging. That gives you a stationary target.

Collections
===========

Next, we'll investigate several of the built-in Python **sequences.** Sequences are composite objects (that is they contain zero or more separate objects). Sequence objects are aware of their status as such and have built-in methods to take advantage of that fact. For instance, all sequences are **iterators** (they know how to loop over themselves), they know how long they are, and can be indexed.

Creating Sequences 
------------------

In the code below, we will take a first-order look at several sequences, and how one might form them. Note that we can use data types such as **range** and **list** as methods to create new instances of said data types.

In [None]:
the_string = "strings are sequences with an encoding"
the_bytes = bytearray(the_string, encoding = 'UTF-8')
the_range = range(10)
the_list = list(range(10))
the_tuple = tuple(range(10))
print("String: ", the_string)
print("Bytes: ", the_bytes)
print("List: ", the_list)
print("Tuple: ", the_tuple)

Note that the bytearray's string representation is proceeded by a "b". That's just a visual indicator that the object is stored as bytes. Other annotations you might encounter include :
u (Unicode, in Python 2.x) r (asks Python to interpret \n \t, and other format characters literally).

As is the case with strings, it's possible to create several objects directly using the appropriate braces. This approach is called "duck typing" – from the interpreter's perspective, if it walks like a duck and quacks like a duck, it must be a duck. 

These all work:

In [None]:
another_list = ["what's", 'up', 'doc?']
print('List')
print(another_list)
print(type(another_list))
print()

a_dict = {'a':'eh', 'b':'bee', 'c':'see'}
print('Dictionary')
print(a_dict)
print(type(a_dict))
print()
      
my_tuple = (1,3,4)
print("Tuple")
print(my_tuple)
print(type(my_tuple))

Naturally, any of objects can be directly created with a constructor (using the object name as a verb). 

This can take the guess work out of the situation. Here's an example for the **set**[29] object:

In [None]:
favorite_set = set()
type(favorite_set)
print(f"An empty set:  {favorite_set}")   

favorite_set.add(1)
print(f"A one-element set:  {favorite_set}")

Python is not strongly typed (at first, anyway).  Unlike many languages, Python does not enforce variable declaration and homogenous types within a collection structure. In fact, it's very cavalier about use and reuse of variable names[30].

In [None]:
reusable = "green"
reusable = 1
reusable = ["some", "list", "of", 4, "elements"]
print(reusable)
print()

reusable = None
reusable is None  # (How to check for a None type.)
reusable          # (prints as 'None'.  In REPL, does not display anything.)
print('-' * 10)


Creating Index Values with enumerate
------------------------------------

Although you don't need to create index values for your loop, you can get some easily by using the built-in **enumerate** method.  It produces tuples of (index, object in iterable).

Here's how you might apply it:

In [None]:
fruits = ('apple', 'banana', 'kiwi')
for snack_and_index in enumerate(fruits):
    print(snack_and_index)

Here, we created a tuple of fruits. Within the **for** statement we applied **enumerate**. For each element of the tuple, **enumerate (**returned a **tuple** of (index, element\_value) ).

We can upgrade this a bit by applying some print formatting and by "unpacking" the **tuple** returned. 

Here's how might work:

In [None]:
fruits = ("apple", "banana", "kiwi")
start_at = 1
for index, snack in enumerate(fruits, start_at):
    print("fruit #{} is a(n) {}".format(index, snack))

The names "snack" and "index" have been successively associated with values contained within the **tuples** provided by **enumerate**, and are available within the indented suite. You can see that the names are "recycled" at each iteration. We provided an optional argument "start\_at" to select the first value of the index produced. If you don't provide the argument, the first index value will be zero.

Slices and Sequence Indexing
----------------------------

So far, we have looked at really small, manageable sequences where it's easy and cheap enough to iterate through all the elements. However, in real-life situations we're likely to encounter structures that contain millions of elements. If we have some *a priori* knowledge of where the element we're looking for lives, or if we want to apply something better than a brute-force search algorithm, we would want a more clever way to approach sequences.

Fortunately, Python supports the ability to "slice" a sequence using index values.  Slicing and indexing prove to be very important when wrangling array-like objects such as those in NumPy and Pandas.

For instance, you can go:

In [None]:
the_list = list(range(0,10))
print("Everything:", the_list)
print()
print("A slice:", the_list[0:6:2])

The general syntax is:

\<iterable\>\[ \<start\> : \<stop\> : \<stride\> \]

The start, stop, and stride parameters are all optional. Be default start is the first element of the sequence, stop is the last element, and stride is 1.

In the simplest form, the entire sequence will be produced:

In [None]:
the_list[:] # [::] works the same

Let's go back and look at the first slicing example. One might expect that the last element produced would be 6 – after all, isn't that what we requested with the stop parameter? Not so – the last element produced is the one <span class="underline">before</span> the stop parameter … something to keep in mind to avoid surprises[31].

Python also has a built-in **slice** object, which works just like the index specifications above. Note that it's applied with \[square brackets\], just like the hard-coded slice.

In [None]:
slicer = slice(0,6,2)

print("Slice object")
print(type(slicer))
print()
print("A slice:", the_list[slicer])

Using the **slice** object can make your code much easier to maintain because you don't have to fiddle with hard-coded index values. It can be especially useful if you're parsing data whose format is likely to change and over which you have little control (like text scraped from a web site).    

A few **slice** objects at the top of your code could make adjustments / updates a snap. This being said, if its specification is too "distant" from where it's being applied your code could become less transparent for human consumers.

The main difference between using a **slice** object and the hard-coded version is that all the parameters (start, stop, and stride) have to be supplied.  If you don't care to supply your own value, just use the **None** object as a placeholder and you'll get the default value.

Here are a few additional examples of how you might use slices:

In [None]:
the_string = "strings are sequences with an encoding"
the_list = list(range(0,10))
the_slice = slice(3, 10) # positions 3...9 inclusive

print("Slice : ", the_slice, "\n")
print("Slice a list: ", the_list[3:10], "\n") # typical
print("Alt. syntax : ", the_list.__getitem__(the_slice), "\n") # more verbose

print("The whole string: ",the_string[:]) # all
print("Slice a string: ", the_string[3:10])
print("Skipping: ",the_string[::2], "\n") # step by 2

new_list = the_list[:] # same as the_list.copy()
print("A clean copy, as a new object:", new_list)

Dictionaries in Python
----------------------

Python supports dictionary objects, known to some languages as "hash tables" or "hash mappings." The basic object, **dict**, is built into the language and variants of it, such as the **OrderedDict** are available in the **collections** library[32].



Basics
------

A **dict** object is another form of a collection – it has some important differences from sequence objects like the **list** and string. Among these:

-   Since it's a hash[33] (optimized for efficiency) the elements are
    not in any guaranteed order. In other words, the act of adding an
    element can change the order of the others.

-   Operations like slicing, indexing, etc. are not available because
    they're dependent on the ordering of the elements.

The **dict** is essentially a one-way lookup table. Given a unique key, one can efficiently find its associated value. However, since values are not necessarily unique you can't go the other way (you can't use a value to look up a key).

Here's some basic usage:

In [None]:
# Alternative syntax for creation
the_dict = {"key":"value", "A":2, "B":3}
same_dict = dict(key = "value", A = 2, B = 3)

print ("The dict:", the_dict, '\n')
print("Are the dicts the same?", the_dict == same_dict, '\n')

# One way to add a key:value pair 
the_dict['C'] = 3
print("We added a new key:value pair: ", the_dict, '\n')

# One way to extract a value if you know the key is good.
my_value = the_dict['C']
print(f"The value associated with 'C' is: {my_value} \n")

# One way to extract a value if the key is sketchy.
my_value = the_dict.get('D', "some default value")
print(".. and yet again:", the_dict)
print()
print("The value I got with a bad key is: '{}'.".format(my_value))

Using get
---------

 While it's completely OK to call a value out by name, if that value does not exist a **KeyError** exception will occur. The **get** method allows you to provide some default value in this case (or if you don't, a valid **None** object will be returned). 
 
 Here's a quick example:


In [None]:
plant_dict = {'raspberry': 'rubus',
              'elm': 'ulmus',
              'maple': 'acer'
              }

# create a format string
fmt_str = "The {} is more properly called a {}.\n"

# get with no argument
look_for = 'raspberry'
found = plant_dict.get(look_for)
print(fmt_str.format(look_for, found))

# get with a default value
look_for = 'alder'
default = 'SOMETHING. I dunno'
found = plant_dict.get(look_for, default)
print(fmt_str.format(look_for, found))

# get with default None
look_for = 'sumak'
default = None
found = plant_dict.get(look_for, default)
if found:
    print(fmt_str.format(look_for, found))
else:
    print(f"Dude, I'm looking for {look_for}, but I give up ;-)")

Keys are Immutable
------------------

The keys of a **dict** need to be unique and hashable. As a result, one of the requirements of a key is that be immutable[34]. To determine whether a proposed key is hashable, "under the hood" the interpreter looks for the **\_\_hash\_\_** "magic method", and uses the hash thus produced for internal bookkeeping. 

If there's no **\_\_hash\_\_** method, the proposed key won't qualify. As long as the key meets these requirements, it can be just about anything, though strings and integers are most commonly used.

Useful Dictionary Methods
-------------------------

Here's a little more code you can use to "kick the tires" on the **dict** object. Note the use of the **update** method of the **dict** object and the top-level **del**[35] method.

In [None]:
mydict = {}
mydict['team'] = 'Cubs'

# Another way to add elements us via update.  For input, just about any iterable t
#    that's comprised of 2-element iterables will work.
mydict.update([('town', 'Chicago'), ('rival', 'Cards')])
print(f"The dict is now:  {mydict}.\n")

# we can print it out using the items method (it returns a tuple)
for key, value in mydict.items():
    print("The key is {} and value is {}.".format(key, value))
    

# and evaluates left to right; this protects from crashes if no 'rival'
print("Let's get rid of a rival \n")
if "rival" in mydict and mydict['rival'] == 'Cards':
    del(mydict['rival'])   #top-level method

print("By the grace of a top-level function, the Cards are gone.\n")
print(f"The dict is now:  {mydict}")

You can extract the keys and values by the cleverly-named **keys** and **values** methods. These return iterable objects containing the appropriate values.

You will notice from the representations of the keys and values that they are their own special objects.  We can easily type-cast them to more usable objects using the constructor for a collection.

In [None]:
the_dict = {"key":"value", "A":2, "B":3}

keys = the_dict.keys()
print(f"Keys: {keys}")

vals = the_dict.values()
print(f"Values: {vals}\n")

print(f"Values as a list object: {list(vals)}")

Another way to remove elements from a **dict** is to use its **pop** and **popitem** methods. These not only remove elements, but return the element removed. 

Here's how you might employ these methods:

In [None]:
my_dict = {'team': "Cubs", "town": "Chicago", "rival": "Cards"}
print("dict is: {}\n".format(my_dict))


# Use popitem to get some (unknown) item (crashes if empty dict).
if my_dict:
    key, value = my_dict.popitem()
    print(f"We've removed: {value}.\n")
    print(f"The dict is now: {my_dict}.\n")

# With pop we can pick a key and use a default, just in case.
default = None
looking_for = 'bad key'
key_value_tuple = my_dict.pop(looking_for, default)
if key_value_tuple:
    print("{} is {}\n.".format(looking_for, key_value_tuple))
else:
    print("Sorry, no '{}'' here.".format(looking_for))

Sorting a Dictionary
--------------------
A word about dict sequencing.   It's important to note that most 'flavors' of Python have been using an ordered version of dict objects since Version 3.6.   That is the key:value pairs are stored in the order loaded into the dict.   Earlier versions allowed the key:value pairs to shift positions in the interest of efficiency.   If you need your code to run on older versions of Python, you'll want to choose a collections.OrderedDict object instead of the normal dict object.


Occasionally, you'll want to sort the contents of a **dict** by keys. This is a snap because they're guaranteed to be unique and have a natural "sort order."  We can accomplish sorting by keys succinctly by applying the top-level **sorted** method (this is not part of the list object).

You'll note that the we have chained operations together in the example below.  This is extremely common in Python and is considered a sign of elegant parsimony.   Python begins by solving the inner-most bit of a nested expression then feeds the result to the next-inner-most bit ... and so on.

It can be horrible for the new user, so we'll do it one step at a time first:

In [None]:
# The long way:
my_dict = {'team': "Cubs", "town": "Chicago", "rival": "Cards"}
keys = my_dict.keys()                           # the dict_key object
list_of_keys = list(keys)                       # a list version
sorted_list_of_keys = sorted(list_of_keys)    # a sorted version of the list
print(sorted_list_of_keys)

In [None]:
# The 'Pythonic' way:
print(sorted(list(my_dict.keys())))

Sorting by values is a little tougher, but here's how you can do it. You need to provide a short routine that returns a value if you give it a key. If you provide it to the **list** object's **sort** method[36], it will use the routine to guide its efforts. We'll get into functions more later. For now, know that the general form is:

<function\_name>():

    <indented suite>

It can take zero or more arguments and optionally return an object of your choice.

The **sort** command by default will sort the elements "naturally" – alphabetically if strings and numerically otherwise. You can influence that behavior by giving it something else besides the **list** elements to sort "naturally". It could be anything – a random number, word count, the last letter of a word, etc. Things will be sorted by whatever is returned by the sorter routine you write.

This is what a "null sorter" would look like (it returns exactly what we passed it):

In [None]:
def sorter(key):
    "simply returns what's provided i.e., contributes nothing"
    return key

We can invoke a list object's sort() method to sort the elements of the list.   Behind the scenes, it uses the null sorter to get the job done.    In other words, it uses the natural sort order of the stuff in the list.

You can roll your own sorter, if you want.   When you do, Python essentially feeds each element of the list into the routine you provide.  

Here's how it works: whatever is returned by the routine gets 'paired up' with each element.  The returned values are things that get sorted - not the original list contents.   When the sort is completed, the returned values go away leaving the list elements in the proper order.

Here's an example:

In [None]:
def sorter(key):
    "Creates a sort using the value associated with a dict key"
    return my_dict[key]

my_dict = {'team': "Cubs", "town": "Chicago", "rival": "Cards"}
key_list = list(my_dict.keys())
key_list.sort(key = sorter)

for key in key_list:
    print(key, my_dict[key])


Miscellaneous Notes on dict Objects
-----------------------------------

Many Python applications, such as Pandas, build user-friendly array indices using objects built on dict-like objects.   Essentially, one can specify row and column names in human terms.  The human terms get mapped to pointers and other computer-friendly objects.   In this way dict-like objects form a cognitive bridge between the human and computer worlds.

You can use dictionaries to hold elements of sparse arrays by using **tuple**s as keys. Let's say you wanted to model movements of goldfish in a tank. You could account for each cubic centimeter of water and monitor each fish passage in and out – you might use a dense 3-D array for that. Perhaps more efficiently, you could monitor the fish themselves by providing a three-**tuple** to represent the current location of each. 

This **dict** uses **tuples** for keys:

In [None]:
array_dict = {(0, 0, 0): 12, (2, 4, 98): 23}
array_dict[(2, 4, 98)]

Keys can be heterogeneous, as can the values. This is a valid **dict**:

In [None]:
hetero_dict = {'a': 1, (1,3,4): "x"}
hetero_dict

If a **dict** has a key named "a", and you provide a new element who's key is 'a', the original will be replaced.


In [None]:
new_dict = dict( ( ('a', 3), ('b', 88) ) )
print("Original:", new_dict)

new_dict['a'] = "Godzilla!"
print("Updated:",new_dict)

Python has an object that will automatically provide a new entry with a default value if someone tries a nonexistent key.  
The default dict takes some sort of a "callable" object - that's something like a function or method usually, but can be anything that can be accessed like a function.

Here's an example:

In [None]:
from collections import defaultdict

# Create a defaultdict and print it
dd = defaultdict(int)  # when int is called, it returns 0
print(f"original:  {dd}\n")

# Add something to it
futile_query = dd['aardvark'] 

print(f"The query returned: {futile_query}\n")
print(f"The dict is now:  \n{dd}\n")

As you can see the default dict now has a single entry with the default value of 0.

    {'aardvark': 0})

In [None]:
# Here's a low-budget word counter.  Keys are the words, values increment count for repeats.

# Create a defaultdict and print it
dd = defaultdict(int)  # when int is called, it returns 0
print(f"original:  {dd}\n")

for animal in ('dog', 'cat', 'dog', 'zebra', 'dog', 'dog', 'dog', 'cat'):
    dd[animal] += 1

print(f"Word counts:")   
for k, v in dd.items():
    print(f"{k:<15} {v:<15}")


In [None]:
from datetime import datetime
import time

# You can "roll your own" callable.  This provides a time stamp.


def callable_constant_factory(value_if_missing):
    # This lambda (anonymous function) returns what to use for a devault value
    timestamp = f"the time is {datetime.now().strftime('%M %H %S %f')}"
    return lambda: f"{value_if_missing:<5}: {timestamp:<30}"  # lambda: time.time()


# Loads the callable up with the value we want (can be anything)
dd = defaultdict(callable_constant_factory("hey!"))

# A simple lambda would also work
#dd = defaultdict(lambda: "hey!")

for animal in ('dog', 'cat', 'dog', 'zebra'):
    print("key: {}, value: {}".format(animal, dd[animal]))
    time.sleep(.01)

print("\nkeys:\n{}".format(dd.keys()))
print()

In [None]:
y = lambda x: x+1 
y(4)



## Memory Considerations
Adding elements one at a time can be expensive, especially as the **dict** grows. That's because the entire object needs to be copied to a new patch of memory if it grows past its allocated space. It's more efficient to use an *en masse* operation like **update**.

The set Object and List Comprehension
-------------------------------------

The **set** object is a mighty useful arrow in your quiver. A **set** is essentially a collection of unique objects. The **set** is another hash object, making retrieval of objects very efficient, even when the **set** is large.

The fact that a **set** contains only unique objects makes it great for an automatic de-duplicator. If you try to add an object that already exists, the attempted addition fails silently. 

Here's a common idiom to de-dup a **list**:

In [None]:
set([1,1,2,2,2,3,3,4,5])

Here's a slightly more complicated example, this one using a "list
comprehension" to create a set of random numbers using the random
library.    Here it is, quickly.  

Yes, it's horrible the first time you see it (very Pythonic), but we'll explain.

In [None]:
import random
values = 1000
dedup = set([int(1000*random.random()) for i in range(values)])
print("we found {} unique values!".format(len(dedup)))

In [None]:
# The long way, using the list append() method.

In [None]:
random_list = []
for i in range(values):
    rand_num = random.random()
    rand_int = int(rand_num * 1000)
    random_list.append(rand_int)
dedup = set(random_list)

**Sets** have built-in methods for efficiently finding unions and intersections. A union is a **set** of every unique element from both sets combined. An intersection is a **set** of the elements found in both. This code serves to illustrate how you might apply them using the "**&**" and "**\|**" operators:

In [None]:
the_set = {"monkey", "gorilla", "dog", "cat"}
print("Set: ", the_set)
the_set.add("parrot")
print("Set: ", the_set)
other_set = {"gorilla", "elephant", "pig", "chicken"}
print("Set intersection:", the_set & other_set)
print("Set union:", the_set | other_set)

Note that although this example uses the "\|" and "&" operators with two **set** objects, there's no limit to the number of objects in play. If a, b, and c are all **sets**, you can go a&b&c or a\|b\|c.

Named Tuples
------------

**The namedtuple** object is something of a hybrid between a **tuple** and a **dict**. Each **namedtuple** can be instantiated into your current namespace as a 'normal' **tuple**, but the wrinkle is that you can provide each element an alias (nickname) so you can refer to it easily. It's not part of basic Python so you need to **import** it from the collections library.

Here's some code to play with which shows the basic functionality. Note that you can address these objects like dictionaries, but can also use them as **tuples**.

In [None]:
"""An introduction to the namedtuple object"""
from collections import namedtuple

#make a tuple with the tag 'Animal'
Animal = namedtuple('Animal', ('species', 'name'))

#specify this animal by providing the species and name
a1 = Animal('gorilla', 'magilla')
print(a1)

#... and specify another
a2 = Animal(species = 'gorilla', name = 'fred')
print(a2)

#we can call out the specifics using dot notation
print(a1.name, a2.name)

#... or, split the tuple containing the first animal
myspecies, myname = a1
print(myspecies, myname)

Getting Fancier with the namedtuple Object
------------------------------------------

Here's a slightly more complicated example – something you might use if you were developing a chess game in Python. We won't elaborate this to completion – but it may be interesting practice. If you do the upgrade, you might look into using Unicode to represent your chess pieces[38] (Python 3 is Unicode-compliant[39]).

This code develops a two-dimensional grid using a **list** of **namedtuple** objects. This is a simple way to whistle up a 2-d array-like structure using only basic Python tools.

You can see that each of the **namedtuple** objects carries with several properties, nicely bundled, and transparently addressed. In this sense, the objects we're creating here are a bit like the build-in Python objects – they're already endowed with "out of the box" capabilities.


In [None]:
from collections import namedtuple

Piece = namedtuple('Piece', 'type color position symbol')

# a list of lists is a really useful data structure
grid = []
for r in range(8):
    grid.append([None, None, None, None, None, None, None, None])

grid[7] = [
            Piece("Rook", "black", [7, 0], "R"),
            Piece("Knight", "black", [7, 1], "K"),
            Piece("Bishop", "black", [7, 2], "B"),
            Piece("Queen", "black", [7, 3], "Q"),
            Piece("King", "black", [7, 4], "K"),
            Piece("Bishop", "black", [7, 5], "B"),
            Piece("Knight", "black", [7, 6], "K"),
            Piece("Rook", "black", [7, 7], "R")
            ]

grid[6] = []
for c in range(8):
    p = Piece("Pawn", "black", [6, c], "P")
    grid[6].append(Piece("Pawn", "black", [6, c], "P"))
    
for p in grid[7]:
    print(p.symbol, end="")
print()    
for p in grid[6]:
    print(p.symbol, end="")

Copying Sequences
-----------------

In Python, it's possible to have multiple names associated with the same object. This can be really handy but the convenience comes with a cost: the only way to tell for sure is with the object's **id**. 

Here are a couple trivial examples that show the potential pitfalls:

In [None]:
one = 1
one_prime = 1
print("one:", id(one), "one_prime", id(one_prime))

if id(one) == id(one_prime):
    print( "Same object.")
else:
    print("Different objects.")


You run into this with collections, too. This code will demonstrate:

In [None]:
def test_lists(first, second):
    test_result = first == second
    print("same list, right? {}.".format(test_result))

    if test_result == False:
        print("first id is {} second id is {}."\
        .format(id(first), id(second)))

print('Make a list, then create a copy')
a_list = ["Tony", "Sally", "George"]
b_list = a_list
print(a_list, b_list)
test_lists(a_list, b_list)

#update the first list, but not the second.
print("\nAdd someone to the a_list, leaving the b_list alone")
a_list.append("Sam")
test_lists(a_list, b_list)

print("\nUh, what happened here?")

"What happened here" is that we have simply assigned two names to the same object (they have the same **id**).  Then we used one of the names to access, and change, the original object.

Having multiple names for the same object is not too weird when you think about it.   The president is called "the President", "POTUS", "Commander-in-Chief" ... then you can read political columns to get even more names :-)   

The point is that there can be many synonyms - "aliases" or "names" for any object because any of them are unique keys tied to the same dict object (that's just how Python keeps track of its namespaces) 

To get around this, you need to make a "deep" (true) copy. Here are some ways to do it:

In [None]:
new = old[:] #forces the list to iterate over itself
new = list(old) #list constructor creates new object
new = old.copy() #uses the list object's copy method

One place this comes up in data science is working with slices of data structures.   Slices are really just aliases to part of an original object, much like a "view" into a database table provides access to part of that table.   When working with Numpy and Pandas, a very common way to hack into the relevant data involves creating a slice then manipulating it.  Numpy won't give you a warning, but Pandas will.  Unless you're really sure what you're doing, you will want to pay close attention and avoid the temptation to ignore or disable the warnings.

## Exercise
I just hired a new admin and it's not working out so well. I asked her to organize my Cubs player database. It turned out to be a disaster. After she quit, I found two versions of the database. Can you help? 

One version of the database (stored as a dict) uses the players' weights as the key. 

I'll spare the details, but it looks like this:

In [None]:
players = { 185:('Tyler', 'Chatwood'),
            219:('Luke', 'Farrell'),
            190:('Kyle', 'Hendricks')
          }

Can you find a way, working only with code and this dictionary, to produce a nicely formatted table that's sorted by the players' last name?
The other version was only a little better and organized by last name – sort of. There were lots of duplications caused by bad typing. 

Here's a bit of it:

In [None]:
players = { 'Hendricks': ('Kyle', 225),
            'HeNDricks': ('Kyle', 225),
            'Hendrix': ('Kyle', 225) 
          }

Can you find a way to clean this version up working with the set object?

It's beyond the scope of this class, but just for fun you might look at the code in py_fuzzy_lookup.py. For ways to further refine. It demonstrates a tool that will let you find and score "fuzzy" (inexact) matches.

Functions
=========

Functions are just bits of code that encapsulate – you guessed it – functionality. We'll explore them in some depth in this chapter.

You've already used several of them like **print**, **input**, and **bin**. You've used them with commands like **list.sort** – though, technically when contained within classes, they're called "methods". And we've just written a small function to sort **dict** objects by value.  You're already a getting to be a pro. You'll become more so by rolling a few of your own.

Scope of Names In a Module
--------------------------

Here are a couple example functions that each take a single, positional parameter. The first is pretty straightforward, and the second introduces the **global** keyword.

In Python (and all languages) there's a notion of "scope" – that's the part of the code where a variable is visible. Indentation levels are a good gauge of scope within a module. The objects defined in the first column (setStar function and the variables STAR, Favorites and X) are visible throughout the module and are called "globals"[40]. Variables tucked away within a function e.g., title\_length in section are visible only within that function, normally. These are called "locals"[41].

If you want to enhance the visibility of the variable and make it global to the module, you can employ the **global** keyword, as is done in SetStar for the variable STAR. Once you do that, you facilitate "two way communication" with that STAR object and allow the function to alter the value . If you don't do that, the function can only alter the "local copy" of STAR[42].

In [None]:
# Globals namespace
STAR = "Sirius" # Polaris
Favorites = [ ]
X = 100

def setStar(name): 
    """ Contains a local namespace just for this function."""
    #global STAR  #<---remove comment to map to global namespace
    STAR = name
    Favorites.append(name)
    print("local STAR: ", STAR)
    
setStar("Polaris")
print("global STAR: ", STAR)#output

Passing Information into a Function
-----------------------------------

You have a tremendous amount of flexibility around how, and whether, to pass information into functions. You can "roll your own" from some combination of fixed positional arguments (a contract between you and your function to provide a precise number or arguments); positional "wildcard" arguments (0 to a zillion arguments); specific key / value pairs; or "wildcard" key / value pairs (0 to a zillion). Here are some examples of the options available to you:

The simplest of all functions can be constructed like this:



In [None]:
def simple():
    "a docstring, or the keyword pass"

You need the keyword **def**, some name, enclosing parentheses and a colon in the header; a nominal indented suite following. This function takes no arguments and returns **None**. Not so interesting, really, but syntactically-correct. You can upgrade in a variety of ways. If you have only named positional parameters as with this constructor:

    def positional_only(input1, input2, input3):

Here you promise three inputs – no more and no fewer. Sometimes, though, you really don't know how many bits of information you'll get and want to generalize. Python provides flexibility when the number of arguments is unpredictable. This constructor requires one positional argument then zero or more additional ones:

     def positional_plus(input1, more_inputs): 
 
The "\*" signals the interpreter to expect a tuple of unknown size, and to create a local name "more\_inputs" tied to a **tuple** object (you don't use the \* inside the function). The **tuple** will scoop all the additional arguments provided beyond the first one (that's assigned to input1). If only argument is provided the **tuple** will be empty.

If you want to use a **dict** as input, you can construct the function
like this:

    def dict_only(a = None, b = None):

This is a handy idiom because objects a and b are always created when the function is called. That makes them optional – neither, either, or both can be used when calling it. Another way to use a **dict** is to use a placeholder for one with a constructor that looks like this:

    def dict_placeholder(**input_dict): 

When you do this, you create a local name "input\_dict" that's tied to the key:value pairs provided when the function is called (you don't use the \*\* internally). Let's take a look at a couple more examples.

In [None]:
def eats(*foods): # gather positional args in a tuple
    print("foods: ", foods) # foods is a tuple now
    
    
print("Tuple of positional arguments to eats():")  
# open-ended number of positional arguments passed in...
eats("Spaghetti", "Oysters", "Chili", "Crackers", "Rice")

def example(*args, **kwargs): # keyword args dict
    """
    (* ) convert positionals -- tuple
    (**) convert keyword args -- dict
    """
    print("\nPositionals:")
    for arg in args: # loop over the tuple
        print(arg, sep = ", ", end = " ")

    print("\nKwargs")
    for key, value in kwargs.items(): # ...now the dict
        print("Arg name:", key, "Value: ", value)
    
# positional + keyword (named) arguments
example( 1,2,3,4, on_vacation = True, at_work = False )

# same thing using "exploders" * and **
example( *(1,2,3,4), **dict(on_vacation = False, at_work = True) )

As you can see, you've got plenty of options here. This being said, there are a couple of constraints. The order is important – the positional arguments need to come first, then the **tuple** of arbitrary size, then the dictionary. As soon as the interpreter hits the **tuple** argument, it's game over for positional arguments. As soon as it hits the dictionary, it's game over for both positional arguments and **tuples**.

Returning Information From a Function
-------------------------------------

A function does not have to return anything. If it does return something, you don't have to give the returned value a name or otherwise use it. 

Personally, I think it's a good idea to return something – even if it's just the number 1 to show "Yup, I executed" or a -1 to show "Uh, there was a problem." But it's up to you – without instructions in this regard a function returns the **None** object.

Finally, in Python you can only return one object. This isn't necessarily an issue because the object can be as complex as it needs to be to fully "pass the baton" to the next routine. The **dict** is my "go to" object for complex returns – mostly because it can be as complex as it needs to and, properly done, completely self-documenting.

First Class Objects
-------------------

Functions are known as "first class objects" because they can be used just as classes and other top-level objects in Python. For instance, they can be passed as arguments to other functions without issue[44].  This is really powerful because you can separate responsibilities cleanly between separate, isolated bits of code. As a result, different teams can work on the individual functions and atomic tests can be written against the capabilities promised by each function. All this makes a "divide and conquer" approach to development a snap and – perhaps more importantly – can keep code easy to maintain as requirements change.

Here's how you might use functions' "first class" status Python to take a foray into functional programming. For this example, imagine we have the "F team" in Florida and that it's never met the "G team" in Georgia.  Each has been working 24/7 on its task, has written tests, and otherwise has things dialed in. F has perfected the art of doubling a value and G has nailed adding 2 to a value.

As consumers of these efforts, we can "stand on the shoulders of giants" and repurpose / combine the efforts into our own project. 

Here's an example:

In [None]:
def compose(g, f):
    """Take two functions as inputs and return a
    function that's their composition"""
    
    def newfunc(x):
        return g(f(x))
    
    return newfunc
        
def G(n):
    return n + 2
    # input function
    
def F(n):
    return n * 2

# compare:
H = compose(G, F) # build a 3rd function from 1 &amp; 2

#print("G(F(x)):", H(100)) # G(F(x))

# ... now with
#H = compose(F, G)
#print("F(G(x)):", H(100)) # F(G(x))

## Inner Functions

It's possible for a function to contain other functions called "inner functions." In this case, not only is there an inner function, but that's what gets returned. The object that gets returned is "loaded for bear", retaining the information originally passed into **addLetters**.

In [None]:
def addLetter(letters):  # -- pass in a string
    """
    A function factory builds and returns function objects.
    L is a function that will add whatever letters are passed
    in to be the ending letters.
    """
    def L(s):
        return s + letters
    return L


# These are functions (versions of the inner
#    function L() returned from addLetters()
add_s = addLetter("s")
add_ed = addLetter("ed")

# Then we can execute these functions like any others
print(add_s('Unhinged rant.'))
print(add_ed('In an unhinged fashion rant'.))

Closures
--------

A closure is an inner function with a memory. Let's say that you have on good authority that the meaning of life is 42. You want to stash that pearl of insight away and be prepared to tweak it as life gets more interesting and the universe evolves.

You might want to create a Python function to which you can provide the original value. Then you might ask the main function to return another function that accepts your changes. 

Here's how you might do it:

In [None]:
def outer_space(outer_input):
    "outer-most function"
    last_answer=outer_input
    def inner_space(inner_input):
        "inner-most function"
        nonlocal last_answer
        last_answer += inner_input
        return last_answer
    return inner_space

MEANING_OF_LIFE=41
original = outer_space(MEANING_OF_LIFE)
print("The original meaning of life is {}".format(original(0)))

tweak=3
new_meaning = original(tweak)
print("But now we think it's a bit more: {}".format(new_meaning))

next_new_meaning = original(tweak)
print("... and now: {}".format(next_new_meaning))

As you can see, the inner function retained some "institutional memory", encapsulated it, and made itself available for further interaction.

The **nonlocal** keyword, introduced here, acts to make a variable in an inner function available to the outer one. This is precisely the relationship that the **global** keyword in the outer function has with the containing module.

 Python's Take on Map, Reduce and Filter
---------------------------------------

Python has some limited built-in capability to handle map/reduce operations. It can easily perform the same sort of analysis possible with big data analytical tools like Hadoop, but just not at the same scale. (Hadoop has its own way to manage workload and file systems – that's where its real magic lies).

In a nutshell, a map operation is one that iterates through a bunch of objects and does the same thing to each one. A reduce operation sifts through the mapped objects and filters out the "good stuff." Generally speaking, the **map** function is run once then sifted through multiple times by different reducers.

To implement mapping in python, we can use the **map** function. Its syntax is straightforward:

mapped\_data = map (\<processing\_function\>, \<iterable\>)

To filter data, we can call upon the **filter** function. It works pretty much like map, but the processing filter passed judgment on each element of the iterable objects by evaluating it **True** or **False**.  The result is another iterable of only those that have been deemed **True**. The syntax is identical to that of **map**:

filtered\_data = filter(\<processing\_function\>, \<iterable\>)

Here's an example of how you might use **map** and **filter** to find even sums of integers provided as **tuples**. These methods are interesting because they each take two arguments: incoming data and the name of the routine to process the data. Each returns a generator-like object. This means they don't process all the data at once. Instead, they proceed one step at a time through their task, then only when requested. This sort of "lazy execution" makes handling of even large data sets possible because only the bits of it that are being processed need be in the memory.

Python also has a **reduce** method – it works pretty much like **map**, except the operations defined in the function are applied cumulatively.  Note that it's implemented using a **lambda** expression[45] (an anonymous "one liner" function that can be used in place of a standard function).

In [None]:
from functools import reduce


def add_numbers(things_to_add):
    "adds two numbers"
    first, second = things_to_add
    return first + second


def find_evens(thing_to_evaluate):
    "returns True if even"
    return not thing_to_evaluate % 2  # True if 0, False otherwise


integers = [(2, 2), (4, 4), (5, 6), (7, 8)]

mapped_data = map(add_numbers, integers)
filtered_results = filter(find_evens, mapped_data)
print("The even sums are {}".format(list(filtered_results)))

cumulative_mult = reduce(lambda x, y: x*y, [1, 2, 3, 4, 5])
print("reduce returned: {}".format(cumulative_mult))

Function Dispatch
-----------------

Until Python 3.10, Python didn't have a statement like **case** or **switch**.   One day, you'll be able to switch to 3.10 for all your data science needs, but until the main distros (Anaconda, etc.) and the underlying data science building blocks are all updated you will likely be using an earlier version of Python.

For better or worse. However, there are other ways to get the same functionality.  We've already seen that a complex **if** .. **elif** .. **else** structure can accommodate this. However, this can get unwieldy and hard to maintain.

An alternative, and potentially more robust, approach involves mapping functions to a dictionary, which is then used to dispatch the function.  The following example shows how you might implement such a beast and, for good measure, introduces the **random.choice** method.

As you review the code, imagine how much easier it would be to maintain than strictly branching logic. Let's say your client suddenly asked you to add a new capability – **str**.**title** how could you add it while minimally touching the existing code[46]?


In [None]:
#py_function_7.py
from random import choice
def f_upper():
    return str.upper
def f_lower():
    return str.lower
def f_swap():
    return str.swapcase

# Use a dict object to match names (like 'up') to one of your functions
function_mapper={'up': f_upper, 'lower': f_lower, 'swap': f_swap}
choices = list(function_mapper.keys())

my_string='Lions and Tigers and Bears, Oh MY!'
for i in range(3):
    mychoice=choice(choices)
    print(function_mapper[mychoice]()(my_string))

The **random**.**choice** method is a convenient way to make a (pseudo) random choice from a collection of options.

The **print** is a bit of a mouthful, so let's break it down.

Note that most of what we're doing is passing around the name of a function, like **f\_upper** or **str.upper** without actually executing it. This allows fairly succinct code because we can chain operations together in a single line. Is the virtue of being compact outweighed by the vice of being tough to decipher?


In [None]:
mychoice = 'up'  # Just to force a choice

print(f" mychoice: {mychoice}")
print()


# String version of the command then the  output.
print("function_mapper['up']")
print(function_mapper['up'])  
print()

print("function_mapper['up']()")
print(function_mapper['up']() )  # the () is the execution operator.  f_upper returns a function.
print()

print("function_mapper['up']() (my_string)")
print(function_mapper['up']() (my_string) )  # this feeds mystring to the function
 


## Exercise

Awesome job so far, folks! This has been a long session, but functions (and methods in classes) are one of the most important building blocks of any serious programming effort. Let's try out your new skills on a couple of problems.

Python's built-in title casing routine has much to be desired. Check this out:

In [None]:
lst = [ "shot in the dark",
        "guido van rossum",
        "monty python's life of brian"
      ]
for item in lst:
    print(item.title())

In the first instance, a really short word got capitalized. The second will drive much European nobility mad because "von" and "van" are generally lower case. And the first letter after an apostrophe is almost never capitalized.

Please see if you can write a function to address these issues – I'm sure you can do better than str.title! W

While you're about it, see you can break the task down into small, atomic bits. That'll make it easier to test and maintain. You might use a series of functions something like:

    process_list() 
        process_title() 
        process_word() 
            process_apostrophe()
            process_royalty()

Feel free to grab code from solution _python_1_chapter05_starter_code.py if you'd like. Many of the mechanics have been worked out, but it would benefit from reorganization.

Modules and Libraries
=====================

Python has several built-in object types that provide a versatile and "off the shelf" collection of tools for handing information you'll encounter in real-world situations. This chapter discusses and demonstrates a few of the libraries (modules) that you can **import** into your code. We'll look at just how you can access these libraries while managing how they work and play with the applications you are developing.

Importing Libraries
-------------------

As you've seen, the way to access libraries to extend Python's core capabilities is to **import** them into your program's namespace. Here are some examples:

In [None]:
import time
from decimal import Decimal, getcontext
from fractions import Fraction as Q # rational number
from datetime import datetime, timezone
from collections import namedtuple

 Basic Use of import
-------------------

If you examine these commands carefully, you'll note several variants.  The first example adds the module **time** into your code. We can use the **dir** keyword to capture the namespace. If used without arguments, it grabs the global namespace; with an argument, it reports on the namespace of the object presented in the argument.

Here, we show the results from executing the **dir** command against a fresh Python shell in a command line interpreter interface.   The results will be the same in a Jupyter notebook, except there will be some extra Jupyter-specific entries.

Initially we have a very parsimonious global namespace:

    dir()
    ['__builtins__', '__doc__', '__loader__', '__name__', '__package__']

When we use the **import** keyword to bring in an extra module, the name of that module is added to our namespace:

    import time
    dir()
    ['__builtins__', '__doc__', '__loader__', '__name__', '__package__', 'time']
    
When Python first opens, it has access to the global namespace and everything in the '__builtins__' module (that's automatically imported).    If we take a quick look at the contents of __builtins__, we'll see keywords at the bottom.  These include the variable types we've seen to date. At the top we'll see mostly exceptions (errors and warnings) along with a few more keywords like True, False, and None.

In [None]:
", ".join(dir (__builtins__))

The act of importing the new module doesn't directly bring all the elements of the **time** module's namespace into your module – you only have access to **time**. The **time** module, however, has its own namespace. 

So what is a "namespace", anyway?   You can think of it as a dict object.   The names are the keys and the values represent the executable code associated with the names.    If you invoke a command like **dir**, Python accesses a dictionary, finds the associated code and executes it.    The act of importing a module is actually very light weight - all you're really doing is updating the master dictionary used by your program with pointers to other resources.

You can take a look at the namespace of the **time** module by using the **dir** function with **time** as the argument.

In [None]:
", ".join(dir (time))

Dot Notation 
------------

While it's isolated from that of your own namespace, you can still access the new elements. You can do this by using "dot notation."    Here, we access an attribute called 'timezone':

In [None]:
time.timezone

Here we'll access the method called 'time' within the time namespace.   We can do that because the first "time" refers to an object available to our global namespace; and the second "time" refers to an object within the module's namespace. 

Because time.time is a method we see only the text representation of the object ... that is until we execute using the () operator.

In [None]:
time.time

In [None]:
time.time()

The "dot" delimits namespaces, which may be configured hierarchically, and allows your code to be specific in terms of exactly which object it addresses. In the code above, you'll see that within the **time** module there is a built in function of the same name. This creates no conflict for our own module because only the name of the newly-imported module is visible.

Because of this separation it is completely safe to import several different modules without being concerned about whether object names from one can "overwrite" those of another.

It's also possible to reach into an external module and import only the bits we want into our local namespace. Observe the effects of importing the **Decimal** class and the **getcontext** function. After the **import**, we now can access to the external objects on a "first name" basis.

In [None]:
from decimal import Decimal, getcontext
Decimal

In [None]:
getcontext

 Renaming Object When Importing
------------------------------

A slight variant of this is to choose a name for the imported object that suits your purpose. Python is agnostic relative to the name you choose, within its basic rules.  

Here, we import the Fraction object from the fractions library.   We import it again, giving it the name "Q".   Essentially, we've created two entries to our main namespace dict for the same object.   We can access it by either "Q" or "Fraction".

In [None]:
from fractions import Fraction
from fractions import Fraction as Q

In [None]:
same_object = id(Fraction) == id(Q)
if same_object:
    print(f"Fraction and Q are aliases to the object at {id(Q)}.")


It might make sense to import an object by a specific name to avoid a namespace collision, or to follow a conventions adapted by your teammates. For instance, it's typical to use **numpy**, **pandas**, and **seaborn** by going:

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns

This is done mostly for brevity. It's much easier to type "np" than "numpy" - especially if you have to do it thousands of times.

Avoid renaming imported objects unless you've got a good reason to do so because it could confuse your team. For instance, you can do one or both of these:

In [None]:
import string
string.ascii_uppercase
'ABCDEFGHIJKLMNOPQRSTUVWXYZ'

In [None]:
import string as elephant
elephant.ascii_uppercase
'ABCDEFGHIJKLMNOPQRSTUVWXYZ'

… however, if you did the latter you could sow much confusion.

Also, while it's possible to import the entire namespace of a library into your code, you typically don't want to do so. In this example we import the entire **string** module's namespace using "star notation":

In [None]:
import string
from string import *

# Namespace of string
string_module_namespace = ", ".join(dir(string))
print(f"The namespace of the string library is: \n\n{string_module_namespace}\n")

# ID of string.digits versus digits
same_object = id(string.digits)==id(digits)
if same_object:
    print("The string.digits from the string module namespace is the same object as digits in the global one,")
    print(f"known under the hood as {id(digits)}")

In [None]:
", ".join(dir(string))

While this might be manageable, what if you were already using a variable called "digits"?

In [None]:
digits = ['thumb', 'pointer', 'middle', 'ring', 'pinky']
digits

In [None]:
from string import *
digits

That's trouble, right?    The message is that you want to be a bit cautious.  It's certainly easier to type in a single name like 'digits', but sometimes a more verbose name is actually better.  For instance, anyone reading the code will recognize 'string.digits' for what it is - it's got provenance, after all.   Transparency matters.

Sometimes it makes sense to use "star notation" when you're importing only one outside library and using its tools exclusively. This is often the case when you are working with GUIs because all the components you need are included in the imported module – there almost no chance of namespace collisions, even when you're part of a collaborative effort.

File System Based Namespaces
----------------------------

Python has the ability to use the file system to create namespaces for modules and it works just like what we've seen for ordinary objects.  Building packages is beyond the scope of this course, but for now know that you can have a directory structure like:

    main_application
        __init__.py

    subdir_1
        __init__.py
        module.py

    subdir_2
        __init__.py
        module.py

If main\_application can be discovered by your app, then you can go:

    from main_application import subdir_1, subdir_2

… which gives you access to subdir\_1.module and subdir\_2.module. The two identically-named module names are each tucked in behind their directory name and will not present namespace conflicts in your app.

On a related note, what makes main\_application discoverable by Python?  When you import a module, Python looks at the contents of **sys.path** – an ordered list of all the directories on its search path. It's comprised of normal, default locations and whatever might be in your PYTHONPATH environment variable. 

The element, and the first place the interpreter looks, is in your current working directory. What this means – and this has caught many an unwary programmer – is that if you name your module the same as a system package, then when you try to import a Python module you will get your own instead. Then you will go insane trying to figure out why your imported module doesn't work as expected.

As a result, you'll want to be a little careful about choosing names. If you're considering 'string.py', for instance, you might first want to attempt to **import** **string**. If the operation succeeds, 'string.py' is already taken, if your get an **ImportError** then you're safe. 

Bear in mind that installation of new packages or editing the PYTHONPATH can change things.

As you'll see, the libraries available in Python's standard library and general ecosystem provide highly-leveraged ways to extend the already-powerful capabilities. 

Time-related Objects: time, datetime, and calendar
--------------------------------------------------

Since we're on the topic of using imported libraries, let's take a look at Python's three principle libraries providing dealing with time. Each has its own strengths, weaknesses and capabilities. We'll go through some of the capabilities of each here.

Before we jump in, there are some things to be aware of:

-   Computers "think" of time in terms of the elapsed seconds since an
    agreed-upon point in time called an "epoch." For most POSIX
    (Linux-like) systems, the epoch began on January 1, 1970. For
    Windows, it's January 1, 1601[47]. Typically, this won't matter
    because Python is OS-agnostic.


-   Since different parts of the world are in different time zones, we
    often think of what time it is in Greenwich (London), England. This
    is known by various names such as GMT, UTC and Zulu.


-   However, your computer (or AWS slice, or server) may "think" in
    local time. Local time is a little weird. Time zones can shift
    because of local decisions. Daylight savings time can vary
    county-by-county, and state-by-state. "Summer time" in Europe and
    elsewhere doesn't necessarily synch with U.S. daylight savings time.
    UTC is reliable.


-   There are two "flavors" of elapsed time available. One is told by
    the clock on your wall – it's an objective measure of the time you
    experience. The other is the elapsed time the CPU works on your
    program. You'll want to be sure of which you're using when it's
    material (such as when your application is running on a busy machine
    the clock time isn't a great measure of how efficient your program
    is.

Time Objects
------------

Here are a few lines of code that use some of the basic functionality of time. This code grabs both the clock time and CPU time and demonstrates use of the **sleep** method. You'll note that these are reported to different levels of precision and produce slightly different results. 

**time**.**process_time** reports how long this script has been running while **time**.**time** reports the number of seconds that have elapsed since the dawn of time.

In [None]:
import time
print(time.process_time(), time.time())
time.sleep(1) #seconds
print(time.process_time(), time.time())

 Datetime Objects
----------------

For most purposes, you'll be working with **datetime** objects provided from the library of the same name. A **datetime** object is a fairly easy-to-read tuple-like structure, and it's straightforward to extract information.  

The Datatime object is everywhere in computing, but you should be aware that the implementation can vary slightly, even among Python modules.

Here's an example of how you can use datetime:

In [None]:
from datetime import datetime
d = datetime.now()
print(d)
print("the hour is: {}.".format(d.hour))
print("the year is: {}.".format(d.year))

In [None]:
from datetime import datetime, timezone
now_here = datetime.now()
print("Now, this timezone: ", now_here)

In [None]:
# Datetime is timezone-aware
now_uk = datetime.now(timezone.utc)
print("Now, in England: ", now_uk)
the_date = now_uk

In [None]:
# Dates can be used in a "mathy" sense
print("Days since 01-01-0001: ", the_date.toordinal())
epoch = datetime(1970,1,1, tzinfo = timezone.utc)

In [None]:
print("01/01/1970 timestamp : ", epoch.timestamp())
print("Now in English timestamp: ", now_uk.timestamp())

In [None]:
delta = now_uk - epoch
print("Time delta in seconds: ", delta.total_seconds())

For all its charms, **datetime** produces fairly ugly output, left to its own devices. Fortunately, there's an easy way to customize is using the "string from time" functionality, known as **strftime**[49]. The basic syntax is:

\<datetime object\>.strftime(\<format string\>)

A simple example follows.   Note that we're able to combine Python formatting with strftime formatting:

In [None]:
from datetime import datetime

date_format = "%d, %b %Y"
now = datetime.now()
print(f"Hello! Today is {now.strftime(date_format)}.")

If you want, you can include other characters in the format string. In the next example, we provide punctuation (":" and "-" characters) to make something like we'd see in a log file (easy to sort chronologically).

In [None]:
from datetime import datetime
now = datetime.now()
exact_format = "%Y-%m-%d %H:%M"
print(f"Or, more precisely, {now.strftime(exact_format)}.")

There are many formatting strings available. You can download a "cheat sheet"[50] in the likely event you don't want to memorize them.   Or you can take a quick look at strftime.org.

Working with Calendar
---------------------

Python also has a time-related function that knows how to print nicely-formatted calendars, figure out the day of week, keep track of leap days, etc.[51].  

This is sort of a "toy module" because serious packages have more heavy duty ways to manage calendar functions.   If you want something quick and dirty with minimal overhead, you may be able whistle up something suitable we with a couple of commands.


Here's a brief flyover of some of its capabilities[52].

In [None]:
import calendar
#create a TextCalendar instance
cal = calendar.TextCalendar()
print("We just produced a {}.\n".format(type(cal)))
calendar.prmonth(2025,4)  #April, 2025

In [None]:
#what day of the week was I born?
birthday_year = 1957
birthday_month = 5
birthday_day = 10
birthday_day_of_week = calendar.weekday(birthday_year, birthday_month, birthday_day)
birthday_dict = {0:'Mon', 1:'Tue', 2:'Wed', 3:'Thur', 4:'Fri', 5:'Sat', 6:'Sun'}
print("I was born on a {}.".format(birthday_dict[birthday_day_of_week]))

Introduction to Python's "Middleware" Libraries
-----------------------------------------------

In the spirit of continuing a discussion of adding capability beyond Python's core repertoire, we might consider some of Python's built-in modules **sys** and **os**. We'll discuss their basic contributions here. I would encourage you to explore these on your own using **dir** and **help** as your investigation tools.

-   The **sys** module contains information about the particular Python
    installation you're working with and how it's installed on your
    local OS. This is where **sys.path** (the **list** of directories
    the interpreter uses to find imported modules).
    

-   The **os** module contains a large repertoire of "middleware" that
    operates between Python and your local OS. It keeps track of things like
    the correct path separator to use.

Here are a couple examples from my systems. This is what I get from a Windows box:


And this is the same information from one of my Linux virtuals:

The tools in **os** are really important if you want to maximize the portability of your code from platform-to-platform. The last thing you want to do is put in conditional statements to do things like construct file paths[53]. Or – worse yet – having your code work only on the "flavor" of development platform you're using.

## Exercise

Starting with a text representation of your birthday e.g., "May 5, 1970" please create a routine that produces a report describing how long you've been alive (to the nearest day) and what day of the week you were born.