# Introduction

These notes are based on Chapter 1 of Dive into Python.  Have that open and at the ready if you've not already read it.  As the name suggests, we're going to dive right in with some code, unpack it, and see how it works. In the process, we're going to hit a number of key concepts.


## Concepts
*   Functions & parameters & return values
*   Documentation
*   Indentation
*   Control flow
*   Variables
*   Loops
*   Exceptions

## Mechanics and Practicals
* Using the Jupyter Notebooks in Google Colab to run existing code
* Using the Scratch cell
* _Don't worry_ - you're not editing my version of this file

Since you likely got this notebook from a Google Share link, you'll notice it says up top that it can't save any changes you make here.  That's fine for now.  You're not actually working on my copy of this, but rather a temporary copy unique to you.  But, for good practice, click on the **Copy to Drive** text just below the menu bar.  This will save it to your Google Drive so you can come back to this at any point.

Now, below is the `humansize.py` file from *Dive Into Python*.  Hit the "Play" button in the upper left corner of it (aka _Run cell_), scroll down and look for the output.


In [7]:
'''Convert file sizes to human-readable form.

Available functions:
approximate_size(size, a_kilobyte_is_1024_bytes)
    takes a file size and returns a human-readable string

Examples:
>>> approximate_size(1024)
'1.0 KiB'
>>> approximate_size(1000, False)
'1.0 KB'

'''

SUFFIXES = {1000: ['KB', 'MB', 'GB', 'TB', 'PB', 'EB', 'ZB', 'YB'],
            1024: ['KiB', 'MiB', 'GiB', 'TiB', 'PiB', 'EiB', 'ZiB', 'YiB']}

def approximate_size(size, a_kilobyte_is_1024_bytes=True):
    '''Convert a file size to human-readable form.

    Keyword arguments:
    size -- file size in bytes
    a_kilobyte_is_1024_bytes -- if True (default), use multiples of 1024
                                if False, use multiples of 1000

    Returns: string

    '''
    if size < 0:
        raise ValueError('number must be non-negative')

    multiple = 1024 if a_kilobyte_is_1024_bytes else 1000
    for suffix in SUFFIXES[multiple]:
        size /= multiple
        if size < multiple:
            return '{0:.1f} {1}'.format(size, suffix)

    raise ValueError('number too large')

print(approximate_size(1000000000000, False))
print(approximate_size(1000000000000))

# Copyright (c) 2009, Mark Pilgrim, All rights reserved.
# 
# Redistribution and use in source and binary forms, with or without modification,
# are permitted provided that the following conditions are met:
# 
# * Redistributions of source code must retain the above copyright notice,
#   this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright notice,
#   this list of conditions and the following disclaimer in the documentation
#   and/or other materials provided with the distribution.
# 
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 'AS IS'
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
# POSSIBILITY OF SUCH DAMAGE.



1.0 TB
931.3 GiB


What happened when you hit play (aka "run cell" aka Shift-Enter)?  It should have given an output that said:
```
1.0 TB
931.3 GiB
```
But, what happened internally?  Some host machine (e.g., Google Colab) has a version of Python running and when you ran the cell, it "defined a function" called *approximate_size()*, and then called that function twice.  It called that function once "passing in" two values: `1000000000000` and `False` and a second time passing in just the `1000000000000`.

In memory now, on that host, is this function.  So, once you've run this cell, that function, *approximate_size* exists as a command you can use.  Right now, open up a "scratch cell" by doing *Insert, Scratch code cell*.  You'll get a window to the right and type in something like `approximate_size(123858)` and hit the run button.  **Try this for a few values**.

**Now, on a blank line** there type _app_ and pause for a second. See how it knows about _approximate_size_? Click on the little circled i and watch it show you some helpful bits about the function.  Now (and you may need to restart typing the function name), type _app_**TAB** and watch it auto-complete the name of the function for you.

_(Note, if you need to get it to do show you the completions again, it Ctrl-Space)_

# Do the Ping! problem set now and submit it

# Unpacking this code

Let's trace through this code a bit:

**def line - defining a function...**
```
def approximate_size(size, a_kilobyte_is_1024_bytes=True):
```

*   function name - can be whatever you want
*   takes parameters
    *   Mandatory
    *   Optional w/default value

**Comment / description - just what does this do**?
```
    '''Convert a file size to human-readable form.

    Keyword arguments:
    size -- file size in bytes
    a_kilobyte_is_1024_bytes -- if True (default), use multiples of 1024
                                if False, use multiples of 1000

    Returns: string

    '''
```

*   Note the triple-quote. If you want text - called a string to span multiple lines, you use the triple-quote
*   Anything you put right after the def line becomes the documentation for that function - called the docstring. 
*   **Right now - in  your Scrtach window type** `help(approximate_size)`

**Indentation**

*   Notice how evertying inside the "def" bit is indented and that `print(approximate_size...` bit is back on the left margin?
*   Indentation is huge in Python - it says what belongs with what and under what. So, all that is part of the function
*   Python knows that the `def approximate_size(...` through the `raise ValueError...` all are part of one block and that the `print` starts something else because of the indentation (other languages use "end" statements, squiggly brackets, etc).
*   Note the colon here though at the end of the `def` line - that's basically saying an indent is coming - it's another block of code we're going into. It'll make a bit more sense later

**if-statement**
```
    if size < 0:
        raise ValueError('number must be non-negative')
```

*   Control flow - check if something is the case. If so, do X. If not, do Y.
*   if - elif - else
*   What is _size_? Where did it come from?
    *   Concept of a variable
    *   Note size is only known inside this function. You could have called it _foo_ or _s_ or _ILikePie_. Don't do that though.
*   Note the indent
*   Note the colon

**conditional assignment**
```
multiple = 1024 if a_kilobyte_is_1024_bytes else 1000
```
This is a fun bit in Python.  You can do fancy things like this, but you needn't get this fancy.  We could do a simpler version like this:

```
multiple = 1000
if a_kilobyte_is_1024_bytes:
    multiple = 1024
```

Keep in mind, there are many ways to solve any problem - neither is at all wrong. But in the version in the code, _multiple_ gets set based on the outcome of that `if`.

**for-loop**
```
    for suffix in SUFFIXES[multiple]:
        size /= multiple
        if size < multiple:
            return '{0:.1f} {1}'.format(size, suffix)
```
We're going to iterate some number of times and each time, _suffix_ will get a different value.  What values? Whatever is in _SUFFIXES_.

Well, whatever is in a part - a slice of _SUFFIXES_ - the 1000 or 1024 index of it

So, let's say we have _multiple_=1000. The first time, the result is 'KB'.  The next will be 'MB', etc.

Let's say _size_ = 1,000,000 (1 million).  



*   The first time in the loop we hit that `/=` sign (aka `size = size / multiple`), _size_ is now 1,000
*   if-statement -- is 1000 &lt; 1000? Nope
*   Loop back to the start
*   _suffix_ is now 'MB'
*   _size_ goes to be 1
*   is 1 &lt; 1000?  Yup
*   returns this goofy thing

What does return mean? Returns - jumps out of this function we're doing here. So we bail on the loop and anything after that and we return a value - in this case that string.

Now, we've got a lot of odd stuff in there - It's a string formatter - we'll get to it later, but:



*   `0` and `1` are placeholders - the 0th and 1st things that format will spit out - size and suffix
*   The `.1f` thing - means floating-point number with one decimal place.  For example, try running this in a scratch cell: `'{0:.1f} {1}'.format(1.2341, 'duckies')`
*   But for the moment, forget all this fancy formatting ... we'll come back to it later.  Just know there's a lot you can do.

**raise statement**
```
    raise ValueError('number too large')
```
What if things go wrong?  Well, here in the code, if we've gotten past that loop and haven't returned some nice ouput, things have gone badly. It's good practice to have your code let other code know that some kind of problem exists.  Here, we're going to send out a big alert (called an exception).  There are multiple kinds of exceptions - here it's a problem with a value - _ValueError_.  The idea is that your code that might call this function may be able to handle the error gracefully - "I'm sorry Dave, I'm afraid I can't do that".

We'll get to try ... except bits later.  But the idea is it's just a way to let anyone else know - any other bit of code know - that the sticky brown stuff hit the rotating blades.


## Change from the book
The book has an if-statement here and puts the print statments inside that if-statement:
```
if __name__ == '__main__':
    print(approximate_size(1000000000000, False))
    print(approximate_size(1000000000000))
```

**_TL;DR: This is your "when I run this file, do this" section_**

Every time I've taught this course, this trips people up and so I changed his code.  There are deep reasons why he did this, but they apply a lot less to us and less still when using things like Jupyter notebook environments.


## One more thing

Clear your current Python machine by going to *Runtime, Restart runtime* (or close this window and open it again).  Now, over in your Scratch cell, try running `approximate_size(123456)` or something like that.  What happens?

It failed right?  It gave you something like
```
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-1-4a112952de59> in <module>()
----> 1 approximate_size(1231)

NameError: name 'approximate_size' is not defined
```

Try the _app_**TAB** trick from before.  That fails too, right? _Do you know why?_


Python is all about adding your own bits to the language - we created this function `approximate_size()`.  But, at this point, Python doesn't know about it - it's a clean, virgin copy of Python we're running.  We need to tell it - we need to let it read these statements like the `def` bits to learn this - to add it in - to create that function.  We've not done anything to execute or load our code.  Python has never read over those lines and executed them, so that new function hasn't been learned by this clean copy of Python.

Now, fix this by hitting the _Run cell_ icon on the code and verify that your scratch cell works again.