[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/scott2b/PythonReview/blob/main/notebooks/Python.02.FunctionsModulesAndStdlib.ipynb)

# Introduction to functions, modules, and the standard library in Python

* Functions, modules
  - Builtin functions (print, len, https://docs.python.org/3.7/library/functions.html)
  - Function def, block constructs and whitespace in Python
  - Parameters (aka arguments)
  - Variable args and keyword args (kwargs)
  - `import` and the python stdlib

## Python's builtin functions

A full list of the builtin functions is available here:
https://docs.python.org/3.7/library/functions.html

### First and foremost

`print` is used to print output. Print can:

 * print multiple things at a time
 * take a `sep` separator parameter
 * take an `end` end parameter

In [None]:
print('one')
print('one', 'two')
print('one', 'two', 'three', sep=',')
print('red', 'orange', 'yellow', end=';')
print('green', 'blue', 'violet')

one
one two
one,two,three
red orange yellow;green blue violet


### Some things you can do with numbers

#### Asolute value

In [None]:
print(abs(10))
print(abs(-10))

10
10


#### Convert between types

In [None]:
print(float(23))
print(int(23.0))

23.0
23


#### Min and Max

In [None]:
print(min([10,20,30,40]))
print(max(50,60,70,80))

10
80


#### Some math related functions


In [None]:
print(round(2.4))

# round doesn't really do what you expect in many cases. See the docs for details
print(round(2.5))
print(sum([2,3,4,5]))

2
2
14


### Handy logic functions

#### Check the "truthiness" of something

In [None]:
print(1, bool(1))
print(0, bool(0))
print(1.0, bool(1.0))
print(0.0, bool(0.0))
print(23, bool(23))
print('aprd', bool('aprd'))
print('', bool(''))
print(None, bool(None))

1 True
0 False
1.0 True
0.0 False
23 True
aprd True
 False
None False


#### all and any

In [None]:
print(all([True, True, True]))
print(all([True, False, True]))
print(any([True, False, True]))
print(any([False, False, False]))

# values are treated as booleans
print(all([1, 'foo', 13]))
print(any([None, 0, 0.0, '']))

True
False
True
False
True
False


### Functions for data structures

#### Convert between different serial structures

In [None]:
print(set([1,2,1,1,2,2,3,2,3,1,3]))
print(list(set([1,1,2,2,3,3])))
print(tuple([1,2,3]))

{1, 2, 3}
[1, 2, 3]
(1, 2, 3)


#### enumerate a series


In [None]:
for i, val in enumerate(['red', 'orange', 'yellow']):
    print(i, val)

0 red
1 orange
2 yellow


#### Filter some values

In [None]:
def positive(n):
    return n >= 0

pos = filter(positive, [1, -3, 5, 7, -12, 14, -27])
# filter returns an iterator rather than a list,
# but you can convert it:
pos = list(pos)
pos

[1, 5, 7, 14]

#### Map a function onto a list of items

In [None]:
def add_one(n):
    return n + 1

# map also returns an iterator
list(map(add_one, [1,2,3,4]))

[2, 3, 4, 5]

#### Zip together two lists of items

In [None]:
list(zip(['apples', 'oranges', 'bananas'], ['red', 'orange', 'yellow']))

[('apples', 'red'), ('oranges', 'orange'), ('bananas', 'yellow')]

## Defining your own functions

A function is a named unit of functionality which:

 * Accepts parameters (a.k.a. arguments)
 * Executes some code, using the provided parameters
 * Returns some value


Functions are defined with the `def` keyword, and are delimited by a block of consistent whitespace indentation. It is strongly advised that you use the Python community standard of 4 spaces for your indentations.

A function definition for adding two numbers might look like this:

```
def add(va11, val2):
    sum = al1 + val2
    return sum
```

Make note of the following:

 * The function definition line starts with `def` and ends with `:`
 * The body of the function is indented 4 spaces
 * The function returns a value. If there is no `return` for your function, an implicit `None` will be returned as the value.

Things to consider:

 * Name your functions clearly according to what they do
 * It is customary in Python to name functions using `snake_case`, not `camelCase` or `PascalCase`.
 * A function can have any number of named parameters. We will expand on this idea in a future lecture by introducing the additional concepts of variable arguments and keywords arguments, which allow you to design more flexible APIs for your functions.
 * Something is always returned from a function, even if it is nothing. If you do not explicitly return a value, Python will return `None` for you.
 * It is possible to design a function that appears to return multiple values by simply returning a tuple of values.

## Example functions


### Fibonacci sequence

The Fibonacci sequence is defined as:

```
F0 = 0, F1 = 1
Fn = Fn-1 + Fn-2, n > 1
```

Resulting in the sequence that starts:
0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, ...

Design a function `fibonacci` that takes a value `n` and returns the first n numbers of the Fibonacci sequence.

In [None]:
def fibonacci(n):
    """Return the first n values of the Fibonacci sequence."""
    r = [0, 1]
    for i in range(2, n):
        r.append(r[i-2] + r[i-1])
    return r

In [None]:
fibonacci(17)

[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987]

### A specialized printing function

Given a dictionary of brands and ratings, where each brand name is the key to a list of ratings values, print a table of min and max ratings by brand.

In [None]:
def brand_report(data):
    print('Brand', 'min', 'max', sep='\t')
    print('-----', '---', '---', sep='\t')
    for brand, ratings in data.items():
        print(brand, min(ratings), max(ratings), sep='\t')


brand_data = {
    'Nike': [3.0, 2.5, 4.0, 1.0],
    'Adidas': [2.0, 3.5, 4.0, 1.5],
    'Reebok': [1.0, 3.0, 3.5, 2.0]
}
brand_report(brand_data)

Brand	min	max
-----	---	---
Nike	1.0	4.0
Adidas	1.5	4.0
Reebok	1.0	3.5


## Standard library modules

Beyond the builtin functions discussed above, the Python standard library has a number of modules available with additional functionality.

If you want to use code from a module, you will need to import it.

Some modules that will be useful for you include:

 * [datetime](https://docs.python.org/3.8/library/datetime.html) and [time](https://docs.python.org/3/library/time.html). Basic date and time types

 * [statistics](https://docs.python.org/3/library/statistics.html). Mathematical statistics functions.

 * [collections](https://docs.python.org/3.8/library/collections.html)

 * [pathlib](https://docs.python.org/3/library/pathlib.html).
Object-oriented filesystem paths.

 * [json](https://docs.python.org/3/library/json.html). JSON encoder and decoder.

### datetime and time

The datetime module is a bit confusing because it contains a class named datetime as well. I recommend not doing this:

```
# not the best way
from datetime import datetime
```

Instead, make datetime module calls via the package as here so as to be clear which `datetime` you are using (the module or the class):

In [None]:
# import the datetime module
import datetime

# call the `now` class method on the datetime class
datetime.datetime.now()

datetime.datetime(2020, 5, 26, 1, 57, 27, 115614)

In [None]:
# import the date class from the datetime module and call `today`
from datetime import date
date.today()

datetime.date(2020, 5, 26)

In [None]:
# clock some duration of time
import time
start_time = time.time()
time.sleep(3) # do nothing for 3 seconds
end_time = time.time()
print('duration:', end_time - start_time)

duration: 3.0032832622528076


### statistics

In [None]:
import random
random.seed(123)
data = random.choices(range(1000), k=100)
data # is 100 random numbers from 0 to 999

In [None]:
import statistics
print('mean', statistics.mean(data))
print('median', statistics.median(data))
print('median-low', statistics.median_low(data))
print('median-high', statistics.median_high(data))

# using a subset since mode will throw an error if there is not a unique mode
print('mode (of 1st 20)', statistics.mode(data[:20]))

print('stdev', statistics.stdev(data))
print('variance', statistics.variance(data))

mean 435.73
median 405.5
median-low 404
median-high 407
mode (of 1st 20) 87
stdev 287.37600242135176
variance 82584.96676767677


### collections

The collections module has a number of useful utilities for working with collections of things. A very handy thing is the Counter.

In [None]:
data = random.choices(['taco bell', 'wendys', 'burger king', 'mcdonalds'], k=10)
data # is a random list of fast food joints

['mcdonalds',
 'wendys',
 'wendys',
 'burger king',
 'mcdonalds',
 'mcdonalds',
 'wendys',
 'burger king',
 'burger king',
 'wendys']

In [None]:
from collections import Counter
counter = Counter(data)
counter

Counter({'burger king': 3, 'mcdonalds': 3, 'wendys': 4})

What if we want to count as we go through the data?

In [None]:
c2 = Counter()
for item in data:
    c2.update([item])
c2

Counter({'burger king': 3, 'mcdonalds': 3, 'wendys': 4})

### pathlib

pathlib is a relatively new addition to the Python standard library. It updates the way we work with files and paths. **Note:** To do any file i/o in Colab, we will need to mount Google Drive first.

In [None]:
# mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

# establish a path to a folder on Google Drive called APRD6342/Data
from pathlib import Path
datadir = Path('drive/My Drive/APRD6342/Data')

In [None]:
# pathlib's glob will list files using pattern matching
list(datadir.glob('*.csv'))

[PosixPath('drive/My Drive/APRD6342/Data/cleaned.alexadata.csv'),
 PosixPath('drive/My Drive/APRD6342/Data/salesforce.2018.csv')]

In [None]:
# pathlib uses a unique / syntax to establish file paths:
alexa_file = open(datadir / 'cleaned.alexadata.csv')
for i,line in enumerate(alexa_file):
    if not i % 1000:
        print(i, line.strip())

0 alexa resume spotify
1000 c. p. r. news
2000 alexa pause
3000 alexa play oh you como v. a.
4000 alexa
5000 alexa what's the helpful holiday travel tips story
6000 alexa
7000 alexa volume seven
8000 alexa pause
9000 play back pack
10000 alexa friends


### json

JSON is a web standard data format that comes from Javascript. If you are into state diagrams of language syntax, you can peruse the official JSON documentation [here](https://www.json.org). But mainly, suffice it to say that JSON pretty much works like Python dictionaries.

**However** When using JSON as an interchange format, the data comes into and goes out of Python code as a string. We might refer to this string data format as a JSON "object", although technically it is really just a string .. which happens to be in JSON format. (There is, technically, not really such a thing as a JSON **object** in Python).

What this means is that we need some kind of codec to encode and decode JSON data. Specifically, we need to **parse** or **decode** the JSON string to produce a Python dictionary, and conversely we need to **encode** Python data structures into JSON strings in order to save them as JSON. The `json` module handles this work for you.

**Decoding JSON**

The _twitter_apiresponse_example.json_ file was created from the example response in the Twitter API docs here: https://developer.twitter.com/en/docs/tweets/timelines/api-reference/get-statuses-user_timeline

In [None]:
import json

with open(datadir / 'twitter_apiresponse_example.json') as f:
    # f is now a file handle to a text file containing JSON-formatted text
    data = json.load(f) # use `load` to load straight from a file
                        # use `loads` if you happen to have the string already
    # The API response data is a list of Tweet objects. Let's look at the 1st one
    print(data[0].keys())

dict_keys(['created_at', 'id', 'id_str', 'text', 'truncated', 'entities', 'source', 'in_reply_to_status_id', 'in_reply_to_status_id_str', 'in_reply_to_user_id', 'in_reply_to_user_id_str', 'in_reply_to_screen_name', 'user', 'geo', 'coordinates', 'place', 'contributors', 'retweeted_status', 'is_quote_status', 'retweet_count', 'favorite_count', 'favorited', 'retweeted', 'possibly_sensitive', 'lang'])


In [None]:
data[0]['user']

{'contributors_enabled': False,
 'created_at': 'Wed May 23 06:01:13 +0000 2007',
 'default_profile': False,
 'default_profile_image': False,
 'description': "The Real Twitter API. I tweet about API changes, service issues and happily answer questions about Twitter and our API. Don't get an answer? It's on my website.",
 'entities': {'description': {'urls': []},
  'url': {'urls': [{'display_url': 'dev.twitter.com',
     'expanded_url': 'https://dev.twitter.com',
     'indices': [0, 22],
     'url': 'http://t.co/78pYTvWfJd'}]}},
 'favourites_count': 26,
 'follow_request_sent': False,
 'followers_count': 6172353,
 'following': True,
 'friends_count': 46,
 'geo_enabled': True,
 'has_extended_profile': False,
 'id': 6253282,
 'id_str': '6253282',
 'is_translation_enabled': False,
 'is_translator': False,
 'lang': 'en',
 'listed_count': 13091,
 'location': 'San Francisco, CA',
 'name': 'Twitter API',
 'notifications': False,
 'profile_background_color': 'C0DEED',
 'profile_background_image_u

**Encoding JSON**

In [None]:
media_urls = {
    'Facebook': 'https://www.facebook.com/',
    'Twitter': 'https://twitter.com/home',
    'Instagram': 'https://www.instagram.com/',
    'TikTok': 'https://www.tiktok.com/'
}

media_urls_as_json = json.dumps(media_urls)
media_urls_as_json # Note: from Python's perspective this is a string!

'{"Facebook": "https://www.facebook.com/", "Twitter": "https://twitter.com/home", "Instagram": "https://www.instagram.com/", "TikTok": "https://www.tiktok.com/"}'

## Excercise

Look again at the _Pathlib_ section above, in the code example that iterates through the _cleaned.alexadata.csv_ file. Execute the code and note that it prints out every 1000th line, starting with line 0.

Before doing this exercise, be sure you understand how this part of the code works:

```
if not i % 1000:
```

**Hint:** A more explicit version of this same statement would be this:

```
if i % 1000 == 0:
```

Now that you understand that, you will write some code that does 2 things:

1. Iterate the _cleaned.alexadata.csv_ file
2. Save every 100th (100, not 1000) line into a data structure
3. Do some calculations on the data (see the JSON file format)
4. Write out a JSON file in the following format:

```
{
    "commands": __,
    "min_length": __,
    "max_length": __
}
```

Where:
 * `commands` is the list of commands that you saved (every 100th command starting with line 0),
 * `min_length` is the minimum length of the command lines
 * `max_length` is the maximum length of the command lines

Note the following helpful tips:

 * To open a file for writing, you will need to use the 'w' write-mode code in the open command. E.g:

    ```
    open(myfile, 'w')
    ```

 * The length of a string is determined by the following function call:

    ```
    len(mystring)
    ```

Save the JSON file to your Google Drive, download it, and submit it to canvas.

1 point: A valid JSON file is submitted
1 point: There is a min_length key with the correct value
1 point: There is a max_length key with the correct value