In [1]:
from __future__ import division, print_function, unicode_literals

# Modules 

## Python, the Standard Library and Beyond

Python is a relatively small and tight language. Though it offers many useful building blocks as we've seen (strings, lists, dictionaries, etc), these are just that: building blocks.  

In addition to these fundemental, basic types, Python has a huge galaxy of additional modules and plugins that you can make use of in your code. These extend the language and make it possible to do almost anything that's possible with a computer. With Python, you you can create bots that look for cheap cars on Craiglist (I've done this.) You can run web servers. You can even use artificial intelligence and neural networks to recognize features in images, or even transfer the artistic style of a Picasso painting onto a picture (https://github.com/lengstrom/fast-style-transfer, if you don't believe me!) 

In this set of notes, we'll quickly blaze through how to import packages, how to install new packages, and a few of the standard packages that ship with every installation of Python (known as the Python standard library.) You can find a full list of packages in the standard library, and extensive documentation for them at https://docs.python.org/3.5/py-modindex.html. Needless to say, there's too much stuff even in just the Standard Library to describe every package, so we'll focus on a handful that you might find useful.

In the later lectures, we'll turn our full attention to a particular set of these packages, using what's known as the scipy stack (numpy, matplotlib, pandas, seaborn and scipy) to analyze and vizualize data.

Let's start by importing the *math* package, which provides some helper functions that calculate square roots, cosines, and other things that Python cannot do with the built in functions. To do this, we just type

In [1]:
import math

When you load a module into Python, all of the associated elements of that module are "confined" to the module. What I mean by this is that the math module supplies a cosine function called *cos*, and a constant called *pi*, but you cannot these items just by typing 

```python
cos(PI/2)
```

into the interpreter. Instead, we call functions from the math module the same way that we call methods, with a dot (.)

In [5]:
math.cos(math.pi/2)

6.123233995736766e-17

Python programmers would say that these functions and modules are stored in the math module's *namespace*. Python programmers use *namespaces* because they keep the number of names in your program to a miniumum, which minimizes the chances of accidentally overwritting something important if you mis-name a variable. 

Namespaces also help to group related functions together: if you're looking for a particular mathematical function, you can easily type math, followed by a dot and a tab at the interpreter, and it will tell you a full list of the available functions within the math module. Try it below!

In [None]:
math.

## Saving non-text data to a file

We saw in the "Files" chapter how we can save text to a file. This is obviously useful if the data you are dealing with is a string. It's a little less useful, however, if your data is a list of dictionaries, or has some other sort of complex structure.

If you're working with a complex object in Python and you want to save it to a file, there are two major file types (each with their own Python module) that may be useful to you. Let's see them both in action

### JSON 

JSON stands for JavaScript Object Notation. It's a file type that is stored in plain text, but that's designed to represent nested structures of lists and dictionaries, containing text and numeric data. Basically every type of data that we've talked about so far can be stored in a JSON file, with the exception of File objects.

JSON has two major advantages:
1. It is human readable. You can open a JSON file and clearly see the structure of the data
2. It is supported by a variety of other programming languages, and is the major way of sending raw data around on the internet

There are two major drawbacks to using JSON:
1. Since the files are stored as text, they can be quite large
2. The format can only store lists, dictionaries, strings, numbers and booleans. It cannot functions or any of the custom objects we will encounter later

Let's create a complicated dictionary structure, and store it in a JSON file. First, let's import the package Python provides for dealing with JSON files, called *json*

In [8]:
import json

In [9]:
complicated_structure = { 
    'letters to numbers': {'2': 2, '3': 3},
    'random_list_of_data': [1,5,2,6,'a',7,{'a': 2}]
}

Now let's use the JSON library to save this to a file. We'll open a file for writing like normal, and use the .dump() function in the json module to write this data to the file we've opened. 

The .dump() function takes two inputs: the first is the Python object that we want to save, and the second is the file object that we want to save it to:

In [11]:
with open('assets/complicated_structure.json', 'w') as json_output:
    json.dump(complicated_structure, json_output)

We can now read this data in again, using the json.load() function, which reads a file object's data into a Python data structure. We'll also read the string data from the file as the variable raw_data

In [21]:
with open('assets/complicated_structure.json') as json_input:
    reloaded_structure = json.load(json_input)
    json_input.seek(0) # "Rewinds the file", necessary since we're reading the file twice
    raw_data = json_input.read()

Looking at the raw data, you can see that it strongly resembles the way that python data structures are laid out

In [20]:
raw_data

'{"random_list_of_data": [1, 5, 2, 6, "a", 7, {"a": 2}], "letters to numbers": {"3": 3, "2": 2}}'

And you can see that the JSON library identically recaptures our original dictionary structure

In [22]:
reloaded_structure

{'letters to numbers': {'2': 2, '3': 3},
 'random_list_of_data': [1, 5, 2, 6, 'a', 7, {'a': 2}]}

To illustrate some of the limitations of JSON, let's try to store a function. Instead of using the .dump() function which would require us to open a file, we'll instead use it's close variant .dumps(), which outputs a string

In [24]:
def function_to_store(number_1, number_2):
    return (number_1 - number_2)/(number_1 + number_2)

In [25]:
json.dumps(function_to_store)

TypeError: <function function_to_store at 0x1047b0b70> is not JSON serializable

As you can see, Python refuses to store a function as JSON, because the file format is extremely strictly defined.

### Pickle

Pickle is a Python library that behaves very similarly to JSON. It produces ".pickle" files, which are only used by Python, but can represent any Python type. These files are not necessarily backwards compatible, so they are best used for short-term storage.

Unlike JSON, .pickle files are stored as "binary data," which is a highly compressed, non-human-readible type of file. This means that when we open the file to write the data, we have to open it in "binary mode." This is done by using 'wb' instead of 'w' in the mode argument of open(), as you can see below. When we read the file back in, we'll use 'rb' as the mode instead of 'r'

Let's try out pickle by using it to store the function above, and then re-loading the function:

In [29]:
import pickle 

with open('assets/function.pickle', 'wb') as pickle_output:
    pickle.dump(function_to_store, pickle_output)

In [30]:
with open('assets/function.pickle', 'rb') as pickle_input:
    reloaded_function = pickle.load(pickle_input)

In [31]:
reloaded_function(3,4)

-0.14285714285714285

# Acessing the web with requests

This next section is just for fun! We may do more with web programming later, but it is a field in itself and too complicated to go through even in sketchy detail.

As one last short demonstration of the power of Python modules, let's try using Python to get information off the internet. Usually, people use the internet as a service to retrieve documents, which we call websits. Websites are stored in a file format called HTML, which is technically readable by humans (not binary data), but in practice is a pretty complicated format. The process of obtaining data from websites is called *web scraping*, and it's as much an art as a science, so we won't get into the gory details here.

Thankfully, there's a faster way to transmit data on the internet than having to scrape information from a website. Lots of websites provide what are known as APIs, which are services that are meant to provide data to computer programs. Let's try accessing an API at the http address http://tambal.azurewebsites.net/joke/random, which provides a random joke in JSON format when you access it.

To do this, we'll use a library called *requests*. This isn't part of the standard library, but it ships with almost every distribution of Python. Let's import it 

In [3]:
import requests

And use it to get a random joke from this website! Don't worry about understanding the next few lines

In [4]:
response = requests.get('http://tambal.azurewebsites.net/joke/random')
response.raise_for_status()
response.text

'{"joke":"My friend\'s bakery burned down last night. Now his business is toast."}'

The variable response.text contains a short snippet of JSON, containing a dictionary with a "joke" key and a value that's an admittedly pretty bad joke (Even though the joke is randomly selected, I promise you that none of them are good). Let's parse it into a Python object using the json.loads() function. This function is identical to json.load(), but takes a string as an argument rather than a file object.

In [41]:
joke = json.loads(response.text)
joke

{'joke': 'What do you call a fake noodle?  An impasta'}

Pretty cool! We just retrived some data from the internet, and got it stored in memory as a Python object! 