In [1]:
from __future__ import division, print_function, unicode_literals

# Modules 

## Python, the Standard Library and Beyond

Python is a relatively small and tight language. Though it offers many useful building blocks as we've seen (strings, lists, dictionaries, etc), these are just that: building blocks.  

In addition to these fundemental, basic types, Python has a huge galaxy of additional modules and plugins that you can make use of in your code. These extend the language and make it possible to do almost anything that's possible with a computer. With Python, you you can create bots that look for cheap cars on Craiglist (I've done this.) You can run web servers. You can even use artificial intelligence and neural networks to recognize features in images, or even transfer the artistic style of a Picasso painting onto a picture (https://github.com/lengstrom/fast-style-transfer, if you don't believe me!) 

In this set of notes, we'll quickly blaze through how to import packages, how to install new packages, and a few of the standard packages that ship with every installation of Python (known as the Python standard library.) You can find a full list of packages in the standard library, and extensive documentation for them at https://docs.python.org/3.5/py-modindex.html. Needless to say, there's too much stuff even in just the Standard Library to describe every package, so we'll focus on a handful that you might find useful.

In the later lectures, we'll turn our full attention to a particular set of these packages, using what's known as the scipy stack (numpy, matplotlib, pandas, seaborn and scipy) to analyze and vizualize data.

Let's start by importing the *math* package, which provides some helper functions that calculate square roots, cosines, and other things that Python cannot do with the built in functions. To do this, we just type

In [5]:
import math

When you load a module into Python, all of the associated elements of that module are "confined" to the module. What I mean by this is that the math module supplies a cosine function called *cos*, and a constant called *pi*, but you cannot these items just by typing 

```python
cos(PI/2)
```

into the interpreter. Instead, we call functions from the math module the same way that we call methods, with a dot (.)

In [6]:
math.cos(math.pi/2)

6.123233995736766e-17

Python programmers would say that these functions and modules are stored in the math module's *namespace*. Python programmers use *namespaces* because they keep the number of names in your program to a miniumum, which minimizes the chances of accidentally overwritting something important if you mis-name a variable. 

Namespaces also help to group related functions together: if you're looking for a particular mathematical function, you can easily type math, followed by a dot and a tab at the interpreter, and it will tell you a full list of the available functions within the math module. Try it below!

In [7]:
math.

SyntaxError: invalid syntax (<ipython-input-7-186ff497df9b>, line 1)

## Saving non-text data to a file

We saw in the "Files" chapter how we can save text to a file. This is obviously useful if the data you are dealing with is a string. It's a little less useful, however, if your data is a list of dictionaries, or has some other sort of complex structure.

If you're working with a complex object in Python and you want to save it to a file, there are two major file types (each with their own Python module) that may be useful to you. Let's see them both in action

### JSON 

JSON stands for JavaScript Object Notation. It's a file type that is stored in plain text, but that's designed to represent nested structures of lists and dictionaries, containing text and numeric data. Basically every type of data that we've talked about so far can be stored in a JSON file, with the exception of File objects.

JSON has two major advantages:
1. It is human readable. You can open a JSON file and clearly see the structure of the data
2. It is supported by a variety of other programming languages, and is the major way of sending raw data around on the internet

There are two major drawbacks to using JSON:
1. Since the files are stored as text, they can be quite large
2. The format can only store lists, dictionaries, strings, numbers and booleans. It cannot functions or any of the custom objects we will encounter later

Let's create a complicated dictionary structure, and store it in a JSON file. First, let's import the package Python provides for dealing with JSON files, called *json*

In [8]:
import json

In [9]:
complicated_structure = { 
    'letters to numbers': {'2': 2, '3': 3},
    'random_list_of_data': [1,5,2,6,'a',7,{'a': 2}]
}

Now let's use the JSON library to save this to a file. We'll open a file for writing like normal, and use the .dump() function in the json module to write this data to the file we've opened. 

The .dump() function takes two inputs: the first is the Python object that we want to save, and the second is the file object that we want to save it to:

In [10]:
with open('assets/complicated_structure.json', 'w') as json_output:
    json.dump(complicated_structure, json_output)

We can now read this data in again, using the json.load() function, which reads a file object's data into a Python data structure. We'll also read the string data from the file as the variable raw_data

In [11]:
with open('assets/complicated_structure.json') as json_input:
    reloaded_structure = json.load(json_input)
    json_input.seek(0) # "Rewinds the file", necessary since we're reading the file twice
    raw_data = json_input.read()

Looking at the raw data, you can see that it strongly resembles the way that python data structures are laid out

In [12]:
raw_data

'{"letters to numbers": {"3": 3, "2": 2}, "random_list_of_data": [1, 5, 2, 6, "a", 7, {"a": 2}]}'

And you can see that the JSON library identically recaptures our original dictionary structure

In [13]:
reloaded_structure

{'letters to numbers': {'2': 2, '3': 3},
 'random_list_of_data': [1, 5, 2, 6, 'a', 7, {'a': 2}]}

To illustrate some of the limitations of JSON, let's try to store a function. Instead of using the .dump() function which would require us to open a file, we'll instead use it's close variant .dumps(), which outputs a string

In [14]:
def function_to_store(number_1, number_2):
    return (number_1 - number_2)/(number_1 + number_2)

In [15]:
json.dumps(function_to_store)

TypeError: <function function_to_store at 0x107436730> is not JSON serializable

As you can see, Python refuses to store a function as JSON, because the file format is extremely strictly defined.

### Pickle

Pickle is a Python library that behaves very similarly to JSON. It produces ".pickle" files, which are only used by Python, but can represent any Python type. These files are not necessarily backwards compatible, so they are best used for short-term storage.

Unlike JSON, .pickle files are stored as "binary data," which is a highly compressed, non-human-readible type of file. This means that when we open the file to write the data, we have to open it in "binary mode." This is done by using 'wb' instead of 'w' in the mode argument of open(), as you can see below. When we read the file back in, we'll use 'rb' as the mode instead of 'r'

Let's try out pickle by using it to store the function above, and then re-loading the function:

In [16]:
import pickle 

with open('assets/function.pickle', 'wb') as pickle_output:
    pickle.dump(function_to_store, pickle_output)

In [17]:
with open('assets/function.pickle', 'rb') as pickle_input:
    reloaded_function = pickle.load(pickle_input)

In [18]:
reloaded_function(3,4)

-0.14285714285714285

# Excercises

## Find today's playlist on BBC Radio 1

As we saw before, JSON is a great way to store data that can be represented as lists, strings, numbers or dictionaries. Because of this, JSON is one of the main ways that raw data is passed between computers on the internet. 

For example, a weather website might offer a service to find the weather in a particular city. What you would do is access the website http://sampleweatherwebsite.com/weather?city=[name_of_city], and it would return plain text that contains a dictionary in JSON format that would look something like this"

```python
{'city': {'name': 'Seattle',
          'latitude': '47.6062N',
          'longitude': '122.3321W'}
 'weather': { 'temperature': 54,
              'humidity': 76,
              'conditions': 'cloudy',
              'chance of precipitation': 100
            }
}
```
We call these sorts of services APIs (application public interfaces). They are similar to web pages, but instead of rendering a document with fonts, images, and other human readable elements, they provide data that is meant to be processed by computers. There's a variety of APIs on the web that return data on all sorts of interesting things. Many of these require you to sign up for the website and obtain a "password" called an *API key* which you keep secret. This is used to identify you as a user, and make sure that you're not abusing the service by making too many requests.

The BBC offers an API for finding the current playlist on Radio 1 that does not require a login or an API key. In this excercise, you'll use the *requests* module to access the Radio 1 API, use the *json* module to parse the data that you obtain from it, and then process the resulting data structure to get a dictionary of Artist -> Song Title pairs for all of the playlists together.

First, import the *requests* module.

In [23]:
import requests

Now, we're going to use the get() function in the requests module to make an HTTP request to the BBC website. The website in question is http://www.bbc.co.uk/radio1/playlist.json. If you visit it in your browser, you'll notice that it returns a string in JSON format. 

Pass this web address to the get() function in requests as a string. Save the answer in a variable called response. Also call the method raise_for_status() on the response, to alert Python of any errors in getting the data.

In [24]:
response = requests.get('http://www.bbc.co.uk/radio1/playlist.json')
response.raise_for_status()

```python
response.text
``` 

is a variable that stores the text of the response. This will always be a string, but because we accessed a website that produces JSON, it's a string that we can parse using the *json* module into a Python object. The function loads() in the json module is similar to the load() function but it takes a string instead of a file as its argument. Use this function to convert the text of the response into a Python dictionary.

In [25]:
song_data = json.loads(response.text)

The song data consists of several nested dictionaries. Try priting out the whole song_data variable. Since there is a lot of information contained in this varible, you might find it helpful to use the .keys() dictionary method to unravel the structure. You should see five different playlists, called *a*, *b*, *c*, and *introducing* and *totw*. What kind of data structure is contained in each playlist?

In [30]:
song_data.keys()

dict_keys(['playlist'])

In [31]:
song_data['playlist'].keys()

dict_keys(['a', 'b', 'introducing', 'c', 'totw'])

In [34]:
song_data['playlist']['c']

[{'artist': 'All Time Low',
  'artist_id': '62162215-b023-4f0e-84bd-1e9412d5b32c',
  'image': 'https://ichef.bbci.co.uk/images/ic/512x512/p052216s.jpg',
  'label': None,
  'playlist': None,
  'status': None,
  'title': 'Life Of The Party'},
 {'artist': 'Blaenavon',
  'artist_id': '6fcabcfc-e595-4de5-b3df-ab4ff5aa587b',
  'image': 'https://ichef.bbci.co.uk/images/ic/512x512/p0540xhm.jpg',
  'label': None,
  'playlist': None,
  'status': None,
  'title': 'Lonely Side'},
 {'artist': 'Camila Cabello',
  'artist_id': '01b8b5bf-06cb-45da-85fb-61ada72fcd69',
  'image': 'https://ichef.bbci.co.uk/images/ic/512x512/p054nhv7.jpg',
  'label': None,
  'playlist': None,
  'status': None,
  'title': 'Crying In The Club'},
 {'artist': 'Halsey',
  'artist_id': '3377f3bb-60fc-4403-aea9-7e800612e060',
  'image': 'https://ichef.bbci.co.uk/images/ic/512x512/p052nf9v.jpg',
  'label': None,
  'playlist': None,
  'status': None,
  'title': 'Now Or Never'},
 {'artist': 'The Hunna',
  'artist_id': '39c6625e-831

Now, write a function that iterates over these different playlists, and combines them all into a single Python list, keeping only the "artist" and "title" attributes of each entry.

In [47]:
def combine_playlists(playlist_data):
    playlist_data = playlist_data['playlist']
    output = []
    for name, playlist in playlist_data.items():
        for song in playlist:
            output.append({'title': song['title'], 
                           'artist': song['artist'], 
                          })
    return output

output = combine_playlists(song_data)

## 2. Select a random song

Now that we have a playlist of songs collected from the BBC, let's write a function called print_random_song that picks a random song from the playlist and prints the string "Now playing - {song} by {artist}" using the print function. Since it prints the information out, there's no need to return anything!

To do this, we will need some way of picking elements randomly. Python provides a *random* module that provides lots of useful functions for getting random behavior. Rather than giving you the name of the function, in this excercise you should visit https://docs.python.org/3/library/random.html and read the documentation for the module. Do you see a function in the library that works for picking a random element from a list?

If you didn't complete the first part of the excercise and don't have a list of songs, there's a sample one (from June 2017) stored in 'assets/playlist.json'. Load this file using the json module (maybe do this anyway for practice!) and use it to complete the excercise. 

In [49]:
import random 

def print_random_song(song_data):
    song = random.choice(song_data)
    print("Now playing", song['title'], "by", song['artist'])

In [55]:
print_random_song(output)

Now playing WALLS by Kings of Leon
