<img src="https://ga-dash.s3.amazonaws.com/production/assets/logo-9f88ae6c9c3871690e33280fcf557f33.png" style="float: left; margin: 10px;"> 

# Python Standard Library & Third-Party Packages
***
Week 1 | Lesson 1.4

### LEARNING OBJECTIVES
*After this lesson, you will be able to:*
- Understand and know how to use the standard library
- Know what third-party packages are and how to install them

### STUDENT PRE-WORK
*Before this lesson, you should already be able to:*
- Understand basic python synatx


![](http://saocarlos.pyladies.com/img/berry/Slides/python2.png)

## Module 

**Definition:** smallest unit of code resuability 

Simply put, a module in python is a .py file that defines one or more function/classes which you intend to reuse in different codes of your program.


In [1]:
# Import a module
import math
# use a module
math.sqrt(16)

4.0

In [2]:
# Import functions from a module into the local namespace
from math import ceil, floor
print ceil(3.7)
print floor(3.7)

4.0
3.0


In [1]:
# Bind a module function to a new local symbol
from math import degrees as degs

In [2]:
# Any python file (including your own) can be a module 
from  my_script import send_greating

## Package 

**Definition:** logical collections of modules

A Python package refers to a directory of Python module(s). This feature comes in handy for organizing modules of one type at one place.

A package is a unit of distribution that can contain a library or an executable (i.e. a script) or both. It's a way to share your code with the community.




**Packges give structure to modules**   
    
    sound/
    ├── __init__.py
    ├── effects/
    │   ├── __init__.py
    │   ├── echo.py
    │   ├── reverse.py
    │   └── surround.py
    ├── filters/
    │   ├── __init__.py
    │   ├── equalizer.py
    │   ├── karaoke.py
    │   └── vocoder.py
    └── formats/
     ├── __init__.py
     ├── aiffread.py
     ├── aiffwrite.py
     ├── auread.py
     ├── auwrite.py
     ├── wavread.py
     └── wavwrite.py

This is the structure of the python packge called sound. Notice how each ***model** (every .py script) has a unit of functionality, such as the echo script which contains files that modify echos in sound files. Each of those models are then group in local directories. For instance the echo, reverse, and surround models are all grouped in the effects directory of the sound package. 

#### Don't run this cell

In [None]:
# Don't run cell
import sound.effects.echo

# module echo as a function called echofilter
echo.echofilter(input, output)

# 2nd way to import a module 
from sound.effects import echo 

echo.echofilter(input, outptu, delay=0.7, atten=4)

<style>
.text_cell_render {
font-family: Times New Roman, serif;
}
</style>

### Package Import Rules

In [None]:
# Don't run this cell!

# The item can be a submodule (or subpackage) of package
from package import item

# All but the last must be packages
import item.subitem.subsubitem

In [5]:
# sklearn is a package, linear_model is a subpackage
from sklearn import linear_model

# sklearn is a package, covariance is a subpackage, and outlier_detection is a module
import sklearn.covariance.outlier_detection

## Python Import Best Practices 

**All imports go at the top of the file after header comment **

1. Putting all imports at the top of the file makes it clear to your team members (and youself when you read through the file 6 months later) which packages the file is dependent on to run. 
2. Scattering import statements through our files will make it difficult for readers to know which packages have been imported into the name space (more on this in a minutes.)

![](Screen Shot 2017-01-28 at 6.18.18 PM.png)

### Avoid imports statements that use the wild character   * 

1. These statements import all modules and subpackages from a package. By importing packages and modules this way, we don't see which names are populating the name space.  
2. This usually leads to confusing errors that can be difficult to debug! 

In [7]:
# avoid statments like this!
from math import *

In [8]:
# some math calculations
print cos(0)
print sin(0)

1.0
0.0


In [9]:
def sin(favorite_sin):
    '''Function shames you for engaging in your favorite sin.'''
    print "Shame on you for engaging in {}!".format(favorite_sin)

In [10]:
# having fun with strings 
fav_sin = "drinking wiskey"

sin(fav_sin)

Shame on you for engaging in drinking wiskey!


In [11]:
# now more calculations
print cos(3.14)
print sin(3.14)

-0.999998731728
Shame on you for engaging in 3.14!
None


### Library

**Definiton**
    A set of modules which makes sense to be together and that can be used in a program or another library. 
    One can't run a library, but they can run the code inside of the library. 
    
The term library is simply a generic term for a bunch of code that was designed with the aim of being usable by many applications. It provides some generic functionality that can be used by specific applications.

When a module/package/something else is "published" people often refer to it as a library. Often libraries contain a package or multiple related packages, but it could be even a single module.

Libraries usually do not provide any specific functionality, i.e. you cannot "run a library".

A package, which is a unit of distribution, can contain a library or an executable (i.e. a script) or both. It's a way to share your code with the community.

**Examples of Libraries**
- Numpy
- Pandas
- Matplotlib
- Python's Standard Library

## Executing Moduls as Scripts

We can run a module as a script. Do so by include the **if __name__ == '__main__':** statement.

Every module in python has a special attribute called __name__ . The value of __name__  attribute is set to '__main__'  when module run as main program.

```python
def send_greating(student_name):
    """"This function greats the student to the dsi course."""
    print "Hello {} and welcome to DSI 5!".format(student_name)
    
        
def celebrate_birthday(student_name):
    """"This function sends a birthday wish"""
    print "Happy birthday {}, and have a great year!".format(student_name)   
    
    
if __name__ == '__main__':  
    
    student_name = raw_input("Enter your name: ")
    
    send_greating(student_name)
```

### You Try

1. Open up a terminal window
2. Navigate to the directory where this jupyter notebook is located. 
3. Type **python my_script.py** into the terminal. 

## The Standard Library

The term **standard library** is an extensive collection of packages, modules, data types, and functions that are built into the python language. Checkout the full documentation [here](https://docs.python.org/3/library/). Some examples of what the standard library has to offer are provided below


### Regex 

Regex stands for **Regular Expressions**, it is very useful package for disecting and parsing the written language. Regex allows you to scan a body of text for speical chacters, words, a group of words, multiple appearances of certain wrods, and more. 

Check out this [Regex syntax building tool](http://regexr.com/). 

In [50]:
import re

In [48]:
text = \
'''Edit the Expression & Text to see matches. Roll over matches or the expression for details. 
Undo mistakes with cmd-z. Save Favorites & Share expressions with friends or the Community. 
Explore your results with Tools. A full Reference & Help is available in the Library, or watch the video Tutorial.'''

In [58]:
m = re.findall(r"([A-Z]\w+)", text)

In [60]:
m

['Edit',
 'Expression',
 'Text',
 'Roll',
 'Undo',
 'Save',
 'Favorites',
 'Share',
 'Community',
 'Explore',
 'Tools',
 'Reference',
 'Help',
 'Library',
 'Tutorial']

### Collections

Collections is a library with some pretty handy data types that are enhancemnts on some standard python data objects

In [70]:
from collections import defaultdict, namedtuple, Counter

#### Defaultdic

Defaultdect allows you to initialize a dictionary's value container with a standard data object: list, dict, tuple. Then you can reference that standard data object's method's when you minipulate the Defaultdect's values. 

In [63]:
s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]
d = defaultdict(list)
for k, v in s:
     d[k].append(v)

d.items()

[('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])]

#### Counter

A Counter is a dict subclass for counting hashable objects. It is an unordered collection where elements are stored as dictionary keys and their counts are stored as dictionary values. Counts are allowed to be any integer value including zero or negative counts. The Counter class is similar to bags or multisets in other languages.

In [66]:
# Tally occurrences of words in a list
cnt = Counter()
for word in ['red', 'blue', 'red', 'green', 'blue', 'blue']:
     cnt[word] += 1
        
cnt

Counter({'blue': 3, 'green': 1, 'red': 2})

In [68]:
Color = namedtuple("Color", ["hue", "saturation", "luminosity"])

In [None]:
Color.

In [67]:
pixel = Color(170, 0.1, 0.6)

if pixel.saturation >= 0.5:
    print "Whew, that is bright!"
    
if pixel.luminosity >= 0.5:
    print "Wow, that is light!"

Wow, that is light!


### Itertools

Itertools is a library that provides several different methods of iteration through data. 

In [71]:
import itertools

In [72]:
for unique_comb in itertools.combinations(range(3), 2):
    print unique_comb

(0, 1)
(0, 2)
(1, 2)


### Json

**JSON (JavaScript Object Notation)** is a lightweight data interchange format inspired by JavaScript object literal syntax.

In [6]:
import json

In [83]:
squares = {1:1, 2:4, 3:9, 4:16}

# Serialize to/from string
output = json.dumps(squares) #=> "{1:1, 2:4, 3:9, 4:16}"

In [86]:
output

'{"1": 1, "2": 4, "3": 9, "4": 16}'

In [87]:
type(output)

str

In [84]:
output

'{"1": 1, "2": 4, "3": 9, "4": 16}'

In [85]:
json.loads(output)

{u'1': 1, u'2': 4, u'3': 9, u'4': 16}

With JSON you can save a data object to file and serialize it. As well as load it from file and deserialize it. 

In [90]:
# Serialize to file
with open('tmp.json', 'w') as out_file:
    json.dump(squares, out_file)

In [91]:
# Deserialize from file
with open('tmp.json', 'r') as in_file:
    
    file_loaded = json.load(in_file)  

In [92]:
file_loaded

{u'1': 1, u'2': 4, u'3': 9, u'4': 16}

In [93]:
type(file_loaded)

dict

In [82]:
  # All variants support useful keyword arguments
json.dumps(squares, indent=4, sort_keys=True, separators=(',', ': '))

'{\n    "1": 1,\n    "2": 4,\n    "3": 9,\n    "4": 16\n}'

### Random

This module implements pseudo-random number generators for various distributions.

For integers, uniform selection from a range. For sequences, uniform selection of a random element, a function to generate a random permutation of a list in-place, and a function for random sampling without replacement.

In [12]:
import random

In [13]:
random.random() # Random float x, 0.0 <= x < 1.0

0.3974024022485141

In [14]:
random.uniform(1, 10) # Random float x, 1.0 <= x < 10.0

1.0836221776771089

In [15]:
random.randint(0, 9) # Integer from 0 to 9

0

In [16]:
random.randrange(10) # Integer from 0 to 9

7

In [18]:
random.randrange(0,101,2) # Even integer from 0 to 100

50

In [22]:
random.sample(range(5), k=3)# k samples without replacement

[2, 4, 0]

In [23]:
random.choice('abcdefghij') # Choose a single element

'b'

### Useful debugging Tools

### Pretty Print

In [27]:
from pprint import pprint

In [25]:
# Some horrendous data structure
ugly = {
'data': {
'after': 't3_3q8aog',
'before': None,
'kind': 'pagination',
'children': [{}, {}, {}, {}],
'uuid': '40b6f818'} }

ugly['recursive'] = ugly

In [26]:
print ugly

{'data': {'uuid': '40b6f818', 'kind': 'pagination', 'after': 't3_3q8aog', 'children': [{}, {}, {}, {}], 'before': None}, 'recursive': {...}}


In [28]:
pprint (ugly)

{'data': {'after': 't3_3q8aog',
          'before': None,
          'children': [{}, {}, {}, {}],
          'kind': 'pagination',
          'uuid': '40b6f818'},
 'recursive': <Recursion on dict with id=4366001608>}


### Time Short Snippets

In [29]:
import timeit

In [30]:
# Python Interface 
timeit.timeit('"-".join(str(n) for n in range(100))', number=10000)

0.39702486991882324

In [31]:
timeit.timeit('"-".join([str(n) for n in range(100)])', number=10000)

0.3775339126586914

In [32]:
timeit.timeit('"-".join(map(str, range(100)))', number=10000)

0.15209102630615234

In [33]:
# Command Line Interface
!python -m timeit '"-".join(str(n) for n in range(100))'

10000 loops, best of 3: 33.7 usec per loop


### Future Statements

Future is a module that allows you import functions that are implemented ahead of your current python 2.x version. 

For instance, in python 3 dividing two ints will results in a float and not in their remainder. 

In [103]:
5/2

2.5

In [96]:
from __future__ import division

In [97]:
5/2

2.5

### Math

Math has a small collection of usuful mathematical operations. 

In [104]:
import math

In [105]:
math.sqrt(9)

3.0

In [106]:
math.factorial(3)

6

In [107]:
math.pow(2,5)

32.0

### Built-in Functions

Checkout the [docs](https://docs.python.org/2/library/functions.html) for a complete list of python's built-in functions. 

In [41]:
any([True, True, False]) # => True

True

In [42]:
all([True, True, False]) # => False

False

In [47]:
range(5)

[0, 1, 2, 3, 4]

In [43]:
int('45')

45

In [44]:
round(123.45, 1)

123.5

In [45]:
sum([1,2,3,4,5])

15

# Third-Party Packages

Thired-Party Packages are packages that didn't built in python (i.e. the standard library) or you didn't build. In python, thired-party packages tend to be open source: developed by the community for the community free of charge. 

### Package Management

In order to download and install third-party packages, you'll need to use a package installer. 

**conda** - Try this first (but less supported by the community), if that doesn't work

**brew** - Try this second, if that doesn't work

**pip** - This will usually work (pip is the preferred Python package manager by the open source community)

**python setup.py install** - build package from source

### Python Package Index (PyPI)

The [python package index](https://pypi.python.org/pypi) is a site that list about **97950*** that are availabel for download.  

In [None]:
# Install/Uninstall a package
$ pip install package_name
$ pip uninstall package_name

In [None]:
# Upgrade an existing package to the newest version
$ pip install --upgrade package_name

In [None]:
# Install to Python user-specific install directory
$ pip install --user package_name

# Require specific versions to be installed
$ pip install "package_name==4.1.0"
$ pip install "package_name >= 1.0, != 1.4.0, < 2.0"

# Show information about a particular package
$ pip show package_name

# Search PyPI for matching
$ pip search query

### Popular Third-Party Packages

![](http://static.wixstatic.com/media/7b913d_7d27b1ff5fe54cc79c80d2cc0e319d92~mv2.jpg_256)

[Numpy's](http://www.numpy.org/) array object make transforming and minipulating matricies and vectors easy. 


![](http://www.cbvl.co.uk/static/upload/consulting/python/scipy.png)

[Scipy](https://www.scipy.org/) has many useful tools that are commonly used for mathematics, science, and engineering.

![](http://www.scipy-lectures.org/_images/scikit-learn-logo.png)

[Sklearn](http://scikit-learn.org/stable/) has an extensive library of machine learning models and feature engineering tools. 

![](http://spark.apache.org/images/spark-logo-trademark.png)

[Apache Spark](http://spark.apache.org/) is a fast and general engine for large-scale data processing.