<table align="center">
   <td align="center"><a target="_blank" href="https://colab.research.google.com/github/ds5110/summer-2021/blob/master/01-Intro.ipynb">
<img src="https://github.com/ds5110/summer-2021/raw/master/colab.png"  style="padding-bottom:5px;" />Run in Google Colab</a></td>
</table>

# 1 -- Python basics

This notebook introduces the basics of Python.

### References

* [Whirlwind Tour of Python](https://jakevdp.github.io/WhirlwindTourOfPython/) -- HTML
* [Whirlwind Tour of Python](https://github.com/jakevdp/WhirlwindTourOfPython) -- Jupyter notebooks

### Additional References

* [The Python Tutorial](https://docs.python.org/3/tutorial/index.html) - python.org

# Alternatives to Python: R and Julia

* R is a statistical programming environment and language with strong support from the statistics community.
* Julia is a relatively new language that has been designed from the ground up to fix some problems with Python.
* Check out a recent [Northeastern ranking](https://www.northeastern.edu/graduate/blog/most-popular-programming-languages/) of the 10 most popular languages.

# Python 2 or 3?

Python underwent a big (breaking) change from version 2 to version 3. It has recovered and is going strong.
* We'll be using Python 3 for everything
* We'll be using Colab notebooks for everything.


# Python core packages

Python has a collection of core packages that build on the basics. 

* NumPy -- efficient computation with multi-dimensional data arrays,
* Pandas -- DataFrame objects to manipulate, filter, group, and transform data.
* SciPy -- numerical integration and interpolation,
* Matplotlib -- API for creation of publication-quality plots and figures,
* Scikit-Learn -- toolkit for common machine learning algorithms,
* Jupyter -- interactive notebook environment for exploratory analysis and executable documents. 
    * Colab is Jupyter as a service with some nice enhancements, including scalable access to GPUs.

We'll be using all of these libraries extensively in this course, and others as well. But first things first...



# Basic syntax

* End-of-line terminates a statement
* Use `\` to continue a statement to the next line
* Comment lines start with `#`
* You can use triple quotes `"""` to begin and end multi-line strings (comments)
* White space matters!!!  (at the beginning of a line)
* Use indentation (at the beginning of the line) for blocks of code

In [None]:
# This is a comment
x = 42  # This line assigns the value of 42 to the variable "x"
for i in range(3):
  # The code block in the for loop is indented
  x += 1
  print(x)

print("What is x now?  A:", x)

43
44
45
What is x now?  A: 45


# Built-in scalar types

| Type        | Example        | Description                                                  |
|-------------|----------------|--------------------------------------------------------------|
| ``int``     | ``x = 1``      | integers (i.e., whole numbers)                               |
| ``float``   | ``x = 1.0``    | floating-point numbers (i.e., real numbers)                  |
| ``complex`` | ``x = 1 + 2j`` | Complex numbers (i.e., numbers with real and imaginary part) |
| ``bool``    | ``x = True``   | Boolean: True/False values                                   |
| ``str``     | ``x = 'abc'``  | String: characters or text                                   |
| ``NoneType``| ``x = None``   | Special object indicating nulls                              |

### Reference

* [05-Built-in-Scalar-Types](https://jakevdp.github.io/WhirlwindTourOfPython/05-built-in-scalar-types.html)

# An aside on numerical precision

Some of the following results may be surprising.

They arise because computers store base-10 (decimal) values as base-2 (binary) with a fixed number of bits.

As a result, some decimal values are represented with binary approximations. These types of roundoff error can accumulate and cause problems in some calculations, depending on the algorithm.



In [None]:
print(.3)
print("0.3 = {0:.17f}".format(0.3))
print("0.1 + 0.2 = {0:.17f}".format(.1 + .2))

# The "assert" statement throws an error if the test condition fails
# The next three statements run without an Error
# The test conditions are all True because of limited numerical precision
assert 0.3 == 3 / 10
assert 0.3 == 0.29999999999999999
assert 0.3 == 0.30000000000000000

# This statement throws an AssertionError if you use "==" instead of "!="
assert 0.1 + 0.2 != 0.3

0.3
0.3 = 0.29999999999999999
0.1 + 0.2 = 0.30000000000000004


# Built-in data structures

Python has several built-in compound types, which act as containers for other types:

| Type Name | Example                   |Description                            |
|-----------|---------------------------|---------------------------------------|
| ``list``  | ``[1, 2, 3]``             | Ordered collection                    |
| ``tuple`` | ``(1, 2, 3)``             | Immutable ordered collection          |
| ``dict``  | ``{'a':1, 'b':2, 'c':3}`` | Unordered (key,value) mapping         |
| ``set``   | ``{1, 2, 3}``             | Unordered collection of unique values |

Round, square, and curly brackets have distinct meanings for these collections.

### Reference 

* [06-Built-in-Data-Structures](https://jakevdp.github.io/WhirlwindTourOfPython/06-built-in-data-structures.html)

## Lists


In [None]:
# a list containing prime numbers
l = [2, 3, 5, 7]

In [None]:
# length of a list
len(l)

4

In [None]:
# append to a list
l.append(11)
l

[2, 3, 5, 7, 11]

In [None]:
# adding lists concatenates them
l = l + [13, 17]
l

[2, 3, 5, 7, 11, 13, 17]

## Beware

..of running cells out of sequence in Jupyter/Colab. It can be very confusing.

For example, try running the cell above a second time.

In [None]:
# sort() method sorts in-place
l = [2, 5, 1, 6, 3, 4]
l.sort()
l

[1, 2, 3, 4, 5, 6]

In [None]:
# Python is dynamically typed, so you can have lists elements of different types
# Note: you can't sort this list
l = [1, 'two', 3.14, [0, 3, 5]]
l

[1, 'two', 3.14, [0, 3, 5]]

## Indexing and slicing lists


<img src="https://github.com/jakevdp/WhirlwindTourOfPython/raw/master/fig/list-indexing.png" width=500>

In [None]:
# indexing starts at 0
l = [2, 3, 5, 7]
l[0]

2

In [None]:
# the last element in a list
l[-1]

7

In [None]:
# slicing -- start index (inclusive) : end index (exclusive)
l[0:3]

[2, 3, 5]

In [None]:
# you can use a step size
l[0:5:2]

[2, 5]

In [None]:
# if you leave out values, you get sensible defaults
print(l[:])
print(l[:3])
print(l[3:])
print(l[::2])

[2, 3, 5, 7]
[2, 3, 5]
[7]
[2, 5]


## Tuples

Tuples are immutable lists

In [None]:
# defined with parentheses, not square brackets
t = (2, 3, 5, 7)

In [None]:
# or no parantheses at all
t = 1, 2, 3
t

(1, 2, 3)

In [None]:
# they can be similarly unpacked
a, b, c = t
print(a, b, c)

1 2 3


In [None]:
# accessed as lists
t[:3]

(1, 2, 3)

In [None]:
# since they're immutable, the next line will throw an error
#t.append(4)

In [None]:
# they're used when a function returns multiple values
x = 0.125
x.as_integer_ratio()

(1, 8)

In [None]:
# and they can be unpacked as you might expect
numerator, denominator = x.as_integer_ratio()
denominator

8

## Dictionaries

key-value pairs -- unordered -- mutable

In [None]:
# defined with curly braces
numbers = {'one':1, 'two':2, 'three':3}
numbers['two']  # accessed with keys using square brackets

2

In [None]:
# mutable
numbers['guess'] = 5
numbers

{'guess': 5, 'one': 1, 'three': 3, 'two': 2}

In [None]:
numbers[3] = 42
numbers

{3: 42, 'guess': 5, 'one': 1, 'three': 3, 'two': 2}

## Set

unordered collection of unique values

they support mathematical operations of sets (union, intersection, etc.)

In [None]:
# defined with curly brackets
s = {1, 1, 9, 13, 3}
s

{1, 3, 9, 13}

In [None]:
# you can't access elements of a set directly
# s[0] throws an error
# but you can easily turn a set into a list
list(s)

[1, 3, 13, 9]

# Documentation

* online -- https://docs.python.org/3/library/ -- standard library
  * For example: [lists](https://docs.python.org/3/library/stdtypes.html#lists)
* `help(list)` -- terminal, Jupyter, Colab
* `list?` -- Jupyter, Colab
* `list` -- Colab (on hover)
* stackoverflow -- but be careful

In [None]:
# In Colab (as you type parentheses) online documentation is automatic.
list()

[]

In [None]:
# With Colab and Jupyter, you can execute a statement with question mark
list?

In [None]:
# You can uncomment and run the next line, which also works in a terminal (command line)
# Note: the slash `/` indicates that previous list of arguments are "positional" only
# https://www.python.org/dev/peps/pep-0436/#functions-with-positional-only-parameters
# You can find this out with careful use of stackoverflow and google
# Try googling: "python help slash as argument"
# help(list)

# Control flow

Indentation is critically important in Python!

* if-elif-else
* for loops (with iterators)

* [07-Control-Flow-Statements](https://github.com/jakevdp/WhirlwindTourOfPython/blob/master/07-Control-Flow-Statements.ipynb)



In [None]:
# if-elif-else -- note the indentation!
x = -15

if x == 0:
    print(x, "is zero")
elif x > 0:
    print(x, "is positive")
elif x < 0:
    print(x, "is negative")
else:
    print(x, "is unlike anything I've ever seen...")

-15 is negative


In [None]:
for N in [2, 3, 5, 7]:
    print(N, end=' ') # print all on same line

2 3 5 7 

In [None]:
# the range object generates a sequence in a for loop
for i in range(3, 10):
    print(i)

In [None]:
# note that range() itself is not a list
range(3,10)

range(3, 10)

In [None]:
# but you can turn it into a list
list(range(3,10))

[3, 4, 5, 6, 7, 8, 9]

In [None]:
# while continues until a condition is met
i = 5
while i < 10:
  print(i, end=" ")
  i += 1

5 6 7 8 9 

# Control flow with `continue` and `break`

`continue` will skip to the next iteration (you could achieve the same thing with `if-else`)

`break` will exit the loop entirely


In [None]:
for n in range(20):
    # if the remainder of n / 2 is 0, skip the rest of the loop
    if n % 2 == 0:
        continue
    print(n, end=' ')

1 3 5 7 9 11 13 15 17 19 

In [None]:
# Fill a list with all Fibonacci numbers up to a certain value
a, b = 0, 1
amax = 100
l = []

while True:
    (a, b) = (b, a + b)
    if a > amax:
        break
    l.append(a)

print(l)

[1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]


# Functions

`*args` and `**kwargs`





In [None]:
# print is a function
print()

In [None]:
# keyword arguments must come at the end
print(1, 2, 3, sep='--')

1--2--3


In [None]:
# Notice what happens when you leave out the keyword
print(1, 2, 3, '--')

1 2 3 --


# Defining functions

`def` and `lambda`

Use functions to create clean, readable and reusable (DRY) code.

In [None]:
def fibonacci(N):
    L = []
    a, b = 0, 1
    while len(L) < N:
        a, b = b, a + b
        L.append(a)
    return L

# Note that a & b aren't accessible outside the function
fibonacci(5)

[1, 1, 2, 3, 5]

In [None]:
# You can return multiple values
def fibonacci(N):
    L = []
    a, b = 0, 1
    while len(L) < N:
        a, b = b, a + b
        L.append(a)
    return L, a, b

fibonacci(5)

([1, 1, 2, 3, 5], 5, 8)

In [None]:
# You can define user-modifiable default values with keyword arguments
def fibonacci(N, a=0, b=1):
    L = []
    while len(L) < N:
        a, b = b, a + b
        L.append(a)
    return L

print(fibonacci(5, 6, 5))
print(fibonacci(5, 0, 1))

[5, 11, 16, 27, 43]
[1, 1, 2, 3, 5]


## flexible arguments

`*args` -- arguments (expand as sequence)

`**kwargs` -- keyword arguments (expand as a dictionary)

In [None]:
def catch_all(*args, **kwargs):
    print("args =", args)
    print("kwargs = ", kwargs)

catch_all()
catch_all(1, 2, 3, a=4, b=5)

args = ()
kwargs =  {}
args = (1, 2, 3)
kwargs =  {'a': 4, 'b': 5}


In [None]:
# variable names "args" and "kwargs" are simply conventions
inputs = [1,4,5]
keywords = {'pi': 3.24, "42": 3}

catch_all(*inputs, **keywords)

args = (1, 4, 5)
kwargs =  {'pi': 3.24, '42': 3}


## lambda functions

anonymous functions -- one-liners

In [None]:
add = lambda x, y: x + y
add(1, 2)

3

In [None]:
sorted()

In [None]:
# a sample dataset (list of dictionaries)
data = [{'first':'Guido', 'last':'Van Rossum', 'YOB':1956},
        {'first':'Grace', 'last':'Hopper',     'YOB':1906},
        {'first':'Alan',  'last':'Turing',     'YOB':1912}]

# the next line throws a TypeError
# sorted(data) 

In [None]:
# sort() has an optional key argument
# The "key" keyword alls you to specify a function that customizes the sort order
# For example, you can sort alphabetically by first name as follows
# Q: How would you sort by year of birth?
sorted(data, key=lambda item: item['first'])

[{'YOB': 1912, 'first': 'Alan', 'last': 'Turing'},
 {'YOB': 1906, 'first': 'Grace', 'last': 'Hopper'},
 {'YOB': 1956, 'first': 'Guido', 'last': 'Van Rossum'}]

# Errors and exceptions

Catch exceptions with `try` and `except`

In [None]:
# This example throws a ZeroDivisionError
# Python provides helpful information, including the line where the error occurred
1/0

ZeroDivisionError: ignored

In [None]:
# You can define a custom function that knows what to do with 1/0
# So you can catch the error before it interrupts execution of your code
def safe_divide(a, b):
    try:
        return a / b
    except:
        return 1E100

safe_divide(1,0)

1e+100

In [None]:
# But you may want to be more speific about the error you catch
print(safe_divide(1, 'hello'))

def safe_divide(a, b):
    try:
        return a / b
    except ZeroDivisionError:
        return 1E100
    except:
        return "that just doesn't make sense"

print(safe_divide(1, 'hello'))

1e+100
that just doesn't make sense


In [None]:
# You can raise your own errors
#raise RuntimeError("Ouch, that hurt!")

In [None]:
# You can create a custom error with class inheritance
class MySpecialError(ValueError):
    pass

#raise MySpecialError("here's the message")

In [None]:
# Then you can fine tune the way you treat errors of various types
try:
    print("do something")
    raise MySpecialError("[informative error message here]")
except ValueError:
    print("caught this one, but not the previous")
except MySpecialError:
    print("do something else")

do something
caught this one, but not the previous


In [None]:
# And you can also use...try-except-else-finallly
try:
    print("try something here")
except:
    print("this happens only if it fails")
else:
    print("this happens only if it succeeds")
finally:
    print("this happens no matter what")

# Iterators

Iterators are Python objects that have an `iterator` interface. A list is an iterator, and so is the `range()` object. 

Once you create a list, the entire list resides in memory. That means it's limited in size. 

Iterators are more general.

In [None]:
# As we've seen, you can iterate over a list
# The "for" statement creates an iterator from the list.
# In other words, you iterate over the list because lists have an "iterator" interface
for value in [2, 4, 6, 8, 10]:
    print(value + 1, end=' ')

3 5 7 9 11 

In [None]:
# In the previous statement, Python is actually using the
# iterator, which you can get using the built-in "iter()" function
a = iter([2, 4, 6, 8, 10])

In [None]:
# You can use the built-in `next()` function to access values from iterators
# Run this cell repeatedly to see how it works.
print(next(a))

2


In [None]:
# You cannot create a list of length `N` because it would overrun memory.
# But you can iterate over `N` values with an iterator.
N = 10 ** 12
for i in range(N):
    if i >= 10: break
    print(i, end=', ')

# Some Python iterators

* `enumerate`
* `zip`
* `map`
* `filter`

In [None]:
# For example, if you need to keep track of the index of an iterator, you can do this...
L = [2, 4, 6, 8, 10]
for i in range(len(L)):
    print(i, L[i])

0 2
1 4
2 6
3 8
4 10


In [None]:
# Or you can do the same thing more succinctly with `enumerate`
for i, val in enumerate(L):
    print(i, val)

0 2
1 4
2 6
3 8
4 10


In [None]:
# `zip` lets you iterate over multiple lists
L = [2, 4, 6, 8, 10]
R = [3, 6, 9, 12, 15]
for lval, rval in zip(L, R):
    print(lval, rval)

2 3
4 6
6 9
8 12
10 15


In [None]:
# Q: And what if you want to keep track of the index using zip?
L = [2, 4, 6, 8, 10]
R = [3, 6, 9, 12, 15]
for i, (lval, rval) in enumerate(zip(L, R)):
    print(i, lval, rval)


0 2 3
1 4 6
2 6 9
3 8 12
4 10 15


In [None]:
# the `map` iterator applies a function to an iterator
# find the first 10 square numbers
square = lambda x: x ** 2
for val in map(square, range(10)):
    print(val, end=' ')

0 1 4 9 16 25 36 49 64 81 

In [None]:
# the `filter` iterator is similar, 
# except it passes values for which the function evalutes to true
is_even = lambda x: x % 2 == 0
for val in filter(is_even, range(10)):
    print(val, end=' ')

0 2 4 6 8 

## iterators as function arguments

In [None]:
# you can also pass iterators as function arguments
# using "*args" notation to expand the iterator
print(*range(10))

0 1 2 3 4 5 6 7 8 9


In [None]:
# likewise, here's a more concise implementation of the map cell above
print(*map(lambda x: x ** 2, range(10)))

0 1 4 9 16 25 36 49 64 81


In [None]:
# You can do the same thing with the zip iterator
# Note that this prints a sequence of 5 2-element tuples
L1 = (1, 2, 3, 4)
L2 = ('a', 'b', 'c', 'd')
z = zip(L1, L2)
print(*z)

(1, 'a') (2, 'b') (3, 'c') (4, 'd')


In [None]:
# You can apply zip again to write a sequence of 2 5-element tuples 
# This effectively unzips the zip iterator
z = zip(L1, L2)
for a in zip(*z):
  print(a)

z = zip(L1, L2)
print('more succinctly:', *zip(*z))

(1, 2, 3, 4)
('a', 'b', 'c', 'd')
more succinctly: (1, 2, 3, 4) ('a', 'b', 'c', 'd')


## itertools has some other iterators

* permutations
* combinations
* product

In [None]:
from itertools import permutations
p = permutations(range(3))
print(*p)

(0, 1, 2) (0, 2, 1) (1, 0, 2) (1, 2, 0) (2, 0, 1) (2, 1, 0)


In [None]:
from itertools import combinations
c = combinations(range(4), 2)
print(*c)

(0, 1) (0, 2) (0, 3) (1, 2) (1, 3) (2, 3)


In [None]:
from itertools import product
p = product('ab', range(3))
print(*p)

('a', 0) ('a', 1) ('a', 2) ('b', 0) ('b', 1) ('b', 2)


# List comprehensions

Readable one-line iteration

`[expr for var in iterable]`

In [None]:
# The verbose way
L = []
for n in range(12):
    L.append(n ** 2)
L

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121]

In [None]:
# The succinct way
[n ** 2 for n in range(12)]

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121]

In [None]:
# multiple iterators on one line
[(i, j) for i in range(2) for j in range(3)]

[(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2)]

In [None]:
# conditionals on the iterator
[val for val in range(20) if val % 3 > 0]

[1, 2, 4, 5, 7, 8, 10, 11, 13, 14, 16, 17, 19]

In [None]:
# conditionals on the value
[val if val % 2 else -val for val in range(20) if val % 3 > 0]

[1, -2, -4, 5, 7, -8, -10, 11, 13, -14, -16, 17, 19]

## set and dictionary comprehensions

In [None]:
# curly braces for set comprensions
{n**2 for n in range(12)}

{0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121}

In [None]:
# with a colon for dict comprehensions
{n:n**2 for n in range(6)}

In [None]:
# use parentheses to get a generator, i.e., a list iterator
(n**2 for n in range(6))

<generator object <genexpr> at 0x7f7180b7f850>

In [None]:
list((n**2 for n in range(6)))

[0, 1, 4, 9, 16, 25]

# Generators

If a list is a collection of values -- this consumes memory.

A generator is a recipe for creating values -- the values are created when needed.

Lists and generators both expose the same iterator interface.

In [None]:
# Generators can continue indefinitely.  For example...
factors = [2, 3, 5, 7]
G = (i for i in count() if all(i % n > 0 for n in factors))
for val in G:
    print(val, end=' ')
    if val > 40: break

In [None]:
# Unlike lists, generators are single-use for iteration
G = (n ** 2 for n in range(12))
print('first time:', list(G))
print('second time:', list(G))

first time: [0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121]
second time: []


In [None]:
# This might be useful if you want to stop and restart. For example, suppose
# you're processing a lot of files. You may want to pause every so often.
G = (n**2 for n in range(12))
for n in G:
    print(n, end=' ')
    if n > 30: break

print("\ndoing something in between")

for n in G:
    print(n, end=' ')

0 1 4 9 16 25 36 
doing something in between
49 64 81 100 121 

## `yield`

In [None]:
# Just as there are two ways of constructing a list
L1 = [n ** 2 for n in range(12)]

L2 = []
for n in range(12):
    L2.append(n ** 2)

print(L1)
print(L2)

In [None]:
# There are two ways of constructing a generator
G1 = (n ** 2 for n in range(12))

def gen():
    for n in range(12):
        yield n ** 2

G2 = gen()
print(*G1)
print(*G2)

0 1 4 9 16 25 36 49 64 81 100 121
0 1 4 9 16 25 36 49 64 81 100 121


In [None]:
# VanderPlas has an interesting use of generators for generating prime numbers
def gen_primes(N):
    """Generate primes up to N"""
    primes = set()
    for n in range(2, N):
        if all(n % p > 0 for p in primes):
            primes.add(n)
            yield n

print(*gen_primes(100))

2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97


In [None]:
# By the way, check out the function documentation....
gen_primes()

# remember, generators are single use

But you can always recreate the generator to start from scratch

# Modules and packages

* `import` statement

The import statement preserves a module's content in a namespace.

You can then access the contents of the module using "dot" notation.

In [None]:
# you can import an entire module
import math
math.cos(math.pi)

-1.0

In [None]:
# you can also import with an alias
import numpy as np
np.cos(np.pi)

-1.0

In [None]:
# and you can import only what you want
from math import cos, pi
cos(pi)

-1.0

In [None]:
# you can import the entire namespace, but this is dangerous!
from math import *
sin(pi) ** 2 + cos(pi) ** 2

## useful built-in modules

* `os` and `sys`: Tools for interfacing with the operating system, including navigating file directory structures and executing shell commands
* `math` and `cmath`: Mathematical functions and operations on real and complex numbers
* `itertools`: Tools for constructing and interacting with iterators and generators
* `functools`: Tools that assist with functional programming
* `random`: Tools for generating pseudorandom numbers
* `pickle`: Tools for object persistence: saving objects to and loading objects from disk
* `json` and `csv`: Tools for reading JSON-formatted and CSV-formatted files.
* `urllib`: Tools for doing HTTP and other web requests.

Reference: [Python standard library](https://docs.python.org/3/library/)

## third-party modules

Many of these will need to be "installed" before they can be "imported". 
The standard registry for importing modules
is the *Python Package Index* (PyPI), using the `pip` command. We'll be using this later in the course.

Reference: [PyPI](https://pypi.org/)

# Strings and regular expressions

* define string with single or double quotes
* use triple quotes for multi-line strings

In [None]:
# define strings with single or double quotes
x = 'a string'
y = "a string"
x == y

True

In [None]:
a = """this
is
a 
mutli-line string"""
print(a)
a

this
is
a 
mutli-line string


'this\nis\na \nmutli-line string'

## string functions

Capitalization

* `upper`
* `lower`
* `capitalize`
* `title`
* `swapcase`

Whitespace

* `strip` -- strip characters, blank by default
* `lstrip` -- strip just from the left
* `rstrip`
* `center` -- center string with prescribed length
* `ljust`
* `rjust`
* `zfill` -- left-padding with zeroes

Substrings

* `find(string, substring)` -- index of substring, -1 if it's not there
* `rfind` -- start from right
* `index` -- like find, but throws error
* `rindex`
* `replace` -- replace all ocurrences

Splitting

* `partition`
* `split`
* `splitlines`
* `join` -- undoes split with substring



In [None]:
line = 'the quick brown fox jumped over a lazy dog'
line.find('fox')

16

In [None]:
line.split()

['the', 'quick', 'brown', 'fox', 'jumped', 'over', 'a', 'lazy', 'dog']

In [None]:
# split and rejoin
"-".join(line.split())

'the-quick-brown-fox-jumped-over-a-lazy-dog'

## format strings

In [None]:
# string conversion
pi = 3.14159
str(pi)

'3.14159'

In [None]:
# string concatenation
"the value of pi is: " + str(pi)

'the value of pi is: 3.14159'

In [None]:
# format strings allow more control
"The value of pi is {0}, and 2pi is {1}".format(pi, 2 * pi)

'The value of pi is 3.14159, and 2pi is 6.28318'

In [None]:
# with explicit formatting
"pi = {0:.2f}".format(pi)

'pi = 3.14'

## regular expressions

This mini-language offers remarkable power for string manipulation, but it can seem a bit obscure.

* Reference: [Section 14 of VanderPlas](https://github.com/jakevdp/WhirlwindTourOfPython/blob/master/14-Strings-and-Regular-Expressions.ipynb)