# Python Reference Guide to Really Understanding Python and How it Works
Python is a powerful, portable, extensible, and productive language that runs on basically everything. You can use it to make a video game, perform data science, launch a rocket, or make a website. The choice is yours.

In [9]:
import this #Easter Egg on the philosophy of Python

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!


## Basics and Using Files

When you run a python program, each line gets passed through the Python Interpreter, the version of which depends on what you're using (this notebook uses Python 3.5). To be more specific, the python code is compiled into byte code, which is then sent to the Python Virtual Machine (PVM), or the execution engine of the Python Interpreter. The byte code is sometimes stored in .pyc files for later use. Unlike C or C++, there is no 'build' or 'make' step. As a result of this architecture, Python is neither as slow as traditional interpreted languages nor as fast as a traditional compiled language.

In [10]:
print('Hello World')
print(2 ** 100)

Hello World
1267650600228229401496703205376


The script above simply prints out 'Hello World'. If we start the python interactive prompt from a terminal, we don't have to use the 'print' statement, but here in Jupyter, we will only output if we print the result of the code block. The script below shows the use of the sys module, which allows you get information and interact with your computer system. In this case, we printed out sys.platform, which should change depending on what platform you are running this notebook on. Also notice the string repetition that occurs when you 'multiply' a string.

In [2]:
import sys
print(sys.platform)
x = 'spam!' * 8
print(x)

darwin
spam!spam!spam!spam!spam!spam!spam!spam!


Note the 'import' statement. You use this statement to import any and all python modules that you have available. A python module is any python file: you can use those provided or create your own. You simply have to "import filename" to get access to all of the attributes in the file. Simply call filename.attr to select the attribute you need.

In [1]:
import module1

Hello module world!


We create a simple module1.py file and imported it. All it does is print out "Hello module world!"

In [14]:
from sys import platform
print(platform)
x = 'spam!' * 8
print(x)

darwin
spam!spam!spam!spam!spam!spam!spam!spam!


Using the 'from' statement, you can pull out a specific attribute using the syntax 'from filename import attr'. You can then reference the attribute without the dot notation. Above you see the exact same code as before but we can call 'platform' instead of 'sys.platform'

## Types and Operations

In python, we do STUFF with THINGS (operations with types). Python programs can be decomposed into a hierarchy:
1. Programs consist of modules
2. Modules contain statements
3. Statements contain expressions
4. Expressions create and process objects

Python is littered with a bevy of built-in types that you should take advantage of to save time and be more efficient. They are also usually going to be much faster and more performant than anything you could make.

In [20]:
#Numbers
1234, 3.1315, 3+4j, 0b111

#Strings
'spam', "Bob's", b'a\x01c', u'sp\xc4m'

#Lists
[1, [2, 'three'], 4.5], list(range(10))

#Dictionaries
{'food': 'spam', 'taste': 'yum'}, dict(hours=10)

#Tuples
(1, 'spam', 4, 'U'), tuple('spam')

#Files
open('eggs.txt')

#Sets
set('abc'), {'a', 'b', 'c'}

#Booleans, Types, None
True, False, type, None

#Program unit types
#functions, modules, classes

#Implementation-related types
#compiled code, stack tracebacks

(True, False, type, None)

There are many other types, but these are the core data types in Python, built into the language.
Python is dynamically typed, meaning that you don't declare var types: rather, they interpreter implies the type. 
Python is also strongly typed, meaning that you can only perform string operations on a string, number operations on a number, etc.
Let's take a look at each one in depth

### Numbers

In [26]:
print(123+122)
print(1.5*4)
print(2 ** 100)

245
6.0
1267650600228229401496703205376


Numbers in python are just like in most languages: they support basic mathematical functions and come in a variety of flavours: integers, floating point, decimals, rationals, imaginary, etc.
Python 3 integers are automatically "big ints" unlike in Python 2 where that's a seperate class. As a result, you can do some pretty big calculations, but they could take a while. The code below converts a huge number to a string using the str() method, then takes the length using the len() method. That number has a lot of digits!

In [30]:
len(str(2 ** 1000000))

301030

Remember imports? The 'math' module is a great resource for advanced mathematical operations and values.

In [31]:
import math
print(math.pi)
print(math.sqrt(85))

3.141592653589793
9.219544457292887


The random module can help you generate some random numbers for your code

In [35]:
import random
print(random.random())
print(random.choice([1, 2, 3, 4]))

0.9586726154443163
1


### Strings

Strings are a sequence of one-character strings that can represent anything from words to image files bytes. Strings support sequence operations as a result. We can access parts of the string using indexes, with the first character being represented as S[0]. We can also index backwards using negatives.

In [41]:
S = 'spam'
print(len(S))
print(S[0])
print(S[1])
print(S[-1])

4
s
p
m


We can also slice and dice our strings using an index method called 'slicing'. We essentially pick out the characters from starting from an index and going to but not including an index. Omitting a value defaults to 0 on the left and the length on the right.

In [43]:
print(S[1:3])
print(S[1:])
print(S[:2])

pa
pam
sp


You can concatenate strings using '+' to create new strings

In [45]:
print(S + 'xyz')
print(S)

spamxyz
spam


However, this didn't change S... in python, strings are IMMUTABLE. This means they cannot be changed in place once they are created. Any change is reflected in a new string.
Numbers, strings, and tuples are immutable.
Lists, dictionaries and sets are mutable.

All of the operations we've done on strings so far have been simply operations you can perform on sequences, and not necesarily specific to strings. Strings also have type specific functions (as do most other objects) such as find, replace, split, etc.

In [46]:
S = 'Spam'
S.find('pa')                 # Find the offset of a substring in S

1

In [47]:
S.replace('pa', 'XYZ')       # Replace occurrences of a string in S with another
'SXYZm'

'SXYZm'

In [48]:
line = 'aaa,bbb,ccccc,dd'
line.split(',')              # Split on a delimiter into a list of substrings

['aaa', 'bbb', 'ccccc', 'dd']

In [51]:
S = 'spam'
print(S.upper())                   # Upper- and lowercase conversions
S.isalpha()                  # Content tests: isalpha, isdigit, etc.
True

SPAM


True

In [52]:
line = 'aaa,bbb,ccccc,dd\n'
line.rstrip()                # Remove whitespace characters on the right side
line.rstrip().split(',')     # Combine two operations

['aaa', 'bbb', 'ccccc', 'dd']

We can also format strings in python using a few different ways, but all boil down to have a term that serves as a placeholder for a tuple of strings, which are replaced in that order.

In [53]:
print('%s, eggs, and %s' % ('spam', 'SPAM!'))        # Formatting expression (all)

print('{0}, eggs, and {1}'.format('spam', 'SPAM!'))   # Formatting method (2.6+, 3.0+)

print('{}, eggs, and {}'.format('spam', 'SPAM!'))    # Numbers optional (2.7+, 3.1+)

spam, eggs, and SPAM!
spam, eggs, and SPAM!
spam, eggs, and SPAM!


If you ever forget methods, look up documentation or use the built in dir function, which will show all of an objects applicable attributes, then pass one to the help function to get more details.

In [54]:
dir(S)

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getnewargs__',
 '__gt__',
 '__hash__',
 '__init__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rmod__',
 '__rmul__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'capitalize',
 'casefold',
 'center',
 'count',
 'encode',
 'endswith',
 'expandtabs',
 'find',
 'format',
 'format_map',
 'index',
 'isalnum',
 'isalpha',
 'isdecimal',
 'isdigit',
 'isidentifier',
 'islower',
 'isnumeric',
 'isprintable',
 'isspace',
 'istitle',
 'isupper',
 'join',
 'ljust',
 'lower',
 'lstrip',
 'maketrans',
 'partition',
 'replace',
 'rfind',
 'rindex',
 'rjust',
 'rpartition',
 'rsplit',
 'rstrip',
 'split',
 'splitlines',
 'startswith',
 'strip',
 'swapcase',
 'title',
 'translate',
 'upper',
 'zfill']

In [55]:
help(S.replace)

Help on built-in function replace:

replace(...) method of builtins.str instance
    S.replace(old, new[, count]) -> str
    
    Return a copy of S with all occurrences of substring
    old replaced by new.  If the optional argument count is
    given, only the first count occurrences are replaced.



Several special characters exist, denoted by a backslash (backslash escapes characters as well)

In [64]:
print("hello\n\t") #\n = newline \t = tab
print("A\0B\0C") #binary zero byte

hello
	
A B C


Single quotes or double? Use either, it just means the other one doesn't have to be escaped in the string. Use """ to create multi string lines which is useful for XML or HTML embedded in your python code.

In [67]:
msg = """
aaaaaaaaaaaaa
bbb'''bbbbbbbbbb""bbbbbbb'bbbb
cccccccccccccc
"""
print(msg)


aaaaaaaaaaaaa
bbb'''bbbbbbbbbb""bbbbbbb'bbbb
cccccccccccccc



In python 3, all strings handle unicode just fine. We can also work with byte strings

In [68]:
print('sp\xc4m')                   # 3.X: normal str strings are Unicode text
'spÄm'
print(b'a\x01c') # bytes strings are byte-based data

spÄm
b'a\x01c'


Phew! Strings have a lot to keep in mind! They are complicated, but that's because they are so basic and integral to the very language. Before we move on, one more thing: pattern matching. Using the re library, we can pick out substrings based on regex.

In [69]:
import re
match = re.match('Hello[ \t]*(.*)world', 'Hello    Python world')
match.group(1)

'Python '

If you know regex, great. If not, don't worry about it in the scope of learning python. When you need it, research it. It's a whole thing of its own.

### Lists

Lists are the most general sequence in python. They are ordered collections of any type objects, and don't have a fixed size. They are mutable. They support the index operations that strings use.

In [95]:
L = [123, 'spam', 1.23]
print(len(L))
print(L[0])
print(L[:-1])
print(L + [4,5,6]) #Note that this doesn't change L
print(L)

3
123
[123, 'spam']
[123, 'spam', 1.23, 4, 5, 6]
[123, 'spam', 1.23]


Unlike many other languages, lists are mutable, and can be any size. The append method expands the list's size and adds to the list. Pop removes an item at the given index. 'insert' will insert at an index, 'remove' lets you remove by value, 'extend' adds multiplte items. Most list methods change the list in place, instead of creating a new one.

In [97]:
L.append('NI')
print(L)
L.pop(2)
print(L)
L.insert(3, 'MI')
print(L)

[123, 'spam', 'NI', 3, 'NI']
[123, 'spam', 3, 'NI']
[123, 'spam', 3, 'MI', 'NI']


In [87]:
M = ['bb', 'aa', 'cc']
M.sort()
print(M)
M.reverse()
print(M)

['aa', 'bb', 'cc']
['cc', 'bb', 'aa']


Despite all this flexibility, we can't pick or assign to an index outside of the current length of the list.

We can also create lists of lists, or nest lists. We can use this to create matrices.

In [99]:
M = [[1, 2, 3],               # A 3 × 3 matrix, as nested lists
         [4, 5, 6],               # Code can span lines if bracketed
         [7, 8, 9]]
print(M)
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
print(M[0][1])

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
2


List comprehension is a great way to write simple, elegant code. They are written within square brackets and essentially loop a term over a dataset. Refering to the matrix above:

In [100]:
[row[1] + 1 for row in M]   

[3, 6, 9]

In [101]:
[row[1] for row in M if row[1] % 2 == 0]

[2, 8]

In [103]:
[M[i][i] for i in [0, 1, 2]] 

[1, 5, 9]

In [102]:
[c * 2 for c in 'spam']  

['ss', 'pp', 'aa', 'mm']

In [105]:
[[x, x / 2, x * 2] for x in range(-6, 7, 2) if x > 0]

[[2, 1.0, 4], [4, 2.0, 8], [6, 3.0, 12]]

As you can see, these can get quite complex, and are very powerful.

Enclosing comprehension syntax can allow you to create other structures as well, such as generators, sets, and dictionaries.

In [112]:
G = (sum(row) for row in M)   
print(next(G))
print(next(G))
print(next(G))

6
15
24


In [113]:
{ord(x) for x in 'spaam'}

{97, 109, 112, 115}

In [114]:
{x: ord(x) for x in 'spaam'} 

{'a': 97, 'm': 109, 'p': 112, 's': 115}

### Dictionaries

Dictionaries are not sequences: rather, they are mappings. They store data with a key rather than a position. They are mutable. We can access values in a dictionary using the key.

In [117]:
D = {'food': 'Spam', 'quantity': 4, 'color': 'pink'}
D['food']

'Spam'

Dictionaries are usually built piece by piece, not all at once. When you add a new key, it is created.

In [119]:
D = {}
D['name'] = 'Bob'      # Create keys by assignment
D['job']  = 'dev'
D['age']  = 40
D

{'age': 40, 'job': 'dev', 'name': 'Bob'}

We can also use the dict function

In [120]:
bob1 = dict(name='Bob', job='dev', age=40)
bob1

{'age': 40, 'job': 'dev', 'name': 'Bob'}

In [122]:
bob2 = dict(zip(['name', 'job', 'age'],['Bob', 'dev', 40]))
bob2

{'age': 40, 'job': 'dev', 'name': 'Bob'}

We can also nest dictionaries

In [124]:
rec = {'name': {'first': 'Bob', 'last': 'Smith'},
           'jobs': ['dev', 'mgr'],
           'age':  40.5}
rec

{'age': 40.5,
 'jobs': ['dev', 'mgr'],
 'name': {'first': 'Bob', 'last': 'Smith'}}

While new keys will be created when added, we can't access non-existent keys. That's why key test are very important when accessing large sets.

In [125]:
'F' in D

False

In [126]:
if not 'f' in D:
    print('missing')

missing


Since dictionaries are unordered, we can turn them into lists if we want to access things in a sequential way. We can also use 'sorted' as a shortcut.

In [132]:
Ks = list(D.keys()) #keys returns a list of the dicts keys
Ks.sort()
for key in Ks:
    print(key, '=>', D[key])

age => 40
job => dev
name => Bob


In [134]:
for key in sorted(D):
    print(key, '=>', D[key])

age => 40
job => dev
name => Bob


Any object in python that is stored physically in memory, or generatable one item at a time is iterable. Generators, lists, dictionaries: all these things can be iterated through with the (sometimes hidden) iter function. Ultimately, for loops, comprehensions, etc are just leveraging iter.

### Tuples

An immutable list. No, seriously, that's it. They can be as long as you like. Use them for fixed lists of items. Can't append or change or remove. That's about it. Use them as an integrity constraint.

In [137]:
T = (1,2,3,4)
T[0] = 2 #returns an error

TypeError: 'tuple' object does not support item assignment

### Files

Files represent just that: any file on your system. Use the "open" function to access a file and bring it into your program. 'w' indicates a write action, 'r' is the default read option. The best way to read is with an iterator as demonstrated in the third block. You always get text.

In [142]:
f = open('data.txt', 'w')
f.write('Hello\n') 
f.close()   

In [144]:
f = open('data.txt')  
text = f.read()  
text

'Hello\n'

In [145]:
for line in open('data.txt'): print(line)

Hello



Usually you'll work with text files, but on occasion you'll have to work with binary files which in Python 3 are handled with the 'b' modifier in open.

In [146]:
import struct
packed = struct.pack('>i4sh', 7, b'spam', 8)     # Create packed binary data
print(packed)                                         # 10 bytes, not objects or text
file = open('data.bin', 'wb')                    # Open binary output file
file.write(packed)                               # Write packed binary data
file.close()

b'\x00\x00\x00\x07spam\x00\x08'


In [147]:
data = open('data.bin', 'rb').read()
struct.unpack('>i4sh', data) 

(7, b'spam', 8)

Working with non ASCII can be tricky, but python 3 makes it easier.

In [150]:
S = 'sp\xc4m'                                          # Non-ASCII Unicode text
print(S)
print(S[2])                                                   # Sequence of characters
'Ä'
file = open('unidata.txt', 'w', encoding='utf-8')      # Write/encode UTF-8 text
file.write(S)                                          # 4 characters written
file.close()

text = open('unidata.txt', encoding='utf-8').read()    # Read/decode UTF-8 text
print(text)

spÄm
Ä
spÄm


In [153]:
raw = open('unidata.txt', 'rb').read()                 # Read raw encoded bytes
raw
len(raw)                                               # Really 5 bytes in UTF-8

5

In [156]:
print(text.encode('utf-8'))                               # Manual encode to bytes
print(raw.decode('utf-8'))                                    # Manual decode to str

b'sp\xc3\x84m'
spÄm


### Other Core Types

Sets are not mappings or sequences, rather an immutable collection of unique values.

In [158]:
X = set('spam')            
Y = {'h', 'a', 'm'}    

In [161]:
X, Y

({'a', 'm', 'p', 's'}, {'a', 'h', 'm'})

In [162]:
X & Y #intersection

{'a', 'm'}

In [163]:
X | Y #Union

{'a', 'h', 'm', 'p', 's'}

In [164]:
X - Y #Difference

{'p', 's'}

In [165]:
X > Y #Superset

False

In [166]:
print(list(set([1, 2, 1, 3, 1])))     # Filtering out duplicates (possibly reordered)
print(set('spam') - set('ham'))       # Finding differences in collections
print(set('spam') == set('asmp'))    # Order-neutral equality ('spam'=='asmp' False)


[1, 2, 3]
{'p', 's'}
True


Other types include floating point, decimal numbers, fraction numbers, and Booleans, as well as a None object. There is even a 'type' type, brought about by the type function that gives you the type of the object you pass to it

In [167]:
type([])

list

In [168]:
type(type([]))

type

But no one likes type testing... that's one of the great flexibilities of python.

Of course, Python is an object oriented program. So you can make your own classes! These are your own types, that can represent whatever you want.

In [169]:
class Worker:
         def __init__(self, name, pay):          # Initialize when created
             self.name = name                    # self is the new object
             self.pay  = pay
         def lastName(self):
             return self.name.split()[-1]        # Split string on blanks
         def giveRaise(self, percent):
             self.pay *= (1.0 + percent)         # Update pay in place

We will cover more about classes later...

### Numerics

Python has a large number of numeric types that can be used for all sorts of applications
- Integer and floating-point objects
- Complex number objects
- Decimal
- Fraction
- Sets
- Booleans
- built-in and module functions
- Expressions
- Third-party extensions.

1234, −24, 0, 99999999999999    Integers (unlimited size)

1.23, 1., 3.14e-10, 4E210, 4.0e+210    Floating-point numbers

0o177, 0x9ff, 0b101010    Octal, hex, and binary literals in 3.X

0177, 0o177, 0x9ff, 0b101010 Octal, octal, hex, and binary literals in 2.X

3+4j, 3.0+4.0j, 3J          Complex number literals

set('spam'), {1, 2, 3, 4}    Sets: 2.X and 3.X construction forms

Decimal('1.0'), Fraction(1, 3)   Decimal and fraction extension types

bool(X), True, False     Boolean type and constants

In Python 3, there is just one integer which encompasses the long category (large)

Python has many built in operators that are not functions, but more like keywords
yield x

Generator function send protocol

lambda args: expression

Anonymous function generation

x if y else z

Ternary selection (x is evaluated only if y is true)

x or y

Logical OR (y is evaluated only if x is false)

x and y

Logical AND (y is evaluated only if x is true)

not x

Logical negation

x in y, x not in y

Membership (iterables, sets)

x is y, x is not y

Object identity tests

x < y, x <= y, x > y, x >= y

x == y, x != y

Magnitude comparison, set subset and superset;

Value equality operators

x | y

Bitwise OR, set union

x ^ y

Bitwise XOR, set symmetric difference

x & y

Bitwise AND, set intersection

x << y, x >> y

Shift x left or right by y bits

x + y

x – y

Addition, concatenation;

Subtraction, set difference

x * y

x % y

x / y, x // y

Multiplication, repetition;

Remainder, format;

Division: true and floor

−x, +x

Negation, identity

˜x

Bitwise NOT (inversion)

x ** y

Power (exponentiation)

x[i]

Indexing (sequence, mapping, others)

x[i:j:k]

Slicing

x(...)

Call (function, method, class, other callable)

x.attr

Attribute reference

(...)

Tuple, expression, generator expression

[...]

List, list comprehension

{...}

Dictionary, set, set and dictionary comprehensions

Note: Python follows operator precedence in increasing order. Each of the operators above are sorted by precedence. When it comes to math, its classic order of operations. You can use parenthesis to override this.

In [182]:
3 + 3.15

6.15

In the case above, we see python's practice of converting up to the most complicated type... in this case we have an integer added to a float. The result is a float.
integer -> float -> complex

Let's write some code

In [184]:
#We create the variables here... no need to name ahead of time
a = 3
b = 4
print(a+1, a-1)
print(2.0**b)

4 2
16.0


When doing comparisons, you can chain tests

In [185]:
x = 1
y = 2
z = 3
x < y < z

True

Always remember the limitations of floating point... you can't use an unlimited number of bits, so it can sometimes result in weird answers!

In [186]:
1.1 + 2.2 == 3.3 #?????!!!!!

False

In [187]:
1.1 + 2.2

3.3000000000000003

In 3.X, the / now always performs true division, returning a float result that includes any remainder, regardless of operand types. The // performs floor division, which truncates the remainder and returns an integer for integer operands or a float if any operand is a float.

In [191]:
(5 / 2), (5 / 2.0), (5 / -2.0), (5 / -2)        #3.X true division

(5 // 2), (5 // 2.0), (5 // -2.0), (5 // -2)    # 3.X floor division

(9 / 3), (9.0 / 3), (9 // 3), (9 // 3.0)        # Both


(3.0, 3.0, 3, 3.0)

Complex numbers... two floats added, last with a j. Next

We can also use binary, hex, or oct to specify numbers. It's important to remember however that in memory its the same as if it was written as a decimal number. Learn these yourself... not worth going over.

Don't forget your bitwise operators... this is where the binary representation can come in handy.

In [192]:
X = 0b0001          # Binary literals
X << 2              # Shift left
4
bin(X << 2)         # Binary digits string
'0b100'
bin(X | 0b010)      # Bitwise OR: either
'0b11'
bin(X & 0b1)        # Bitwise AND: both
'0b1'

'0b1'

The math library contains a bunch of useful functions. You can take absolute powers, powers, sin, sqrt, etc.

In [194]:
import math
print(math.pi, math.e)
math.sin(2*math.pi/180)

3.141592653589793 2.718281828459045


0.03489949670250097

There is also the decimal type... this solves the problem of inaccuracy in floats, and is constructed. It has a set precision. For example:

In [195]:
0.3-0.1-0.1-0.1 #Should be 0, but its not

-2.7755575615628914e-17

In [198]:
import decimal
Decimal('0.1') + Decimal('0.1') + Decimal('0.1') - Decimal('0.3')

Decimal('0.0')

In [199]:
decimal.getcontext().prec = 4 #Set decimal precision.

Also there is the fraction type.

In [201]:
from fractions import Fraction
x = Fraction(1, 3)                    # Numerator, denominator
y = Fraction(4, 6)  
print(x,y)
Fraction('.25')

1/3 2/3


Fraction(1, 4)

Us the .as_integer_ratio() function to get a Fraction from any number

In [203]:
(2.5).as_integer_ratio()               # float object method

(5, 2)

Sets are another great mathematical type... there can only be one of each entry.


In [205]:
x = set('xyz')
y = set('abcz')
x - y

{'x', 'y'}

In [206]:
{1, 2, 3, 4} #Also create them this way

{1, 2, 3, 4}

However, it is important to remember that {} is still an empty dictionary. Empty sets must be declared as set(). Sets have some functions... add, union, intersection, etc.

By definition, python sets can only hold immutable objects. Numbers, strings yes. Dictionaries lists, no. In python 3 we can also do set comprehension like with lists.

Sets can be used for math, but also for filtering duplicates, finding differences,  order neutral equality tests, navigating graphs, finding common data in datasets.



FYI Booleans are just 0 and 1 with a True False skin. Hence:

In [207]:
True + 4

5

In [208]:
False + 4

4

Of course there are also many third-party math tools that have spawned their own books, like Numpy and pandas. Check them out.

## Dynamic Typing

Something that's important to understand in python is how it performs typing. In C and Java, we have to declare variable types, while in Python we do not. This gives us some great flexibility but at the same time we have to be mindful to ensure that we don't get any unintended results.

Variables in python are created when they are assigned a value.

In [210]:
a = 3
a = 'spam'
a = [1, 2, 3]

In this example, a now exists, but 'a' as a variable does not have a type. It's value is 3, which has the type of integer, but a could be reassigned to any other types. The notion of type exists solely within the object... variables have no information tied to them related to type. Hence in the above example we can reassign a to whatever object we want... it simply acts as a pointer to different locations in memory. You must assign a variable before you reference it in a function.

Objects have two headers: a type header and reference header. The type header tells you what type the object is: int, list, string, etc. The reference header counts the number of references pointing to the object... as soon as this number drops to 0, and no variables reference that object, the memory that it takes up is reclaimed for other objects! This is called garbage collection.

In [211]:
a = 3
b = a
a = 'spam'
print(b)

3


The above example reinforces the language's reference passing model: a references a space in memory that holds 3. When we make b = a, it ALSO points to that same area in memory. When we assign a to 'spam', it simply points a to a new area in memory. b continues to point to 3. This is in part because 3 is immutable... you can't change it if you wanted to.

In [212]:
a = 3
b = a
a = a + 2
print(a)
print(b)

5
3


As you can see, setting a = a + 2 doesn't increment 3 to 5. You can't do that! Instead, a now points to a new area in memory that has a 5. b still points to 3. This makes side effects in Python impossible with immutable objects.

With mutable objects that allow changes in place, its not that simple. We do see side effects.

In [213]:
L1 = [2, 3, 4]        # A mutable object
L2 = L1               # Make a reference to the same object
L1[0] = 24            # An in-place change

print(L1)                    # L1 is different
print(L2)                    # But so is L2!

[24, 3, 4]
[24, 3, 4]


As you can see, we modified both lists, because they both reference the same area in memory and we changed one of the indecies by overwriting the old value. If we want to avoid this, simple create a copy instead of reference.

In [214]:
L1 = [2, 3, 4]
L2 = L1[:]            # Make a copy of L1 (or list(L1), copy.copy(L1), etc.)
L1[0] = 24

print(L1)
print(L2)                    # L2 is not changed

[24, 3, 4]
[2, 3, 4]


Python doesn't always reclaim memory... some small strings and integers are stored in a system cache for efficiency's sake. This reference system leads to a distinction in what it means to be equal. There are two tests: == tests for value equality, while 'is' tests for object equality (are both variables pointing to the same object.

In [215]:
L = [1, 2, 3]
M = L                 # M and L reference the same object
print(L == M)                # Same values
print(L is M)                # Same objects

True
True


In [216]:
L = [1, 2, 3]
M = [1, 2, 3]         # M and L reference different objects
print(L == M)                # Same values
print(L is M)                # Different objects

True
False


## Strings

Let's get more in depth with strings! Python technically has three string types: str is used for Unicode text including ASCII, bytes is used for binary data, and bytearray is a mutable version of bytes.

We will just focus on str here. Functionally, a string can represent anything. In addition, strings are immutable sequences. Empty strings are represented as empty quotes.

S = ''

Empty string

S = "spam's"

Double quotes, same as single

S = 's\np\ta\x00m'

Escape sequences

S = """...multiline..."""

Triple-quoted block strings

S = r'\temp\spam'

Raw strings (no escapes)

B = b'sp\xc4m'

Byte strings in 2.6, 2.7, and 3.X (Chapter 4, Chapter 37)

U = u'sp\u00c4m'

Unicode strings in 2.X and 3.3+ (Chapter 4, Chapter 37)

S1 + S2

S * 3

Concatenate, repeat

S[i]

S[i:j]

len(S)

Index, slice, length

"a %s parrot" % kind

String formatting expression

"a {0} parrot".format(kind)

String formatting method in 2.6, 2.7, and 3.X

S.find('pa')

**Search**

S.rstrip()

**remove whitespace**

S.replace('pa', 'xx')

**replacement**

S.split(',')

**split on delimiter**

S.isdigit()

**content test**

S.lower()

**case conversion**

S.endswith('spam')

**end test**

'spam'.join(strlist)

**delimiter join**

S.encode('latin-1')

**Unicode encoding**

B.decode('utf8')

**Unicode decoding, etc. (see Table 7-3)**


for x in S: print(x)

'spam' in S

[c * 2 for c in S]

map(ord, S)

**Iteration, membership**

re.match('sp(.*)am', line)

**Pattern matching: library module**



Beyond the core functions, Python also has a 're' library that allows you to do a lot more.

There's a lot of way to write strings

Single quotes: 'spa"m'

Double quotes: "spa'm"

Triple quotes: '''... spam ...''', """... spam ..."""

Escape sequences: "s\tp\na\0m"

Raw strings: r"C:\new\test.spm"

Bytes literals in 3.X and 2.6+ (see Chapter 4, Chapter 37): b'sp\x01am'

Unicode literals in 2.X and 3.3+ (see Chapter 4, Chapter 37): u'eggs\u0020spam'

In [218]:
('shrubbery', "shrubbery") #double quotes and single quotes are the same

('shrubbery', 'shrubbery')

In [219]:
title = "Meaning " 'of' " Life"        # Implicit concatenation
title

'Meaning of Life'

In [220]:
'knight\'s', "knight\"s" #use escapes to embed quotes

("knight's", 'knight"s')

In [222]:
print('a\nb\tc') #\n is a new line. \t is a tab

a
b	c


\newline

**Ignored (continuation line)**

\\

**Backslash (stores one \)**

\'

**Single quote (stores ')**

\"

**Double quote (stores ")****

\a

**Bell**

\b

**Backspace**

\f

**Formfeed**

\n

**Newline (linefeed)**

\r

**Carriage return**

\t

**Horizontal tab**

\v

**Vertical tab**

\xhh

**Character with hex value hh (exactly 2 digits)**

\ooo

**Character with octal value ooo (up to 3 digits)**

\0

**Null: binary 0 character (doesn’t end string)**

\N{ id }

**Unicode database ID**

\uhhhh

**Unicode character with 16-bit hex value**

\Uhhhhhhh

**Unicode character with 32-bit hex value**

\other

**Not an escape (keeps both \ and other)**

We can use raw strings to nullify escapes, which is important when opening files.

In [224]:
myfile = open(r'C:\new\text.dat', 'w') #without the 'r' the \n would be interpreted as a new line and \t as a tab

In [225]:
 mantra = """Always look
...   on the bright
... side of life."""
mantra #newlines not interpreted, triple quotes allow multiline

'Always look\n  on the bright\nside of life.'

In [226]:
print(mantra) #print will interpret these special characters for display

Always look
  on the bright
side of life.


Let's do stuff with strings. Basic string functions include.

In [227]:
len('abc') #number of items

3

In [228]:
'abd' + 'xyz' #create a new string

'abdxyz'

In [229]:
'Ni!' * 8 #repetition

'Ni!Ni!Ni!Ni!Ni!Ni!Ni!Ni!'

In [230]:
myjob = 'hacker'
for c in myjob: print(c, end=' ')

h a c k e r 

In [232]:
'k' in myjob

True

In [233]:
'z' in myjob

False

In [234]:
'spam' in 'asasdfaspamasdfas'

True

In [236]:
S = 'spam' #of course slicing and index
print(S[0])
print(S[-1])
print(S[:3])
S[:] #top level copy

s
m
spa


'spam'