# Examples for L3-Sept16-MoreOnPython. (ENSF 519.01 Applied Data Science) 

# Resources:
## “A Whirlwind Tour of Python” (90 pages)
### By Jake VanderPlas
### http://www.oreilly.com/programming/free/files/a-whirlwind-tour-of-python.pdf
### Notebooks: https://github.com/jakevdp/WhirlwindTourOfPython

## “Python for Everybody” lectures
### https://www.py4e.com/


# slide 7
## set examples

In [None]:
primes = {2, 3, 5, 7}
odds = {1, 3, 5, 7, 9}
# union: items appearing in either
primes | odds # with an operator

In [None]:
primes.union(odds) # equivalently with a method

In [None]:
# intersection: items appearing in both
primes & odds             # with an operator
primes.intersection(odds) # equivalently with a method

In [None]:
# difference: items in primes but not in odds
primes - odds           # with an operator
primes.difference(odds) # equivalently with a method

In [None]:
# symmetric difference: items appearing in only one set
primes ^ odds                     # with an operator
primes.symmetric_difference(odds) # equivalently with a method

# Slide 13
## List Comprehensions

## Multiple Iteration
The second for expression acts as the interior index, varying the fastest in the resulting list


In [None]:
[(i, j) for i in range(2) for j in range(3)]


Can be ``extended`` to three, four, or more iterators, but at some point code ``readability`` will suffer!


## Conditionals on the Iterator

In [None]:
[val for val in range(20) if val % 3 > 0]


Construct a list of values for each value up to 20, but only if the value is not divisible by 3


In [None]:
L = []
for val in range(20):
    if val % 3:
        L.append(val)
L

## Conditionals on the Value

Similar to the single-line conditional enabled by the ``?`` operator in ``C``:
``` C
int absval = (val < 0) ? -val : val
```
Python has something very similar to this, which is most often used within ``list comprehensions``, ``lambda`` functions, and other places where a simple expression is desired:

In [None]:
val = -10
val if val >= 0 else -val

In [None]:
[val if val % 2 else -val
 for val in range(20) if val % 3]

# Other comprehensions

## set comprehension


In [None]:
{n**2 for n in range(12)}


In [None]:
{a % 3 for a in range(1000)}


## Dictionary comprehension


In [None]:
{n:n**2 for n in range(6)}


# Slide 19
## Regular Expression 

## Using ``re.search()`` like ``find()``

In [None]:
hand = open('mbox-short.txt')
for line in hand:
    line = line.rstrip()
    if line.find('From:') >= 0:
        print (line)


In [21]:
import re

hand = open('mbox-short.txt')
for line in hand:
    line = line.rstrip()
    if re.search('From:', line) :
        print(line)


From: stephen.marquard@uct.ac.za
From: louis@media.berkeley.edu
From: zqian@umich.edu
From: rjlowe@iupui.edu
From: zqian@umich.edu
From: rjlowe@iupui.edu
From: cwen@iupui.edu
From: cwen@iupui.edu
From: gsilver@umich.edu
From: gsilver@umich.edu
From: zqian@umich.edu
From: gsilver@umich.edu
From: wagnermr@iupui.edu
From: zqian@umich.edu
From: antranig@caret.cam.ac.uk
From: gopal.ramasammycook@gmail.com
From: david.horwitz@uct.ac.za
From: david.horwitz@uct.ac.za
From: david.horwitz@uct.ac.za
From: david.horwitz@uct.ac.za
From: stephen.marquard@uct.ac.za
From: louis@media.berkeley.edu
From: louis@media.berkeley.edu
From: ray@media.berkeley.edu
From: cwen@iupui.edu
From: cwen@iupui.edu
From: cwen@iupui.edu


## Using ``re.search()`` like ``startswith()``

In [22]:
hand = open('mbox-short.txt')
for line in hand:
    line = line.rstrip()
    if line.startswith('From:') :
        print(line)


From: stephen.marquard@uct.ac.za
From: louis@media.berkeley.edu
From: zqian@umich.edu
From: rjlowe@iupui.edu
From: zqian@umich.edu
From: rjlowe@iupui.edu
From: cwen@iupui.edu
From: cwen@iupui.edu
From: gsilver@umich.edu
From: gsilver@umich.edu
From: zqian@umich.edu
From: gsilver@umich.edu
From: wagnermr@iupui.edu
From: zqian@umich.edu
From: antranig@caret.cam.ac.uk
From: gopal.ramasammycook@gmail.com
From: david.horwitz@uct.ac.za
From: david.horwitz@uct.ac.za
From: david.horwitz@uct.ac.za
From: david.horwitz@uct.ac.za
From: stephen.marquard@uct.ac.za
From: louis@media.berkeley.edu
From: louis@media.berkeley.edu
From: ray@media.berkeley.edu
From: cwen@iupui.edu
From: cwen@iupui.edu
From: cwen@iupui.edu


In [24]:
import re

hand = open('mbox-short.txt')
for line in hand:
    line = line.rstrip()
    if re.search('^From:', line) :
        print(line)


From: stephen.marquard@uct.ac.za
From: louis@media.berkeley.edu
From: zqian@umich.edu
From: rjlowe@iupui.edu
From: zqian@umich.edu
From: rjlowe@iupui.edu
From: cwen@iupui.edu
From: cwen@iupui.edu
From: gsilver@umich.edu
From: gsilver@umich.edu
From: zqian@umich.edu
From: gsilver@umich.edu
From: wagnermr@iupui.edu
From: zqian@umich.edu
From: antranig@caret.cam.ac.uk
From: gopal.ramasammycook@gmail.com
From: david.horwitz@uct.ac.za
From: david.horwitz@uct.ac.za
From: david.horwitz@uct.ac.za
From: david.horwitz@uct.ac.za
From: stephen.marquard@uct.ac.za
From: louis@media.berkeley.edu
From: louis@media.berkeley.edu
From: ray@media.berkeley.edu
From: cwen@iupui.edu
From: cwen@iupui.edu
From: cwen@iupui.edu


### We fine-tune what is matched by adding special characters to the string


# Slide 27
## Matching and Extracting Data!
### re.findall() vs re.search()

In [25]:
import re
x = 'My 2 favorite numbers are 19 and 42'
y = re.findall('[0-9]+',x)
print (y)


['2', '19', '42']


In [26]:
y = re.findall('[AEIOU]+',x)
print (y)

[]


# Slide 28
## Greedy Matching

In [27]:
import re
x = 'From: Using the : character'
y = re.findall('^F.+:', x)
print(y)


['From: Using the :']


In [30]:
# +? One or more characters but not greedily

y = re.findall('^F.+?:', x)
print(y)


['From:']


# Slide 29
## String Extraction

In [31]:
x= 'From stephen.marquard@uct.ac.za Sat Jan  5 09:14:16 2008'
y = re.findall('\S+@\S+',x)
print(y)


['stephen.marquard@uct.ac.za']


In [32]:
y = re.findall('^From \S+@\S+',x)
print(y)

['From stephen.marquard@uct.ac.za']


In [33]:
y = re.findall('^From (\S+@\S+)',x)
print(y)

['stephen.marquard@uct.ac.za']


# Slide 30
## Example 1

In [35]:
# Option 1: using find() and slicing

data = 'From stephen.marquard@uct.ac.za Sat Jan  5 09:14:16 2008'
atpos = data.find('@')
print(atpos)

sppos = data.find(' ',atpos)
print(sppos)

host = data[atpos+1 : sppos]
print(host)


21
31
uct.ac.za


In [37]:
# Option 2: Double Split 

words = data.split()
email = words[1]
pieces = email.split('@')
print(pieces[1])


uct.ac.za


In [39]:
# Option 3: regex

re.findall('@([^ ]*)',data)



['uct.ac.za']

In [40]:
# Option 4: regex (more specific)

re.findall('^From .*@([^ ]*)',data)


['uct.ac.za']

# Slide 31
## Example 2

In [45]:
# reads data from mbox-short.txt and prints the maximum spam confidence it finds
import re
hand = open('mbox-short.txt')
numlist = list()
for line in hand:
    line = line.rstrip()
    stuff = re.findall('^X-DSPAM-Confidence: ([0-9.]+)', line)
    if len(stuff) != 1 :  continue
    num = float(stuff[0])
    numlist.append(num)
print('Maximum:', max(numlist))

Maximum: 0.9907


# Slide 40
## Lambda Function (sample usage)

In [47]:
data = [{'first':'Guido','last':'Van Rossum', 'YOB':1956},
        {'first':'Grace', 'last':'Hopper', 'YOB':1906},
        {'first':'Alan', 'last':'Turing', 'YOB':1912}]

# sort alphabetically by first name
sorted(data, key=lambda item: item['first'])


[{'first': 'Alan', 'last': 'Turing', 'YOB': 1912},
 {'first': 'Grace', 'last': 'Hopper', 'YOB': 1906},
 {'first': 'Guido', 'last': 'Van Rossum', 'YOB': 1956}]

In [48]:
sorted(data, key=lambda item: item['YOB'])


[{'first': 'Grace', 'last': 'Hopper', 'YOB': 1906},
 {'first': 'Alan', 'last': 'Turing', 'YOB': 1912},
 {'first': 'Guido', 'last': 'Van Rossum', 'YOB': 1956}]

# Slide 42
## Mapping Functions over Iterables: map

In [50]:
# print the first 10 square numbers
square = lambda x: x ** 2
for val in map(square, range(10)):
    print(val, end=' ')


0 1 4 9 16 25 36 49 64 81 

In [51]:
list(map(lambda x: x ** 2, range(10)))

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

# Slide 43
## Selecting Items in Iterables: filter

In [52]:
# find values up to 10 for which x % 2 is zero
is_even = lambda x: x % 2 == 0
for val in filter(is_even, range(10)):
    print(val, end=' ')


0 2 4 6 8 

In [54]:
list(filter(lambda x:not(x%2),range(10)))


[0, 2, 4, 6, 8]