# Python Data Algorithms Quick Reference

## Table Of Contents

1. <a href="#1.-Map">Map</a>
2. <a href="#2.-Filter">Filter</a>
3. <a href="#3.-Named-Slices">Named Slices</a>
4. <a href="#4.-Zip">Zip</a>
5. <a href="#5.-itemgetter">itemgetter</a>
6. <a href="#6.-attrgetter">attrgetter</a>
7. <a href="#7.-groupby">groupby</a>
8. <a href="#8.-Generator-Expressions">Generator Expressions</a>
9. <a href="#9.-compress">compress</a>

## 1. Map

**map** applies a function to every element of a sequence and returns an iterator of elements

In [1]:
simpsons = ['homer', 'marge', 'bart']
map(len, simpsons) # returns [0, 2, 4]

#equivalent list comprehension
[len(word) for word in simpsons]

[5, 5, 4]

In [2]:
map(lambda word: word[-1], simpsons) # returns ['r','e', 't']

#equivalent list comprehension
[word[-1] for word in simpsons]

['r', 'e', 't']

## 2. Filter

**filter** returns an iterator containing the elements from a sequence for which a condition is True:

In [3]:
nums = range(5)
filter(lambda x: x % 2 == 0, nums) # returns [0, 2, 4]

# equivalent list comprehension
[num for num in nums if num % 2 == 0]

[0, 2, 4]

## 3. Named Slices

In [4]:
######    0123456789012345678901234567890123456789012345678901234567890'
record = '....................100          .......513.25   ..........'

SHARES = slice(20,32)
PRICE = slice(40,48)

cost = int(record[SHARES]) * float(record[PRICE])
cost

51325.0

## 4. Zip

In [5]:
# zip() allows you to create an iterable view over a tuple created out of two separate iterable views
prices = { 'ACME' : 45.23, 'AAPL': 612.78, 'IBM': 205.55, 'HPQ' : 37.20, 'FB' : 10.75 }

min_price = min(zip(prices.values(), prices.keys()))  #(10.75, 'FB')

max((zip(prices.values(), prices.keys())))

(612.78, 'AAPL')

**zip can only be iterated over once!**

In [6]:
prices_and_names = zip(prices.values(), prices.keys())
print(min(prices_and_names))

# running the following code would fail
#print(min(prices_and_names))

(10.75, 'FB')


In [7]:
# the indices method returns a tuple which makes the slice safe for a collection of a given length
# i.e. it protects against out-of-bounds errors.

a = slice ( 5,50, 2)
a.start

5

In [8]:
s = 'HelloWorld'
a.indices(len(s))

(5, 10, 2)

In [9]:
for i in range( *a.indices(len(s))):
    print(s[i])

W
r
d


## 5. itemgetter

In [10]:
from operator import itemgetter

In [11]:
rows = [
{'fname': 'Brian', 'lname': 'Jones', 'uid': 1003},
{'fname': 'David', 'lname': 'Beazley', 'uid': 1002},
{'fname': 'John', 'lname': 'Cleese', 'uid': 1001},
{'fname': 'Big', 'lname': 'Jones', 'uid': 1004}
]

rows_by_fname = sorted(rows, key=itemgetter('fname'))
rows_by_fname

[{'fname': 'Big', 'lname': 'Jones', 'uid': 1004},
 {'fname': 'Brian', 'lname': 'Jones', 'uid': 1003},
 {'fname': 'David', 'lname': 'Beazley', 'uid': 1002},
 {'fname': 'John', 'lname': 'Cleese', 'uid': 1001}]

In [12]:
rows_by_uid = sorted(rows, key=itemgetter('uid'))
rows_by_uid

[{'fname': 'John', 'lname': 'Cleese', 'uid': 1001},
 {'fname': 'David', 'lname': 'Beazley', 'uid': 1002},
 {'fname': 'Brian', 'lname': 'Jones', 'uid': 1003},
 {'fname': 'Big', 'lname': 'Jones', 'uid': 1004}]

In [13]:
# itemgetter() function can also accept multiple keys
rows_by_lfname = sorted(rows, key=itemgetter('lname','fname'))
rows_by_lfname

[{'fname': 'David', 'lname': 'Beazley', 'uid': 1002},
 {'fname': 'John', 'lname': 'Cleese', 'uid': 1001},
 {'fname': 'Big', 'lname': 'Jones', 'uid': 1004},
 {'fname': 'Brian', 'lname': 'Jones', 'uid': 1003}]

## 6. attrgetter

In [14]:
from operator import attrgetter

In [15]:
#used to sort objects that dont natively support comparison
class User:
    def __init__(self, user_id):
        self.user_id = user_id
    def __repr__(self):
        return 'User({})'.format(self.user_id)
        
users = [User(23), User(3), User(99)]
users

[User(23), User(3), User(99)]

In [16]:
sorted(users, key=attrgetter('user_id'))

[User(3), User(23), User(99)]

In [17]:
min(users, key=attrgetter('user_id'))

User(3)

## 7. groupby

The groupby() function works by scanning a sequence and finding sequential “runs”
of identical values (or values returned by the given key function). On each iteration, it
returns the value along with an iterator that produces all of the items in a group with
the same value.

In [19]:
from operator import itemgetter
from itertools import groupby

In [27]:
rows = [
{'address': '5412 N CLARK', 'date': '07/01/2012'},
{'address': '5148 N CLARK', 'date': '07/04/2012'},
{'address': '5800 E 58TH', 'date': '07/02/2012'},
{'address': '2122 N CLARK', 'date': '07/03/2012'},
{'address': '5645 N RAVENSWOOD', 'date': '07/02/2012'},
{'address': '1060 W ADDISON', 'date': '07/02/2012'},
{'address': '4801 N BROADWAY', 'date': '07/01/2012'},
{'address': '1039 W GRANVILLE', 'date': '07/04/2012'},
]

# important!  must sort data on key field first!
rows.sort(key=itemgetter('date'))

#iterate in groups
for date, items in groupby(rows, key=itemgetter('date')):
    print(date)
    for i in items:
        print('   %s' % i)

07/01/2012
   {'date': '07/01/2012', 'address': '5412 N CLARK'}
   {'date': '07/01/2012', 'address': '4801 N BROADWAY'}
07/02/2012
   {'date': '07/02/2012', 'address': '5800 E 58TH'}
   {'date': '07/02/2012', 'address': '5645 N RAVENSWOOD'}
   {'date': '07/02/2012', 'address': '1060 W ADDISON'}
07/03/2012
   {'date': '07/03/2012', 'address': '2122 N CLARK'}
07/04/2012
   {'date': '07/04/2012', 'address': '5148 N CLARK'}
   {'date': '07/04/2012', 'address': '1039 W GRANVILLE'}


## 8. Generator Expressions

In [36]:
mylist = [1, 4, -5, 10, -7, 2, 3, -1]
positives = (n for n in mylist if n > 0)

positives

<generator object <genexpr> at 0x00000000061BF990>

In [37]:
for x in positives:
    print(x)

1
4
10
2
3


In [38]:
nums = [1, 2, 3, 4, 5]
sum(x * x for x in nums)

55

In [40]:
# Output a tuple as CSV
s = ('ACME', 50, 123.45)
','.join(str(x) for x in s)

'ACME,50,123.45'

In [43]:
# Determine if any .py files exist in a directory
import os
files = os.listdir('.')
if any(name.endswith('.py') for name in files):
    print('There be python!')
else:
    print('Sorry, no python.')

Sorry, no python.


In [48]:
# Data reduction across fields of a data structure
portfolio = [
{'name':'GOOG', 'shares': 50},
{'name':'YHOO', 'shares': 75},
{'name':'AOL', 'shares': 20},
{'name':'SCOX', 'shares': 65}
]
min(s['shares'] for s in portfolio)

20

In [49]:
s = sum((x * x for x in nums)) # Pass generator-expr as argument
s = sum(x * x for x in nums) # More elegant syntax
s

55

## 9. compress

itertools.compress() takes an iterable and an accompanying Boolean selector sequence as input. As output, it gives you all of the items in the iterable where the corresponding element in the selector is True.

In [31]:
from itertools import compress

In [32]:
addresses = [
'5412 N CLARK',
'5148 N CLARK',
'5800 E 58TH',
'2122 N CLARK'
'5645 N RAVENSWOOD',
'1060 W ADDISON',
'4801 N BROADWAY',
'1039 W GRANVILLE',
]
counts = [ 0, 3, 10, 4, 1, 7, 6, 1]


In [33]:
more5 = [n > 5 for n in counts]
more5

[False, False, True, False, False, True, True, False]

In [34]:
list(compress(addresses, more5))

['5800 E 58TH', '4801 N BROADWAY', '1039 W GRANVILLE']