# Agenda 

1. What are comprehensions?
2. List comprehensions
3. List comprehensions and files
4. Set comprehensions
5. Dict comprehensions
6. Nested comprehensions
7. Generator expressions (aka generator comprehensions)

In [1]:
# I have  a list of integers
# I want to get a list of those integers squared

numbers = range(10)

output = []

for one_number in numbers:
    output.append(one_number ** 2)

output    

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

A comprehension (a list comprehension) is perfect for when you want to get a new list based on an existing one (or any existing sequence).

- I have a list of integers
- I want a list of integers (the first list's squared)
- I know how to transform each element in the first list to the second one



In [3]:
# here's the comprehension version of the above

[one_number ** 2 for one_number in numbers]

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

What's going on here?

- We start from the `for` -- that loop runs first and foremost
- The iterable value (`numbers`, here) that we iterate over can be anything -- not just a list
- The first part of our comprehension is a Python expression -- meaning, an operation, function, or method call that returns a value
- The expression is invoked once for each element in the iterable
- The `[]` around our comprehension tell Python that we want to create a list



In [4]:
[one_number ** 2              # expression -- SELECT
 for one_number in numbers]   # iteration --  FROM 

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

If you have a list already, and you want a new list based on it, you should use a comprehension!

In [5]:
# if I have a list of strings

mylist = ['abcde', 'fg', 'hij']

# I want to get them back as a string, with '*' between them
'*'.join(mylist)

'abcde*fg*hij'

In [6]:
# what if I have a list of integers?

mylist = [10, 20, 30, 40, 50]

'*'.join(mylist)

TypeError: sequence item 0: expected str instance, int found

In [7]:
# we have an iterable of integers
# we want an iterable of strings
# we can convert from integers to strings with str()

mylist = [10, 20, 30, 40, 50]

'*'.join([str(one_number)
         for one_number in mylist])

'10*20*30*40*50'

In [8]:
[str(one_number)
         for one_number in mylist]

['10', '20', '30', '40', '50']

In [9]:
s = 'this is a sample sentence for my tutorial'

s.title()

'This Is A Sample Sentence For My Tutorial'

In [10]:
# can I do the same thing as str.title but using only str.capitalize?

s.capitalize()

'This is a sample sentence for my tutorial'

In [13]:
' '.join([one_word.capitalize()
          for one_word in s.split()])

'This Is A Sample Sentence For My Tutorial'

# Exercises:

1. Ask the user to enter a string containing numbers, separated by spaces. Add those numbers together (as integers), and print the result. It's OK to use the builtin `sum` function. We can assume that our user will only enter digits and whitespace.
2. Ask the user to enter a string, and print the length of the string, except for whitespace. Don't use `str.replace`.

In [16]:
text = input('Enter some numbers: ').strip()

sum([int(one_number)
    for one_number in text.split()])

Enter some numbers:  10 20 30 40 50


150

In [19]:
text = input('Enter a sentence: ').strip()

sum([len(one_word)
    for one_word in text.split()])

Enter a sentence:  hello out there


13

In [20]:
# it's very common (and easy) to iterate over a file in Python

# /etc/passwd

for one_line in open('/etc/passwd'):
    print(one_line)

##

# User Database

# 

# Note that this file is consulted directly only when the system is running

# in single-user mode.  At other times this information is provided by

# Open Directory.

#

# See the opendirectoryd(8) man page for additional information about

# Open Directory.

##

nobody:*:-2:-2:Unprivileged User:/var/empty:/usr/bin/false

root:*:0:0:System Administrator:/var/root:/bin/sh

daemon:*:1:1:System Services:/var/root:/usr/bin/false

_uucp:*:4:4:Unix to Unix Copy Protocol:/var/spool/uucp:/usr/sbin/uucico

_taskgated:*:13:13:Task Gate Daemon:/var/empty:/usr/bin/false

_networkd:*:24:24:Network Services:/var/networkd:/usr/bin/false

_installassistant:*:25:25:Install Assistant:/var/empty:/usr/bin/false

_lp:*:26:26:Printing Services:/var/spool/cups:/usr/bin/false

_postfix:*:27:27:Postfix Mail Server:/var/spool/postfix:/usr/bin/false

_scsd:*:31:31:Service Configuration Service:/var/empty:/usr/bin/false

_ces:*:32:32:Certificate Enrollment Service:/var/empty:/usr/bin/fal

In [21]:
[one_line
for one_line in open('/etc/passwd')]

['##\n',
 '# User Database\n',
 '# \n',
 '# Note that this file is consulted directly only when the system is running\n',
 '# in single-user mode.  At other times this information is provided by\n',
 '# Open Directory.\n',
 '#\n',
 '# See the opendirectoryd(8) man page for additional information about\n',
 '# Open Directory.\n',
 '##\n',
 'nobody:*:-2:-2:Unprivileged User:/var/empty:/usr/bin/false\n',
 'root:*:0:0:System Administrator:/var/root:/bin/sh\n',
 'daemon:*:1:1:System Services:/var/root:/usr/bin/false\n',
 '_uucp:*:4:4:Unix to Unix Copy Protocol:/var/spool/uucp:/usr/sbin/uucico\n',
 '_taskgated:*:13:13:Task Gate Daemon:/var/empty:/usr/bin/false\n',
 '_networkd:*:24:24:Network Services:/var/networkd:/usr/bin/false\n',
 '_installassistant:*:25:25:Install Assistant:/var/empty:/usr/bin/false\n',
 '_lp:*:26:26:Printing Services:/var/spool/cups:/usr/bin/false\n',
 '_postfix:*:27:27:Postfix Mail Server:/var/spool/postfix:/usr/bin/false\n',
 '_scsd:*:31:31:Service Configuration Servi

In [24]:
# can I get the usernames from this file?
# remember, the username is what comes before the first :

[one_line.split(':')[0]               # expression -- SELECT
 for one_line in open('/etc/passwd')  # iteration -- FROM
 if not one_line.startswith('#')]     # condition -- WHERE

['nobody',
 'root',
 'daemon',
 '_uucp',
 '_taskgated',
 '_networkd',
 '_installassistant',
 '_lp',
 '_postfix',
 '_scsd',
 '_ces',
 '_appstore',
 '_mcxalr',
 '_appleevents',
 '_geod',
 '_devdocs',
 '_sandbox',
 '_mdnsresponder',
 '_ard',
 '_www',
 '_eppc',
 '_cvs',
 '_svn',
 '_mysql',
 '_sshd',
 '_qtss',
 '_cyrus',
 '_mailman',
 '_appserver',
 '_clamav',
 '_amavisd',
 '_jabber',
 '_appowner',
 '_windowserver',
 '_spotlight',
 '_tokend',
 '_securityagent',
 '_calendar',
 '_teamsserver',
 '_update_sharing',
 '_installer',
 '_atsserver',
 '_ftp',
 '_unknown',
 '_softwareupdate',
 '_coreaudiod',
 '_screensaver',
 '_locationd',
 '_trustevaluationagent',
 '_timezone',
 '_lda',
 '_cvmsroot',
 '_usbmuxd',
 '_dovecot',
 '_dpaudio',
 '_postgres',
 '_krbtgt',
 '_kadmin_admin',
 '_kadmin_changepw',
 '_devicemgr',
 '_webauthserver',
 '_netbios',
 '_warmd',
 '_dovenull',
 '_netstatistics',
 '_avbdeviced',
 '_krb_krbtgt',
 '_krb_kadmin',
 '_krb_changepw',
 '_krb_kerberos',
 '_krb_anonymous',
 '_asse

In [25]:
# the file nums.txt is in the zipfile I asked you to download
# at https://files.lerner.co.il -- basic/intro Python exercise files

!cat nums.txt

5
	10     
	20
  	3
		   	20        

 25


# Exercise: Sum numbers in `nums.txt`

- Every line in `nums.txt` contains either 0 integers or 1 integer. Each integer might have some whitespace before or after it.
- Using a comprehension and `sum`, total the numbers in this file.


In [28]:
[int(one_item.strip())
for one_item in open('nums.txt')]

ValueError: invalid literal for int() with base 10: ''

In [30]:
int('5')

5

In [31]:
int('   5      ')

5

In [32]:
int('')

ValueError: invalid literal for int() with base 10: ''

In [35]:
sum([int(one_item)
for one_item in open('nums.txt')
if one_item.strip()])

83

In [36]:
!head shoe-data.txt

Adidas	orange	43
Nike	black	41
Adidas	black	39
New Balance	pink	41
Nike	white	44
New Balance	orange	38
Nike	pink	44
Adidas	pink	44
New Balance	orange	39
New Balance	black	43


# Exercise: Shoe dicts

`shoe-data.txt` contains 100 lines. Each line contains three fields (brand, color, and size) separated by tabs (`'\t'`). 

Use a list comprehension to turn this file into a list of dictionaries, where each dict has three keys -- `brand`, `color`, and `size`. The values can remain strings.

The expression on the first line of the comprehension can be any Python function, method, or operator -- including a function that you write! Here, I recommend that you write a function, `line_to_dict`, that takes a line from the function and returns a dict. The comprehension will invoke the function once per line, producing a list of dicts.

The result will look like

```python
    {'brand':'Adidas',
     'color':'black',
    'size':'45'}
    ...
```    


In [45]:
def line_to_dict(text):
    fields = text.strip().split('\t')
    return {'brand': fields[0],
            'color': fields[1],
            'size': fields[2]}

[line_to_dict(one_line)
for one_line in open('shoe-data.txt')]

[{'brand': 'Adidas', 'color': 'orange', 'size': '43'},
 {'brand': 'Nike', 'color': 'black', 'size': '41'},
 {'brand': 'Adidas', 'color': 'black', 'size': '39'},
 {'brand': 'New Balance', 'color': 'pink', 'size': '41'},
 {'brand': 'Nike', 'color': 'white', 'size': '44'},
 {'brand': 'New Balance', 'color': 'orange', 'size': '38'},
 {'brand': 'Nike', 'color': 'pink', 'size': '44'},
 {'brand': 'Adidas', 'color': 'pink', 'size': '44'},
 {'brand': 'New Balance', 'color': 'orange', 'size': '39'},
 {'brand': 'New Balance', 'color': 'black', 'size': '43'},
 {'brand': 'New Balance', 'color': 'orange', 'size': '44'},
 {'brand': 'Nike', 'color': 'black', 'size': '41'},
 {'brand': 'Adidas', 'color': 'orange', 'size': '37'},
 {'brand': 'Adidas', 'color': 'black', 'size': '38'},
 {'brand': 'Adidas', 'color': 'pink', 'size': '41'},
 {'brand': 'Adidas', 'color': 'white', 'size': '36'},
 {'brand': 'Adidas', 'color': 'orange', 'size': '36'},
 {'brand': 'Nike', 'color': 'pink', 'size': '41'},
 {'brand': '

In [46]:
def line_to_dict(text):
    brand, color, size = text.strip().split('\t')
    return {'brand': brand,
            'color': color,
            'size': size}

[line_to_dict(one_line)
for one_line in open('shoe-data.txt')]

[{'brand': 'Adidas', 'color': 'orange', 'size': '43'},
 {'brand': 'Nike', 'color': 'black', 'size': '41'},
 {'brand': 'Adidas', 'color': 'black', 'size': '39'},
 {'brand': 'New Balance', 'color': 'pink', 'size': '41'},
 {'brand': 'Nike', 'color': 'white', 'size': '44'},
 {'brand': 'New Balance', 'color': 'orange', 'size': '38'},
 {'brand': 'Nike', 'color': 'pink', 'size': '44'},
 {'brand': 'Adidas', 'color': 'pink', 'size': '44'},
 {'brand': 'New Balance', 'color': 'orange', 'size': '39'},
 {'brand': 'New Balance', 'color': 'black', 'size': '43'},
 {'brand': 'New Balance', 'color': 'orange', 'size': '44'},
 {'brand': 'Nike', 'color': 'black', 'size': '41'},
 {'brand': 'Adidas', 'color': 'orange', 'size': '37'},
 {'brand': 'Adidas', 'color': 'black', 'size': '38'},
 {'brand': 'Adidas', 'color': 'pink', 'size': '41'},
 {'brand': 'Adidas', 'color': 'white', 'size': '36'},
 {'brand': 'Adidas', 'color': 'orange', 'size': '36'},
 {'brand': 'Nike', 'color': 'pink', 'size': '41'},
 {'brand': '

In [47]:
# we can create a dict by invoking "dict" on a list of tuples 
# each tuple is a key-value pair

dict([('a',10), ('b', 20), ('c', 30)])

{'a': 10, 'b': 20, 'c': 30}

In [48]:
def line_to_dict(text):
    return dict(zip(['brand', 'color', 'size'],
                    text.strip().split('\t')))

[line_to_dict(one_line)
for one_line in open('shoe-data.txt')]

[{'brand': 'Adidas', 'color': 'orange', 'size': '43'},
 {'brand': 'Nike', 'color': 'black', 'size': '41'},
 {'brand': 'Adidas', 'color': 'black', 'size': '39'},
 {'brand': 'New Balance', 'color': 'pink', 'size': '41'},
 {'brand': 'Nike', 'color': 'white', 'size': '44'},
 {'brand': 'New Balance', 'color': 'orange', 'size': '38'},
 {'brand': 'Nike', 'color': 'pink', 'size': '44'},
 {'brand': 'Adidas', 'color': 'pink', 'size': '44'},
 {'brand': 'New Balance', 'color': 'orange', 'size': '39'},
 {'brand': 'New Balance', 'color': 'black', 'size': '43'},
 {'brand': 'New Balance', 'color': 'orange', 'size': '44'},
 {'brand': 'Nike', 'color': 'black', 'size': '41'},
 {'brand': 'Adidas', 'color': 'orange', 'size': '37'},
 {'brand': 'Adidas', 'color': 'black', 'size': '38'},
 {'brand': 'Adidas', 'color': 'pink', 'size': '41'},
 {'brand': 'Adidas', 'color': 'white', 'size': '36'},
 {'brand': 'Adidas', 'color': 'orange', 'size': '36'},
 {'brand': 'Nike', 'color': 'pink', 'size': '41'},
 {'brand': '

In [49]:
!head mini-access-log.txt

67.218.116.165 - - [30/Jan/2010:00:03:18 +0200] "GET /robots.txt HTTP/1.0" 200 99 "-" "Mozilla/5.0 (Twiceler-0.9 http://www.cuil.com/twiceler/robot.html)"
66.249.71.65 - - [30/Jan/2010:00:12:06 +0200] "GET /browse/one_node/1557 HTTP/1.1" 200 39208 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
65.55.106.183 - - [30/Jan/2010:01:29:23 +0200] "GET /robots.txt HTTP/1.1" 200 99 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
65.55.106.183 - - [30/Jan/2010:01:30:06 +0200] "GET /browse/one_model/2162 HTTP/1.1" 200 2181 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
66.249.71.65 - - [30/Jan/2010:02:07:14 +0200] "GET /browse/browse_applet_tab/2593 HTTP/1.1" 200 10305 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.71.65 - - [30/Jan/2010:02:10:39 +0200] "GET /browse/browse_files_tab/2499?tab=true HTTP/1.1" 200 446 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.65.12 - - [30/J

In [51]:
# get the IP addresses from this file

[one_line.split()[0]
for one_line in open('mini-access-log.txt')]

['67.218.116.165',
 '66.249.71.65',
 '65.55.106.183',
 '65.55.106.183',
 '66.249.71.65',
 '66.249.71.65',
 '66.249.65.12',
 '66.249.65.12',
 '66.249.65.12',
 '66.249.65.12',
 '66.249.65.12',
 '65.55.106.131',
 '65.55.106.131',
 '66.249.65.12',
 '66.249.65.12',
 '66.249.65.12',
 '66.249.65.12',
 '66.249.65.12',
 '66.249.65.12',
 '66.249.65.12',
 '65.55.106.186',
 '65.55.106.186',
 '66.249.65.12',
 '66.249.65.12',
 '66.249.65.12',
 '74.52.245.146',
 '74.52.245.146',
 '66.249.65.43',
 '66.249.65.43',
 '66.249.65.43',
 '66.249.65.12',
 '66.249.65.12',
 '66.249.65.12',
 '66.249.65.12',
 '66.249.65.12',
 '66.249.65.12',
 '65.55.207.25',
 '65.55.207.25',
 '66.249.65.12',
 '66.249.65.12',
 '66.249.65.12',
 '66.249.65.12',
 '66.249.65.12',
 '66.249.65.12',
 '66.249.65.12',
 '65.55.207.94',
 '65.55.207.94',
 '66.249.65.12',
 '65.55.207.71',
 '66.249.65.12',
 '66.249.65.12',
 '66.249.65.12',
 '98.242.170.241',
 '66.249.65.38',
 '66.249.65.38',
 '66.249.65.38',
 '66.249.65.38',
 '66.249.65.38',
 '

In [52]:
# how many times did each IP address access my server?

# one of my favorite classes in the standard library is collections.Counter

In [53]:
# here's how not to use Counter

from collections import Counter

c = Counter()
c['a'] += 5
c['b'] += 3
c['a'] += 2
c['x'] += 7

c

Counter({'a': 7, 'x': 7, 'b': 3})

In [55]:
# the real way to use Counter is to throw an iterable at it
# it then counts how many times each element appears
# the elements in the original become the keys in Counter
# the values are the number of times that each element appears

c = Counter('abcaabcdefxab')
c

Counter({'a': 4, 'b': 3, 'c': 2, 'd': 1, 'e': 1, 'f': 1, 'x': 1})

In [56]:
c.most_common()

[('a', 4), ('b', 3), ('c', 2), ('d', 1), ('e', 1), ('f', 1), ('x', 1)]

In [57]:
c.most_common(3)

[('a', 4), ('b', 3), ('c', 2)]

In [58]:
Counter([one_line.split()[0]
        for one_line in open('mini-access-log.txt')])

Counter({'66.249.65.38': 100,
         '66.249.65.12': 32,
         '89.248.172.58': 22,
         '67.195.112.35': 16,
         '66.249.71.65': 3,
         '66.249.65.43': 3,
         '65.55.207.50': 3,
         '67.218.116.165': 2,
         '65.55.106.183': 2,
         '65.55.106.131': 2,
         '65.55.106.186': 2,
         '74.52.245.146': 2,
         '65.55.207.25': 2,
         '65.55.207.94': 2,
         '65.55.207.126': 2,
         '82.34.9.20': 2,
         '65.55.106.155': 2,
         '65.55.207.77': 2,
         '65.55.215.75': 2,
         '65.55.207.71': 1,
         '98.242.170.241': 1,
         '208.80.193.28': 1})

In [60]:
c = Counter([one_line.split()[0]
        for one_line in open('mini-access-log.txt')])

for key, value in c.items():
    print(f'{key}\t{value}')

67.218.116.165	2
66.249.71.65	3
65.55.106.183	2
66.249.65.12	32
65.55.106.131	2
65.55.106.186	2
74.52.245.146	2
66.249.65.43	3
65.55.207.25	2
65.55.207.94	2
65.55.207.71	1
98.242.170.241	1
66.249.65.38	100
65.55.207.126	2
82.34.9.20	2
65.55.106.155	2
65.55.207.77	2
208.80.193.28	1
89.248.172.58	22
67.195.112.35	16
65.55.207.50	3
65.55.215.75	2


In [61]:
c = Counter([one_line.split()[0]
        for one_line in open('mini-access-log.txt')])

for key, value in c.items():
    print(f'{key}\t{value * 'x'}')

67.218.116.165	xx
66.249.71.65	xxx
65.55.106.183	xx
66.249.65.12	xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
65.55.106.131	xx
65.55.106.186	xx
74.52.245.146	xx
66.249.65.43	xxx
65.55.207.25	xx
65.55.207.94	xx
65.55.207.71	x
98.242.170.241	x
66.249.65.38	xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
65.55.207.126	xx
82.34.9.20	xx
65.55.106.155	xx
65.55.207.77	xx
208.80.193.28	x
89.248.172.58	xxxxxxxxxxxxxxxxxxxxxx
67.195.112.35	xxxxxxxxxxxxxxxx
65.55.207.50	xxx
65.55.215.75	xx


# Next up

- Set comprehensions
- Dict comprehensions
- Nested comprehensions
- Generator expressions

In [62]:
# variable scoping and comprehensions

x = 100

for i in range(10):
    x = i ** 2

x    

81

In [63]:
x = 100

[x 
 for x in range(10)]

x

100

In [64]:
def myfunc():
    return [x 
         for x in range(10)]

myfunc()


[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [65]:
import dis   # Python dis-assembler

dis.dis(myfunc)

   1           RESUME                   0

   3           LOAD_GLOBAL              1 (range + NULL)
               LOAD_CONST               1 (10)
               CALL                     1
               GET_ITER

   2           LOAD_FAST_AND_CLEAR      0 (x)
               SWAP                     2
       L1:     BUILD_LIST               0
               SWAP                     2

   3           GET_ITER
       L2:     FOR_ITER                 5 (to L3)
               STORE_FAST               0 (x)

   2           LOAD_FAST                0 (x)
               LIST_APPEND              2
               JUMP_BACKWARD            7 (to L2)

   3   L3:     END_FOR
               POP_TOP

   2   L4:     SWAP                     2
               STORE_FAST               0 (x)
               RETURN_VALUE

  --   L5:     SWAP                     2
               POP_TOP

   2           SWAP                     2
               STORE_FAST               0 (x)
               RERAISE             

In [67]:
# let's get our usernames from /etc/passwd

usernames = [one_line.split(':')[0]
 for one_line in open('/etc/passwd')
 if not one_line.startswith('#')]

In [68]:
# if I want, I can search inside of this list!

'root' in usernames

True

In [69]:
'reuven' in usernames

False

In [70]:
'nobody' in usernames

True

In [71]:
# if I could create a dict in which the keys were the usernames and the values
# were True, then I could search in the dict, and get a very fast response.

# sets are the equivalent of a dict's keys
# - we can find them very quickly
# - sets also guarantee uniqueness

In [72]:
# let's take our list of usernames and turn it into a set

set(usernames)

{'_accessoryupdater',
 '_amavisd',
 '_analyticsd',
 '_aonsensed',
 '_appinstalld',
 '_appleevents',
 '_applepay',
 '_appowner',
 '_appserver',
 '_appstore',
 '_ard',
 '_assetcache',
 '_astris',
 '_atsserver',
 '_audiomxd',
 '_avbdeviced',
 '_avphidbridge',
 '_backgroundassets',
 '_biome',
 '_calendar',
 '_captiveagent',
 '_ces',
 '_clamav',
 '_cmiodalassistants',
 '_coreaudiod',
 '_coremediaiod',
 '_coreml',
 '_corespeechd',
 '_ctkd',
 '_cvmsroot',
 '_cvs',
 '_cyrus',
 '_darwindaemon',
 '_datadetectors',
 '_demod',
 '_devdocs',
 '_devicemgr',
 '_diagnosticservicesd',
 '_diskimagesiod',
 '_displaypolicyd',
 '_distnote',
 '_dovecot',
 '_dovenull',
 '_dpaudio',
 '_driverkit',
 '_eligibilityd',
 '_eppc',
 '_findmydevice',
 '_fpsd',
 '_ftp',
 '_gamecontrollerd',
 '_geod',
 '_hidd',
 '_iconservices',
 '_installassistant',
 '_installcoordinationd',
 '_installer',
 '_jabber',
 '_kadmin_admin',
 '_kadmin_changepw',
 '_knowledgegraphd',
 '_krb_anonymous',
 '_krb_changepw',
 '_krb_kadmin',
 '_krb

In [73]:
usernames_set = set(usernames)

In [74]:
%timeit 'root' in usernames   # looking in a list

18.7 ns ± 0.0955 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)


In [75]:
%timeit 'root' in usernames_set   # looking in a set

11.9 ns ± 0.112 ns per loop (mean ± std. dev. of 7 runs, 100,000,000 loops each)


In [76]:
11.9 / 18.7

0.6363636363636364

In [77]:
# let's get our usernames from /etc/passwd

usernames = set([one_line.split(':')[0]
 for one_line in open('/etc/passwd')
 if not one_line.startswith('#')])

usernames

{'_accessoryupdater',
 '_amavisd',
 '_analyticsd',
 '_aonsensed',
 '_appinstalld',
 '_appleevents',
 '_applepay',
 '_appowner',
 '_appserver',
 '_appstore',
 '_ard',
 '_assetcache',
 '_astris',
 '_atsserver',
 '_audiomxd',
 '_avbdeviced',
 '_avphidbridge',
 '_backgroundassets',
 '_biome',
 '_calendar',
 '_captiveagent',
 '_ces',
 '_clamav',
 '_cmiodalassistants',
 '_coreaudiod',
 '_coremediaiod',
 '_coreml',
 '_corespeechd',
 '_ctkd',
 '_cvmsroot',
 '_cvs',
 '_cyrus',
 '_darwindaemon',
 '_datadetectors',
 '_demod',
 '_devdocs',
 '_devicemgr',
 '_diagnosticservicesd',
 '_diskimagesiod',
 '_displaypolicyd',
 '_distnote',
 '_dovecot',
 '_dovenull',
 '_dpaudio',
 '_driverkit',
 '_eligibilityd',
 '_eppc',
 '_findmydevice',
 '_fpsd',
 '_ftp',
 '_gamecontrollerd',
 '_geod',
 '_hidd',
 '_iconservices',
 '_installassistant',
 '_installcoordinationd',
 '_installer',
 '_jabber',
 '_kadmin_admin',
 '_kadmin_changepw',
 '_knowledgegraphd',
 '_krb_anonymous',
 '_krb_changepw',
 '_krb_kadmin',
 '_krb

In [78]:
# we can create a set via a "set comprehension" very easily
# it's the same syntax as a list comprehension, *but* we use
# {} instead of []

usernames = {one_line.split(':')[0]
 for one_line in open('/etc/passwd')
 if not one_line.startswith('#')}

type(usernames)


set

In [79]:
'nobody' in usernames

True

In [80]:
# a set is like a dictionary without values

{one_line.split()
for one_line in open('/etc/passwd')}

TypeError: unhashable type: 'list'

In [81]:
hash('a')

4025843321024852658

# Exercise: Sum unique numbers

1. Ask the user to enter numbers, separated by whitespace
2. Ignore anything that doesn't contain only digits.
3. Print the sum, but only count each number once.

Example:

    Enter numbers: 10 20 30 10 abc 20 30
    Total is 60

Hint: Check out the `str.isdigit` method -- it returns `True` if a string contains only digits.    

In [86]:
text = input('Enter numbers: ').strip()

sum({int(one_number)
for one_number in text.split()
if one_number.isdigit()})

Enter numbers:  10 20 30 10 abc 20 30


60