# Comprehending Comprehensions

[![comprehensions](https://files.catbox.moe/r36zt3.png)](https://www.youtube.com/watch?v=qMv1ZD2V1A4)

In [1]:
# Have a list of integers and create a list of those integers squared
numbers = range(10)

squared = []
for number in numbers:
    squared.append(number ** 2)

squared

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

In [2]:
# same thing with list comprehension
[number ** 2 for number in numbers]

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

In [3]:
# much easier to write (and understand), if we pick it apart and write it on multiple lines.

[number ** 2  # any valid py expression
for number in numbers] # iteration

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

## Comprehensions

2 things for list comprehension

* any valid expression
* iteration (python loop)

In comprehension the first thing that runs is loop, second thing is expression. The result of a list comprehension is a list. we have created a new list, we can pass it as an argument to a function, or assign it to a variable.

The new list is the result of evaluating our expression on every element of the input list.

As a result, the output list will have the same number of elements as the input list.

### when use a loop, and when a comprehension?

> The big distinction between getting a new value back and having side effects.

If you have an existing list, and you want a new list, and you can describe the mapping from the first to the second, then you should use a comprehension.  

But if you are assigning repeatedly, if you are modifying repeatedly. Then use a regular for loop.

In [4]:
# we have list of integers
# we want string * between elements

integes = range(10, 40, 10)
'*'.join([str(integer) for integer in integes])

'10*20*30'

In [5]:
string = "This is a sample sentence for my tutorial"

# capitalize each word's first letter using list comprehension
' '.join([letter.capitalize() for letter in string.split()])

'This Is A Sample Sentence For My Tutorial'

### Exercises

1. Ask the user to enter a string containing numbers, separated by spaces. add those numbers together (as integers) and print the result. it's OK to use `sum` function.

In [6]:
sum(map(int, input("Numbers to add: ").split()))

Numbers to add: 10 20 30


60

2. Ask the user to enter a string, and print the length of the string, except for white space, it's not OK to use `str.replace`

In [7]:
string = input("Enter a string: ").strip()
len([str_ for str_ in string if str_ != " "])

Enter a string: hello world


10

In [8]:
[line for line in open("linux-etc-passwd.txt")]

['# This is a comment\n',
 '# You should ignore me\n',
 'root:x:0:0:root:/root:/bin/bash\n',
 'daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin\n',
 'bin:x:2:2:bin:/bin:/usr/sbin/nologin\n',
 'sys:x:3:3:sys:/dev:/usr/sbin/nologin\n',
 'sync:x:4:65534:sync:/bin:/bin/sync\n',
 'games:x:5:60:games:/usr/games:/usr/sbin/nologin\n',
 'man:x:6:12:man:/var/cache/man:/usr/sbin/nologin\n',
 'lp:x:7:7:lp:/var/spool/lpd:/usr/sbin/nologin\n',
 'mail:x:8:8:mail:/var/mail:/usr/sbin/nologin\n',
 '\n',
 '\n',
 '\n',
 'news:x:9:9:news:/var/spool/news:/usr/sbin/nologin\n',
 'uucp:x:10:10:uucp:/var/spool/uucp:/usr/sbin/nologin\n',
 'proxy:x:13:13:proxy:/bin:/usr/sbin/nologin\n',
 'www-data:x:33:33:www-data:/var/www:/usr/sbin/nologin\n',
 'backup:x:34:34:backup:/var/backups:/usr/sbin/nologin\n',
 'list:x:38:38:Mailing List Manager:/var/list:/usr/sbin/nologin\n',
 'irc:x:39:39:ircd:/var/run/ircd:/usr/sbin/nologin\n',
 'gnats:x:41:41:Gnats Bug-Reporting System (admin):/var/lib/gnats:/usr/sbin/nologin\n',
 '\n

In [9]:
# usersnames from this password file?
[line.split(":")[0] for line in open("linux-etc-passwd.txt") if ":" in line]

['root',
 'daemon',
 'bin',
 'sys',
 'sync',
 'games',
 'man',
 'lp',
 'mail',
 'news',
 'uucp',
 'proxy',
 'www-data',
 'backup',
 'list',
 'irc',
 'gnats',
 'nobody',
 'syslog',
 'messagebus',
 'landscape',
 'jci',
 'sshd',
 'user',
 'reuven',
 'postfix',
 'colord',
 'postgres',
 'dovecot',
 'dovenull',
 'postgrey',
 'debian-spamd',
 'memcache',
 'genadi',
 'shira',
 'atara',
 'shikma',
 'amotz',
 'mysql',
 'clamav',
 'amavis',
 'opendkim',
 'gitlab-redis',
 'gitlab-psql',
 'git',
 'opendmarc',
 'dkim-milter-python',
 'deploy',
 'redis']

### Exercise: Sum of numbers

Use a comprehension to read through `nums.txt` and sum the numbers it contains. Each line of the file contains either zero integer or one integer. The integer might well have white space before or after it.

In [10]:
!cat nums.txt

5                
           10
           20
           3
                        20
25
                        

In [11]:
sum(int(num) for num in open("nums.txt") if num.strip())

83

In [12]:
# how many vowels are a string?
s = "whatever"
len([letter for letter in s if letter in "aeiou"])

3

In [13]:
# how many vowels are a string?
s = "whatever"
len([letter for letter in s if letter in "aeiou"])

3

### Exercise: Shoe dicts

`shoe-data.txt` contains 100 lines, Each line contains three fields: Brand, Color, Size. Use a list comprehension to turn this into a list of dictionaries. Each line should be turned into a dict whose keys are `brand`, `color`, and `size`. The values can remain strings, don't worry about the size.

I recommend that you write an external function that takes a string as input and returns a dict, then invoke that in ur comprehension.

The result will be a list of 

```py
[
    {
        "brand": "Adidas",
        "color": "orange",
        "size": 43
    },
    ...
]
```

In [14]:
!cat shoe-data.txt

Adidas	orange	43
Nike	black	41
Adidas	black	39
New Balance	pink	41
Nike	white	44
New Balance	orange	38
Nike	pink	44
Adidas	pink	44
New Balance	orange	39
New Balance	black	43
New Balance	orange	44
Nike	black	41
Adidas	orange	37
Adidas	black	38
Adidas	pink	41
Adidas	white	36
Adidas	orange	36
Nike	pink	41
Adidas	pink	35
New Balance	orange	37
Nike	pink	43
Nike	black	43
Nike	black	42
Nike	black	35
Adidas	black	41
New Balance	pink	40
Adidas	white	35
New Balance	pink	41
New Balance	orange	41
Adidas	orange	40
New Balance	orange	40
New Balance	white	44
New Balance	pink	40
Nike	black	43
Nike	pink	36
New Balance	white	39
Nike	black	42
Adidas	black	41
New Balance	orange	40
New Balance	black	40
Nike	white	37
Adidas	black	39
Adidas	black	40
Adidas	orange	38
New Balance	orange	39
Nike	black	35
Adidas	white	39
Nike	white	37
Adidas	orange	37
Adidas	pink	35
New Balance	orange	41
Nike	pink	44
Nike	pink

In [15]:
def str_to_dict(string):
    brand, color, size = string.strip().split("\t")  
    return {
        "brand": brand,
        "color": color,
        "size": size,
    }

[str_to_dict(shoe) for shoe in open("shoe-data.txt")]

[{'brand': 'Adidas', 'color': 'orange', 'size': '43'},
 {'brand': 'Nike', 'color': 'black', 'size': '41'},
 {'brand': 'Adidas', 'color': 'black', 'size': '39'},
 {'brand': 'New Balance', 'color': 'pink', 'size': '41'},
 {'brand': 'Nike', 'color': 'white', 'size': '44'},
 {'brand': 'New Balance', 'color': 'orange', 'size': '38'},
 {'brand': 'Nike', 'color': 'pink', 'size': '44'},
 {'brand': 'Adidas', 'color': 'pink', 'size': '44'},
 {'brand': 'New Balance', 'color': 'orange', 'size': '39'},
 {'brand': 'New Balance', 'color': 'black', 'size': '43'},
 {'brand': 'New Balance', 'color': 'orange', 'size': '44'},
 {'brand': 'Nike', 'color': 'black', 'size': '41'},
 {'brand': 'Adidas', 'color': 'orange', 'size': '37'},
 {'brand': 'Adidas', 'color': 'black', 'size': '38'},
 {'brand': 'Adidas', 'color': 'pink', 'size': '41'},
 {'brand': 'Adidas', 'color': 'white', 'size': '36'},
 {'brand': 'Adidas', 'color': 'orange', 'size': '36'},
 {'brand': 'Nike', 'color': 'pink', 'size': '41'},
 {'brand': '

In [16]:
!ls *.txt

linux-etc-passwd.txt	  mini-access-log.txt  nums.txt       wcfile.txt
linux-etc-passwd.txt.txt  myconfig.txt	       shoe-data.txt


In [17]:
!head mini-access-log.txt

67.218.116.165 - - [30/Jan/2010:00:03:18 +0200] "GET /robots.txt HTTP/1.0" 200 99 "-" "Mozilla/5.0 (Twiceler-0.9 http://www.cuil.com/twiceler/robot.html)"
66.249.71.65 - - [30/Jan/2010:00:12:06 +0200] "GET /browse/one_node/1557 HTTP/1.1" 200 39208 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
65.55.106.183 - - [30/Jan/2010:01:29:23 +0200] "GET /robots.txt HTTP/1.1" 200 99 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
65.55.106.183 - - [30/Jan/2010:01:30:06 +0200] "GET /browse/one_model/2162 HTTP/1.1" 200 2181 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
66.249.71.65 - - [30/Jan/2010:02:07:14 +0200] "GET /browse/browse_applet_tab/2593 HTTP/1.1" 200 10305 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.71.65 - - [30/Jan/2010:02:10:39 +0200] "GET /browse/browse_files_tab/2499?tab=true HTTP/1.1" 200 446 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.65.12 - -

In [18]:
# retrieve all of the ip address from the file
[line.split()[0] for line in open("mini-access-log.txt")][0:10]

['67.218.116.165',
 '66.249.71.65',
 '65.55.106.183',
 '65.55.106.183',
 '66.249.71.65',
 '66.249.71.65',
 '66.249.65.12',
 '66.249.65.12',
 '66.249.65.12',
 '66.249.65.12']

In [19]:
# how many times did each ip address access my server?

from collections import Counter

# bad way to use Counter
c = Counter()
c["a"] += 5
c["b"] += 6
c

Counter({'a': 5, 'b': 6})

In [20]:
# the good way to use Counter is to initialize it with an iterable
# it will count how many times each element of that iterable is there. Each element becomes a key,
# the number times becomes the values.

c = Counter([line.split()[0] for line in open("mini-access-log.txt")])

In [21]:
# Counter inherits from dict

for key, value in c.items():
    print(f"{key:15}: {value}")

67.218.116.165 : 2
66.249.71.65   : 3
65.55.106.183  : 2
66.249.65.12   : 32
65.55.106.131  : 2
65.55.106.186  : 2
74.52.245.146  : 2
66.249.65.43   : 3
65.55.207.25   : 2
65.55.207.94   : 2
65.55.207.71   : 1
98.242.170.241 : 1
66.249.65.38   : 100
65.55.207.126  : 2
82.34.9.20     : 2
65.55.106.155  : 2
65.55.207.77   : 2
208.80.193.28  : 1
89.248.172.58  : 22
67.195.112.35  : 16
65.55.207.50   : 3
65.55.215.75   : 2


In [22]:
c.most_common(5)

[('66.249.65.38', 100),
 ('66.249.65.12', 32),
 ('89.248.172.58', 22),
 ('67.195.112.35', 16),
 ('66.249.71.65', 3)]

In [23]:
# set is faster than list, it gurantees uniqueness in their members, searching is very fast
# and all elements are hashable. Just like dict keys

usernames = {line.split(":")[0] for line in open("linux-etc-passwd.txt") if ":" in line}

In [24]:
usernames

{'amavis',
 'amotz',
 'atara',
 'backup',
 'bin',
 'clamav',
 'colord',
 'daemon',
 'debian-spamd',
 'deploy',
 'dkim-milter-python',
 'dovecot',
 'dovenull',
 'games',
 'genadi',
 'git',
 'gitlab-psql',
 'gitlab-redis',
 'gnats',
 'irc',
 'jci',
 'landscape',
 'list',
 'lp',
 'mail',
 'man',
 'memcache',
 'messagebus',
 'mysql',
 'news',
 'nobody',
 'opendkim',
 'opendmarc',
 'postfix',
 'postgres',
 'postgrey',
 'proxy',
 'redis',
 'reuven',
 'root',
 'shikma',
 'shira',
 'sshd',
 'sync',
 'sys',
 'syslog',
 'user',
 'uucp',
 'www-data'}

In [25]:
"root" in usernames

True

## Exercise: Sum of unique numbers

1. Ask the user to enter numbers, separated by whitespace
2. Print their sum, but only count each number once.

```py
> Enter numbers: 10 20 30 10 20 30
> Total is 60
```

In [26]:
sum(set(map(int, input("Enter numbers: ").strip().split())))

Enter numbers: 10 20 30 10 20 30


60

In [27]:
sum({int(number) for number in input("Enter numbers: ").strip().split()})

Enter numbers: 10 20 30 10 20 30


60

## Exercise: which shells?

Read through `linux-etc-passwd.txt` and find the different shells that are used on the system.

In [28]:
{line.strip() for line in open("linux-etc-passwd.txt")}

{'',
 '# This is a comment',
 '# You should ignore me',
 'amavis:x:116:127:AMaViS system user,,,:/var/lib/amavis:/bin/sh',
 'amotz:x:1006:1007:Amotz Lerner-Friedman,,,:/home/amotz:/bin/bash',
 'atara:x:1004:1005:Atara Lerner-Friedman,,,:/home/atara:/bin/bash',
 'backup:x:34:34:backup:/var/backups:/usr/sbin/nologin',
 'bin:x:2:2:bin:/bin:/usr/sbin/nologin',
 'clamav:x:115:126::/var/lib/clamav:/bin/false',
 'colord:x:106:116:colord colour management daemon,,,:/var/lib/colord:/bin/false',
 'daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin',
 'debian-spamd:x:111:122::/var/lib/spamassassin:/bin/sh',
 'deploy:x:1008:1011:Deploy,,,:/home/deploy:/bin/bash',
 'dkim-milter-python:x:119:130::/var/run/dkim-milter-python:/bin/false',
 'dovecot:x:108:119:Dovecot mail server,,,:/usr/lib/dovecot:/bin/false',
 'dovenull:x:109:120:Dovecot login user,,,:/nonexistent:/bin/false',
 'games:x:5:60:games:/usr/games:/usr/sbin/nologin',
 'genadi:x:1002:1003:Genadi Reznichenko,,,:/home/genadi:/bin/bash',
 'git:x:

In [29]:
[line.strip().split(":")[-1] for line in open("linux-etc-passwd.txt") if ":" in line]

['/bin/bash',
 '/usr/sbin/nologin',
 '/usr/sbin/nologin',
 '/usr/sbin/nologin',
 '/bin/sync',
 '/usr/sbin/nologin',
 '/usr/sbin/nologin',
 '/usr/sbin/nologin',
 '/usr/sbin/nologin',
 '/usr/sbin/nologin',
 '/usr/sbin/nologin',
 '/usr/sbin/nologin',
 '/usr/sbin/nologin',
 '/usr/sbin/nologin',
 '/usr/sbin/nologin',
 '/usr/sbin/nologin',
 '/usr/sbin/nologin',
 '/usr/sbin/nologin',
 '/bin/false',
 '/bin/false',
 '/bin/false',
 '/bin/bash',
 '/usr/sbin/nologin',
 '/bin/bash',
 '/bin/bash',
 '/bin/false',
 '/bin/false',
 '/bin/bash',
 '/bin/false',
 '/bin/false',
 '/bin/false',
 '/bin/sh',
 '/bin/false',
 '/bin/bash',
 '/bin/bash',
 '/bin/bash',
 '/bin/bash',
 '/bin/bash',
 '/bin/false',
 '/bin/false',
 '/bin/sh',
 '/bin/false',
 '/bin/nologin',
 '/bin/sh',
 '/bin/bash',
 '/bin/false',
 '/bin/false',
 '/bin/bash',
 '/bin/false']

In [30]:
{line.strip().split(":")[-1] for line in open("linux-etc-passwd.txt") if ":" in line}

{'/bin/bash',
 '/bin/false',
 '/bin/nologin',
 '/bin/sh',
 '/bin/sync',
 '/usr/sbin/nologin'}

In [31]:
# I have string with some words
# I want to create a dict where each word is the key and word length is the value

string = "This is a bunch of words"

string_dict = {}

for key, value in [(word, len(word)) for word in string.split()]:
    string_dict[key] = value

string_dict

{'This': 4, 'is': 2, 'a': 1, 'bunch': 5, 'of': 2, 'words': 5}

In [32]:
# invoke list of tuples and get back a dict
dict([(word, len(word)) for word in string.split()])

{'This': 4, 'is': 2, 'a': 1, 'bunch': 5, 'of': 2, 'words': 5}

In [33]:
# dict comprehension
{word : len(word) for word in string.split()}

{'This': 4, 'is': 2, 'a': 1, 'bunch': 5, 'of': 2, 'words': 5}

In [34]:
!cat myconfig.txt

a=1
b=2
c=3
d=4


In [35]:
{ line.split("=")[0] : line.split("=")[1].strip() for line in open("myconfig.txt") }

{'a': '1', 'b': '2', 'c': '3', 'd': '4'}

## Exercise: Usernames and shells

Use a dict comprehension to create a dict in which keys are usernames and the values are the shells associated with those usernames in `linux-etc-passwd.txt`

In [36]:
[(line.split(":")[0], line.split(":")[-1].strip()) for line in open("linux-etc-passwd.txt")]

[('# This is a comment\n', '# This is a comment'),
 ('# You should ignore me\n', '# You should ignore me'),
 ('root', '/bin/bash'),
 ('daemon', '/usr/sbin/nologin'),
 ('bin', '/usr/sbin/nologin'),
 ('sys', '/usr/sbin/nologin'),
 ('sync', '/bin/sync'),
 ('games', '/usr/sbin/nologin'),
 ('man', '/usr/sbin/nologin'),
 ('lp', '/usr/sbin/nologin'),
 ('mail', '/usr/sbin/nologin'),
 ('\n', ''),
 ('\n', ''),
 ('\n', ''),
 ('news', '/usr/sbin/nologin'),
 ('uucp', '/usr/sbin/nologin'),
 ('proxy', '/usr/sbin/nologin'),
 ('www-data', '/usr/sbin/nologin'),
 ('backup', '/usr/sbin/nologin'),
 ('list', '/usr/sbin/nologin'),
 ('irc', '/usr/sbin/nologin'),
 ('gnats', '/usr/sbin/nologin'),
 ('\n', ''),
 ('nobody', '/usr/sbin/nologin'),
 ('syslog', '/bin/false'),
 ('messagebus', '/bin/false'),
 ('landscape', '/bin/false'),
 ('jci', '/bin/bash'),
 ('sshd', '/usr/sbin/nologin'),
 ('user', '/bin/bash'),
 ('reuven', '/bin/bash'),
 ('postfix', '/bin/false'),
 ('colord', '/bin/false'),
 ('postgres', '/bin/bash'

In [37]:
{line.split(":")[0] : line.split(":")[-1].strip() for line in open("linux-etc-passwd.txt") if ":" in line}

{'root': '/bin/bash',
 'daemon': '/usr/sbin/nologin',
 'bin': '/usr/sbin/nologin',
 'sys': '/usr/sbin/nologin',
 'sync': '/bin/sync',
 'games': '/usr/sbin/nologin',
 'man': '/usr/sbin/nologin',
 'lp': '/usr/sbin/nologin',
 'mail': '/usr/sbin/nologin',
 'news': '/usr/sbin/nologin',
 'uucp': '/usr/sbin/nologin',
 'proxy': '/usr/sbin/nologin',
 'www-data': '/usr/sbin/nologin',
 'backup': '/usr/sbin/nologin',
 'list': '/usr/sbin/nologin',
 'irc': '/usr/sbin/nologin',
 'gnats': '/usr/sbin/nologin',
 'nobody': '/usr/sbin/nologin',
 'syslog': '/bin/false',
 'messagebus': '/bin/false',
 'landscape': '/bin/false',
 'jci': '/bin/bash',
 'sshd': '/usr/sbin/nologin',
 'user': '/bin/bash',
 'reuven': '/bin/bash',
 'postfix': '/bin/false',
 'colord': '/bin/false',
 'postgres': '/bin/bash',
 'dovecot': '/bin/false',
 'dovenull': '/bin/false',
 'postgrey': '/bin/false',
 'debian-spamd': '/bin/sh',
 'memcache': '/bin/false',
 'genadi': '/bin/bash',
 'shira': '/bin/bash',
 'atara': '/bin/bash',
 'shikma

In [38]:
# better version
{field[0] : field[-1].strip() for line in open("linux-etc-passwd.txt") if ":" in line and (field := line.split(":"))}

{'root': '/bin/bash',
 'daemon': '/usr/sbin/nologin',
 'bin': '/usr/sbin/nologin',
 'sys': '/usr/sbin/nologin',
 'sync': '/bin/sync',
 'games': '/usr/sbin/nologin',
 'man': '/usr/sbin/nologin',
 'lp': '/usr/sbin/nologin',
 'mail': '/usr/sbin/nologin',
 'news': '/usr/sbin/nologin',
 'uucp': '/usr/sbin/nologin',
 'proxy': '/usr/sbin/nologin',
 'www-data': '/usr/sbin/nologin',
 'backup': '/usr/sbin/nologin',
 'list': '/usr/sbin/nologin',
 'irc': '/usr/sbin/nologin',
 'gnats': '/usr/sbin/nologin',
 'nobody': '/usr/sbin/nologin',
 'syslog': '/bin/false',
 'messagebus': '/bin/false',
 'landscape': '/bin/false',
 'jci': '/bin/bash',
 'sshd': '/usr/sbin/nologin',
 'user': '/bin/bash',
 'reuven': '/bin/bash',
 'postfix': '/bin/false',
 'colord': '/bin/false',
 'postgres': '/bin/bash',
 'dovecot': '/bin/false',
 'dovenull': '/bin/false',
 'postgrey': '/bin/false',
 'debian-spamd': '/bin/sh',
 'memcache': '/bin/false',
 'genadi': '/bin/bash',
 'shira': '/bin/bash',
 'atara': '/bin/bash',
 'shikma

In [39]:
myList = [
    [10, 20, 25],
    [30, 35, 40, 45, 50],
    [60, 65, 70, 80, 90, 100],
    [110, 115, 120, 130, 140, 145]
]

myList

[[10, 20, 25],
 [30, 35, 40, 45, 50],
 [60, 65, 70, 80, 90, 100],
 [110, 115, 120, 130, 140, 145]]

In [40]:
# sum the integers in the nested list

# NATIVE Approach
result = 0
for sublist in myList:
    for num in sublist:
        result += num

result

1480

In [41]:
sum([num 
     for sublist in myList # outer loop
     for num in sublist # nested loop
])

1480

In [42]:
# use condition
[num 
     for sublist in myList if len(sublist) > 3
     for num in sublist
]

[30, 35, 40, 45, 50, 60, 65, 70, 80, 90, 100, 110, 115, 120, 130, 140, 145]

In [43]:
[num 
     for sublist in myList if len(sublist) > 3
     for num in sublist if num % 2
]

[35, 45, 65, 115, 145]

In [44]:
!head movies.dat

1::Toy Story (1995)::Animation|Children's|Comedy
2::Jumanji (1995)::Adventure|Children's|Fantasy
3::Grumpier Old Men (1995)::Comedy|Romance
4::Waiting to Exhale (1995)::Comedy|Drama
5::Father of the Bride Part II (1995)::Comedy
6::Heat (1995)::Action|Crime|Thriller
7::Sabrina (1995)::Comedy|Romance
8::Tom and Huck (1995)::Adventure|Children's
9::Sudden Death (1995)::Action
10::GoldenEye (1995)::Action|Adventure|Thriller


## Exercise: Movie genres

Goal: Find out what the 5 most popular movie genres are in `movies.dat` file

Use a nested comprehension to read through the file, find out appropriate fields and lines, then use `Counter` to find the most common genres.

If a movie has more than 1 genre, each should be counted once.

In [45]:
c = Counter([genre for line in open("movies.dat") for genre in line.strip().split("::")[2].split("|")])
c.most_common(5)

[('Drama', 1603),
 ('Comedy', 1200),
 ('Action', 503),
 ('Thriller', 492),
 ('Romance', 471)]

## Generator

A generator is an object that knows how to behave inside of a `for` loop -- because it is iterable

The point of generator is that it doesn't return all of its elements at once. Rather, it only returns one a time.

A generator expression works just like a list comprehension, except that instead of returning a list with all of its elements, it returns a generator object. That object can be put into a for loop (or any other iterable context), and it will only run the expression when it is asked to, typically once per iteration.

In [46]:
(x**2 for x in range(10))

<generator object <genexpr> at 0x7f2b339bc970>

In [47]:
# generator expression
"*".join(
    str(item) for item in [10, 20, 30]
)

'10*20*30'

In [48]:
# list comprehenion
"*".join(
    [str(item) for item in [10, 20, 30]]
)

'10*20*30'

In [49]:
def myGen():
    yield 10
    yield 20
    yield 30
    
myGen()

<generator object myGen at 0x7f2b339bd310>

In [50]:
for x in myGen():
    print(x)

10
20
30


In [51]:
# do the same things by writing a function that returns a generator expression
def myGen():
    return (line for line in [10, 20, 30, 40])

myGen()

<generator object myGen.<locals>.<genexpr> at 0x7f2b339bd540>

In [52]:
x = 100

for num in range(5):
    x = num * 3

x # what is x's value?

12

In [53]:
x = 100

print([x*3 for x in range(5)])

x # seems like x is unchanged

[0, 3, 6, 9, 12]


100

In [54]:
import dis

def regular_loop():
    x = 100
    
    for num in range(5):
        x = num * 3
        
dis.dis(regular_loop)

  4           0 LOAD_CONST               1 (100)
              2 STORE_FAST               0 (x)

  6           4 LOAD_GLOBAL              0 (range)
              6 LOAD_CONST               2 (5)
              8 CALL_FUNCTION            1
             10 GET_ITER
        >>   12 FOR_ITER                 6 (to 26)
             14 STORE_FAST               1 (num)

  7          16 LOAD_FAST                1 (num)
             18 LOAD_CONST               3 (3)
             20 BINARY_MULTIPLY
             22 STORE_FAST               0 (x)
             24 JUMP_ABSOLUTE            6 (to 12)

  6     >>   26 LOAD_CONST               0 (None)
             28 RETURN_VALUE


In [55]:
def comp_loop():
    x = 100
    
    print([x*3 for x in range(5)])
        
dis.dis(comp_loop)

  2           0 LOAD_CONST               1 (100)
              2 STORE_FAST               0 (x)

  4           4 LOAD_GLOBAL              0 (print)
              6 LOAD_CONST               2 (<code object <listcomp> at 0x7f2b33976a20, file "/tmp/ipykernel_1044/2847522135.py", line 4>)
              8 LOAD_CONST               3 ('comp_loop.<locals>.<listcomp>')
             10 MAKE_FUNCTION            0
             12 LOAD_GLOBAL              1 (range)
             14 LOAD_CONST               4 (5)
             16 CALL_FUNCTION            1
             18 GET_ITER
             20 CALL_FUNCTION            1
             22 CALL_FUNCTION            1
             24 POP_TOP
             26 LOAD_CONST               0 (None)
             28 RETURN_VALUE

Disassembly of <code object <listcomp> at 0x7f2b33976a20, file "/tmp/ipykernel_1044/2847522135.py", line 4>:
  4           0 BUILD_LIST               0
              2 LOAD_FAST                0 (.0)
        >>    4 FOR_ITER              