## List Comprehensions

In [2]:
nums = [12, 8, 21, 3, 16]
new_nums = []
for num in nums:
    new_nums.append(num + 1)
print(new_nums)


[13, 9, 22, 4, 17]


The above code is can be written in a more compact form using list comprehension. List comprehension is conceptually similar to setting up a command with a for loop. The list comprehension starts with a `[` and ends with a `]`, so everything in the middle is a command. The for loop is then written as if it were inside the command. All boils down to one linline of code:

In [3]:
nums = [12, 8, 21, 3, 16]
new_nums = [num + 1 for num in nums]
print(new_nums)

[13, 9, 22, 4, 17]


### Nested loops

List comprehension can be extended to nested loops. For example, if you want to create a list of pairs (tuples) from the numbers 0 to 2 with 6 to 8, you would do:

In [4]:
pairs_2 = [(num1, num2) for num1 in range(0, 2) for num2 in range(6, 8)]
print(pairs_2)

[(0, 6), (0, 7), (1, 6), (1, 7)]


### Exercises

In [5]:
# Create list comprehension: squares
squares = [i**2 for i in range(0,10)]
print(squares)

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]


One of the ways in which lists can be used are in representing multi-dimension objects such as matrices. Matrices can be represented as a list of lists in Python. For example a 5 x 5 matrix with values `0` to `4` in each row can be written as:

```ptyhon
matrix = [[0, 1, 2, 3, 4],
          [0, 1, 2, 3, 4],
          [0, 1, 2, 3, 4],
          [0, 1, 2, 3, 4],
          [0, 1, 2, 3, 4]]
```

Our task is to recreate this matrix by using nested listed comprehensions. Recall that you can create one of the rows of the matrix with a single list comprehension. To create the list of lists, you simply have to supply the list comprehension as the __output expression__ of the overall list comprehension:

`[`[output expression] `for` iterator variable `in` iterable`]`

Note that here, the output expression is itself a list comprehension.

### Instructions
* In the inner list comprehension - that is, the output expression of the nested list comprehension - create a list of values from `0` to `4` using `range()`. Use `col` as the iterator variable.
* In the iterable part of your nested list comprehension, use `range()` to count `5` rows - that is, create a list of values from `0` to `4`. Use `row` as the iterator variable; note that you won't be needing this to create values in the list of lists.

In [1]:
# Create a 5 x 5 matrix using a list of lists: matrix
matrix = [[col for col in range(0,5)] for row in range(0,5)]

# Print the matrix
for row in matrix:
    print(row)


[0, 1, 2, 3, 4]
[0, 1, 2, 3, 4]
[0, 1, 2, 3, 4]
[0, 1, 2, 3, 4]
[0, 1, 2, 3, 4]


## Advanced comprehensions

### Conditionals in comprehensions



In [10]:
# Create list of squares of even numbers from 0 to 9
[num ** 2 for num in range(10) if num % 2 == 0]

[0, 4, 16, 36, 64]

We can also condition the list comprehension on the output expression. Below, for an even integer we output its square. In any other case, signified by the else clause, that is for odd integers, we output 0:

In [11]:
[num ** 2 if num % 2 == 0 else 0 for num in range(10)]

[0, 0, 4, 0, 16, 0, 36, 0, 64, 0]

### Dict comprehensions

We can also write dictionaty comrehensions to create new dictionaries from iterables. There is only two differences on the syntax:
* The curly braces `{}` are used instead of square brackets `[]`
* The `key` and `value` are separated by a colon `:` instead of a comma `,`

In the following example, we are creating a dictionary with keys positive integers and values their negative counterparts:

In [12]:
pos_neg = {num: -num for num in range(9)}
print(pos_neg)

{0: 0, 1: -1, 2: -2, 3: -3, 4: -4, 5: -5, 6: -6, 7: -7, 8: -8}


### Exercises

__Instructions__
* Use `member` as the iterator variable in the list comprehension. For the conditional, use `len()` to evaluate the iterator variable. Note that we only want strings with `7` characters or more.

In [1]:
# Create a list of strings: fellowship
fellowship = ['frodo', 'samwise', 'merry', 'aragorn', 'legolas', 'boromir', 'gimli']

# Create list comprehension: new_fellowship
new_fellowship = [member for member in fellowship if len(member)>6]

# Print the new list
print(new_fellowship)

['samwise', 'aragorn', 'legolas', 'boromir']


Next, in the output expression, keep the string as-is if the number of characters is >= 7, else replace it with an empty string - that is, '` or `""`.

In [2]:
# Create list comprehension: new_fellowship
new_fellowship = [member if len(member)>6 else "" for member in fellowship ]

# Print the new list
print(new_fellowship)

['', 'samwise', '', 'aragorn', 'legolas', 'boromir', '']


Now, we create a dict comprehension where the key is a string in `fellowship` and the value is the length of the string. Remember to use the syntax `<key> : <value>` in the output expression part of the comprehension to create the members of the dictionary. Use `member` as the iterator variable:

In [3]:
# Create dict comprehension: new_fellowship
new_fellowship = {member: len(member) for member in fellowship}

# Print the new dictionary
print(new_fellowship)

{'frodo': 5, 'samwise': 7, 'merry': 5, 'aragorn': 7, 'legolas': 7, 'boromir': 7, 'gimli': 5}


## Introduction to generator expressions

Let us look at the following code:

In [4]:
[2 * num for num in range(10)]

[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

Now, lets replace the square brackets with parentheses to create a generator expression:

In [5]:
(2 * num for num in range(10))

<generator object <genexpr> at 0x1079dcd40>

### List comprehensions vs. generators

* List comprehension - returns a list
* Generators - returns a generator object
* Both can be iterated over

In [6]:
result = (num for num in range(6))
for num in result:
    print(num)

0
1
2
3
4
5


On the above code, we see that looping over a generator expression produces the elements of the analogous list. <br>
We can also create a list by passing a generator expression to the list() function:

In [13]:
result = (num for num in range(6))
print(list(result))

[0, 1, 2, 3, 4, 5]


Moreover, like any other iterator, we can pass a generator to the function next in order to iterate through its elements. This is an example of something called __lazy evaluation__, whereby the evaluation of the expression is delayed until its value is needed. This can help a great deal when working with extremely large sequences as you don't want to store the entire list in memory, which is what comprehensions would do; you want to generate elements of the sequence on the fly.

In [18]:
result = (num for num in range(6))
print(next(result))
print(next(result))
print(next(result))
print(next(result))
print(next(result))

0
1
2
3
4


Let's say that we wanted to iterate over a very large sequence of numbers,
```python
[num for num in range(10 * 10000000)]
```
or at least wanted to do so until another condition was satisfied. what happens when we try to build such an iterable list using a comprehension is that the list can't be stored in memory. <br>
However, if we use a generator expression, we can generate the analogous object on the fly because it does not yet create the entite list. This is a huge improvement in terms of memory usage.

In [19]:
(num for num in range(10 * 10000000))

<generator object <genexpr> at 0x107b2c940>

### Conditionals in generators

In [20]:
even_nums = (num for num in range(10) if num % 2 == 0)
print(list(even_nums))

[0, 2, 4, 6, 8]


### Generator function

* Produces generator objects when called
* Defined like a regular function - `def`
* Yields a sequence of values instead of returning a single value
* Generates a value with `yield` keyword

Here we have defined a generator function that, when called with a number n, produces a generator object that generates integers 0 though n. 

In [23]:
def num_sequence(n):
    """Generate values from 0 to n."""
    i = 0
    while i < n:
        yield i
        i += 1
result = num_sequence(5)
print(type(result))

<class 'generator'>


In [24]:
for item in result:
    print(item)

0
1
2
3
4


### Exercise

__Instructions__

* Create a generator object that will produce values from `0` to 10`. Assign the result to result and use `num` as the iterator variable in the generator expression.
* Print the first `5` values by using `next()` appropriately in `print()`.
* Print the rest of the values by using a `for` loop to iterate over the generator object.

In [25]:
# Create generator object: result
result = (num for num in range(0,11))

# Print the first 5 values
print(next(result))
print(next(result))
print(next(result))
print(next(result))
print(next(result))

# Print the rest of the values
for value in result:
    print(value)

0
1
2
3
4
5
6
7
8
9
10


In [26]:
# Create a list of strings: lannister
lannister = ['cersei', 'jaime', 'tywin', 'tyrion', 'joffrey']

# Create a generator object: lengths
lengths = (len(person) for person in lannister)

# Iterate over and print the values in lengths
for value in lengths:
    print(value)

6
5
5
6
7


In the following exercise, we will create a generator function with a similar mechanism as the generator expression we defined in the previous exercise:
```python
lengths = (len(person) for person in lannister)
```

__Instructions__

* Complete the function `get_lengths()` which has a single input parameter of `input_list`.
* In the function body, complete the `for` loop such that it yields the length of the strings in `input_list`.
* Complete the `for` loop in the `main()` function which calls `get_lengths(lannister)` and prints out the values in the generator object it returns.


In [27]:
# Create a list of strings
lannister = ['cersei', 'jaime', 'tywin', 'tyrion', 'joffrey']

# Define generator function get_lengths
def get_lengths(input_list):
    """Generator function that yields the
    length of the strings in input_list."""

    # Yield the length of a string
    for person in input_list:
        yield len(person)

# Print the values generated by get_lengths()
for value in get_lengths(lannister):
    print(value)

6
5
5
6
7


### Re-cap: List comprehensions

* Basic
    ```python
    [output expression for iterator variable in iterable]
    ```

* Advanced
    ```python
    [output expression +
    conditional on output for iterator variable in iterable +
    conditional on iterable]
    ```

## EXERCISE

In this exercise, you will be using a list comprehension to extract the time from time-stamped Twitter data. 

### Instructions
* Extract the column `'created_at'` from `df` and assign the result to `tweet_time`. Fun fact: the extracted column in `tweet_time` here is a Series data structure!
* Create a list comprehension that extracts the time from each row in `tweet_time`. Each row is a string that represents a timestamp, and you will access the __12th to 19th__ characters in the string to extract the time. Use `entry` as the iterator variable and assign the result to `tweet_clock_time`. Remember that Python uses 0-based indexing!


In [30]:
import pandas as pd

df = pd.read_csv('../Databases/tweets.csv')

# Extract the created_at column from df: tweet_time
tweet_time = df['created_at']

# Extract the clock time: tweet_clock_time
tweet_clock_time = [entry[11:19] for entry in tweet_time]

# Print the extracted times
print(tweet_clock_time)

['23:40:17', '23:40:17', '23:40:17', '23:40:17', '23:40:17', '23:40:17', '23:40:18', '23:40:17', '23:40:18', '23:40:18', '23:40:18', '23:40:17', '23:40:18', '23:40:18', '23:40:17', '23:40:18', '23:40:18', '23:40:17', '23:40:18', '23:40:17', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:17', '23:40:18', '23:40:18', '23:40:17', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:19', '23:40:18', '23:40:18', '23:40:18', '23:40:19', '23:40:19', '23:40:19', '23:40:18', '23:40:19', '23:40:19', '23:40:19', '23:40:18', '23:40:19', '23:40:19', '23:40:19', '23:40:18', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23

Now, we will add a conditional expression to the list comprehension so that you only select the times in which `entry[17:19]` is equal to `'19'`.

### Instructions
* Extract the column `'created_at'` from df and assign the result to `tweet_time`. 
* Create a list comprehension that extracts the time from each row in `tweet_time`. Each row is a string that represents a timestamp, and you will access the __12th to 19th__ characters in the string to extract the time. Use `entry` as the _iterator_ variable and assign the result to `tweet_clock_time`. Additionally, add a conditional expression that checks whether `entry[17:19]` is equal to `'19'`.

In other words, we are going to print out the tweet times that happened at the 19th second of any hour-minute.

In [32]:
# pandas is imported as pd, and tweets.csv is loaded as df (see previous exercise)

# Extract the created_at column from df: tweet_time
tweet_time = df['created_at']

# Extract the clock time: tweet_clock_time
tweet_clock_time = [entry[11:19] for entry in tweet_time if entry[17:19] == '19']

# Print the extracted times
print(tweet_clock_time)

['23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19']
