In [4]:
import unittest
import util

# Range() and lazy evaluation

## The range() function

Many sort of programming problems at some point use a list of sequential numbers. For example, we might need a list of all perfect squares of the numbers 0 through 5. In past lessons, we would create these by explicitly typing [1, 2, 3, 4, 5]

In [28]:
[x * x for x in [1, 2, 3, 4, 5]]

[1, 4, 9, 16, 25]

Needing to create a list of sequential numbers is such a common problem that Python provides the range() function to generate them. Given a single argument to range, Python generates numbers from zero up to the given number

In [29]:
[x for x in range(5)]

[0, 1, 2, 3, 4]

With two or three numbers, the syntax is similar to the syntax used to slice strings or lists. Basically, it generates numbers from the first argument, up to the second argument, with the step (optionally) specified by the third argument.

In [30]:
[x for x in range(1, 6)]

[1, 2, 3, 4, 5]

In [31]:
[x for x in range(1, 6, 2)]

[1, 3, 5]

## Lazy evaluation 

You'll notice that for each of these examples, we've placed range() in a list comprehension. The reason for this is that range() actually doesn't output a list. If we call range on its own, it produces a *range* object. 

In [32]:
range(5)

range(0, 5)

If you're not immediately iterating over the collection of numbers, and you need a list, it's easy to turn a range object into a list using the list() function.

In [34]:
list(range(5))

[0, 1, 2, 3, 4]

This probably seems a little confusing. Why would Python use a seperate data type to represent ranges of numbers when a list is a perfectly good way to do it? The reason is basically that it makes it easy to generate huge lists of numbers without using large amounts of memory. Try running the two lines of code below, where we create a *range* object holding the numbers from zero to fifty million, and then do the same for a list

In [4]:
big_range = range(50000000)

In [5]:
big_list = list(range(50000000))

Notice how creating the list takes some time, while creating the range is almost instantaneous? This is because the range is "smarter" than the list. It knows that there is a regular rule for generating numbers in sequence, so it only stores the current element, and a rule for creating the next one. If you're iterating through a range and you're at the number 55, the only things that are stored in memory are the current number, the highest number (so range knows when to stop), and the rule for generating the next number (the step.)

In contrast, a list has to explicitly contain each element. So in this case, rather than storing two numbers and a simple rule, it has to store 50 million numbers. This takes up a considerable amount of processor time to generate the list, and then takes up a great deal of memory to hold all 50 million numbers.

The first strategy for generating these numbers is called "lazy evaluation," because range() only generates one number each time a for loop passes over it. In contrast, the strategy for generating a list is called "eager evaluation", because your computer does the work to generate all of the numbers before a loop even touches them.

The major downside of lazy evaluation is that it's sometimes conceptually harder to understand what the program is doing, because you can't easily see the intermediate steps. The upside, however, is that programs that effectively use lazy evaluation are much faster and consume less memory.

# Built in iteration functions

In addition to range(), which creates a lazy iterator, Python has some useful functions which take either lazy iterators or collections and produce useful other iterators.

## Counting while iterating with *enumerate()*

Very often, we'll want to go through a list and know both the value at every index, as well as the index. When we type

```python
for item in list:
    pass
```

this only gives us the value of the item in the list, but not the index that we find this item. One way to solve this would be to maintain and update a count variable, like below:

```python
index = 0
for item in list:
    # do something with item, index
    index +=1
```

but this is tedious, and prone to bugs if you forget to update the index variable. A better way to do this is using the *enumerate()* function, which returns an (index, item) tuple for every item in the list. Let's see an example below:


In [2]:
grocery_list = ['Eggs', 'Milk', 'Bread', 'Cheese']

for index, item in enumerate(grocery_list):
    print("Item", index, "is", item)

Item 0 is Eggs
Item 1 is Milk
Item 2 is Bread
Item 3 is Cheese


like all other indicies in Python, the numbers returned from *enumerate()* begin at 0.

## Reversing with *reversed()*

For ordered collections (like a list or a tuple, but not a dictionary), the *reversed()* function will return the items in the collection, but reversed. Let's see it in action:

In [3]:
for food_item in reversed(grocery_list):
    print(food_item)

Cheese
Bread
Milk
Eggs


## Sorting with sorted()

We can sort an ordered collection using the *sorted()* function

In [4]:
sorted(grocery_list)

['Bread', 'Cheese', 'Eggs', 'Milk']

Note that sort uses the built in comparison operator (i.e. >, < and ==) to decide the order of the collection. This means that strings will sort in lexical (alphabetical) order, while numbers will sort in numeric order:

In [5]:
sorted([4,5,2,1,5,7])

[1, 2, 4, 5, 5, 7]

Note that sorted() will refuse to sort lists where the types cannot be compared to each other. For example, there's no reasonable way to order arbitrary strings with integers, so Python will refuse to execute the code below.

In [12]:
sorted([1, '2', 4, 'bananna'])

TypeError: unorderable types: str() < int()

That does not mean that lists with different types cannot be sorted. For example, the following array contains both floating point numbers and integer numbers, but because they can be compared to each other, the array will still sort.

In [11]:
sorted([1, 5.3, 2, 0.001, -1000])

[-1000, 0.001, 1, 2, 5.3]

Sorting is slightly different from these other two operations, becuase the program must be able to hold the entire list in memory at once to sort it. Thus, sorted() produces another collection, rather than a lazy iterator.

# Excercises

## 1: Sort strings by their reversed order

Write a function that takes a list of strings as an argument, and returns the strings sorted by their **last** letters. This is an approach sometimes used to create crude rhyming dictionaries in English. Check the test suite for some examples of pairs of inputs and outputs.

Hint: To reverse a string, you can use the reversed() function, but it will output a lazy iterator that you have to join into a string. Do you remember how to join iterators of strings together? Alternatively, there's an indexing trick that you learned in part 1 that can reverse the order of strings.

In [10]:
def reversed_string_sort(list_of_strings):
    reversed_strings = [s[::-1] for s in list_of_strings]
    reversed_strings = sorted(reversed_strings)
    return [s[::-1] for s in reversed_strings]

In [16]:
class TestReversedStringsSort(unittest.TestCase):
    def test_strings(self):
        input_data = ['list', 'of', 'words', 'to', 'reverse']
        self.assertEqual(reversed_string_sort(input_data), 
                        ['reverse', 'of', 'to', 'words', 'list']
                        )
    def test_rhyming_words_sort_of(self):
        input_data = ['elation', 'monotreme', 'paste', 'theme', 'elevation', 'waste']
        sorted_data = ['theme', 'monotreme', 'paste', 'waste', 'elation', 'elevation']
        self.assertEqual(reversed_string_sort(input_data), sorted_data)
        
util.run_tests(TestReversedStringsSort)

test_rhyming_words_sort_of (__main__.TestReversedStringsSort) ... ok
test_strings (__main__.TestReversedStringsSort) ... ok

----------------------------------------------------------------------
Ran 2 tests in 0.004s

OK


['theme', 'monotreme', 'paste', 'waste', 'elation', 'elevation']