## Chapter 8 - Lists

## 8.1 A list is a sequence
* Like a string, a list is a sequence of values.


* In a string, the values are characters; in a list, they can be any type.


* The values in list are called **elements** or sometimes items.


* There are several ways to create a new list; the simplest is to enclose the elements in square brackets **([ and ])**:


In [2]:
[10,20,30,40] # The first example is a list of four integers.

[10, 20, 30, 40]

In [5]:
['crunchy frog', 'ram bladder', 'lark vomit']
# The second is a list of three strings. 
# The elements of a list don’t have to be the same type.

['crunchy frog', 'ram bladder', 'lark vomit']

In [6]:
['spam', 2.0, 5, [10, 20]]
# The following list contains a string, a float, an integer, and (lo!) another list:

['spam', 2.0, 5, [10, 20]]

A list that contains no elements is called an empty list; you can create one with empty brackets, [].

As you might expect, you can assign list values to variables:

In [7]:
cheeses = ['Cheddar', 'Edam', 'Gouda']

In [11]:
numbers = [17, 123]

In [9]:
empty = []

In [12]:
print (cheeses, numbers, empty)

['Cheddar', 'Edam', 'Gouda'] [17, 123] []


Below is a quick summary of lists constants

<img src = 'Py4Inf-08-Lists.jpg' >

#### Lists and definite loop (that is the **for**) works best.

<img src = 'list and loop.png' >

Just like strings, we can get at any single element in a list using an index specified in **square brackets**


<img src = 'list index.png' >

## 8.2 Lists are mutable
* Unlike strings, lists are mutable because you can change the order of items in a list or reassign an item in a list.


* When the bracket operator appears on the left side of an assignment, it identifies the element of the list that will be assigned.

In [14]:
num = [17, 123]

In [15]:
num[1] = 5

In [16]:
print (num)

[17, 5]


The above is where where the value at index 1 (which was 123) is now replaced with 5

You can think of a list as a relationship between indices and elements. This relationship is called a **mapping**; each index ***“maps to”*** one of the elements.

List indices work the same way as string indices:

* Any integer expression can be used as an index.

* If you try to read or write an element that does not exist, you get an IndexError. 

* If an index has a negative value, it counts backward from the end of the list.

* The **in** operator also works on lists.

In [17]:
cheeses = ['Cheddar', 'Edam', 'Gouda']
'Edam' in cheeses

True

In [18]:
'Brie' in cheeses

False

## 8.3 Traversing a list
The most common way to traverse the elements of a list is with a **for loop**.
The syntax is the same as for strings:

            for cheese in cheeses:
                print cheese

This works well if you only need to read the elements of the list. But if you want to write or update the elements, you need the indices. A common way to do that is to combine the functions **range** and **len**:

In [19]:
num = [1,2,3,4,5]
for i in range(len(num)):
    num[i] = num[i] * 2

In [20]:
num

[2, 4, 6, 8, 10]

This loop traverses the list and updates each element. len returns the number of
elements in the list. range returns a list of indices from 0 to n−1, where n is the length of the list. Each time through the loop, i gets the index of the next element. The assignment statement in the body uses i to read the old value of the element and to assign the new value.

Although a list can contain another list, the nested list still counts as a single element. The length of this list is four:


In [29]:
list = ['spam', 1, ['Brie', 'Roquefort', 'Pol le Veq'], [1, 2, 3]]

In [30]:
len(list)

4

# 8.4 List operations
The + operator concatenates lists:

In [33]:
a = [1, 2, 3]
b = [4, 5, 6]
c= [a + b]
print (c)

[[1, 2, 3, 4, 5, 6]]


Similarly, the * operator repeats a list a given number of times:

In [34]:
[0] * 4

[0, 0, 0, 0]

In [35]:
[1,2,3] * 3

[1, 2, 3, 1, 2, 3, 1, 2, 3]

The first example repeats [0] four times. The second example repeats the list [1, 2, 3] three times.

## 8.5 List slices

The slice operator also works on lists:
<img src = 'Capture.PNG' >

In [36]:
t = ['a', 'b', 'c', 'd', 'e', 'f']

In [37]:
t[1:3]

['b', 'c']

In [38]:
t[:4]

['a', 'b', 'c', 'd']

In [39]:
t[3:]

['d', 'e', 'f']

In [40]:
t[:]

['a', 'b', 'c', 'd', 'e', 'f']

Since lists are mutable, it is often useful to make a copy before performing operations that fold, spindle, or mutilate lists.


A slice operator on the left side of an assignment can update multiple elements:

In [41]:
t = ['a', 'b', 'c', 'd', 'e', 'f']

In [42]:
t[1:3] = ['x', 'y']

In [44]:
print (t)

['a', 'x', 'y', 'd', 'e', 'f']


## 8.6 List methods
Python provides methods that operate on lists.

In [52]:
x = [] # empty list

In [54]:
type(x)

list

In [56]:
 print (dir(x))

['__add__', '__class__', '__contains__', '__delattr__', '__delitem__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__iadd__', '__imul__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__reversed__', '__rmul__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'append', 'clear', 'copy', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort']


In [57]:
help(x.append)

Help on built-in function append:

append(...) method of builtins.list instance
    L.append(object) -> None -- append object to end



**append** adds a new element to the end of a list

In [64]:
t = ['a', 'b', 'c']

In [65]:
t.append('Z')

In [66]:
print (t)

['a', 'b', 'c', 'Z']


**extend** takes a list as an argument and appends all of the elements

In [67]:
t1 = ['a', 'b', 'c']

In [68]:
t2 = ['d', 'e']

In [69]:
t1.extend(t2)

In [71]:
print (t1)

['a', 'b', 'c', 'd', 'e']


**sort** arranges the elements of the list from low to high:

In [72]:
t = ['d', 'c', 'e', 'b', 'a']
t.sort()
print (t)

['a', 'b', 'c', 'd', 'e']


Most list methods are void; they modify the list and return None. If you accidentally write t = t.sort(), you will be disappointed with the result. 

In [None]:
t = t.sort()

In [74]:
t

In [75]:
print (t)

None


Using logical operator to search a list if something is in it.

<img src = 'list operator.png'>

In [76]:
some = [1, 9, 21, 10, 16]

In [77]:
9 in some

True

In [78]:
15 in some

False

In [79]:
20 not in some

True

## 8.7 Deleting elements
* There are several ways to delete elements from a list. If you know the index of the element you want, you can use **pop**



* **pop** modifies the list and returns the element that was removed. If you don’t provide an index, it deletes and returns the last element.:

In [80]:
t = ['a', 'b', 'c']
x = t.pop(1)
print(t)

['a', 'c']


In [81]:
print (x)

b


If you don’t need the removed value, you can use the **del** operator:

In [82]:
t = ['a', 'b', 'c']

In [83]:
del t[1]

In [84]:
print (t)

['a', 'c']


If you know the element you want to remove (but not the index), you can use
**remove**:

In [85]:
t = ['a', 'b', 'c']

In [86]:
t.remove('b')

In [87]:
print(t)

['a', 'c']


* To remove more than one element, you can use del with a slice index


* As usual, the slice selects all the elements up to, but not including, the second index.

In [88]:
t = ['a', 'b', 'c', 'd', 'e', 'f']

In [89]:
del t[1:4]

In [90]:
print (t)

['a', 'e', 'f']


## 8.8 Lists and functions
There are a number of built-in functions that can be used on lists that allow you to quickly look through a list without writing your own loops:

In [91]:
nums = [3, 41, 12, 9, 74, 15]

In [92]:
print (len(nums))

6


In [93]:
print (max(nums))

74


In [94]:
print (min(nums))

3


In [95]:
print(sum(nums))

154


In [96]:
print (sum(nums)/len(nums))

25.666666666666668


The sum() function only works when the list elements are numbers. The other
functions (max(), len(), etc.) work with lists of strings and other types that can be comparable.

Here we re-write a program done earlier where it only used a signal variable to calculate the avarage.

In this program, we have count and total variables to keep the number and
running total of the user’s numbers as we repeatedly prompt the user for a number.

In [99]:
total = 0
count = 0
while ( True ) :
    inp = input('Enter a number: ')
    if inp == 'done' : break
    value = float(inp)
    total = total + value
    count = count + 1

average = total / count
print ('Average:', average)

Enter a number: 10
Enter a number: 20
Enter a number: 30
Enter a number: 40
Enter a number: 50
Enter a number: done
Average: 30.0


Now we used list data type for same calculation



We make an empty list before the loop starts, and then each time we have a number, we append it to the list. At the end of the program, we simply compute the sum of the numbers in the list and divide it by the count of the numbers in the list to come up with the average.

In [102]:
numlist = []
while ( True ) :
    inp = input('Enter a number: ')
    if inp == 'done' : break
    value = float(inp)
    numlist.append(value)

    average = sum(numlist) / len(numlist)
print ('Average:', average)

Enter a number: 10
Enter a number: 20
Enter a number: 30
Enter a number: 40
Enter a number: 50
Enter a number: done
Average: 30.0


In [103]:
numlist

[10.0, 20.0, 30.0, 40.0, 50.0]

## 8.9 Lists and strings

A string is a sequence of characters and a list is a sequence of values, but a list of characters is not the same as a string. To convert from a string to a list of characters, you can use list.

The below example is not work in Jupyter...but working in Python sheet as shown below.

<img src = "string to list.png" >

* The list function breaks a string into individual letters. If you want to break a string into words, you can use the split method.


* **split** breaks a string into parts and produces a list of strings. 

In [128]:
s = 'pining for the fjords'

In [129]:
t = s.split()

In [130]:
print (t)

['pining', 'for', 'the', 'fjords']


In [131]:
print (t[2])

the


Once you have used split to break the string into a list of words, you can use the index operator (square bracket) to look at a particular word in the list.

You can call split with an optional argument called a **delimiter** that specifies which characters to use as word boundaries. The following example uses a hyphen as a delimiter:

In [132]:
s = 'spam-spam-spam'

In [133]:
delimiter = '-'

In [134]:
s.split(delimiter)

['spam', 'spam', 'spam']

**join** is the inverse of **split**. It takes a list of strings and concatenates the elements. **join** is a string method, so you have to invoke it on the delimiter and pass the list as a parameter:

In [135]:
t = ['pining', 'for', 'the', 'fjords']

In [136]:
delimiter = ' '

In [137]:
delimiter.join(t)

'pining for the fjords'

In this case the delimiter is a space character, so join puts a space between words. To concatenate strings without spaces, you can use the empty string, '', as a delimiter.

## 8.10 Parsing lines
* Usually when we are reading a file we want to do something to the lines other than just printing the whole line. 


* Often we want to find the “interesting lines” and then **parse** the line to find some interesting part of the line.


* What if we wanted to print out the day of the week from those lines that start with “From ”.
    
    *From stephen.marquard@uct.ac.za Sat Jan 5 09:14:16 2008*


* The **split** method is very effective when faced with this kind of problem. We can write a small program that looks for lines where the line starts with “From ”,  **split** those lines, and then print out the third word in the line:

In [146]:
fhand = open('mbox-short.txt')
for line in fhand:
    line = line.rstrip()
    if not line.startswith('From '):
        continue
    words = line.split()
    print (words[2]) #note 2 is the index where day of week is

Sat
Fri
Fri
Fri
Fri
Fri
Fri
Fri
Fri
Fri
Fri
Fri
Fri
Fri
Fri
Fri
Fri
Fri
Fri
Fri
Fri
Thu
Thu
Thu
Thu
Thu
Thu


In abouve example the following is happening:
*file handle is created to variable **fhand**


* use of for loop to go through each line
    * first white space right of the string is stripped
    * then if the line does not start with 'From ', the program skips that line and move to the next.
    * If 'From '  is found it then assign that line (which is a string) to the words variable and used the **split** method to split string into words for the list.
    * then is prints the index 2 (which is the third element in the list)

* Below is a simple example on a line



In [None]:
line = 'From stephen.marquard@uct.ac.za Sat Jan  5 09:14:16 2008'

In [150]:
words = line.split()

In [151]:
words

['From', 'stephen.marquard@uct.ac.za', 'Sat', 'Jan', '5', '09:14:16', '2008']

In [153]:
print (words)

['From', 'stephen.marquard@uct.ac.za', 'Sat', 'Jan', '5', '09:14:16', '2008']


In this example we carry out double split from the above.

In [155]:
line = 'From stephen.marquard@uct.ac.za Sat Jan  5 09:14:16 2008'

In [156]:
words = line.split()

In [160]:
words

['From', 'stephen.marquard@uct.ac.za', 'Sat', 'Jan', '5', '09:14:16', '2008']

In [163]:
email = words[1]
print (email)

stephen.marquard@uct.ac.za


In [165]:
pieces = email.split('@')
print (pieces)

['stephen.marquard', 'uct.ac.za']


Note the split method was called with the **delimiter** argument and it was passed the **'@'** element. Which mean that was removed

In [167]:
print (pieces[1])

uct.ac.za


In [169]:
# Putting hte last example as ine script
line = 'From stephen.marquard@uct.ac.za Sat Jan  5 09:14:16 2008'
words = line.split()
email = words[1]
pieces = email.split('@')

print (pieces[1])


uct.ac.za


I did not review 8.11 and 8.12 and 8.13 as it just speaks objects/values and aliasing.  Should not affect the remainder of this on list.
Worth understanding as it relates to objects and memory allocation

## 8.16 Exercises

**Exercise 8.4** Download a copy of the file from www.py4inf.com/code/romeo.txt

Write a program to open the file romeo.txt and read it line by line. For each line,
split the line into a list of words using the split function.

For each word, check to see if the word is already in a list. If the word is not in the list, add it to the list.

When the program completes, sort and print the resulting words in alphabetical
order.

Enter file: romeo.txt
['Arise', 'But', 'It', 'Juliet', 'Who', 'already',
'and', 'breaks', 'east', 'envious', 'fair', 'grief',
'is', 'kill', 'light', 'moon', 'pale', 'sick', 'soft',
'sun', 'the', 'through', 'what', 'window',
'with', 'yonder']

In [226]:
file_name = input("Enter file: ")

lines = [line.strip('\n') for line in open(file_name, 'r')]

word_list = []

for line in lines:
    words = line.split()
    for word in words:
        word = word.lower()
        if word in word_list:
            pass
        else:
            word_list.append(word)

print (sorted(word_list))

Enter file: romeo.txt
['already', 'and', 'arise', 'breaks', 'but', 'east', 'envious', 'fair', 'grief', 'is', 'it', 'juliet', 'kill', 'light', 'moon', 'pale', 'sick', 'soft', 'sun', 'the', 'through', 'what', 'who', 'window', 'with', 'yonder']


**Exercise 8.5**  Write a program to read through the mail box data and when you
find line that starts with “From”, you will split the line into words using the split function. We are interested in who sent the message, which is the second word on the From line.

    From stephen.marquard@uct.ac.za Sat Jan 5 09:14:16 2008

You will parse the From line and print out the second word for each From line,
then you will also count the number of From (not From:) lines and print out a
count at the end.

This is a good sample output with a few lines removed:

python fromcount.py
Enter a file name: mbox-short.txt
stephen.marquard@uct.ac.za
louis@media.berkeley.edu
zqian@umich.edu

[...some output removed...]

ray@media.berkeley.edu
106 Chapter 8. Lists
cwen@iupui.edu
cwen@iupui.edu
cwen@iupui.edu

There were 27 lines in the file with From as the first word

In [238]:
file_name = input("Enter a file name: ")
lines = [line.strip("\n") for line in open(file_name, 'r')
         if line.startswith("From") and not line.startswith("From:")]

count = 0
for line in lines:
    words = line.split()
    print (words[1])
    count += 1
print ('There were '  + str(count) + ' ' + 'lines in the file with From as the first word')

Enter a file name: mbox-short.txt
stephen.marquard@uct.ac.za
louis@media.berkeley.edu
zqian@umich.edu
rjlowe@iupui.edu
zqian@umich.edu
rjlowe@iupui.edu
cwen@iupui.edu
cwen@iupui.edu
gsilver@umich.edu
gsilver@umich.edu
zqian@umich.edu
gsilver@umich.edu
wagnermr@iupui.edu
zqian@umich.edu
antranig@caret.cam.ac.uk
gopal.ramasammycook@gmail.com
david.horwitz@uct.ac.za
david.horwitz@uct.ac.za
david.horwitz@uct.ac.za
david.horwitz@uct.ac.za
stephen.marquard@uct.ac.za
louis@media.berkeley.edu
louis@media.berkeley.edu
ray@media.berkeley.edu
cwen@iupui.edu
cwen@iupui.edu
cwen@iupui.edu
There were 27 lines in the file with From as the first word


**Exercise 8.6** Rewrite the program that prompts the user for a list of numbers and prints out the maximum and minimum of the numbers at the end when the user
enters “done”. Write the program to store the numbers the user enters in a list
and use the max() and min() functions to compute the maximum and minimum
numbers after the loop completes.

* Enter a number: 6
* Enter a number: 2
* Enter a number: 9
* Enter a number: 3
* Enter a number: 5
* Enter a number: done
* Maximum: 9.0
* Minimum: 2.0

In [241]:
def find_min_max():
    user_responses = []
    while True:
        try:
            user_input = input("Enter a number: ")
            user_input = int(user_input)
        except:
            break
        user_responses.append(user_input)
    print ("Maximum: " + str(max(user_responses)))
    print ("Minimum: " + str(min(user_responses)))

find_min_max()

Enter a number: 10
Enter a number: 20
Enter a number: 30
Enter a number: 40
Enter a number: 50
Enter a number: done
Maximum: 50
Minimum: 10


<a href="Py4Inf-08-Lists.pdf">The Lecture Slides in PDF </a>

<a href="Py4Inf-08-Lists.pptx">The Lecture Slides in PPT </a>

# The END
=================