<p><a name="sections"></a></p>


# Sections

- <a href="#string">String operations</a><br>
 - <a href="#strBasic">String Basics</a><br>
 - <a href="#OMC">Object Method Calls</a><br>
 - <a href="#buildin">Built-in Strings Operations</a><br>
 - <a href="#MinMap">Method Call in map</a><br>
 - <a href="#case">Case conversion</a><br>
- <a href="#fileIO">File Input and Output</a><br>
 - <a href="#read">Reading from Files</a><br>
 - <a href="#output">File Output</a><br>
 - <a href="#fileSearch">Searching in Files</a><br>
- <a href="#DS">Data Structure</a><br>
 - <a href="#mutate">Mutating Operations on Lists</a><br>
 - <a href="#multiList">Multiple List Operators</a><br>
 - <a href="#TSD">Tuple, sets and Dictionaries</a><br>
- <a href="#quiz">Quiz</a><br>
<p><a name="string"></a></p>

# String operations

<p><a name="strBasic"></a></p>
## String Basics
String literals are written using either single quotes `('...')` or double quotes `("...")`.

In [1]:
s1 ='Hello world!'
s1

'Hello world!'

In [2]:
s2 = "Hello world!"
s2

'Hello world!'

Once a variable is created as a string, it is treated as a string!

In [3]:
s3 = '1'
s4 = '2'
s3 + s4

'12'

Within a string, special symbols can be included by using escape character `(\)`. Examples are `\t` for tab and `\n` for newline. See 

https://docs.python.org/2.0/ref/strings.html

for a complete list!!

The `print` statement produces a more readable output by omitting enclosing quotes as well as escaped characters and special symbols:

In [6]:
print 'that's what she said'

SyntaxError: invalid syntax (<ipython-input-6-0a3bff541cb7>, line 1)

In [4]:
print 'that\'s what she said' # basicaly \ tells the print command to chill out and continue reading 

that's what she said


If you don't want the characters prefaced by `\ ` to be interpreted as special characters, you can use raw strings by adding an `r` before the first quote: 

In [5]:
print 'Hello,\nI have been expecting you!'

Hello,
I have been expecting you!


In [7]:
print 'C:\some\name'   # \n means newline so Python gets confused 

C:\some
ame


In [9]:
len('C:\some\name')  # you may think this string has a length = 12 but \n in python is a special character

11

Above, `\n` is interpreted as one special character. To avoid that, add `r` in front of the first quote, then `\n` become two chatacters.

In [9]:
print r'C:\some\name'

C:\some\name


In [10]:
len(r'C:\some\name')

12

A string is normally coded in one line, but it can also be broken into multiple lines of code by adding `\ ` in the very last position of each line.

In [9]:
longquote = 'This is a particularly long line \
that is broken into two lines'

print longquote

This is a particularly long line that is broken into two lines


Python also has multiple lineotes: enclose the string with three quotation marks. Note that with three quotation marks the newlines are included

In [15]:
longquote = '''This is a particularly long line
that is broken into two lines'''

print longquote

This is a particularly long line
that is broken into two lines


Strings are similar to lists. You can subscipt and slice, as well as use `+` to concatenate strings.

In [10]:
s = 'My dog has fleas'
s[1]

'y'

In [17]:
s[3:6]  # exclusive of 6 

'dog'

In [18]:
s + ' and so do I.'

'My dog has fleas and so do I.'

Convert a string to a list of single-character rings using the function `list()`:

In [16]:
print list(s)

['M', 'y', ' ', 'd', 'o', 'g', ' ', 'h', 'a', 's', ' ', 'f', 'l', 'e', 'a', 's']


In [17]:
print list('dog')

['d', 'o', 'g']


<p><a name="OMC"></a></p>
## Object Method Calls

We saw how we could split a string into a list. Going the other way requires a trick.

In [22]:
"-".join(['s','e','p','e','h','r'])

's-e-p-e-h-r'

In [19]:
"".join(['s','e','p','e','h','r'])

'sepehr'

We have already seen the dot notation when calling functions from a module, for exaple: `math.sqrt()`.

- Then you might be wondering: why does an empty sring have a function?

To answer this question, we need to mention an important fact that Python is an **object-oriented language**. Without delving into details, this means Python takes many useful types of data for **objects**, including strings. Each object is associated with **methods** that can be applied to them.

- Essentially a method is a function. A function associated with an object is called a method.

A method is always called by an object it is associated with, and the syntax is:

- `object.method(argument1, argument2, ...)`

Therefore, in our previouse example:

- The empty string `''` is used to call the function `join`.
- The symbol `_` is used to represent the value returned by the last cell. In this particular case: `['d', 'o', 'g']`.
- The method `join` then concatenate the strings in the list passed to it.

We will discuss more on this `join` method.

Object oriented programming is actually an important feature of Python. We can even create our own objects. We will discuss this advanced topic later in this course.

<p><a name="buildin"></a></p>
## Built-in Strings Operations

Python comes with a large set of functions to perform useful operations strings. We will discuss several, for a complete list, see

https://docs.python.org/2/library/string.html

- **`strip`** removes "whitespace" (spaces, tab, newlines,..., etc) from the beginning and the end of a string. Note the usage of object-oriented syntax:

In [23]:
s = '    my dog has fleas    '
s.strip()

'my dog has fleas'

- **`split`** is an extremely useful function that splits a string into a list of strings on the delimeter. By default, the delimeter consists of spaces, tabs `(\t)`, and newlines `(\n)`, but the delimeter can be given as an argument.

In [31]:
s = 'my dog has fleas'
s.split()

['my', 'dog', 'has', 'fleas']

In [32]:
s1 = 'my,dog,has,fleas'
s1.split(',')

['my', 'dog', 'has', 'fleas']

- **`join`** concatenate a list of words with intervening occurences of a delimeter. You can sider join as the inverse of `split`. For example:

In [26]:
l = ['my','dog','has','fleas']
' '.join(l)

'my dog has fleas'

In [28]:
', '.join(l)

'my, dog, has, fleas'

Here we see that the object calling the function is often the first argument passed to the function. In this particular example, the string calling `join` is passed as the **delimeter**. Earlier we saw an example where `join` was called by the empty string `''`:

In [35]:
''.join(['d','o','g'])

'dog'

In [31]:
s='sepehr'
print ''.join(list(s))

sepehr


- **`replace`** returns a string with all occurences of one string replaced by another. For example:

In [36]:
s = 'He is my classmate, and he is learning Python'
s.replace('is','was')

'He was my classmate, and he was learning Python'

The object parameter is the string within which replacement takes place the first parameter is the string to be replaced by the second parameter.

If you only want the first occurence of the strign to be replaced, you can pass it to the third argument:

In [37]:
s.replace('is', 'was', 1)

'He was my classmate, and he is learning Python'

- **`find`** returns the lowest index in string `s` where the substring *sub* is found. It returns -1 on failure.

In [33]:
s = 'my dog has fleas'
s.find('dog')

3

In [32]:
s.find('dogs')

-1

If we want to find the substring *sub* within a range we need to specify it:

In [46]:
s.find('a')   # if there are multiple instances, it will give you the index of the first instance 

8

In [47]:
s.find('a', 9)  # this means start from index 9 

14

- **`%` and format**

The `%` operator is used to create nicely formatted strings from other values. Its syntax is `format_specifier%(tuple of values)`:

 - `formate_specifier` is a string containing special format symbols, which are used to insert alues from the tuple:
  - `%s` means inserting a string.
  - `%d` means inserting an integer.
  - `%f` means inserting a float.
 - The number of format symbol in the format specifier must equal to the number of values in the tuple, and each format symbol must match the type of the corresponding value in the tuple.

In [34]:
print 'My name is %s and I am %d years old.' % ('Mike', 25)    # you can also use %i for integer  

My name is Mike and I am 25 years old.


In [43]:
print 'The average price of 1BR apartments in %s is $%.2f Million Dollars' % ('Tribeca',1.345)  

The average price of 1BR apartments in Tribeca is $1.34 Million Dollars


<p><a name="MinMap"></a></p>
## Method call in `map`

Object notation causes one problem. Suppose we have a method *fun* on strings, and suppose it has no arguments.  So we apply it to string s by writing: `s.fun()`.

Now suppose we want to apply *fun* to every element of a list of strings. How can we do that?

- We can try `map(fun, L)`, but that doesn’t work. 

Instead, turn the method into a function:

- `lambda s: s.fun()`

or

- `def newfun(s): return s.fun()`

E.g., here is a way to “`strip`” every string in a list:

In [47]:
lis = [' abc  \n', ' def', 'ghi   ']
map(lambda s: s.strip(), lis)

['abc', 'def', 'ghi']

Here is a way to remove the strings that contain the word “No”:

In [19]:
# Return true if s does *not* contain 'No'
def has_no_No(s):
    return s.find('No') == -1

L = ['No rain', 'Some snow', 'No sleet']

print filter(has_no_No, L)
print map(has_no_No, L)

['Some snow']
[False, True, False]


<p><a name="case"></a></p>
## Case Conversion

Python has provided some built-in functions to do case conversions:

In [53]:
'ABcd'.lower()      # convert to lowercase

'abcd'

In [54]:
'ABcd'.upper()      # convert to uppercase

'ABCD'

In [55]:
'ABcd'.swapcase()   # swap case

'abCD'

In [56]:
'aCd acD'.title()   # make first letters uppercase

'Acd Acd'

**Exercise 1** String operations

- Use `map` to find the first occurrence of the character `i` in each word in a list:
```
lis = ['today', 'is', 'a', 'nice', 'day']
map( ... , lis)   # fill in the ... to get [-1, 0, -1, 1, -1]
```

- Define function `find_char(s, t)` to find the lowest index of string `t` in each word in `s`:
```
s = 'today is a nice day'
find_char(s, 'i') ---> [-1, 0, -1, 1, -1]
```
You need to split `s`, and then so as above.

- Use `map` with the formatting operation (`%`) to turn a list of numbers into a list of strings:
```
lis = [10, 12, 4, 7]
map( ... , lis)   # fill in the ... to get ['10', '12', '4', '7']
```

- Use `filter` to find the strings in a list that do contain No.
```
L = ['No rain', 'Some snow', 'No sleet']
filter(has_no, L) ---> ['No rain', 'No sleet']
```

In [57]:
#### Your code here

#1
# first occurence of i in each word in the list
lis = ['today', 'is', 'a', 'nice', 'day']
print map(lambda s: s.find('i') , lis)   # fill in the ... to get [-1, 0, -1, 1, -1]


[-1, 0, -1, 1, -1]


In [58]:
#2
#find the lowest index of string t in each word in s:
s = 'today is a nice day'
# find_char(s, 'i') ---> [-1, 0, -1, 1, -1]

def find_char(s,k):
    temp = s.split()
    return map(lambda x: x.find(k), temp)

print find_char(s,'i')


[-1, 0, -1, 1, -1]


In [59]:
#3
#  use map with the formatting operation (%) to turn a list of numbers into a list of strings
lis = [10, 12, 4, 7]
print map(lambda x: '%i' % x , lis)   # fill in the ... to get ['10', '12', '4', '7']

#another method without using %:
print map(lambda x: str(x), lis)

['10', '12', '4', '7']
['10', '12', '4', '7']


In [60]:
#4 #use filter to find the strings in a list that do contain No.

L = ['No rain', 'Some snow', 'No sleet']
filter(lambda s: s.find('No')<>-1 , L) 



['No rain', 'No sleet']

<p><a name="fileIO"></a></p>
# File Input and Output

We first follow the step below to create a .txt file in iPython notebook:

- As above, save your notebook and go to the initial iPython screen.
- In the New menu (upper right), click Text file.
- Enter your text.
- Click on “untitled.txt” in the top left to name the file.
- Select “Save” from the file menu to save it.
- Click the word Jupyter on top left to return to the iPython screen. You should see your new file listed.


<p><a name="read"></a></p>
## Reading from Files

Before we input the file, we might want to inspect the file. Of course we can go back to the initial iPython screen to look at the file. With iPython notebook we can inspect a file without leaving our working space. We may use command line after the `!` notation 

**Note**: this is not python code, but a special feature in iPython notebook)

In [58]:
#!cat simpe text   for mac machines 
!more simple.txt

Before we input the file, we might want to inspect the file. 
Of course we can go back to the initial iPython screen to look at the file. 
With iPython notebook we can inspect a file without leaving our working space. 
We may use command line after the ! notation


Reading from files is very simple, because we can treat a file almost as a list of strings.

- To turn a file into a list of strings, simply do this:

In [59]:
# f is a file object here
# a file object has a bunch of methods associated with it ... like readlines
f = open('simple.txt', 'r')    # 'r' for read
lines = f.readlines()
f.close()
lines

['Before we input the file, we might want to inspect the file. \n',
 'Of course we can go back to the initial iPython screen to look at the file. \n',
 'With iPython notebook we can inspect a file without leaving our working space. \n',
 'We may use command line after the ! notation']

Now that we have the file’s contents in a list, we can apply all of our list- and string-processing powers to it.  E.g. turn all letters in simple.txt into uppercase:

In [60]:
text = ' '.join(map(lambda s: s.upper().strip(), lines))
text

'BEFORE WE INPUT THE FILE, WE MIGHT WANT TO INSPECT THE FILE. OF COURSE WE CAN GO BACK TO THE INITIAL IPYTHON SCREEN TO LOOK AT THE FILE. WITH IPYTHON NOTEBOOK WE CAN INSPECT A FILE WITHOUT LEAVING OUR WORKING SPACE. WE MAY USE COMMAND LINE AFTER THE ! NOTATION'

Let’s take that code apart. `simple.txt` has two lines:

- The first three lines read the file into a list, as we’ve seen. Note that each line still has its ending newline:

In [9]:
f = open('simple.txt', 'r')
lines = f.readlines()
f.close()
lines

['Before we input the file, we might want to inspect the file. \n',
 'Of course we can go back to the initial iPython screen to look at the file. \n',
 'With iPython notebook we can inspect a file without leaving our working space. \n',
 'We may use command line after the ! notation']

We can apply a function to each line using map.  Here we’re upper-casing each line:

In [10]:
map(lambda s: s.upper(), lines)

['BEFORE WE INPUT THE FILE, WE MIGHT WANT TO INSPECT THE FILE. \n',
 'OF COURSE WE CAN GO BACK TO THE INITIAL IPYTHON SCREEN TO LOOK AT THE FILE. \n',
 'WITH IPYTHON NOTEBOOK WE CAN INSPECT A FILE WITHOUT LEAVING OUR WORKING SPACE. \n',
 'WE MAY USE COMMAND LINE AFTER THE ! NOTATION']

We might want to assign the new list to a variable, or maybe back to lines:

In [12]:
lines = map(lambda s: s.upper(), lines)
lines

['BEFORE WE INPUT THE FILE, WE MIGHT WANT TO INSPECT THE FILE. \n',
 'OF COURSE WE CAN GO BACK TO THE INITIAL IPYTHON SCREEN TO LOOK AT THE FILE. \n',
 'WITH IPYTHON NOTEBOOK WE CAN INSPECT A FILE WITHOUT LEAVING OUR WORKING SPACE. \n',
 'WE MAY USE COMMAND LINE AFTER THE ! NOTATION']

because lists are more convenient if we want to do more processing.

In this case, we just want to get the new text in the form of a string, so we use join:

In [68]:
text = ''.join(lines)
text = text.replace('.','. ')
text

'BEFORE WE INPUT THE FILE, WE MIGHT WANT TO INSPECT THE FILE.  \nOF COURSE WE CAN GO BACK TO THE INITIAL IPYTHON SCREEN TO LOOK AT THE FILE.  \nWITH IPYTHON NOTEBOOK WE CAN INSPECT A FILE WITHOUT LEAVING OUR WORKING SPACE.  \nWE MAY USE COMMAND LINE AFTER THE ! NOTATION'

**Exercise 2** File input

- The `‘\n’` symbol on the previous slide is quite annoying. Try to get rid of it using the `strip()` function.
- Write a function `e_to_a` to read the contents of a file, and get a list of every line, with the letter `‘e’` changed to `‘a’` in every line.
```
e_to_a('simple.txt') ---> ["I'm lina 1,", "and I'm lina 2."]
```
Hint: Start with the usual code to read the lines of the file, then map replace over the lines and return the result.

In [69]:
# 1



# 2
def e_to_a(filename):
    f = open(filename, 'r')    # 'r' for read
    lines = f.readlines()
    f.close()
    return map(lambda x: x.replace('a','e').strip(), lines)

e_to_a('simple.txt')

['Before we input the file, we might went to inspect the file.',
 'Of course we cen go beck to the initiel iPython screen to look et the file.',
 'With iPython notebook we cen inspect e file without leeving our working spece.',
 'We mey use commend line efter the ! notetion']

In [20]:
# write a function that reads in atext file and retunrs the word count 
# note: it should not count .'s

def wc(filename):
    f = open(filename, 'r')
    lines = f.readlines()
    f.close()
    lines = map(lambda x: x.strip(), lines)
    lines = map(lambda x: x.replace('.',''), lines)
    lines = map(lambda x: x.replace(',',''), lines)
    lines = map(lambda x: x.replace('!',''), lines)
    text = ''.join(lines)
    return len(text.split())

wc('simple.txt')



46

<p><a name="output"></a></p>
## File Output

Writing output to a file is easy.

- Open file for output:  `f = open(filename, 'w')`. 
**Caution**: Once this line of code is executed, the file specified by the filename would be **ERASED!!**
- Write a string, `s`,  to the file:  `f.write(s)`
- Close the file:  `f.close()`

In [28]:
!more simple.txt

Before we input the file, we might want to inspect the file. 
Of course we can go back to the initial iPython screen to look at the file. 
With iPython notebook we can inspect a file without leaving our working space. 
We may use command line after the ! notation


In [23]:
f = open('simple2.txt', 'w')
f.write('This overwrites the file!')
f.close()

In [86]:
!more simple2.txt

This overwrites the file!


If you want to append a string to the end of the file, we may oopen the file for appending:

In [26]:
f = open('simple2.txt', 'a') # 'a' for appending
f.write('\nThis should be the second line.')
f.close()

In [27]:
!more simple2.txt

This overwrites the file!
This should be the second line.


You can open a file for both reading and writing at the same time:

In [None]:
f = open('simple.txt', 'r+')
lines = f.readlines()
lines

We may then write a new line into it:

In [None]:
f.write('\nThis should be the third line.')
f.close()

In [None]:
!more simple.txt

<p><a name="fileSearch"></a></p>
## Searching in Files

You may be familiar with the Unix command grep, which is used to search for strings within files.  For example:

In [None]:
!grep people oldmanandthesea.txt    # doesn't work on PC. grep is a Unix command and only works on mac OX 
# or other machines with unix based OS

*The txt file is from A Project Gutenberg Canada Ebook.*

We can do a similar thing in Python, using `filter` and `find`.

In [29]:
def grep(word, filename):
    f = open(filename, 'r')
    lines = f.readlines()
    f.close()
    
    output = filter(lambda line: line.find(word) != -1, lines)
    return "".join(output)

print grep('people', 'oldmanandthesea.txt')

quite sure no local people would steal from him, the old man thought
He always thought of the sea as _la mar_ which is what people call her
that were as long as the skiff and weighed a ton.  Most people are
him beyond all people.  Beyond all people in the world.  Now we are
table.  There was much betting and people went in and out of the room
many people will he feed, he thought.  But are they worthy to eat him?
did it to keep me alive and feed many people.  But then everything is a
are people who are paid to do it.  Let them think about it.  You were



** Exercise 3** Searching in Files

- Define `grep2(word1, word2, filename)`.  It returns the lines that contain both word1 and word2.

In [30]:
def grep2(word1, word2, filename):
    f = open(filename, 'r')
    lines = f.readlines()
    f.close()
    
    # has_word is true if line contains word
    has_both_words = lambda line: line.find(word1) != -1 and line.find(word2) != -1
    output = filter(has_both_words, lines)
    return "".join(output)

print grep2('old man', 'sea', 'oldmanandthesea.txt')


The sun rose thinly from the sea and the old man could see the other
in the sea and the old man loved to see the big sea turtles eating
the old man rode gently with the small sea and the hurt of the cord
there wallowing now in the seas and the old man pulled the skiff up



<p><a name="DS"></a></p>
# Data Structure

Lists are the most widely used data structure in Python. But they are not the only one. Other built-in data structures are sets and dictionaries:
- Sets - unordered collections without duplicates.
- Dictionaries - maps from one value (often strings) to another.

An important feature of Python data structures is that some are mutable and some are immutable; mutability is a key concept that we will discuss in this section.

For example, slicing is non-mutating:

In [83]:
L = ['a', 'b', 'c']
L[1:]

['b', 'c']

We can see the slicing is non-mutating because even `L[1:]` returns sub-list, the original list **`L`** itself remains unchanged:

In [84]:
L

['a', 'b', 'c']

`map` and `upper` are also non-mutating, since it returns a value:

In [85]:
map(lambda s: s.upper(), L)

['A', 'B', 'C']

But they do not change `L`.

In [86]:
L

['a', 'b', 'c']

**Exercise 4** Non-mutating operations

Try this using list and string operations you have learned:
 - Assign a list or string to a variable (say, L or s).
 - Perform operations on the variable.
 - Note that the variable changes only if you re-assign to it.
 - Now assign the variable to another variable:
```
M = L
t = s
```
 - Now perform operations on `L` and `s` and assign the result to `L` or `s`, e.g. “`L = L[2:]`” or “`s = s.upper()`”.  Do `M` or `t` change?

In [87]:
a=5
b=a   # this is assignment by current value 
a=6
print b  # b stays 5

5


In [88]:
A='five'
B=A
A='six'
print B

five


In [89]:
alist = [1,2,3]
blist = alist
alist = [4,5,6]
print blist

[1, 2, 3]


In [90]:
def add_5(lst):
    return lst.append(5)

my_list = [4,5,4]
add_5(my_list)
my_list    # I had a list and then I applied a function on that list and that function MUTATED my list !!!

[4, 5, 4, 5]

In [None]:
# what if I don't want my list to be mutated after I apply a function on it?
# then you need to created a copy of your list before applying a function on it


<p><a name="mutate"></a></p>
## Mutating Operations on Lists

- Lists are a mutable data type. The most important mutating operation is: **assignment**

In [89]:
skills = ['Python','SAS','Hadoop']
skills[1] = 'R'

We see that no value are returned, but the value of `skills` is changed!

In [90]:
skills

['Python', 'R', 'Hadoop']

In [93]:
skills = ['Python','SAS','Hadoop']
my_skills = skills
skills[1] = 'R'     # no assignment to my_skills
my_skills           # THIS IS SO WEIRD AND I DID NOT EXPECT my_skills TO CHANGE

['Python', 'R', 'Hadoop']

In [33]:
skills = ['Python','SAS','Hadoop']
my_skills = list(skills)  # this will create a new object
skills[1] = 'R'     # no assignment to my_skills
my_skills   # my_skills didn't change this time

['Python', 'SAS', 'Hadoop']

We have always added to a list by using `+`, which is non-mutating:

In [94]:
L = ['a', 'b', 'c']
L + ['d']

['a', 'b', 'c', 'd']

But `L` is not updated:

In [95]:
L

['a', 'b', 'c']

Assigning the value back to `L` updates it:

In [96]:
L = L + ['d']   
L

['a', 'b', 'c', 'd']

The `append` operation mutates a list:

In [97]:
L.append('e')
L

['a', 'b', 'c', 'd', 'e']

We have already seen `sorted(lis)`, which is a non-mutating sort operation:

In [98]:
lis = [4, 2, 6, 1]
sorted(lis)

[1, 2, 4, 6]

In [99]:
lis

[4, 2, 6, 1]

`lis.sort()` is a mutating sort operation:

In [100]:
lis.sort()    # as a rule of thump, mutating operations do not return anything (like mylist.sort()) 
lis          # but non-mutating opetaions return something (like sorted(mylist))

[1, 2, 4, 6]

It follows from what we’ve seen that a mutating operation applied to a list `L` can change the value of another variable if that variable is pointing to the same memory location as `L`:

In [101]:
lis = [4, 2, 6, 1]
lis2 = lis   # lis2 points to the same place in memory that is holding [4,2,6,1]
lis.sort()
lis2

# mutating

[1, 2, 4, 6]

In [102]:
# how can we do the above non-mutating?

lis = [4, 2, 6, 1]
lis2 = list(lis)  #lis2 is now pointing to a completely new copy of lis 
lis.sort()
lis2

# non-mutating

[4, 2, 6, 1]

Using `sorted(lis)` doesn't cause the change on `lis2`.

In [103]:
lis = [4, 2, 6, 1]
lis2 = lis
sorted(lis)

[1, 2, 4, 6]

In [104]:
lis2

[4, 2, 6, 1]

This is called a side effect of the mutating operation.  Programmers try to avoid side effects, because it is difficult to understand code when variables can change without even being mentioned.

Note that the mutating operations we have seen have no value, or rather, their value is `None`.  Try:

In [105]:
print lis.sort()
print lis.append(4)

None
None


It follows that we cannot use mutating operations in a map or filter, because those depend upon the value of the expression.  This is an attempt to extend every element of a nested list:

In [106]:
L = [[1], [2], [3]]
map(lambda l: l.append(4), L)

# append a mutating operation and does not print anything and when we print it, it returns nothing

[None, None, None]

In [107]:
L

[[1, 4], [2, 4], [3, 4]]

`sort` and `sorted` use the first element as the primary sort key, the second element as the second sort key, etc., and they sort in ascending order. You can customize the sort using two different arguments:
 - Sort on a user-defined key:

In [91]:
staff =[['Lucy','A',9], ['John','B',3], ['Peter','A',6]]
print sorted(staff)
print sorted(staff, key = lambda x: x[2])  # key is ID number

[['John', 'B', 3], ['Lucy', 'A', 9], ['Peter', 'A', 6]]

You can define functions that use mutating operations.  If the purpose of a function is to perform a mutating operation, it does not need a return value.

This function sorts a nested list, using the given element of each sublist as the sort key:

In [34]:
def sort_on_field(lis, fld):
    lis.sort(key = lambda x: x[fld])

L = [['a', 4], ['b', 1], ['c', 7], ['d', 3]]
sort_on_field(L, 1)

It has no return, and does not produce a value. But it mutates the variable.

In [35]:
L

[['b', 1], ['d', 3], ['a', 4], ['c', 7]]

**Exercise 5**

Write a function to switch the ith and jth items in a list.
```
def switch_item(L, i, j):
    ... function body goes here ...

my_list = ['first', 'second', 'third', 'fourth']
switch_item(my_list, 1, -1)
my_list ---> ['first', 'fourth', 'third', 'second']
```

In [114]:
#### Your code here

# mutating version:

def switch(L,i,j):
    temp=L[i]
    L[i]= L[j]
    L[j]=temp
    
L = [1,2,3,4,5,6]
switch(L,0,5)

L   # notice that L mutated

[6, 2, 3, 4, 5, 1]

In [117]:
# non-mutating version

def switch(L,i,j):
    newL = list(L)
    newL[i]=L[j]
    newL[j]=L[i]
    return newL

L = [1,2,3,4,5,6]
print switch(L,0,5)

print L  # notice that L DID NOT mutate
  
    

[6, 2, 3, 4, 5, 1]
[1, 2, 3, 4, 5, 6]


<p><a name="TSD"></a></p>
## Tuples, sets and dictionaries

We can now explain the other data types of Python.
- **Tuples**:  Tuples are like lists, but are immutable.
- **Sets**:  Also like lists, except that they do not have duplicate elements.  Immutable.
- **Dictionaries**:  These are tables that associate values with keys (usually strings).  Mutable.
- **Strings**:  Like lists of characters.  Immutable.

** Strings**

Strings are immutable

In [118]:
company = 'NYC DataScience Academy'
company[0] = 'A'

# this is not going to work 

TypeError: 'str' object does not support item assignment

In [119]:
# we can mutate company if we want to:
company = 'NYC DataScience Academy'
company = 'A' + company[1:]
company

'AYC DataScience Academy'

** Tuples**

- Tuples are similar to lists, but they are immutable.
- Tuples are written with parentheses instead of square brackets.

In [93]:
courses = ('Programming', 'Stats', 'Math') 
courses[2] = 'Algorithms'

# it won't work ... tuple will not accept mutating operations

TypeError: 'tuple' object does not support item assignment

- Tuples support all the non-mutating list operations:

In [94]:
courses[1:]

('Stats', 'Math')

In [97]:
map(lambda s: s.upper(), courses)  # we can pass a tuple to map or filter but the output is still going to be a list

['PROGRAMMING', 'STATS', 'MATH']

In [98]:
tuple(map(lambda s: s.upper(), courses))  # if we want the output to be a tuple

('PROGRAMMING', 'STATS', 'MATH')

Tuples and lists both allow a shorthand for assignment that allows all the elements of the tuple or list to be assigned to variables at once:

In [44]:
(a,b) = (1,2)   # works with lists also
a

1

In [45]:
b

2

In [47]:
[c,d,e]=[1,2,3]
e

3

This provides a handy way to swap variables:

In [125]:
(a,b) = (b,a)
a

2

In [126]:
b

1

In [99]:
# let's create a function that returns 2 variables as a tuple:

def sum_num(L):
    return (sum(L),len(L))

print sum_num([1,4,3,6,12])
(summa, length) = sum_num([1,4,3,6,12])
print summa
print length

(26, 5)
26
5


<p><a name="multiList"></a></p>
** Multiple List Operations **

A multiple-list operation is one that combines two lists
 - Add the elements of two lists of the same length
 - Rearrange the elements of one list by using elements of another list as subscripts
 - Select elements of one list corresponding to the True elements in a list of boolean values

These operations require that we map simultaneously over two lists. There are two ways to do this:
 - Use `map`. We’ve used `map` to map over a single list with a unary function, but it can also be used to map over multiple lists.
 - Use `zip` to put two lists together, then use unary map.
 
`map(fun, lis1, lis2)` applies fun to pairs of elements from `lis1` and `lis2`
 - `lis1` and `lis2` must have the same length
 - `fun` is a binary operation whose arguments are of the correct type.
 
If `lis1 = [x0, x1, x2, ...]` and `lis2 = [y0, y1, y2, ...]`, then `map(fun, lis1, lis2)` is equal to `[fun(x0, y0), fun(x1, y1), fun(x2, y2), ...]`.

In [133]:
map(lambda x, y: x+y, [1,2,3], [10,20,30]) # [1+10, 2+20, 3+30]

[11, 22, 33]

In [103]:
# if we wanted to use a for loop (discussed in next session):

L1 = [1,2,3]
L2= [10,20,30]

def add_two(L1, L2):
    L = list()
    for i in range(len(L1)):
        L.append(L1[i]+L2[i])
    return L

add_two(L1, L2)


[11, 22, 33]

In [134]:
map(lambda x, y: x[y], [[1, 2], [2, 3]], [0, 1]) # [[1, 2][0], [2, 3][1]]

[1, 3]

In [135]:
map(lambda x,y: (x,y),['a','b','c'],[1,2,3])    
# this is called zipping and it's such a useful operation that there's a function available in Python for it

[('a', 1), ('b', 2), ('c', 3)]

`zip` is a function that takes two lists of the same length and makes one list containing pairs of corresponding elements of the two lists.

In [136]:
a = [1, 2, 3, 4]
b = [5, 6, 7, 8]
zip(a, b)

[(1, 5), (2, 6), (3, 7), (4, 8)]

The elements in the zipped list are tuples. These are just like lists, but are written using parentheses instead of square brackets. 
zip provides an alternative to binary map:

In [137]:
map(lambda p: p[0]+p[1], zip([1,2,3], [10,20,30]))

[11, 22, 33]

Both `map` and `zip` can actually apply to more than two lists, in the “obvious” way:

In [138]:
a = [1, 2, 3]
b = [5, 6, 7]
c = [9, 10, 11]
zip(a, b, c)

[(1, 5, 9), (2, 6, 10), (3, 7, 11)]

In [139]:
map(lambda t: t[0]+t[1]+t[2], zip(a,b,c))

[15, 18, 21]

In [140]:
map(lambda x, y, z: x+y+z, a, b, c)

[15, 18, 21]

**Set**

- A set is an unordered collection with no duplicate elements.  Sets are immutable.

- To create a set, you can use either curly braces or the `set()` function.

In [49]:
vowels = {'u','a','e','i','o','u','i'}
vowels

{'a', 'e', 'i', 'o', 'u'}

In [105]:
fruit = set(['apple', 'orange', 'apple', 'pear'])
fruit

{'apple', 'orange', 'pear'}

- Sets support non-mutating list operations, as long as they don’t depend on order:

In [115]:
primes = {2, 3, 5, 7}
primes[2]

# there is no order in sets

TypeError: 'set' object does not support indexing

In [144]:
sum(primes)

17

In [145]:
len(primes)

4

In [56]:
map(lambda x: x*x, primes)      #map expects a list and since primes in not a list, it turns it to a list first

[4, 9, 25, 49]

In [116]:
# adding an elemnt to a set:

primes.add(11)
primes

{2, 3, 5, 7, 11}

In [117]:
# we can also use the | operator:

primes = primes | {13}
primes

{2, 3, 5, 7, 11, 13}

- Sets have mathematical operations like union (`|`), intersection (`&`), difference (`-`), and symmetric difference (`^`).

In [147]:
set_1 = {'a', 'b', 'c'}
set_2 = {'b', 'c', 'd'}

set_1 | set_2       # union

{'a', 'b', 'c', 'd'}

In [148]:
set_1 & set_2       # intersection

{'b', 'c'}

In [149]:
set_1 - set_2       # difference

{'a'}

In [150]:
set_1 ^ set_2       # symmetric difference (a-b | b-a)

{'a', 'd'}

In [151]:
set_1.union(set_2)

{'a', 'b', 'c', 'd'}

In [152]:
set_1.intersection(set_2)

{'b', 'c'}

In [153]:
set_1.difference(set_2)

{'a'}

In [154]:
set.union(set_1,set_2)    

{'a', 'b', 'c', 'd'}

In [155]:
set.intersection(set_1,set_2)

{'b', 'c'}

**Dictionaries**

- A dictionary is a set of keys with associated values. Each key can have just one value associated with it.  Dictionaries are mutable.
 - Any immutable object can be a key, including numbers, strings, and tuples of numbers or strings.  Strings are most common.
 - Any object can be a value.

- Dictionaries are written in set braces (like sets), with the key/value pairs separated by colons:


In [64]:
employee = {'sex': 'male', 'height': 6.1, 'age': 30}
# keys and values
# values don't need to be of the same type

The most important operation on dictionaries is key lookup:

In [160]:
employee['age']

30

We can add new key: value pairs to the dictionary:

In [65]:
employee['city'] = 'New York'
employee

{'age': 30, 'city': 'New York', 'height': 6.1, 'sex': 'male'}

It is illegal to access a key that is not present:

In [66]:
employee['weight']

KeyError: 'weight'

but you can check if a key is present using the in operator:

In [163]:
'weight' in employee

False

In [166]:
'sex' in employee

True

For convenience, you can construct a dictionary from a list (or set) of tuples:

In [67]:
dict([('sape', 4139), ('guido', 4127), ('jack', 4098)])

{'guido': 4127, 'jack': 4098, 'sape': 4139}

You can also get a list of the keys, the values, or all key/value pairs:

In [168]:
employee = {'sex': 'male', 'height': 6.1, 'age': 30}
employee.keys()

['age', 'height', 'sex']

In [169]:
employee.values()

[30, 6.1, 'male']

In [170]:
employee.items()

[('age', 30), ('height', 6.1), ('sex', 'male')]

You could represent a table as a list of pairs, use append to add items, and use filter to look them up:

In [68]:
employee = [('sex', 'male'), ('height', 6.1), ('age', 30)]
filter(lambda x: x[0]=='sex', employee)[0][1]

'male'

**Exercise 6**

- Given the following dictionary:
```
inventory = {'pumpkin': 20, 'fruit': ['apple', 'pear'], 'vegetable': ['potato','onion','lettuce']}
```
- Modify inventory as follows:
 - Add a meat inventory item containing 'beef', 'chicken', and 'pork'.
 - Sort the vegetables (Recall the sorted function.)
 - Add five more pumpkins.
After these changes, inventory is:
```
{'vegetable': ['lettuce', 'onion', 'potato'], 'fruit': ['apple', 'pear'],
 'meat': ['beef', 'chicken', 'pork'], 'pumpkin': 25}
```

In [None]:
inventory = {'pumpkin' : 20, 'fruit' : ['apple', 'pear'],
           'vegetable' : ['potato','onion','lettuce']}

#### Your code here

inventory['meat'] = ['beef', 'chicken', 'pork']
inventory['vegetable'].sort()  # sorts and mutates
inventory['pumpkin'] = inventory['pumpkin'] + 5
# alternate way: inventory['pumpkin'] += 5

print inventory

<p><a name="quiz"></a></p>
# Quiz

**Ex1** Write a function called **ex_1** to find the lines in a file that **contain** a word1 but **do not contain** word2.


In [192]:
# Your code here

def ex_1(filename, word1, word2):
    f = open(filename, 'r')
    lines = f.readlines()
    f.close()
    output = filter(lambda line: line.find(word1) != -1 and line.find(word2) == -1, lines)
    return ''.join(output)

print ex_1('oldmanandthesea.txt','fisherman','sea')

a fisherman in May."
"They say his father was a fisherman.  Maybe he was as poor as we are
"And the best fisherman is you."
Perhaps I should not have been a fisherman, he thought.  But that was
he is young and strong.  Also his father was a fisherman.  But would
born to be a fisherman as the fish was born to be a fish.  San Pedro
was a fisherman as was the father of the great DiMaggio.
pride and because you are a fisherman.  You loved him when he was alive
"He was eighteen feet from nose to tail," the fisherman who was



**Ex2** Write a function called **ex_2** that adds sequence numbers to each item in a list
```
ex_2(['a', 'b', 7]) ---> [(1, 'a'), (2, 'b'), (3, 7)]
```
Create the numeric sequence using range, and then use zip.


In [178]:
# Your code here

def ex_2(L):
    seq = range(1,len(L)+1)
    return zip(seq,L)

#test1:
mylist = ['a', 'b', 7]
print ex_2(mylist)

#test2:
mylist = [3,4,8]
print ex_2(mylist)


[(1, 'a'), (2, 'b'), (3, 7)]
[(1, 3), (2, 4), (3, 8)]


**Ex3**. Given the following dictionary:

`
roster = {'group1' :['Jack', 'Lisa'],
          'group2' :['Mike', 'Mary'],
     	 'group3' :['John', 'Laura']} 
`
- Get the keys and values of the dictionary.
- Add a group4, item containing ‘David', ‘Susan’.
- Convert all the names belong to group 2 to upper cases.


In [191]:
roster = {'group1' :['Jack', 'Lisa'],
          'group2' :['Mike', 'Mary'],
          'group3' :['John', 'Laura']} 

# Your code here

print roster.keys()
print roster.values()

roster['group4']=['David','Susan']
print roster

map(lambda x: x.upper(), roster['group2'])

['group1', 'group3', 'group2']
[['Jack', 'Lisa'], ['John', 'Laura'], ['Mike', 'Mary']]
{'group4': ['David', 'Susan'], 'group1': ['Jack', 'Lisa'], 'group3': ['John', 'Laura'], 'group2': ['Mike', 'Mary']}


['MIKE', 'MARY']

In [7]:
# write a function that reads a text file and outputs the top 20 words in that text (by count):

def top_words(filename):
    f = open(filename, 'r')
    lines = f.readlines()
    f.close()
    lines = map(lambda x: x.strip(), lines)
    lines = map(lambda x: x.replace('.',''), lines)
    lines = map(lambda x: x.replace(',',''), lines)
    lines = map(lambda x: x.replace('!',''), lines)
    text = ''.join(lines)
    ls_words = text.split()
    dct={}
    for item in ls_words:
        if item not in dct:
            dct[item]=1
        else:
            dct[item]=dct[item]+1
    output_list = list(sorted(dct.items(), key = lambda x: x[1], reverse = True))[:20]
    return output_list


top_words('oldmanandthesea.txt')
    


[('the', 1876),
 ('and', 1110),
 ('he', 788),
 ('of', 480),
 ('to', 403),
 ('was', 371),
 ('his', 370),
 ('a', 340),
 ('I', 336),
 ('it', 331),
 ('in', 317),
 ('that', 238),
 ('old', 234),
 ('man', 205),
 ('fish', 204),
 ('with', 184),
 ('him', 183),
 ('not', 180),
 ('had', 179),
 ('on', 179)]

In [8]:
mystring = 'I have a dog. My dog is very friendly. I take my dog out for a walk every day. My dog pees everywhere.'

def count_words(word, string):
    cnt = 0
    idx = 0
    while string.find(word, idx) >= 0:
        cnt = cnt+1
        idx = string.find(word, idx) + 1
    return cnt

count_words('dog',mystring)
        

4

In [85]:
# another method:

mystring = 'I have a dog. My dog is very friendly. I take my dog out for a walk every day. My dog pees everywhere.'

def count_words(word, string):
    string = string.split()
    string = map(lambda s: s.replace(',',''), string)
    string = map(lambda s: s.replace('.',''), string)
    sub_string = filter(lambda s: s=='dog', string)
    return len(sub_string)

count_words('dog', mystring)
    

4