# Lecture 12 2018-09-27: tuples, sets; formatting strings redux; comprehension

This worksheet accompanies the lecture notes.


# Tuples and sets

## Tuples

Tuples are (iterable) sequences of values. Unlike lists (but like strings) they are immutable. 

Syntax 

>(value, value, value...)

You have seen tuples before! In 
>enumerate( nucs+amb, char_counts )

this is a tuple whose first element is a string and second element a list of ints

>( nucs+amb, char_counts )

Indexing and slicing tuples is same as with lists.

There are only two tuple methods. (compare to string!)

>tuple.count()
>tuple.index()

Examples:

```
    sequence="CGGATCGNNAAGCTCTGTTGTTGGTGANNNYYGGATAYAGGUUNYGTAACTGGCCT"
    nucs=['A','C','G','T']
    amb=['N','Y','U']
    char_counts = [ sequence.count(x) for x in nucs+amb ]

    nuc_tuple = ( nucs+amb, char_counts )
    nuc_tuple[:-1]

    tuple(sequence)

    sequence.count('A')

    counts = 1
    for next_char in sequence:
        if next_char = 'A':
            counts = counts + 1

    x, y = 0, 1
    print(x,y)
    x, y = y, x
    print(x,y)

    sequence[0] = 'X'
```       

## Sets

Sets are iterable sequences of *unique* elements (like mathematical sets)

Useful methods (note that they use the *object.method(other_object)* format we saw with *str.join(list)*

>set.add(item)
>set.remove(item)

>set.difference(set)
>set.intersection(set)
>set.symmetric_difference(set)
>set.union(set)

```
mammals = {'horse', 'platypus', 'cat', 'human'}
quadrapeds = {'horse', 'platypus', 'dog'}
placental_mammals = {'horse', 'cat', 'human', 'fish'} # remember to check your data for accuracy!

placental_mammals.remove('fish')
placental_mammals.add('rat')

hairy_quadrapeds = quadrapends.intersection(placental_mammals)
hairy_quadrapeds_placentals = quadrapends.intersection(placental_mammals.intersection(mammals))

'booze_hound' in mammals
'dog' in quadrapeds

all_animals = mammals.union(quadrapeds.union(placental_mammals))
len(all_animals)
```

Common use: get rid of duplicates
```
mammals = [ 'horse', 'platypus', 'cat', 'human', 'horse', 'horse']
print(mammals)
mammals = list(set(mammals))
print(mammals)
```

Iterative, so they can be used in loops:

```
for next_animal in mammals:
   print('my next favorite animal', next_animal)
```


# String formatting, fancy style

So far, we have seen how to use *print* with multiple strings, and how to include escape characters (like tab, '\t' and new line '\n') in strings, to get some primitive formatting. We have also seen how so use the *str.join(list)* method to form tab delimited strings from a list of strings. 

An alternative is the *str.format()* method, which is a very powerful way to format strings. To use it, the string whose method you are invoking has what you want to print, with patterns in it that you want to format or provide values for. the parameters in parentheses are the values you want to use to replace the patterns.

The parameters are enclosed in "{}" marks. what is in the brackets tells python how you want to format data. *format()* method replaces these with the parameters between brackets, in order. 

For example, this produces what you think it should:

```
gc_content = 0.667
dna_string = 'GGCCTA'
print('gc: content is {} for {}'.format(gc_content, dna_string))
```

In [99]:
gc_content = 0.667
dna_string = 'GGCCTA'
print('gc: content is {} for {}'.format(gc_content, dna_string))

gc: content is 0.667 for GGCCTA


## more complicated formatting

Using '{n}' for an integer n uses the nth parameter from the *format()* method. And you can re-use patterns. 

```
name='James'
greeting='hello'
print('dear {1}. I bid you {0}, {1}'.format(greeting, name))
```

In [100]:
name='James'
greeting='hello'
print('dear {1}. I bid you {0}, {1}'.format(greeting, name))

dear James. I bid you hello, James


More fine-grained formatting is possible. 
Here are some common uses.
(for details on formatting, see the very, very geeky: https://docs.python.org/3/library/string.html#formatstrings)

In general, a number before the ":" says which parameter to use, a number after ":" is a width, a number after "." says how many digits/characters to use, and characters after the number change the formatting in self-explanatory ways. 

* to format the ith parameter as an int, right aligned to a column of width w use '{i:w}'
* to format with comma separations, use '{i:w,}'
* to format as a percentage, use 'i:w%'
* to format a float with a after the decimal, use 'i:.a'
* to format a number in exponential notation with with d digits, use 'i:.d' 

Strings make sense, too

In [101]:
print('as percent {0:10%}\twith commas {0:5,}\tfloat {1:.4}\texponential {2:.5}'.format(
    12345, 3.14156, 0.0000000001234567)
     )
print('--{0:.3}\t{0:.2}\t{0}--'.format('jabberwocky'))

as percent 1234500.000000%	with commas 12,345	float 3.142	exponential 1.2346e-10
--jab	ja	jabberwocky--


# comprehension

Comprehension creates one sequential structure from another. 
this is a very powerful feature of python.

*list comprehension* creates a list, *dictioary comprehension* creates a dict, *set comprehension* creates a set, etc. Comprehension is essential shorthand for the usual way of creating a structure, by creating an empty structure then adding elements one at a time in a *for* loop. For example, a list comprehension equivalent to 

```
numbers = list(range(20))

evens = []
for x in numbers:
    if x%2 == 0:
       evens.append(x)
print(evens)
```

is

```
evens = [x for x in numbers if x%2 == 0]

print(evens)
```

Here are more examples

```
evens = [x for x in numbers if x%2 == 0]
pairs = [(x,y) for x,y in list(range(4))]
distinct_pairs = [(x,y) for x,y in range(4) if x != y]
```

## List comprehension

Lists are a sequence of values between '[' and ']', so comprehension looks like

>[how to build the list from other structures]

The "other structures" can be lists, sets, tuples, strings ... any sequential structure

```
numbers = list(range(20))

evens = []
for x in numbers:
    if x%2 == 0:
       evens.append(x)
print(evens)
```

is

```
evens = [x for x in numbers if x%2 == 0]

print(evens)
```

Here are more examples

```
evens = [x for x in numbers if x%2 == 0]
pairs = [(x,y) for x,y in list(range(4))]
distinct_pairs = [(x,y) for x,y in range(4) if x != y]
```

In [102]:
evens = []
for x in numbers:
    if x%2 == 0:
       evens.append(x)
print(evens)

[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]


In [103]:
numbers = list(range(20))
print(numbers)

evens = [x for x in numbers if x%2 == 0]
evens

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]


[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

Multiple *for*s are possible, just as one can have multiple *for* loops. For example, to make a list with pairs of ints, for 

```
pairs = []
for x in range(4):
    for y in range(2):
       pairs.append((x,y))

print('pairs: {}'.format(pairs))
```

use

```
pairs = [(x,y) for x in range(4) for y in range(2)]
print('pairs: {}'.format(pairs))
```

and for pairs of unique ints, instead of 
```
distinct_pairs = []
for x in range(4):
    for y in range(2):
       if x != y:
           distinct_pairs.append((x,y))

print('distinct_pairs: {}'.format(distinct_pairs))
```

use

```
distinct_pairs = [(x,y) for x in range(4) for y in range(2) if x != y]

print('distinct_pairs: {}'.format(distinct_pairs))
```

In [104]:
pairs = []
for x in range(4):
    for y in range(2):
       pairs.append((x,y))

print('pairs: {}'.format(pairs))

distinct_pairs = [(x,y) for x in range(4) for y in range(2) if x != y]

print('distinct_pairs: {}'.format(distinct_pairs))


pairs: [(0, 0), (0, 1), (1, 0), (1, 1), (2, 0), (2, 1), (3, 0), (3, 1)]
distinct_pairs: [(0, 1), (1, 0), (2, 0), (2, 1), (3, 0), (3, 1)]


though better formatting for this may have been

```
distinct_pairs = [(x,y) 
    for x in range(4) 
    for y in range(2) 
    if x != y]
```

Or

```
sequence_list = ['AAA', 'AAG', 'ATC', 'TAG', 'AAG', 'AAG']

two_As = [x for x in sequence_list if x.count('A' == 2)]
```

One can perform operations and invoke methods when appropriate

In [105]:
squared_evens = [x*x for x in evens]
print(squared_evens)

[0, 4, 16, 36, 64, 100, 144, 196, 256, 324]


In [106]:
sequence_list = ['AAA', 'AAG', 'ATC', 'TAG', 'AAG', 'AAG']

two_As = [x for x in sequence_list if x.count('A') == 2]
print(two_As)

['AAG', 'AAG', 'AAG']


### A very important example

As you know, *str.join(list)* only works if *list* is a list **of strings**. 
If it isn't, list comprehension is an easy way to make it one. 
We have seen this in several lectures and homeworks, but there I just called it "magic".

Consider
```
int_list = [1,2,3,4,5]
list_list = [[1,2], [3,4], [5], [6,'foo']]
```

these will not work

```
print('\t'.join(int_list))
print('\t'.join(list_list))
```

But this will:

````
print('\t'.join( 
   [ str(value) for value in int_list ] 
       )
   )
print('\t'.join(
    [ str(s) for s in list_list ]
        )
    )
```

In [107]:
int_list = [1,2,3,4,5]
list_list = [[1,2], [3,4], [5], [6,'foo']]

In [111]:
print('\t'.join(int_list))
print('\t'.join(list_list))

TypeError: sequence item 0: expected str instance, int found

In [112]:
print('\t'.join( 
   [ str(value) for value in int_list ] 
       )
   )
print('\t'.join(
    [ str(s) for s in list_list ]
        )
    )

1	2	3	4	5
[1, 2]	[3, 4]	[5]	[6, 'foo']


## Set Comprehension

Similarly, set comprehension looks like this

>{how to build the set from other structures}

```
two_As_unique = {x for x in sequence_list if x.count('A') == 2}
two_As_unique
```

In [113]:
two_As_unique = {x for x in sequence_list if x.count('A') == 2}
two_As_unique

{'AAG'}

## Dictionary comprehension

For dicts, the syntax is

>{ how to build a dict with keys k and values v, from other structures}

The "other structures" can be lists, sets, tuples, strings ... any sequential structure

for example to convert 

```
list_list = [ (1,2), ('foo','bar'), (3, [1,2,3])]
```

to a dict with the first of each pair as a key and the second as a value, use

```
dict_version = { x: y for x,y in list_list }
print(dict_version)
```

In [114]:
list_list = [ (1,2), ('foo','bar'), (3, [1,2,3])]
list_list

[(1, 2), ('foo', 'bar'), (3, [1, 2, 3])]

In [115]:
dict_version = { x: y for x,y in list_list }
print(dict_version)

dict_version['foo']

{1: 2, 3: [1, 2, 3], 'foo': 'bar'}


'bar'

A silly example of using methods

```
words = ['when', 'in', 'the', 'course', 'of']
word_lengths = { w:len(w) for w in words }

print(word_lengths)
print('length of "course" is ', word_lengths['course'])
```

In [116]:
words = ['when', 'in', 'the', 'course', 'of']
word_lengths = { w:len(w) for w in words }

print(word_lengths)
print('length of "course" is ', word_lengths['course'])

{'when': 4, 'the': 3, 'of': 2, 'course': 6, 'in': 2}
length of "course" is  6


or, for a bad but showy version of this example that reminds you about *zip*

```
keys = [x for x,y in list_list]
values = [y for x,y in list_list]

dict_version = dict( zip( keys, values ) )
print dict_version
```

In [117]:
keys = [x for x,y in list_list]
values = [y for x,y in list_list]

dict_version = dict( zip( keys, values ) )
print('keys: {}\nvalues: {}\ndict_version: {}'.format(keys, values, dict_version))

keys: [1, 'foo', 3]
values: [2, 'bar', [1, 2, 3]]
dict_version: {1: 2, 3: [1, 2, 3], 'foo': 'bar'}
