# 012 Groups

## Introduction

### Purpose

In this section we will learn how use groups of objects in Python.


### Prerequisites

You will need some understanding of the following:


* [001 Using Notebooks](001_Notebook_use.ipynb)
* [005 Getting help](005_Help.ipynb)
* [010 Variables, comments and print()](010_Python_Introduction.ipynb)
* [011 Data types](011_Python_data_types.ipynb) In particular, you should be understand strings, integers and floating point numbers.

### Timing

The session should take around XX hours.



In [21]:
# the Hello there everyone example above has no spaces between the words. 
# copy the code and modify it to have spaces.

# generate a string called s
# and see how long it is

# lets have a spacer variable
spacer = ' '
quote = '"'
# add the spaces in
s = "Hello" + spacer + "there" + spacer + "everyone"
print ('the length of',quote+s+quote,'is',len(s))

# confirm that you get the expected increase in length.
# It is now 20 rather than 18 above

the length of "Hello there everyone" is 20


## Groups of things
Very often, we will want to group items together. There are several main mechanisms for doing this in Python, known as:

* string e.g. `hello`
* tuple, e.g. `(1, 2, 3)`
* list, e.g. `[1, 2, 3]`

A slightly different form of group is a dictionary:

* dict, e.g. `{1:'one', 2:'two', 3:'three'}`

You will notice that each of the grouping structures tuple, list and dict use a different form of bracket. The numpy array is fundamental to much work that we will do later.

We have dealt with the idea of a string as an ordered collection in the material above, so will deal with the others here.

We noted the concept of length (`len()`), that elements of the ordered collection could be accessed via an index, and came across the concept of a slice. All of these same ideas apply to the first set of groups (string, tuple, list, numpy array) as they are all ordered collections.

A dictionary is not (by default) ordered, however, so indices have no role. Instead, we use 'keys'.

###  `tuple`
A tuple is a group of items separated by commas. In the case of a tuple, the brackets are optional.
You can have a group of differnt types in a tuple (e.g. int, int, str, bool)

In [None]:
# load into the tuple
t = (1, 2, 'three', False)

# unload from the tuple
a,b,c,d = t

print(t)
print(a,b,c,d)

If there is only one element in a tuple, you must put a comma , at the end, otherwise it is not interpreted as a tuple:



In [None]:
t = (1)
print (t,type(t))
t = (1,)
print (t,type(t))

You can have an empty tuple though:



In [None]:
t = ()
print (t,type(t))

#### Exercise

* create a tuple called t that contains the integers 1 to 5 inclusive
* print out the value of t
* use the tuple to set variables a1,a2,a3,a4,a5

In [None]:
# do exercise here


### `list`
A `list` is similar to a `tuple`. One main difference is that you can change individual elements in a list but not in a tuple.
To convert between a list and tuple, use the 'casting' methods `list()` and `tuple()`:

In [None]:

# a tuple
t0 = (1,2,3)

# cast to a list
l = list(t0)

# cast to a tuple
t = tuple(l)

print('type of {} is {}'.format(t,type(t)))
print('type of {} is {}'.format(l,type(l)))

You can concatenate (join) lists or tuples with the `+` operator:



In [None]:
l0 = [1,2,3]
l1 = [4,5,6]

l = l0 + l1
print ('joint list:',l)

#### Exercise

* copy the code from the cell above, but instead of lists, use tuples
* loop over each element in the tuple and print out the data type and value of the element

Hint: use a `for ... in ...` construct.

In [None]:
# do exercise here

A common method associated with lists or tuples is:
* `index()`

Some useful methods that will operate on lists and tuples are:
* `len()`
* `sort()`
* `min(),max()`



In [None]:
l0 = (2,8,4,32,16)

# print the index of the item integer 4 
# in the tuple / list

item_number = 4

# Note the dot . here
# as index is a method of the class list
ind  = l0.index(item_number)

# notice that this is different
# as len() is not a list method, but 
# does operatate on lists/tuples
# Note: do not use len as a variable name!
llen = len(l0)

# note the use of integers in the braces e.g. {0}
# rather than empty braces as before. This allows us to
# refer to particular items in the format argument list
print('the index of {0} in {1} is {2}'.format(item_number,l0,ind))
print('the length of the {0} {1} is {2}'.format(type(l0),l0,llen))

#### Exercise

* copy the code to the block below, and test that this works with lists, as well as tuples
* find the index of the integer 16 in the tuple/list
* what is the index of the first item?
* what is the length of the tuple/list?
* what is the index of the last item?

In [None]:
# do exercise here

A list has a much richer set of methods than a tuple. This is because we can add or remove list items (but not tuple).

* `insert(i,j)` : insert `j` beore item `i` in the list
* `append(j)` : append `j` to the end of the list
* `sort()` : sort the list

This shows that tuples and lists are 'ordered' (i.e. they maintain the order they are loaded in) so that indiviual elements may be accessed through an 'index'. The index values start at 0 as we saw above. The index of the last element in a list/tuple is the length of the group, minus 1. This can also be referred to an index `-1`.

In [None]:
l0 = [2,8,4,32,16]

# insert 64 at the begining (before item 0)
# Note that this inserts 'in place'
# i.e. the list is changed by calling this
l0.insert(0,64)


# insert 128 *before* the last item (item -1)
l0.insert(-1,128)

# append 256 on the end
l0.append(256)

# copy the list 
# and sort the copy
# Note the use of the copy() method here
# to create a copy
l1 = l0.copy()

# Note that this sorts 'in place'
# i.e. the list is changed by calling this
l1.sort()

print('the list {0} once sorted is {1}'.format(l0,l1))

#### Exercise

* copy the above code and try out some different locations for inserting values (e.g. what does index `-2` mean?)
* what happens if you take off the `.copy()` statement in the line `l1 = l0.copy()`, i.e. just use `l1 = l0`?  [Why is this?](https://www.afternerd.com/blog/python-copy-list/)

In [None]:
# do exercise here

### 1.3.4 `dict`



The collections we have used so far have all been ordered. This means that we can refer to a particular element in the group by an index, e.g. `array[10]`.

A dictionary is not (by default) ordered. Instead of indices, we use 'keys' to refer to elements: each element has a key associated with it. It can be very useful for data organisation (e.g. databases) to have a key to refer to, rather than e.g. some arbitrary column number in a gridded dataset.

A dictionary is defined as a group in braces (curley brackets). For each elerment, we specify the key and then the value, separated by `:`.

In [None]:
a = {'one': 1, 'two': 2, 'three': 3}

# we then refer to the keys and values in the dict as:

print ('a:\n\t',a)
print ('a.keys():\n\t',a.keys())     # the keys
print ('a.values():\n\t',a.values()) # returns the values
print ('a.items():\n\t',a.items())   # returns a list of tuples

Because dictionaries are not ordered, we cannot guarantee the order they will come out in a `for` loop, but we will often use such a loop to iterate over the items in a dictionary.

In [None]:
for key,value in a.items():
    print(key,value)

We refer to specific items using the key e.g.:

In [None]:
print(a['one'])

You can add to a dictionary:

In [None]:
a.update({'four':4,'five':5})
print(a)

# or for a single value
a['six'] = 6
print(a)

Quite often, you find that you have the keys you want to use in a dictionary as a list or array, and the values in another list.

In such a case, we can use the method `zip(keys,values)` to load into the dictionary. For example:

In [None]:
values = [1,2,3,4]
keys = ['one','two','three','four']

a = dict(zip(keys,values))

print(a)

We will use this idea to make a dictionary of our ENSO dataset, using the items in the header for the keys. In this way, we obtain a  more elegant representation of the dataset, and can refer to items by names (keys) instead of column numbers.

In [None]:
import requests
import numpy as np
import io

# access dataset as above
url = "http://www.esrl.noaa.gov/psd/enso/mei.old/table.html"
txt = requests.get(url).text

# copy the useful data
start_head = txt.find('YEAR')
start_data = txt.find('1950\t')
stop_data  = txt.find('2018\t')

header = txt[start_head:start_data].split()
data = np.loadtxt(io.StringIO(txt[start_data:stop_data]),unpack=True)

# use zip to load into a dictionary
data_dict = dict(zip(header, data))

key = 'MAYJUN'
# plot data
plt.figure(0,figsize=(12,7))
plt.title('ENSO data from {0}'.format(url))
plt.plot(data_dict['YEAR'],data_dict[key],label=key)
plt.xlabel('year')
plt.ylabel('ENSO')
plt.legend(loc='best')

#### Exercise

* copy the code above, and modify so that datasets for months `['MAYJUN','JUNJUL','JULAUG']` are plotted on the graph

Hint: use a for loop

In [None]:
# do exercise here

We can also usefully use a dictionary with a printing format statement. In that case, we refer directly to the key in ther format string. This can make printing statements much easier to read. We don;'t directly pass the dictionary to the `fortmat` staterment, but rather `**dict`, where `**dict` means "treat the key-value pairs in the dictionary as additional named arguments to this function call".

So, in the example:

In [None]:
import requests
import numpy as np
import io

# access dataset as above
url = "http://www.esrl.noaa.gov/psd/enso/mei/table.html"
txt = requests.get(url).text

# copy the useful data
start_head = txt.find('YEAR')
start_data = txt.find('1950\t')
stop_data  = txt.find('2018\t')

header = txt[start_head:start_data].split()
data = np.loadtxt(io.StringIO(txt[start_data:stop_data]),unpack=True)

# use zip to load into a dictionary
data_dict = dict(zip(header, data))
print(data_dict.keys())

# print the data for MAYJUN
print('data for MAYJUN: {MAYJUN}'.format(**data_dict))

The line `print('data for MAYJUN: {MAYJUN}'.format(**data_dict))` is equivalent to writing:

    print('data for {MAYJUN}'.format(YEAR=data_dict[YEAR],DECJAN=data_dict[DECJAN], ...))
    
In this way, we use the keys in the dictionary as keywords to pass to a method.

Another useful example of such a use of a dictionary is in saving a numpy dataset to file.

If the data are numpy arrays in a dictionary as above, we can store the dataset using:



In [None]:
import requests
import numpy as np
import io

# access dataset as above
url = "http://www.esrl.noaa.gov/psd/enso/mei/table.html"
txt = requests.get(url).text

# copy the useful data
start_head = txt.find('YEAR')
start_data = txt.find('1950\t')
stop_data  = txt.find('2018\t')

header = txt[start_head:start_data].split()
data = np.loadtxt(io.StringIO(txt[start_data:stop_data]),unpack=True)

# use zip to load into a dictionary
data_dict = dict(zip(header, data))

filename = 'enso_mei.npz'

# save the dataset
np.savez_compressed(filename,**data_dict)

What we load from the file is a dictionary-like object `<class 'numpy.lib.npyio.NpzFile'>`.

If needed, we can cast this to a dictionary with `dict()`, but it is generally more efficient to keep the original type.

In [None]:
# load the dataset

filename = 'enso_mei.npz'

loaded_data = np.load(filename)

print(type(loaded_data))

# test they are the same using np.array_equal
for k in loaded_data.keys():
    print('\t',k,np.array_equal(data_dict[k], loaded_data[k]))

#### Exercise

* Using what you have learned above, access the Met Office data file (`https://www.metoffice.gov.uk/hadobs/hadukp/data/monthly/HadSEEP_monthly_qc.txt`)[https://www.metoffice.gov.uk/hadobs/hadukp/data/monthly/HadSEEP_monthly_qc.txt] and create a 'data package' in a numpy`.npz` file that has keys of `YEAR` and each month in the year, with associated datasets of Monthly Southeast England precipitation (mm).
* confirm that tha data in your `npz` file is the same as in your original dictionary
* produce a plot of October rainfall using these data for the years 1900 onwards

In [None]:
# do exercise here

## Summary

In this section, we have extended the types of data we might come across to include groups . We dealt with ordered groups of various types (`tuple`, `list`), and introduced the numpy package for numpy arrays (`np.array`). We saw dictionaries as collections with which we refer to individual items with a key.

We learned in the previous section how to pull apart a dataset presented as a string using loops and various using methods and to construct a useful dataset 'by hand' in a list or similar structure. It is useful, when learning to program, to know how to do this.

Here, we saw that packages such as numpy provide higher level routines that make reading data easier, and we would generally use these in practice. We saw how we can use `zip()` to help load a dataset from arrays into a dictionary, and also the value of using a dictionary representation when saving numpy files.