<DIV ALIGN=CENTER>

# Python Data Structures I
## Professor Robert J. Brunner
  
</DIV>  
-----
-----

## Introduction

Previously, we covered the basic Python concepts required to begin
writing legal Python code. In this lesson, we will introduce additional
fundamental concepts that form the basis of many Python programs. These
concepts include the built-in Python data structures: string, tuple,
list, and dictionary. In this notebook we focus on the string and list
data types.

-----

## Python Data Structures

Python provides built-in support for a number of useful data structures,
including the _string_, _tuple_, _list_, and _dictionary_:

__String__: 
A sequence of zero or more characters that are enclosed within either a
pair of single quote characters, `'`, or a pair of double quote
characters, `"`. A Python string is an instance of class `str`.

`A string containing many characters`

__Tuple__:
An ordered sequence of zero or more values that are enclosed in
parentheses, `(` and `)`. The different values in a tuple are generally separated
by commas, although this is not required. A Python tuple is an instance of class `tuple`.

`(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)`

__List__:
An ordered sequence of zero or more values that are enclosed in square
brakets, `[` and `]`. The different valus in the list are separated by
commas. A Python list is an instance of class `list`.

`[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]`

__Dictionary__:
An unordered collection of key, value pairs that are enclosed in curly
braces, `{` and `}`. A key is separated form its corresponding value by
a colon `:` and the different key-value entries in the collection are
separated by commas. A Python dictionary is an instance of class `dict`.

`d = {'name': "Alexander", 'age': 30, 'location': (102.1, 32.1)}`

----

Of these four data structures, the _list_ and the _dictionary_ are
mutable, which means the values stored in these structures can be
changed. On the other hand,  the _string_ and the _tuple_ are immutable,
which means the values can not be changed once created. Instead, a new
data structure must be created, either explicitly by the programmer, or
implicitly by the Python interpreter. All of these data structures can be
displayed by using the Python built-in `print` function, which converts
its arguments to a string, which is then sent to STDOUT:

```python
data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
print(data)
```

The _dictionary_ is a special data structure that maps keys to values,
and thus the keys, which can be stored in any order, are used to access
the values. The other three structures are ordered sequences, and thus
individual elements can be accessed by specifying an index position
within square brackets, `[]`, with the caveat that Python is a
zero-indexed language. Thus, given an ordered sequence `data`, the
following accesses are legal:

- `a[0]`: access the first value
- `a[1]`: access the second value
- `a[-1]`: access the last value
- `a[-2]`: access the second to last value

### Slicing

Python supports a rich array of techniques for extracting values from an
ordered list beyond the single value access method, known as slicing.
Given an ordered sequence `data`, the basic format is
`data[start:end:stride]` where start and end are the starting and
ending index values, respectively, and stride is the number of values to
skip when iterating. If start or end are omitted, the default is the
first and last value, while the default stride is one. A negative value
can be used for either the start or the end index values, which
indicates relative to the end value. These concepts are demonstrated in
the following code block:

In [1]:
# Edit the start/end/stride values to learn slicing

data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

print(data[0])
print(data[2:-2])
print(data[:-3:2])

1
[3, 4, 5, 6, 7, 8]
[1, 3, 5, 7]


-----

### Common Sequence Operations

In addition to slicing, the _string_, _tuple_, and _list_ support
several common sequence operations. Given a value `v`, integer `n`, and
similar typed sequences `s` and `t`:

| Operation | Description |
| ----- | ----- |
| `v in s`| `True` if `v` is in the sequence `s`, otherwise `False`|
| `v not in s`| `False` if `v` is in the sequence `s`, otherwise `True`|
| `s + t`| concatenation of `s` and `t`|
| `s * n` or `n* s`| `n` shallow copies of `s` concatenated|
| `len(s)`| the number of elements in the sequence `s`|
| `min(s)`| the smallest elements in the sequence `s`|
| `max(s)`| the largest of elements in the sequence `s`|
| `s.count(v)`| number of times `v` appears in `s`|

These methods are demonstrated below on the `data` list:

In [2]:
1 in data

True

In [3]:
0 not in data

True

In [4]:
data * 2

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

In [5]:
data + [11, 12, 13, 14]

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]

In [6]:
print(len(data), min(data), max(data))

10 1 10


-----

### [Strings](https://docs.python.org/3.4/library/stdtypes.html#text-sequence-type-str)

In Python, a string is a sequence of zero or more characters that are
enclosed within either a pair of single quote characters, `'`, or a pair
of double quote characters, `"`. While it might seem confusing to have
two such similar representations, it can be quite handy, as this allows
strings to be written that include either single quotes or double quote
characters by simply using the opposite type to enclose the string
itself:

```
"A string with a contraction such as don't or a possessive like Python's string."
'A string with a quote, "Four score and seven years ago ..."'
```

A string can span multiple lines by using either three single or double
quote characters to enclose the sequence of characters, just like
multi-line comments. Adjacent strings on the same line will be silently
combined by the Python interpreter:

```python
data = 'First Name ''Last Name '" Email Address"
print(data)

First Name Last Name  Email Address
```

A special string in Python is known as a _raw string_, where
backslashes are not handled in a special way when processing the string.
This is useful when constructing regular expressions or other strings
that might include formatting information, like plot axes that use LaTeX
formatting. TO create a raw string, simply prefix the string with a
lower-case `'r'`:

```python
data =r'Text with a \ that is left unprocessed.'
```

The `str` class defines a large number of functions that can be used to
process string data, including testing if the string only contains
alphabetical values, numerical values, or alphanumeric values, and
functions that can convert a string to all lower-case or uppercase
characters. A full description of the string methods is available from
the online [Python
Documentation](https://docs.python.org/3/library/stdtypes.html#str) or
by using `help(str)` at a Python prompt or IPython Notebook cell.

Some of the more useful Python _string_ functions are listed in the 
following table, assuming the `data` is a string.

__`format`__:

The format method is used to create a new, formatted string from a
template string and substitution text. The classic example is a form
letter, where specific fields are replaced by new data with every
string. The `format` method replaces the previous `%` string formatting
operator. In its basic form, the template string includes identified
`{}` to indicate replacement string locations, and the format method
takes arguments that are used to indicate the replacement text. For
example,

```python
'Hello {}, you are visitor #{}!'.format('Alexander', 23)
```

will return

```
'Hello Alexander, you are visitor #23!'
```

__`find`__:

The `find` method locates the first occurrence of a sub-string in the
full string, and returns the index position of this first occurrence. For
example, 

```python
"The brown dog jumped over the quick fox!".find("he")
```

returns 1.

__`split`__:

The `split` method is very powerful, and will tokenize a string into
substrings based on the input arguments, which are whitespace characters
by default. For example,

```python
"The brown dog jumped over the quick fox!".split()
```
returns

```
['The', 'brown', 'dog', 'jumped', 'over', 'the', 'quick', 'fox!']
```

__`strip`__:

The `strip` method is used to remove characters specified as input
arguments to the method from the beginning and end of a string. By
default, whitespace characters are removed. Two variants of this
function: `lstrip` and `rstrip` remove leading or trailing characters,
respectively.

```python
"    Some text surrounded by white space characters    ".strip()
```
returns

```
'Some text surrounded by white space characters'
```

__`join`__:

While strings can be combined by using the `+` operator, this approach
is slow for many additions since each addition requires the construction
of a new string to hold the combined result. A more efficient string
combination approach is to use the `join` method, which can quickly
combine multiple strings that are contained in an iterable object such
as a `list` or `tuple` together. The string you use to call the join
method provides the _glue text_ between each item in the iterable. For
example, the following method will create a new string from a list of
strings that are each separated by a comma and a single space character:

```python

data = ['1', '2', '3', '4', '5', '6', '7', '8', '9']

", ".join(data)
```
returns 

`'1, 2, 3, 4, 5, 6, 7, 8, 9'`

The following code block demonstrate several string operations that you
can test, change, and execute.

----


In [7]:
text = ['The', 'brown', 'dog', 'jumped', 'over', 'the', 'quick', 'fox!']

newtext = " ".join(text)

print(newtext)

The brown dog jumped over the quick fox!


-----

Another useful built-in function is the `input` function that can be
used to obtain information from the user. The `input` method has a
string argument that is displayed on _STDOUT_ and reads characters from
_STDIN_ until a newline character is encountered. This is demonstrated in
the following code cell.

-----

In [8]:
name = input("Enter your Name: ")

print("Welcome {0}".format(name))

Enter your Name: Alexander the Great!
Welcome Alexander the Great!


-----

### [List](https://docs.python.org/3/tutorial/datastructures.html#more-on-lists)

A mutable sequence that can hold homogeneous data, `[1, 2, 3, 4, 5]` or
heterogeneous data, `[1, '2', 'Three', (4, 5)]`. A list can be created
in several different ways:

1. `[]`: An empty list
2. `[1]`: A single valued list
3. `[1, 2, 3]`: Comma-separated items
4. `list()`: using the list class constructor

Since a `list` is mutable, a list can be changed by adding elements,
removing elements, or simply changing existing elements in place. Lists
are very powerful data structures and are used extensively in many
Python programs. The following table presents some of the more commonly
used list functions:

| Function | Description | Example |
| ---- | ----- | ---- |
| append| add an element to the end of the list | data.append(11) |
| insert | insert an element at the specified index | data.insert(4, '4') |
| del | delete the value at the specified index | del data[4] |
| remove | remove the  element containing the value | data.remove(11) |
|clear | remove all elements in the list | data.clear() |
| sort | sorts list in place | data.sort() |
| reverse | reverses list in place | data.reverse() | 

By default, assigning a list to a new variable results in a shallow
copy, which means that both variables point to the same underlying list
and any changes to one results in changes to the other. For example,
after this set of operations:

```python
data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
d = data
d[0] = -1
```

`data[0]` now contains the value `-1`. To obtain a deep copy, use the
slice notation without any values. For example, after this set of
operations:

```python
data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
d = data[:]
d[0] = -1
```

`data[0]` retains the original value of `1`.

-----

In [10]:
data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

print(data)

data.reverse()
print(data)

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
[10, 9, 8, 7, 6, 5, 4, 3, 2, 1]


In [11]:
# Now we compare shallow and deep copies

# First a deep copy
d = data[:]
d[1] = -1

print(data)

# Now a shallow copy
d = data
d[1] = -1

print(data)

[10, 9, 8, 7, 6, 5, 4, 3, 2, 1]
[10, -1, 8, 7, 6, 5, 4, 3, 2, 1]


## Additional References

1. [Think Python](http://faculty.stedwards.edu/mikek/python/thinkpython.pdf) for Python3.
2. [Python3 Tutorial](https://docs.python.org/3.4/tutorial/index.html)
3. [Dive into Python3](http://www.diveintopython3.net/index.html)

For information on writing Python programs at the command line, Google for Education has developed [Pyhon Setup directions](https://developers.google.com/edu/python/set-up).

Python 2 versus Python 3:
1. A [discussion of the differences between Python 2 and Python 3](http://python3porting.com/intro.html).
2. A [shorter List of Changes from Python 2 to Python 3](http://inventwithpython.com/appendixa.html).

Several free books, mostly written for Python 2:
1. [Open Tech School's Introduction to Programming with Python](http://opentechschool.github.io/python-beginners/en/index.html)
2. [Invent with Python](http://inventwithpython.com)
3. [Building Skills in Programming](http://www.itmaybeahack.com/homepage/books/)
4. [Learn Python The Hard Way, 3rd Edition](http://learnpythonthehardway.org/book/)
5. [A Byte of Python](http://www.ibiblio.org/g2swap/byteofpython/read/)
 
 
-----