## Data types

Today we'll be taking a look at the standard data types that come with python and some examples of how we can use these data types to represent real world information. This list isn't comprehensive, there are many more types available but this is the majority of what you'll see day to day.

### Why?

Data types are the building blocks of applications. They are the basic elements we can combine to form more complex structures.

## Integers

Integers are whole numbers. They can be either positive or negative:

In [1]:
1

1

In [2]:
-5

-5

Ints generally can be up to any size:

In [3]:
2**65

36893488147419103232L

But notice how there's an 'L' at the end there? This means that python has represented the output of the operation as a long int. I won't get into the details now but if you'd like more information: 

- http://en.wikipedia.org/wiki/32-bit
- http://stackoverflow.com/a/5435660
- http://en.wikipedia.org/wiki/Integer_(computer_science)#Common_integral_data_types


The boundry between ints and long ints depends on the system python is running on. Python will handle the switch from int to long automatically so normally it's not something you'd have to worry about but be aware that if you wanted to check if the user input or result of some calculation was an int, the test might fail:

In [4]:
import sys
sys.maxint

9223372036854775807

In [5]:
type(9223372036854775807)

int

In [6]:
type(9223372036854775807 + 1)

long

In [7]:
int == type(9223372036854775807 + 1)

False

In practice these differences don't mean much as ints and longs play well together. Side note: we don't see the 'L' in these examples because the print statement knows to hide it.

In [8]:
print type(2**100)
print type(15)
print 2**100 + 15
print 2**100 - 15
print 2**100 * 15
print 2**100 / 15

<type 'long'>
<type 'int'>
1267650600228229401496703205391
1267650600228229401496703205361
19014759003423441022450548080640
84510040015215293433113547025


## Floating point numbers

Floating point numbers can be a bit tricky. Let's take a look at some examples:

In [9]:
1.5

1.5

In [10]:
type(1.5)

float

So you might think that floats are simply numbers that have decimal parts but...

In [11]:
0.1 + 0.2

0.30000000000000004

The python docs discuss this behavior: (https://docs.python.org/2/tutorial/floatingpoint.html#representation-error):

> Note that this is in the very nature of binary floating-point: this is not a bug in Python, and it is not a bug in your code either. You’ll see the same kind of thing in all languages that support your hardware’s floating-point arithmetic (although some languages may not display the difference by default, or in all output modes).

For more info:

- http://en.wikipedia.org/wiki/IEEE_floating_point
- http://cr.yp.to/2005-590/goldberg.pdf

In short: if you are doing work which requires numerical precision you'll want to use the decimal library. Note that we have to pass the Decimal class a string as an argument.

In [12]:
from decimal import Decimal
Decimal('0.1') + Decimal('0.2')

Decimal('0.3')

## Strings

Now we get to the fun stuff. I say we're getting into the fun stuff because there aren't a lot of methods for numerical types but the rest of the types we'll discuss have plenty of methods available to them which can be very useful.

Strings are text, generally. A string is any collection of symbols surrounded by quotes:

In [13]:
('Hello '
"python "
'''learners''')

'Hello python learners'

This example demonstrates a couple of properties of strings. We can represent strings with single, double and triple quotes. We can split strings across several lines and they will be automatically grouped together into a single string as long was we group them together with parentheses.

There's more than one way to combine strings though:

In [14]:
"We can " + "concatenate strings " + "together using the + operator"

'We can concatenate strings together using the + operator'

In [15]:
first = "sometimes it's better "
middle = "to assign parts of a long string "
last = "to variables then concatenate the strings by the variable names"
sentence = first + middle + last
print sentence

sometimes it's better to assign parts of a long string to variables then concatenate the strings by the variable names


The methods available to strings allow us to modify and ask questions which give us useful information.

In [16]:
for i in dir('Hello'):
    if not i.startswith('_'):
        print i

capitalize
center
count
decode
encode
endswith
expandtabs
find
format
index
isalnum
isalpha
isdigit
islower
isspace
istitle
isupper
join
ljust
lower
lstrip
partition
replace
rfind
rindex
rjust
rpartition
rsplit
rstrip
split
splitlines
startswith
strip
swapcase
title
translate
upper
zfill


Here are some examples of what we can do with these methods:

In [17]:
word = "hello"
print "capitalize:", word.capitalize() # capitalize the first letter of the string
print "count:", word.count('l') # count how many times the string we pass as an argument appear in 'word'
print "endswith:", word.endswith('o') # T/F if it ends with the string we pass as an argument
print "index:", word.index('o') # Returns index of the string we pass as an argument (remember indexes start at 0)
print "isalpha:", word.isalpha() # methods that start with 'is' give us a clue that the method returns True or False
print "upper:", word.upper() # changes all letters of the string to uppercase
word_two = "HeLlO"
print "swapcase:", word_two.swapcase() # for every letter in the string, swap between upper and lower case
name = "guido van rossum"
print "title:", name.title() # Assumes the string is a name and will change the first letter of each word to uppercase

capitalize: Hello
count: 2
endswith: True
index: 4
isalpha: True
upper: HELLO
swapcase: hElLo
title: Guido Van Rossum


## Lists

So far we've talked about data types that exist as singular objects. Now we can move on to data types that act as collections of items. The first we'll discuss is lists.

A list is an *ordered* series of things. A list can contain objects of any type, including other lists! We use square brackets [] around a comma seperated series of objects to define a list.

In [18]:
[1, 2, 3]

[1, 2, 3]

In [19]:
[1, 'one', 1.0]

[1, 'one', 1.0]

In [20]:
[[1, 2, 3], ['one', 'two', 'three'], [1.0, 2.0, 3.0]]

[[1, 2, 3], ['one', 'two', 'three'], [1.0, 2.0, 3.0]]

Just like strings there are methods available to us to work with lists

In [21]:
for i in dir([]):
    if not i.startswith('_'):
        print i

append
count
extend
index
insert
pop
remove
reverse
sort


Let's take a look at how these work. We'll start off with a list of two names, alice and bob. From there we'll use each of the methods to modify the 'names' list.

In [22]:
names = ['alice', 'bob']
names

['alice', 'bob']

append() will add the argument to the end of the list

In [23]:
names.append('eve') 
names

['alice', 'bob', 'eve']

We'll append again to show off the next method

In [24]:
names.append('bob') 
names

['alice', 'bob', 'eve', 'bob']

count() tells us how many times the argument occurs in the list

In [25]:
print "The word 'bob' is seen:", names.count('bob') 

The word 'bob' is seen: 2


append() only adds a single item at a time, if we want to extend our original list by several items we can use the extend() method and pass in a list of things to add to the end.

In [26]:
names.extend(['bill', 'sally']) 
names

['alice', 'bob', 'eve', 'bob', 'bill', 'sally']

We can find the position of an item using index(), remember lists start counting at 0

In [27]:
print "'sally' is at index:", names.index('sally') 

'sally' is at index: 5


We can use insert() to put an item at a specific position in the list

In [28]:
names.insert(2, 'mike')
names

['alice', 'bob', 'mike', 'eve', 'bob', 'bill', 'sally']

pop() can be used for a couple of things, if we simply need to remove the last item from the list we can call it by itself

In [29]:
names.pop()
names

['alice', 'bob', 'mike', 'eve', 'bob', 'bill']

But, we can also keep that last item in another variable:

In [30]:
last_person = names.pop()
print names
print last_person

['alice', 'bob', 'mike', 'eve', 'bob']
bill


remove() will remove the 1st occurance of the argument we give it. Notice that alice and mike are now next to each other and the last bob is still in the list

In [31]:
names.remove('bob')
names

['alice', 'mike', 'eve', 'bob']

reverse() does pretty much what you'd expect it to

In [32]:
names.reverse()
names

['bob', 'eve', 'mike', 'alice']

As does sort()

In [33]:
names.sort()
names

['alice', 'bob', 'eve', 'mike']

## Interlude: Index notation

Before we move on to our discussion of tuples I'd like to discuss a common way to select items from objects. If we know the index of an item we can select it like this:

In [34]:
print names

['alice', 'bob', 'eve', 'mike']


In [35]:
print names[0]

alice


But this will work for other types as well 

In [36]:
'alice'[3]

'c'

## Tuples

Tuples a bit like lists but have some very important differences. First let's take a look at how they are similar:

- ordered
- series of things seperated by commas
- can be of any length
- can be a mix of any type of things

It'll be easier to show thier differences through example. Normally we use parentheses () to show a tuple, but in reality any object folowed by a comma is a tuple.

In [37]:
a = 1,
b = 'two',
c = 3.0,
print type(a)
print type(b)
print type(c)

<type 'tuple'>
<type 'tuple'>
<type 'tuple'>


But python will add the parentheses for us

In [38]:
print a, b, c

(1,) ('two',) (3.0,)


Probably the most important difference between a list and a tuple has to do with 'immutability.' Let's take a look at an example:

In [39]:
names = ['alice', 'bob']
people = ('alice', 'bob')
print names
print people

['alice', 'bob']
('alice', 'bob')


So far, not much difference. But lets say that we wanted to get rid of bob and replace him with eve. 

In [40]:
names[1] = 'eve'
names

['alice', 'eve']

In [41]:
people[1] = 'eve'

TypeError: 'tuple' object does not support item assignment

Uh-oh python has told us that the tuple does not allow us to 'mute' an item in the tuple the way we can with a list. In other words lists are mutable, tuples are immutable.

Let's see what methods we have available to us for tuples:

In [42]:
for i in dir(()):
    if not i.startswith('_'):
        print i

count
index


As a result of the immutability of tuples we don't have many built in methods.

## Dictionaries

So the data types we've seen so far are great for collections of things but there are times where we have pieces of information that are related in some way and we'd like to keep track of those relationships.

Let's start off with an example:

In [43]:
eng_to_spn = {'one': 'uno', 'two': 'dos', 'three': 'tres'}
eng_to_spn

{'one': 'uno', 'three': 'tres', 'two': 'dos'}

Here we have a relationship between pairs of strings, each pair is seperated by a ':' The object to the left of the ':' is called the **key** and that thing we will use to select a relationship from the dictionary. The object to the right of the ':' is the **value**. 

So we have a relationship that can be described as english numbers : spanish numbers

The whole collection of these pairs is the dictionary. We represent dictionaries in python with curly brackets {}

Let's try picking out some data from the dictionary:

In [44]:
eng_to_spn['one']

'uno'

Good, when I use a key to select from the dictionary I get the value associated with that key as a response. Let's try another way:

In [45]:
eng_to_spn[0]

KeyError: 0

An important thing to note about dictionaries is that while they are similar to lists and tuples they are **unordered** and because of that we can't select the *first* item using the 0 index like we could with a list or tuple.

As a matter of fact reading from a dict will produce an arbitrary order:

In [46]:
for i in eng_to_spn:
    print i

three
two
one


Let's take a look at the methods available to us for dictionaries:

In [47]:
for i in dir({}):
    if not i.startswith('_'):
        print i

clear
copy
fromkeys
get
has_key
items
iteritems
iterkeys
itervalues
keys
pop
popitem
setdefault
update
values
viewitems
viewkeys
viewvalues


copy() will return a "shallow copy" of the dictionary. I won't get into detail here but if you'd like more information see: http://stackoverflow.com/a/3975388

In [48]:
eng_to_spn2 = eng_to_spn.copy()
eng_to_spn2

{'one': 'uno', 'three': 'tres', 'two': 'dos'}

fromkeys() will take the keys from one dict and make a new dict with the same keys but with the keys that we specify

In [49]:
eng_to_spn3 = eng_to_spn.fromkeys(eng_to_spn, 'english')
eng_to_spn3

{'one': 'english', 'three': 'english', 'two': 'english'}

get() will pull the value from a dictionary:

In [50]:
eng_to_spn.get('one')

'uno'

What's useful about the get() method is that we can specify a default value in the case that what we are asking for doesn't exist yet in the dictionary. This can avoid errors:

In [51]:
eng_to_spn['four']

KeyError: 'four'

In [52]:
print eng_to_spn.get('four', None)

None


We can also ask if a key exists using has_key():

In [53]:
eng_to_spn.has_key('four')

False

We can get the pairs as a list of tuples using the items() method:

In [54]:
eng_to_spn.items()

[('three', 'tres'), ('two', 'dos'), ('one', 'uno')]

iteritems() gives an item that we can call .next() on. This is valuable in the case that don't want to load the entire dictionary into memory but still want to iterate through the items.

In [55]:
items = eng_to_spn.iteritems()
print items.next()
print items.next()

('three', 'tres')
('two', 'dos')


We can do the same with the keys using iterkeys():

In [56]:
keys = eng_to_spn.iterkeys()
print keys.next()
print keys.next()

three
two


We can remove a key and return the value using pop()

In [57]:
three = eng_to_spn.pop('three')
print three
eng_to_spn

tres


{'one': 'uno', 'two': 'dos'}

popitem() will remove a associaton and return it as a tuple but you don't get to pick which item you'd like to pop out!

In [58]:
anything = eng_to_spn.popitem()
print anything
eng_to_spn

('two', 'dos')


{'one': 'uno'}

setdefault() works a bit like get() but will set the value for us if it doesn't exist in the dictionary:

In [59]:
eng_to_spn.setdefault('four', 'quatro')
eng_to_spn

{'four': 'quatro', 'one': 'uno'}

update() allows us to add values from another dictionary:

In [60]:
new_numbers = {'five': 'cinco', 'six': 'seis'}
eng_to_spn.update(new_numbers)
eng_to_spn

{'five': 'cinco', 'four': 'quatro', 'one': 'uno', 'six': 'seis'}

We can see all the values from a dictionary using values()

In [61]:
eng_to_spn.values()

['quatro', 'cinco', 'seis', 'uno']

These next methods, viewitems(), viewkeys() and viewvalues() each return a dictionary view object. The python docs discuss thier purpose: https://docs.python.org/2/library/stdtypes.html#dictionary-view-objects

> The objects returned by dict.viewkeys(), dict.viewvalues() and dict.viewitems() are view objects. They provide a dynamic view on the dictionary’s entries, which means that when the dictionary changes, the view reflects these changes.
Dictionary views can be iterated over to yield their respective data, and support membership tests:

In [62]:
eng_to_spn.viewitems()

dict_items([('four', 'quatro'), ('five', 'cinco'), ('six', 'seis'), ('one', 'uno')])

In [63]:
eng_to_spn.viewkeys()

dict_keys(['four', 'five', 'six', 'one'])

In [64]:
eng_to_spn.viewvalues()

dict_values(['quatro', 'cinco', 'seis', 'uno'])

We skipped over clear() but here's a good time to see what it does, clear the dictionary out!

In [65]:
eng_to_spn.clear()
eng_to_spn

{}