# Data Types

Duncan Callaway

This notebook is a review of variable types.  My objective is to review enough of the basics to get you in a position to understand how a Pandas dataframe works, which is upcoming

## Object, method, function?

These phrases come up repeatedly in python.  

Q: What are they? 

Go [here for more](https://www.geeksforgeeks.org/difference-method-function-python/)

1. Object: virtually anything with attributes in python.  
    1. Objects usually belong to classes and have attributes
    1. For example an object might belong to a bike class.  The attributes of the class would be material, wheel size, number of gears, etc.
2. Method: A function associated with an object. 
    a. for example `object.method` applies the method to the object. 
3. Function: performs an action using some set of input parameters.
    b. for example `function(object)` applies the function to the object.
    
    
We'll talk about each of these in class today.

## Basic data, or variable, types

Q: what are some variable types native to Python? 

1. `string`
2. `numbers`
    1. `int`
    3. `float`
4. `bool`

**Strings**

In [1]:
name = 'Imogen'
type(name)

str

In [2]:
name[2]

'o'

An interesting note about strings: though we can index their contents, they are *immutable*, meaning you can't change individual elements.  Instead you need to do a wholesale reassignment.

In [6]:
name[4] = 'y'

TypeError: 'str' object does not support item assignment

In [7]:
name = name[0:4] + 'y' + name[5]
name

'Imogyn'

**Floats**

In [8]:
x = 0.34
type(x)

float

**Int**

In [9]:
i = 1
type(i)

int

**Boolean**

In [10]:
ans = True
type(ans)

bool

Note that Python is 'dynamically typed' meaning it assigns a variable a type based on what you set it equal to.

You don't need to define the type ahead of time.

But note that Python is also 'strongly typed' meaning that once the variable has a type, you have to work hard to reassign it.

## Data structures or "containers"

Q: what are some data structures native to Python? 

1. `dict`
2. `list`
3. `tuple`
4. `set`

Let's explore these a little further.  

### Lists

In [11]:
squares = [1, 5, 9, 16, 25] # do this in lecture
squares

[1, 5, 9, 16, 25]

We can index elements with the usual process:

In [12]:
print(squares[0]) # do this in lecture; zero is the first element
print(squares[-1]) # do this in lecture; -1 gives the last
print(squares[2:]) # do this in lecture; you can slice...

1
25
[9, 16, 25]


What does "mutable" mean?

...that you can change individual entries of the data structure.

Lists are mutable:

In [13]:
squares[1] = 4
squares

[1, 4, 9, 16, 25]

That's better! 

We can also append:

In [14]:
squares.append(6**2)
squares

[1, 4, 9, 16, 25, 36]

We can append lists this way too:

In [15]:
squares = squares + [49]
squares

[1, 4, 9, 16, 25, 36, 49]

Finally, we can nest lists

In [16]:
x = [1,2,3]
y = [4,6]

In [17]:
A = [x,y]
A

[[1, 2, 3], [4, 6]]

Note that the number of elements in the two lists within the list did not need to be equal.  

Note also that we can also assign different variable types to the nested list:

In [18]:
a = ['abc', 'def']

In [19]:
A = [x,a]
A

[[1, 2, 3], ['abc', 'def']]

Careful though, it's tough to index these things.

Think about how you might get the entry 'def' from the list, by integer indexes.

In [20]:
A[1,1]

TypeError: list indices must be integers or slices, not tuple

That didn't work!  (We have to wait for numpy and pandas to be able to do that.)

In [21]:
A[1][1]

'def'

Note, pandas data frames are a lot like lists in this way.  You first index which "sublist" you want, then you index into elements of that.  Numpy arrays (as we'll see) are much easier to index.  Frankly, the indexing with pandas is pretty annoying, but there are some workarounds that we'll discuss.  

### Tuples

Tuples are like lists -- they hold multiple elements and you can index them.  The key difference is that they are *immutable*

In [2]:
B = (1,2,3)
B

(1, 2, 3)

In [3]:
B[1]

2

In [4]:
B[1] = 3

TypeError: 'tuple' object does not support item assignment

Changing and individual element throws an error.

So why bother with tuples?
1. They are harder to work with, but
2. They prevent you from doing things you don't want to, and
3. They are more memory-efficient.  

### Sets

Whereas lists are defined with square brackets, sets use curlies:

In [22]:
basket = {'apple', 'orange', 'apple', 'pear', 'orange', 'banana'}

What happened there? 

...I wrote orange and apple twice.  Let's look at the set:

In [23]:
print(basket)

{'apple', 'orange', 'pear', 'banana'}


Pretty smart -- no duplicates!

What might you do if you have a variable called basket but you don't know what's in it? 

Now we can query the set for membership:

In [24]:
'orange' in basket

True

In [25]:
'kiwi' in basket

False

Note, pandas data frames are like lists in that there are a number of entries that we can very quickly query.  But there are even more similarities with `dict` data structures.

### Dict

Now let's talk about dicts.  We are inching closer to the kind of structure we have in a pandas data frame.

Dicts associate one value with another.

In [26]:
cars = {'Prius':'Toyota', 'Volt':'Chevy', 'Model 3': 'Tesla'}
cars

{'Model 3': 'Tesla', 'Prius': 'Toyota', 'Volt': 'Chevy'}

In [27]:
cars['Prius']

'Toyota'

In this case 
1. 'Prius' is the **key** of the dict.
2. 'Toyota' is the associated **value**.

We can add new entries very easily:

In [28]:
cars['Leaf'] = 'Nissan'

In [29]:
cars

{'Leaf': 'Nissan', 'Model 3': 'Tesla', 'Prius': 'Toyota', 'Volt': 'Chevy'}

Important note here:  the dict does not store entries in any particular order -- you rely on the key to pull data out.