# Data Types and Operations

## Fundamental Data Types

### Booleans

In [None]:
True

True

In [None]:
False

False

### Integers

Any whole number (1,2,3,4,...)

In [None]:
type(10)

int

### Floats

Any number with a decimal (floating) point

In [None]:
pi = 3.14
type(pi)

float

### Strings

Anything contained within single, double, or triple quotes.

In [None]:
sentence = 'Hello, how are you?'

In [None]:
type(sentence)

str

In [None]:
looks_like_an_integer = "25"
type(looks_like_an_integer)

str

In [None]:
looks_like_an_integer = "25"
type(looks_like_an_integer)

## Operators

### Arithmetic Operators

In [None]:
2 + 3 # Addition

5

In [None]:
'Hello' + ' World' # Works on strings too!

'Hello World'

In [None]:
10 - 7 # Subtraction

3

In [None]:
4 * 5 # Multiplication

20

In [None]:
'Go On ' * 5 # Multiplication works on strings too!

'Go On Go On Go On Go On Go On '

In [None]:
8 / 4 # Division - Notice how it returns a float, even for whole numbers

2.0

In [None]:
9 // 2 # Floor Division - Number of full divisions; discards the remainder

4

In [None]:
9 % 2 # The remainder after division – Useful for finding odd numbers

1

In [None]:
4 ** 2 # Exponent – Raises to the power of

16

### Assignment Operators

In [None]:
x = 10         # Assigning a number
colour = 'Red' # Assigning a string

In [None]:
# To make a counter
i = 1      # Start at 1
i = i + 1  # Add 1 to it
print(i)   # Print out the result

2


In [None]:
i += 1     # We can use this shorthand for combining Addition + Assignment
print(i)

3


In [None]:
i *= 2   # Works for other arithmetic operators too
print(i)

6


### Boolean Operators

Think of these like you are asking a question

In [None]:
x > 10  # Is x greater than 10?

False

In [None]:
x >= 10 # Greater than, or equal to

True

In [None]:
x == 10 # We have already used equals (=) for assignment, so we have to use double equals (==)

True

In [None]:
x != 1  # The exclamation mark means not

True

### Logical Operators

In [None]:
x = 15

In [None]:
x > 10 and x < 20 # In an "and" operation, both statements must be true

True

In [None]:
day = 'Saturday'

In [None]:
day == 'Saturday' or day == 'Sunday' # In an "or" operation, either statement needs to be true

True

In [None]:
not True # The "not" operation swaps True and False

False

## Collection Data Types

Two types of collection data types: Those arranged in order, and those arranged by a key. Tuple and Lists have an order; these are sequence types

### Tuple

In [None]:
weekend = ('Sat','Sun')  # Tuples are created using round brackets ()
weekend[1]               # We use the location in the sequence to retrieve values

'Sun'

In [None]:
weekend[1] = 'Mon'       # Tuples do not support item assignment

TypeError: 'tuple' object does not support item assignment

### List

In [None]:
weekday = ['Mon','Tue','Wed']  # Lists are created using square brackets []
weekday[0] = 'Thur'            # Also accessed by location, but do support re-assignment
weekday

['Thur', 'Tue', 'Wed']

### Dictionary

Since values in a dictionary do not have a sequance, we define the key for each.

In [None]:
capitals  = {'Ireland':'Dublin', 'France':'Paris'}  # Dictionaries are defined using curly brackets {}

In [None]:
capitals['Ireland']  # We retrieve values using a key instead of an index. Make sure your key is unique!

'Dublin'

### Strings

In [None]:
sentence = 'This is actually a collection'
sentence[10]

't'

In [None]:
sentence[10] = 'h' # Just don't try to reassign values

TypeError: 'str' object does not support item assignment

## Square Bracket Notation

We will start with a new list

In [None]:
students = [1,2,3,4,5,6,7,8]

### Accessing Item

In [None]:
type(students[0])  # Using square brackets returns the individual item

int

### Subsetting

In [None]:
type(students[2:5])  # Subsetting returns a list. Pass in the start at end paramters

list

In [None]:
students[:5]      # Leaving out the first parameter means it will start at the start

[1, 2, 3, 4, 5]

In [None]:
students[2:]      # Leaving out the second parameter means it will run to the end

[3, 4, 5, 6, 7, 8]

In [None]:
students[:]       # Leaving out both will return the full list

[1, 2, 3, 4, 5, 6, 7, 8]

In [None]:
students[-3:-1]   # Negative values will count from the end of the list

[6, 7]

In [None]:
students[::2]     # A third parameter can be passed to set the step size

[1, 3, 5, 7]

### Manipulating

We can use the assignment operations from before on lists too

In [None]:
students += [9,10]
students

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

In [None]:
students *= 2
students

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

### Copying

This is a warning about copying lists

In [None]:
class_a = [1,2,3,4,6]  # Say I create a list "class_a"
class_b = class_a      # And I want to create a copy of this list as "class_b"
class_b                # And I even have a look at "class_b", and seems to be copied perfectly

[1, 2, 3, 4, 6]

In [None]:
class_a[0] = 10        # But if I change one of the values in "class_a"
class_b                # And have a look at class_b, it is also changed here! We really are just pointing to "class_a"

[10, 2, 3, 4, 6]

In [None]:
class_b = class_a[:]   # We can use our subsetting trick in future to create a proper copy

[1, 2, 3, 4, 6]

In [None]:
class_b = list(class_a) # Or this list() function

## Conditional Statements

Conditional statements run depending on a Boolean statement.

### If

In [None]:
x = 10
if x > 15:                # Here x > 15 is a Boolean statement. It will evaluate to False
    print('x is large')
elif x == 10:             # Here x == 10 is True. So the below statement executes
    print('x is 10')
elif x > 5:               # This is also True, but the statemetn has already been closed
    print('x is medium')
else:                     # This would catch anything not met by previous tests. Notice there is no Boolean test.
    print('x is small')

x is 10


### While

This is our first loop. That means the statement will continually run until the statement turns False

In [None]:
x = 1                        # Start with x = 1
while x < 6:                 # This test evaluates as True. It will keep revisiting this
    print('x is ' + str(x))  # Print out whatever x is. (Notice: to add x to my sentence, I had to turn it into a string.)
    x += 1                   # Make sure you are doing somethign to make x < 10 turn False. Or you will be stuck in a loop!

x is 1
x is 2
x is 3
x is 4
x is 5


### For

In [None]:
for x in [1,2,3,4,5]:        # In for loops, you provide a object to iterate over. Much more stable!
    print('x is ' + str(x))

x is 1
x is 2
x is 3
x is 4
x is 5


In [None]:
for x in range(1,6):        # If you don't have a list or object, you can create one using range() function
    print('x is ' + str(x))

x is 1
x is 2
x is 3
x is 4
x is 5


## Functions

Functions are a way of packaging up code. They take some input (arguements), run some code (body), and then return a value

In [None]:
def sums_two(a, b):  # Here we take in two numbers, represented by "a" and "b"
    total = a + b    # We assign the sum of these two numbers to "total"
    return total     # And now we return this function as the variable "total"

In [None]:
sums_two(10,20)

30

## Object-Oriented Programming (OOP)

So far, we have covered off data types and functions. In Object-oriented Programming (OOP), data types and functions can be packaged up inside of objects. When inside an object, data types are called attributes and functions are called methods. You can get into them, by using a dot (.) Every object type has different attributes and methods associated with them, depending on what would be useful.

### Attributes

None of the objects we created so far have attributes associated with them (data types packaged inside them). So we will revisit some attribtues later, when we have more complex data structures.

### Methods

For this example, we will take a string. In OOP it is considered an object with loads of useful functions (methods) packaged inside it.

In [None]:
'hello'.capitalize()  # Even though this is a string, it is an object with hidden methods inside it

'Hello'

In [None]:
name = 'Cian'
name.upper()   # This method turns the string to upper-case

'CIAN'

In [None]:
birthday = '10-12-89'  # Say we have a birthday give to us a string, but we want to know the month.
birthday.split('-')    # The split() method will return a list, splitting out our string into separate elements

['10', '12', '89']

In [None]:
day, month, year = birthday.split('-') # We can ever go as far as to unpack this list into three variables
print(month)

12


## Packages

We can now look at importing code into our project. This dot notation, for accessing methods inside in objects, can be used for accessing functions inside of packages.

In [None]:
import numpy         # Here we are importing a chunk of code called Numpy (Numerical Python). It is full of useful functions
numpy.mean(students) # This function will get the average of our list

5.5

In [None]:
import numpy as np  # It is very common to give these packages an alias, here "np" is used
np.max(students)    # This function will get the max value from our list

10

## Numpy

Remember when we tried to run our operations on our lists. We could only treat it like a sequence or a word (double the whole thing, or tack values on to the end of it). But we couldn't operate on the elements of it. Converting our list into a NumPy array will allow us to do all those operations we looked at at the start of the lecture.

In [None]:
students = [1,2,3,4,5,6,7,8]      # Let's define our list again
student_arr = np.array(students)   # Using the array() function from our NumPy package
student_arr                       # Looks identical, excepts it is called array

array([1, 2, 3, 4, 5, 6, 7, 8])

### Arithmetic Operations

In [None]:
student_arr * 2   # Except this time, doubling it doesn't double the lenght of it, it doubles each element in it

array([ 2,  4,  6,  8, 10, 12, 14, 16])

In [None]:
student_arr ** 2  # And all our arithmetic operators from before work on it. Here we square all the elements

array([ 1,  4,  9, 16, 25, 36, 49, 64])

### Boolean Operations

In [None]:
student_arr > 3    # The same goes for Boolean operations

array([False, False, False,  True,  True,  True,  True,  True])

### Filtering

A very handy use of the Boolean array above, is that we can pass it into our original array. This will knock out any values where the array is False

In [None]:
over_three = student_arr > 3  # Here we create our Boolean array, and set it as a variable
student_arr[over_three]       # Passing this into our original array has the effect of filtering the values

array([4, 5, 6, 7, 8])

In [None]:
student_arr[student_arr > 3]  # This can all be done in one line. This is very common notation

array([4, 5, 6, 7, 8])

### Methods and Attributes

Remember we said we would revisit attributes, once we got to a data structure that had them. NumPy arrays have attributes (data) associated with them. Notice there are no roundy brackets. That's because here we are retrieving a data type (integer, in this case), not calling a method.

In [None]:
student_arr.size  # This attribute has the total number of elements in the list

8

In [None]:
student_arr.mean() # This method returns the average of the list

4.5

## Two-Dimensional Objects

So far, all the objects we have been working with have been one dimensional. For instance, a list of numbers or names. What we really want to get to, is some sort of a table object. This is done by creating a two-dimensional object. It sounds complicated, but it just means layering what we have already. For instance, a list of lists.

### List of Lists

Here we are going to create another list with three elements, but each element is going to be a list itself.

In [None]:
list_of_lists = [[1,2,3],[4,5,6],[7,8,9]]  # We can create a list of lists
list_of_lists

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

In [None]:
list_of_lists[0]   # Using the square bracket notation we can access the first element (first list)

[1, 2, 3]

In [None]:
list_of_lists[0][2] # We can chain these together to access individual elements

3

### 2D Arrays

Using NumPy's array() function, we can turn our 2D lists into 2D arrays.

In [None]:
np.array(list_of_lists)  # This is immediately more recognisably two-dimentional

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [None]:
two_d_arr = np.array(list_of_lists)  # Converting out 2D list into a 2D Numpy array
two_d_arr[0][2]  # Accessing items just like before

3

In [None]:
two_d_arr[0,2]   # But now we have a more convenient short-hand for it (rows, columns)

3

In [None]:
two_d_arr[0:2,0:2] # And we can subset our rows and columns as before

array([[1, 2],
       [4, 5]])

So this is great, we have a 2D object that we can slice and filter by rows and columns. And if your data is entirely numeric this could be just what you need. But what if we are dealing with more descriptive datasets. Like out `brics` table with information about country captials, populations, GDP, etc.

In [None]:
brics_list = [['BR','RU','IN','CH','SA'],
              ['Brazil', 'Russia', 'India', 'China', 'South Africa'],
              ['Brasilia', 'Moscow', 'New Dehli', 'Beijing', 'Pretoria'],
              [8.516, 17.10, 3.286, 9.597, 1.221],
              [200.4, 143.5, 1252, 1357, 52.98]]

brics_arr = np.array(brics_list)
brics_arr

array([['BR', 'RU', 'IN', 'CH', 'SA'],
       ['Brazil', 'Russia', 'India', 'China', 'South Africa'],
       ['Brasilia', 'Moscow', 'New Dehli', 'Beijing', 'Pretoria'],
       ['8.516', '17.1', '3.286', '9.597', '1.221'],
       ['200.4', '143.5', '1252', '1357', '52.98']], dtype='<U12')

If I want to pull out the capitals columns, I can do it the same as before, but I have to know which column number that is.

In [None]:
brics_arr[2]

array(['Brasilia', 'Moscow', 'New Dehli', 'Beijing', 'Pretoria'],
      dtype='<U12')

And to pull out the population of India, we would need to know India is the 3rd row.

In [None]:
brics_arr[2,2]

'New Dehli'

This could be fine for you, depending on the dataset. However, most of the time it would be great to be able to label our columns and rows. This would make it easier to pull out data from it. To do this, we take advantage of the dictionary!

### Dictionary of Lists

Dictionaries give us the ability to apply a label to our columns. This means we can access the columns by a word rather than just a number (as with arrays).

In [None]:
brics_dict = {'label':      ['BR','RU','IN','CH','SA'],
              'country':    ['Brazil', 'Russia', 'India', 'China', 'South Africa'],
              'capital':    ['Brasilia', 'Moscow', 'New Dehli', 'Beijing', 'Pretoria'],
              'area':       [8.516, 17.10, 3.286, 9.597, 1.221],
              'population': [200.4, 143.5, 1252, 1357, 52.98]}
brics_dict

{'label': ['BR', 'RU', 'IN', 'CH', 'SA'],
 'country': ['Brazil', 'Russia', 'India', 'China', 'South Africa'],
 'capital': ['Brasilia', 'Moscow', 'New Dehli', 'Beijing', 'Pretoria'],
 'area': [8.516, 17.1, 3.286, 9.597, 1.221],
 'population': [200.4, 143.5, 1252, 1357, 52.98]}

From here we can create a sort of labelled array. This is called a DataFrame. Just like how a list of lists could be turned into an array, a dictionary of lists can be turned into a DataFrame. The package we will use to do this is called `Pandas`.

### DataFrames

In [None]:
import pandas as pd

In [None]:
brics_df = pd.DataFrame(brics_dict)
brics_df

Unnamed: 0,label,country,capital,area,population
0,BR,Brazil,Brasilia,8.516,200.4
1,RU,Russia,Moscow,17.1,143.5
2,IN,India,New Dehli,3.286,1252.0
3,CH,China,Beijing,9.597,1357.0
4,SA,South Africa,Pretoria,1.221,52.98


Finally, we have something that actually looks like a table! It is similar to the array from before, but we have labelled the columns. We can use these labels to access them (much like in a dictionary).

In [None]:
brics_df['capital']

0     Brasilia
1       Moscow
2    New Dehli
3      Beijing
4     Pretoria
Name: capital, dtype: object

Next week, we will dive deeper into DataFrames and the `Pandas` package.