# Lab 1: Python Basics

**OBJECTIVES**: This lab will introduce you to data types, data structures, and get you comfortable working with Jupyter Notebooks. Your lab is due in one week, and should be submitted to me prior to the start of the next lab. Submission is by email as a Python notebook (.ipynb). 

Sections of this lab include the following:
- Data types
- Data structures
- Indexing and slicing
- Simple functions

## Data Types

The data you'll be working with include numerical data, strings, boolean values, and dates and time. 

| Data Type | Description | 
|-|-|
| `None` | The Python "null" value|
| `str` | String type | 
| `byte` | Raw ASCII bytes|
| `float` | Double-precision (64-bit) floating point numbers|
| `bool` | A True or False value|
| `int` | Arbitrary precision signed integer | 

### Numeric types
`int` and `float` are the most common numeric data types you will be working with. Integers are **whole** numbers. No decimals. 

In [1]:
ival = 17
type(ival)

int

In [2]:
ival * 6  

102

Integer division that doesn't result in a whole number (another integer) will yield a floating-point number: 

In [3]:
3/2

1.5

Floating point numbers are double-precision (64-bit) values. They can also be expressed with scientific notation:

In [4]:
fval = 7.243
fval2 = 6.78e-5

For `fval2`, the `e-5` means $\times 10^{-5}$

In [5]:
type(fval)

float

In [6]:
print(fval2)
print(format(fval2,'f'))

6.78e-05
0.000068


### Strings

Strings are signified by either single quotes `'` or double quotes `"`. I don't really care which one you use, just be consistent...

In [7]:
a = 'one way of writing a string'
b = "another way"

Numeric data types can be converted to strings:

In [8]:
str(fval)

'7.243'

Strings can also be *concatenated*, or joined together

In [10]:
a = 'this is the first half ... '
b = 'and this is the second half'
a + b

'this is the first half ... and this is the second half'

You can also *query* the elements of a string using indexing (square brackets with a number indicating the position; more on this later):

In [11]:
str = 'Joseph'
str[0]

'J'

### Boolean

In Python (and in particular geospatial applications), the *boolean* data type (True or False) is **extremely** useful. `True` is equivalent to a value of 1, and `False` is equivalent to a value of zero. 

In [12]:
a = True
b = False
a == b  # This double equal sign is a logical test asking  "Does a = b"?

False

In [13]:
a = [0,1,2,False]
sum(a)

3

In [14]:
a = [0,1,2,True]

In [15]:
sum(a)

4

To test multiple conditions, use the "and" (`&`) operator or the "or" (`|`) operator. Each condition has to be enclosed in parentheses ():

In [16]:
a = 5
b = 9

print((a == 5) & (b == 9))
print((a < 4) | (b > 10))

True
False


### None

`None` is the null value type in Python.  

In [17]:
a = None

Check: is 'a' equal to None?

In [19]:
a is None

True

In [20]:
b = 5

In [21]:
# is the variable 'b' None?
b is None

False

*Note:* We will also use the numpy function `nan` to define missing data. There are large number of built-in funtions in numpy to deal with `nans`.

In [22]:
import numpy as np
#autocomplete this
np.nan

# generate 10 random values
a = np.random.randn(10)

# insert an nan 
a[7] = np.nan

print(a) 


[ 0.97300399  2.52802012  0.57195135  0.18921283  0.16335686 -0.23537844
  0.59749448         nan -2.39269543 -1.7130667 ]


In [23]:
# calculate the mean of this array
print(a.mean())


nan


In [24]:
# calculate the mean using nanmean
np.nanmean(a)

0.07576656170403025

### Casting Between Data Types
As seen above, you can query the type of data you are working with. But you can also convert (or *cast*) between data types using the functions $str$, $float$, $int$. 

In [25]:
a = '10'
float(a)

10.0

In [26]:
int(a)

10

## Mathematical Operators
| Operator | Description | 
|-|-|
| `+` | Addition|
|`-`| Subtraction|
|`*` | Multiplication|
|`/` | Division|
|`**`| Exponent (to the power of...)|
|`//`| Floor division | 
|`%`| Modulus (returns the remainder)|

In [None]:
# example: create two variables with values of 6 and 10 and add them together

In [None]:
# example: now divide them

In [None]:
# example: now raise the first to the power of the second

In [None]:
# example: what is the remainder if you divide them (use modulus)? 

## Data Structures

Data structures are how data are organized in Python. 

### Lists

Lists are the most commonly used data structure. A list is a sequence of data that is enclosed in square brackets and data are separated by a comma. Each data point can be accessed by calling its index value.

Lists are declared by equating a variable to '[ ]', and they can contain multiple data types (see above). 

In [27]:
a = []

In [28]:
print(type(a))

<class 'list'>


Sequences of data or strings can be assigned to lists: 

In [29]:
x = ['apple', 'orange']

In [30]:
y = [2, 3, 7, None]

Elements can be added (appended) to a list using the *append* function. 

In [31]:
y.append(x[0])
y

[2, 3, 7, None, 'apple']

Lists can also be concatenated: 

In [32]:
x + y

['apple', 'orange', 2, 3, 7, None, 'apple']

and sorted: 

In [33]:
x = [109,9,17,1]
x.sort()
x

[1, 9, 17, 109]

A *range* of integers can be created with the `range(N)` function, where N is the number of elements. **Ranges will always start with 0**:

In [34]:
a = range(5)
print(a[0])
print(a[4])
print(a[5])

0
4


IndexError: range object index out of range

The numpy function $arange(start,stop,interval)$ also can be used to create a range of values:

In [35]:
np.arange(0,100,10)

array([ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90])

### Tuples

A fixed-length sequence of objects. Objects are separated by commas, and can be surrounded by parentheses '(...)'. 

In [38]:
tup = 4,5,6
tup

(4, 5, 6)

In [39]:
tup = (4,5,6)
tup

(4, 5, 6)

#### Unpacking tuples

*Unpacking* refers to getting objects out of a data structure. For tuples, you define the variables in the tuple:

In [None]:
a,b,c = tup

In [None]:
b

## Dictionaries
Another potentially useful data structure, and often encountered when importing data frames (e.g. Pandas). Can be constructed using curly braces `{...}`, with pairs of `keys` and `values`

In [1]:
# note the whitespaces and formatting - your code can (and should) run over multiple lines to be readable
MLB_team = {'Colorado' : 'Rockies',
            'Boston'   : 'Red Sox',
            'Minnesota': 'Twins',
            'Milwaukee': 'Brewers',
            'Seattle'  : 'Mariners'
           }

In [41]:
# query the type of data structure 
type(MLB_team)



dict

In [42]:
# access an element of a dictionary
MLB_team['Colorado']

'Rockies'

In [43]:
# or use the get function
MLB_team.get('Colorado')

'Rockies'

## Arrays
Arrays (requires NumPy) are critical structures for storing and working with geospatial datasets. They can be 1-D, 2-D, or multi-dimensional. Mathematial operations can be performed on each element quickly. 


In [44]:
import numpy as np

# generate some random data
data = np.random.randn(2,3)
data

array([[ 0.60948068,  1.57089076,  0.79434513],
       [ 1.3060725 , -0.62624289, -0.01873178]])

In [45]:
# multiply each element in the array by 10
data*10

array([[ 6.09480684, 15.70890764,  7.94345128],
       [13.06072497, -6.26242893, -0.18731779]])

In [46]:
# add two arrays together (must have the same shape/dimensions!)
print(data.shape)

data + data

(2, 3)


array([[ 1.21896137,  3.14178153,  1.58869026],
       [ 2.61214499, -1.25248579, -0.03746356]])

## Indexing and Slicing

Indexing is a way of accessing individual objects in a data structure. In python, *indexing* starts from 0. Our list x, which has two elements, will have apple at an index of 0 and orange at an index of 1. To index a structure, use square brackets `[ ]`. 

In [2]:
x = ['apple','orange','banana']
print(x[0])
print(x[1])

apple
orange


Indexing can also be done in reverse order, with the last element starting at -1

In [51]:
x[-1]

'banana'

You can also select sections of different data structures with *slicing*, which consists of `start:stop` passed to the index operator `[]`:

In [52]:
seq = [7,2,3,7,5,6,0,1]
seq[1:5]

[2, 3, 7, 5]

The number of elements in the slice is equal to `stop - start`:

In [53]:
len(seq[1:5]) # len() is a useful command!

4

Either the stop or the start can be omitted, and the slice defaults to the start or the end of the sequence, respectively:

In [54]:
seq[:5]

[7, 2, 3, 7, 5]

In [55]:
seq[3:]

[7, 5, 6, 0, 1]

Note: Indexing doesn't work for dictionaries! 

In [56]:
MLB_team[1]

KeyError: 1

# Loops and Functions

## Loops
One task that is often done in geospatial coding is applying the same operation to a list of datasets. 



In [8]:
for i in range(0,5):
    print(i)

0
1
2
3
4


Another example: say you are working with census data, and you want to loop through each neighbourhood in Vancouver to examine vote results.  

In [4]:
# create list of neighbourhoods
hoods = ['East Vancouver','West Vancouver','North Vancouver','West End','False Creek']

In [5]:
# get the length of the list
N = len(hoods)

# simple loop to print the name of each neighbourhood
for item in hoods:  # what does this line do???
    print(item)

East Vancouver
West Vancouver
North Vancouver
West End
False Creek


You can also iterate using indexing:

In [11]:
# a list of fruit
a = ['banana','apple','cherry','lime']

# length of the list
N = len(a)

# create a sequence with the length of the list and print each element 
for i in range(N):
    print(a[i])

banana
apple
cherry
lime


## Functions
Functions are probably the most primary and important method of code organization. When you import a library into Python, you can access all the functions contained in the library. 

For example: below, we import the numpy package/library as 'np'. Functions within numpy can then be used/accessed using the `np.function()` convention. To get help on a function, use `np.function?`

In [57]:
import numpy as np
np.sin?

If you are writing your own code and you need to repeat the same lines of code more than once, its probably worth writing a your own function. 

Functions are declared with the `def` keyword, and exited (or returned from) using the `return` keyword. Multiple return statements are allowed. *Conditional* statements (e.g. if, else) are used to tell the function what to do. In python, the white spaces, indents, and colon (:) operator are critical - the function won't work without them. 

In [59]:
def my_function(x, y, z = 2):
    return z + x*y

In [61]:
my_function(6,2)

14

In [63]:
# use an if/else function for flow control
def my_function2(x, y, z = 2):
    if z > 1: 
        return z * (x + y) 
    else:
        return z / (x + y)

In the examples above, the function is called `my_function` or `my_function2`, `x` and `y` are *positional* arguments, and `z` is a *keyword* argument. To call this function, you need to specify x and y, and z defaults to a value of 2 unless it is specified:

In [64]:
my_function2(5, 2, z = 3)

21

In [65]:
my_function2(2, 2, -1)

-0.25

In [66]:
my_function2(10,20) # z defaults to a value of 2

60

In [67]:
my_function2(3,6,11,2) # too many inputs

TypeError: my_function2() takes from 2 to 3 positional arguments but 4 were given

### Global and local variables

Functions can access variables that are either *global* or *local*. Variables assigned within a function are *local* and are typically destroyed once the function is finished. Global variables will mostly be avoided. 

In [None]:
def my_func():
    temp = [] # assign variable inside the function
    # this is a loop! 
    for i in range(5):
        temp.append(i)

my_func()
temp

In [None]:
temp = [] # assign variable outside the function

def my_func():
    # this is a loop
    for i in range(5):
        temp.append(i)

my_func()
temp

### User Input
Occasionally (i.e. for programming practice), you will want to get user inputs. These will generally be strings, which need to be cast/converted if you plan on using them for calculations.

In [None]:
# ask user input
answer = input('What day of the week is it? ')
print('Its ' + str(answer) + ' today!')

In [None]:
# ask user input, convert to a float
answer_cm = float(input('How tall are you (in m)? '))
answer_m = answer_cm

# convert back to a string to use the print function
print('You must be ' + str(answer_m)+ ' m tall to go on this ride.')

# Exercises
For each question, add a new code cell below and input your answers. Feel free to comment your code, or use an additional markdown cell as needed for your answers. Save your file with your last name appended to the filename (e.g. 1_PythonBasics_Shea) and submit before the start of the next lab. 


1) Write a block of python code that calculates the area of a circle with a radius of 5, and prints the answer to the screen. [2 marks] 

In [None]:
from math import pi    # a hint to get you started
import 

2) Create an array of 10 random numbers, sort them, and print the sorted values. [2 marks]

In [12]:
import numpy as np
np.

3) Using indexing, print the first two numbers in this array. And then print the last two numbers. [2 marks]

4) Create an array of numbers from 1 to 10, that goes up by twos, using the numpy arange function [2 marks]

5) Use the numpy functions `isnan` and `sum` to calculate the number of NaNs in the following 1-d array [2 marks]

In [None]:
a = np.zeros((10))
a[2:6] = np.nan

6) The array below contains nans. Calculate the sum of all the elements in this array (there are two ways to do this, either is acceptable). [2 marks]

In [None]:
a = np.random.randn(10)
a[4:6] = np.nan # assign nans to positions 4 and 5



6) Create a 'list' that contains the first names of all the students in the class. Write a simple loop to through each element of the list and print `'Hello _____!'`, where `______` is the name of each student. [2 marks] 

7) Create a numpy array that contains only zeroes, and has 10 rows and 10 columns. Confirm the dimensions of the array using the `shape` function. [2 marks]

8) Look up the help information for the NumPy function `arange`, and describe what this function does, and what the positional arguments are. Use this function to create an array of **floats** from 10 to 20. [2 marks]

9) Write a function that returns the first half of the string, but only if the string has an even length. For example, given 'ABCDEF', the function would return 'ABC'. If the string is of uneven length, return 'Cannot compute' [2 marks] 



10) You are driving a little too fast in a 60 km/h zone, and a police officer stops you. Write a function that takes your speed as an input, determines if you were speeding using logical operators, and prints whether or not you get: (1) no ticket (2) a small ticket, or (3) a big ticket and suspension.  [4 marks]
 - 60 km/h or less: print 'no ticket'
 - 60 to 80 km/h: print 'small ticket' 
 - over 80 km/h: print 'ooh, big ticket and suspension'

Test your function with a speeds of 64, 78, and 84. 

11) Create a dictionary structure that contains the following information: 

| Year | Month | Day | Temperature |
| - | - | - | - | 
| 2020 | 09 | 14 | 18 |
| 2020 | 09 | 15 | 15 |
| 2020 | 09 | 16 | 10 | 

Query the dictionary for the temperature on 15 September 2020, and print the statement 'It is currently XX C outside.' [3 marks]