# Programming with Python

## Session Overview

1) What is programming with Python?


2) Basic Python Math 



3) Variable types

3) Importing packages

6) Numpy

4) Reading in a NetCDF file

7) For loops, if statements, and logicals

8) Creating Functions

9) Best Practices / Reading Errors / Debugging



## What is programming with Python?



### Jupyter Notebooks 📓

Often, code is written in a text editor and then run in a command-line interface

<b> Jupyter Notebooks </b> allow us to write and run code within a single documents. They also allow us to embed text and code. 

![Alt text](python_programming/fig1.png)

<b> Visual Studio Code (VSCode) </b> is an easy to use development environment that has extensions for every major programming language. We will be using VSCode for this workshop

## Basic Mathematical Operations 



| Operation        | Operator | Example      | Value    |
|------------------|----------|--------------|----------|
| Addition         | `+`      | `2 + 3`      | `5`      |
| Subtraction      | `-`      | `2 - 3`      | `-1`     |
| Multiplication   | `*`      | `2 * 3`      | `6`      |
| Division         | `/`      | `7 / 3`      | `2.66667`|
| Remainder        | `%`      | `7 % 3`      | `1`      |
| Exponentiation   | `**`     | `2 ** 0.5`   | `1.41421`|

- An **expresion** is a combination of values, operators and functions that evaluates to some **value**
- We will enter our expressions in **code cells**
    - Hit **shift + enter (or shift + return) on your keyboard** or, 
    - Press the "Run" button in the toolbar


In [22]:
23

23

In [23]:
-15 + 2.718

-12.282

In [24]:
4 ** 3

64

### Python uses typical order of operations - PEMDAS

In [25]:
(2 + 3 + 4) / 3

3.0

In [26]:
(5 * 2) ** 3

1000

#### Activity 

In the cell below, write an expression that's equivalent to


$(19 + 6 \cdot 3) - 15 \cdot \left( \sqrt{100} \cdot \frac{1}{30} \right) \cdot \frac{3}{5} + 4^2 + \left(6 - \frac{2}{3} \right) \cdot 12$



In [10]:
# activity cell
(19 + 6 * 3) - 15 * (100 ** 0.5 * (1/30)) * (3/5) + 4**2 + (6 - (2/3)) * 12

114.0

## Variables

- A **variable** is a place to store a value so that in can be referred to later in our code. To define a variable, we use an **assignment statement** \
![Alt text](python_programming/fig2.png)

- An assignment statement changes the meaning of the **name** to the left of the = sign

- In the example above, zebra is bound to 9 (the value) not 23-14 (expression)


### Example
Before we use a name in an assignment statement, it has no meaning. After the assignment statement, it refers to the value assigned to it.

In [2]:
temp_in_f

NameError: name 'temp_in_f' is not defined

In [3]:
temp_in_c = 5
temp_in_f = temp_in_c * 9/5 + 32

In [4]:
temp_in_f

41.0

Any time we use `temp_in_f` in an expression, `41.0` is substituted for it.

In [None]:
temp_in_f * -4

-164.0

Note that the above expression **does not change** the value of `temp_in_f`, because we did not reassign `temp_in_f`

In [None]:
temp_in_f

41.0

#### Naming variables

- Give your variables helpful names so that you/your collaborators know what they refer to 
- variables can contain uppercase, lowercase, numbers, and underscores
    - they **cannot** start with a number
    - they are case sensitive!
    - no character limit!

Examples of **valid** but **poor** variable names:

In [None]:
six = 15

In [None]:
i_love_waves_999 = 60 * 60 * 24 * 365

Examples of assignment statements that are **valid** and use **good** variable names:

In [None]:
seconds_per_hour = 60 * 60
hours_per_year = 24 * 365
seconds_per_year = seconds_per_hour * hours_per_year

#### Variable Types

What's the difference?

In [None]:
4 / 2

2.0

In [None]:
5 - 3

2

To us, `2.0` and `2` are the same number. But to Python, these appear to be different

#### Two numeric variable types: `int` and `float`
- `int`: an integer of any size
- `float`: a number with a decimal point

##### For `int`:
- if you add, subtract, multiply or exponentiate `int`, result is another `int`
- `int` has arbitrary precision in Python, meaning that calculations will always be exact

In [None]:
7 - 15

In [None]:
2 ** 300

2037035976334486086268445688409378161051468393665936250636140449354381299763336706183397376

- Use type() to check the kind of data type

In [None]:
type(2 ** 300)

int

##### For `float`:
- A `float` is specified using a **decimal** point
- Might be printed using scientific notation

In [5]:
3.2 + 2.5

5.7

In [6]:
type(3.2 + 2.5)

float

In [7]:
# The result is in scientific notation: e+90 means "times 90"
2.0 ** 300

2.037035976334486e+90

##### Strings 🧶

- a string is a snippet of text, it can be any length 
- Enclosed by either single quotes (') or doulble quotes (")

In [8]:
'woof'

'woof'

In [9]:
type('woof')

str

##### String arithmetic
When using the `+` symbol between strings, the operation is called **concatenation**

In [10]:
s1 = 'send'
s2 = 'swell 🌊'

In [11]:
s1 + s2

'sendswell 🌊'

In [12]:
s1 + ' ' + s2

'send swell 🌊'

In [13]:
s1 * 3

'sendsendsend'

##### String methods
- String methods are special functions for strings
- string methods are accessed with a `.` after the string
- Examples: `upper`, `title`, `replacee`, but there are [many more](https://docs.python.org/3/library/stdtypes.html#string-methods)


In [14]:
blue_crush_string = '7 days until pipe masters'

In [15]:
blue_crush_string.title()

'7 Days Until Pipe Masters'

In [None]:
blue_crush_string.upper()

'7 DAYS UNTIL PIPE MASTERS'

In [None]:
blue_crush_string.replace('7', '6')

'6 days until pipe masters'

In [None]:
# len is not a method since it doesn't use dot notation
len(blue_crush_string)

25

#### Converting between data types

- if you mix `int`s and `float`s in an expression, the result will be a **`float`**
- a value can be converted using the `int` and `float` functions
- any value can be converted to a string using `str`
- some strings can be converted to `int` and `float`

In [None]:
int(2.0 + 3)

5

In [None]:
str(3)

'3'

In [None]:
float('3')

3.0

In [None]:
int('silly string')

ValueError: invalid literal for int() with base 10: 'silly string'

#### A note on Python Functions
- Functions in Python work the same way math functions fo 
- inputs to functions are called arguments
- Python comes with a number of built-in functions such as `int`, `float`, and `str`
- **Calling** a function, or using a function, means asking the function to "run its recipe" on the given input
- Type `?` after a function's name to see its documentation, or use the `help` function

In [None]:
str?

[0;31mInit signature:[0m [0mstr[0m[0;34m([0m[0mself[0m[0;34m,[0m [0;34m/[0m[0;34m,[0m [0;34m*[0m[0margs[0m[0;34m,[0m [0;34m**[0m[0mkwargs[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m     
str(object='') -> str
str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or
errors is specified, then the object must expose a data buffer
that will be decoded using the given encoding and error handler.
Otherwise, returns the result of object.__str__() (if defined)
or repr(object).
encoding defaults to sys.getdefaultencoding().
errors defaults to 'strict'.
[0;31mType:[0m           type
[0;31mSubclasses:[0m     StrEnum, DeferredConfigString, FoldedCase, _rstr, _ScriptTarget, _ModuleTarget, LSString, include, Keys, InputMode, ...

In [None]:
help(float)

Help on class float in module builtins:

class float(object)
 |  float(x=0, /)
 |
 |  Convert a string or number to a floating-point number, if possible.
 |
 |  Methods defined here:
 |
 |  __abs__(self, /)
 |      abs(self)
 |
 |  __add__(self, value, /)
 |      Return self+value.
 |
 |  __bool__(self, /)
 |      True if self else False
 |
 |  __ceil__(self, /)
 |      Return the ceiling as an Integral.
 |
 |  __divmod__(self, value, /)
 |      Return divmod(self, value).
 |
 |  __eq__(self, value, /)
 |      Return self==value.
 |
 |  __float__(self, /)
 |      float(self)
 |
 |  __floor__(self, /)
 |      Return the floor as an Integral.
 |
 |  __floordiv__(self, value, /)
 |      Return self//value.
 |
 |  __format__(self, format_spec, /)
 |      Formats the float according to format_spec.
 |
 |  __ge__(self, value, /)
 |      Return self>=value.
 |
 |  __getnewargs__(self, /)
 |
 |  __gt__(self, value, /)
 |      Return self>value.
 |
 |  __hash__(self, /)
 |      Return hash(self

#### Booleans
- When we compare two values, the result is either `True` or `False`
- There are only two possible Boolean values: `True` or `False`

**Comparison Operators**

| Symbol | Meaning                  |
|--------|--------------------------|
| `==`   | equal to                 |
| `!=`   | not equal to             |
| `<`    | less than                |
| `<=`   | less than or equal to    |
| `>`    | greater than             |
| `>=`   | greater than or equal to |

In [66]:
5 == 6

False

In [67]:
type(5 ==6)

bool

In [68]:
9 + 10 < 21

True

#### Lists
- In Python, a list is used to store multiple values within a signle value. To create a new list, use `[square brackets]`
- Lists are a sequence of any type of object

In [None]:
temp_list = [38, 33, 40, 34, 26, 23, 34]

Note that the **elements** in a list don't need to be unique

In [18]:
type(temp_list)

list

To find the average temperature, we can divide the **sum of the temperatures** by the **number of temperatures recorded**:

In [19]:
sum(temp_list) / len(temp_list)

72.57142857142857

Within a list, you can store elements of different types

In [20]:
mixed_temp = [68, 'sixty', 68.9, 62]
mixed_temp

[68, 'sixty', 68.9, 62]

##### **HOWEVER...**
- Lists are **slow 🐌**.
- This causes a problem when working with large datasets
- To gain additional functionality, we need to import a library

## Importing Packages 

- Python doesn't have everything we need built in 
- we import **packages (AKA libraries)** through **import statements**
- Packages are collections of Python functions and values
- Syntax for calling functions: `package.function()`

In [21]:
import numpy as np # numpy is usually imported as np (but doesn't have to be)
from netCDF4 import Dataset

As seen above --> instead of importing a complete library, we can just import the functions we need by using:\
`from package import function`

We will use `Dataset` later on, so just import it for now

<b> Useful Packages: </b>

| Package   | Purpose                      |
|-----------|-----------------------------|
| numpy     | Numerical operations         |
| matplotlib| Plotting and visualization   |
| netCDF4   | Using netCDF files            |
| pandas    | Data analysis and manipulation|
| xarray    | Labeled multi-dimensional arrays |

Packages have their own associated **documentation** which shows what functions are associated. 

##### For this section, we will use NumPy. 

- NumPy provides support for arrays and operations on them and is <b>heavily</b> used in the field

- To use NumPy, we need to import it. It's usually imported as np (but doesn't have to be)

#### **Arrays** save the day

<img src="python_programming/fig3.png" alt="Alt text" width="300">


- Think of NumPy arrays as faster lists
- To create an array, we pass a list as input to the `np.array` function

In [28]:
bananas_sold = np.array([35 , 17, 22, 47, 30])
bananas_sold

array([35, 17, 22, 47, 30])

In [30]:
temp_list

[68, 73, 70, 74, 76, 73, 74]

In [31]:
# no square brackets because temp_list is already a list
temp_array = np.array(temp_list)
temp_array

array([68, 73, 70, 74, 76, 73, 74])

In [32]:
type(temp_array)

numpy.ndarray

- Normally, we create arrays by loading them from a data file. 

- Now that we know how to make arrays the hard way, let's begin working with an oceanographic dataset

- **an Ode to netcdf?**
    - include more ways of inspecting a file?

In [60]:
# recall that we imported the Dataset function from netCDF4 earlier
nc = Dataset('python_programming/scripps_pier-2023.nc', mode='r')

#print key variables:
print(nc.variables.keys())

dict_keys(['time', 'temperature', 'conductivity', 'pressure', 'salinity', 'chlorophyll_raw', 'chlorophyll', 'temperature_flagPrimary', 'temperature_flagSecondary', 'conductivity_flagPrimary', 'conductivity_flagSecondary', 'pressure_flagPrimary', 'pressure_flagSecondary', 'salinity_flagPrimary', 'salinity_flagSecondary', 'chlorophyll_flagPrimary', 'chlorophyll_flagSecondary', 'sigmat', 'diagnosticVoltage', 'currentDraw', 'aux1', 'aux3', 'aux4', 'instrument1', 'instrument2', 'platform1', 'station', 'lat', 'lon', 'depth', 'crs'])


In [None]:
#call data 
temp_nc = nc.variables['temperature'][:]

- Create array for temperature and time

In [None]:
temp = np.array(temp_nc)

##### Positions
- Each element of an array has a position
- Python is "0-indexed'
    - This means that the position of the first element in an array is 0, not 1. 
    - An elements position represents the **number of elements in front of it**

In [69]:
temp[0]

15.1105

- a negative number indicates that the count is going backward (i.e variable[-1] is the **last element** in the array)`

In [70]:
temp[-1]

16.6175

##### Array-number arithmetic

Arrays make it easy to perform the same operation to every elemnt. This is known as **broadcasting**. \
<img src="python_programming/fig4.png" alt="Alt text" width="400">

In [71]:
# Increase all temperatures by 3 degrees
temp + 3

array([18.1105, 18.1084, 18.0969, ..., 19.6199, 19.6152, 19.6175],
      dtype=float32)

In [72]:
# halve all temperatures
temp / 2

array([7.55525, 7.5542 , 7.54845, ..., 8.30995, 8.3076 , 8.30875],
      dtype=float32)

- Is `temp` changed?

In [75]:
temp # no!

array([15.1105, 15.1084, 15.0969, ..., 16.6199, 16.6152, 16.6175],
      dtype=float32)

#### Exercise

- convert all temperatures to Farenheit and assign it to a new variable `temp_farenheit`
- Hint: $ ^\circ F = (\frac{9}{5} *  ^\circ C) + 32$

In [76]:
# convert all temperature to Farenheit
temp_far = (9/5) * temp + 32
temp_far

array([59.1989  , 59.19512 , 59.17442 , ..., 61.915817, 61.90736 ,
       61.9115  ], dtype=float32)

#### Array Methods
- arrays work with a variety of methods, which are functions designed to operate specifically on arrays. 
- Call these methods using dot notation, `array_name.method()
- A full list of methods can be found in the NumPy [documentation](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html)

In [78]:
temp_far.max()

74.81857

In [81]:
temp_far.mean()

62.441406

In [82]:
temp_far.min()

51.54008

##### Ranges
- a **range** is an array of evenly spaced numbers. These are created using `np.arange `
- The most general way to create a range is np.arange(start, end, step). where:
    - the first number is `start`. **By default, `start` is 0**
    - All subsequenct numbers are spaced out by `step`, until(but excluding) `end. **By default, step is 1**

In [83]:
# Start at 0, end before 11, step by 1. 
np.arange(10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [84]:
# start at 1, end before 19, step by 1
np.arange(1, 19) 

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
       18])

In [85]:
# start at 5, end before 50, step by 3
np.arange(5, 50, 3)

array([ 5,  8, 11, 14, 17, 20, 23, 26, 29, 32, 35, 38, 41, 44, 47])

##### Activity

Suppose a coastal town is experiencing increaing rainfall due to a slow-moving low pressure system. On Day 1, it rain **1mm**. Each day after that, rainfall increases by 1mm more than the previous day (so Day 2 get 2mm, Day 3 gets 3mm etc.)

If this continues for 30 days, how much total rain falls in that month in **centimeters**? Save this value as `rain_total`. 

Hint: Use `np.arange` and `.sum()`

In [90]:
rain_total = np.arange(1,31).sum() / 10 # in cm
rain_total

46.5

#### Slicing Arrays

- When working with NumPy arrays, slicing allows us to extract a portion of the data using:

- `array[start:stop]` -> starts at `start` index **up to but not including** the `stop` index
- `array[:stop]` -> starts at the beginning (index 0) and goes up to `stop -1`
- `array[start:]` -> starts at `start` and goes **all the way to the end**
- `array[:]` -> gives you the **entire array**

- let's inspect our time data

In [91]:
print(nc.variables.keys())

dict_keys(['time', 'temperature', 'conductivity', 'pressure', 'salinity', 'chlorophyll_raw', 'chlorophyll', 'temperature_flagPrimary', 'temperature_flagSecondary', 'conductivity_flagPrimary', 'conductivity_flagSecondary', 'pressure_flagPrimary', 'pressure_flagSecondary', 'salinity_flagPrimary', 'salinity_flagSecondary', 'chlorophyll_flagPrimary', 'chlorophyll_flagSecondary', 'sigmat', 'diagnosticVoltage', 'currentDraw', 'aux1', 'aux3', 'aux4', 'instrument1', 'instrument2', 'platform1', 'station', 'lat', 'lon', 'depth', 'crs'])


In [100]:
nc.variables['time']

<class 'netCDF4._netCDF4.Variable'>
int64 time(time)
    units: minutes since 2023-01-01 00:01:00
    calendar: proleptic_gregorian
unlimited dimensions: time
current shape = (125305,)
filling on, default _FillValue of -9223372036854775806 used

- Above we can see that the units of time are minutes since 2023-01-01 00:01:00
- let's inspect how many minutes between each data point

In [105]:
# create an array
time = nc.variables['time'][:]

dt = time[1] - time[0]

print(dt ,"minutes between data points")

4 minutes between data points


##### Activity

**How many data points are there per day?**

In [106]:
data_per_hr = 60/4
data_per_day = data_per_hr * 24
data_per_day

360.0

**Print the first day's worth of data**

In [110]:
temp[:360]
temp

array([15.1105, 15.1084, 15.0969, ..., 16.6199, 16.6152, 16.6175],
      dtype=float32)

## For Loops, While Loops, If Statements, and Logicals

In [None]:
2.0 + 3

5.0

## Creating Functions

# Errors and Exceptions (Lets change this to be tips and tricks for how to read errors, but not necessarily worry about listing types of errors)

Tracebacks (error output) can look intimidating, but they give us a lot of useful information about what went wrong in our program, including where the error occurred and what type of error it was.

An error having to do with the ‘grammar’ or syntax of the program is called a SyntaxError. If the issue has to do with how the code is indented, then it will be called an IndentationError.

A NameError will occur if you use a variable that has not been defined, either because you meant to use quotes around a string, you forgot to define the variable, or you just made a typo.

Containers like lists and strings will generate errors if you try to access items in them that do not exist. This type of error is called an IndexError.

Trying to read a file that does not exist will give you an FileNotFoundError. Trying to read a file that is open for writing, or writing to a file that is open for reading, will give you an IOError.

## Best Practices

Program defensively, i.e., assume that errors are going to arise, and write code to detect them when they do.

Put assertions in programs to check their state as they run, and to help readers understand how those programs are supposed to work.

Use preconditions to check that the inputs to a function are safe to use.

Use postconditions to check that the output from a function is safe to use.

Test your program or function in order to make sure it is correctly preforming the way you want it to. Give your function or program an input with a know output and make sure this output is outputted. 

Write documentation before writing code in order to help determine exactly what that code is supposed to do and to assist future reader.

## Debugging 

- Know what code is supposed to do before trying to debug it.

- Make it fail fast.

- Change one thing at a time, and for a reason.

- Keep track of what you’ve done.

### Acknowledgements 

Some of the material in this lesson is derived from the Software Carpentry Lessons for Python Programming and Plotting https://swcarpentry.github.io/python-novice-inflammation/reference/ and HDSI at UC San Diego https://datascience.ucsd.edu/