# Python Scripting for Water Modellers 
## Australian Water School
---
Kevin Nebiolo, PhD. <br>
Kleinschmidt Group <br>
November 4, 2020 

# Contents
<ol>
    <li>What is Python?</li>
    <li>Where to go for help</li>
    <li>Easiest Way to Start</li>
    <li>Data Types and Structures</li>
    <li>Program Flow</li>
    <li>Modules Useful for Water Resources</li>
    <li>Example</li>
</ol>

---

# Part 1: Introduction to Python

# What is Python? 
- Powerful object-oriented open-source easily-extendable programming language
- Elegant, almonst natural language syntax that makes code readable
- Interpretated language - each line of code is read and converted to machine code rather than compiled - makes it easy to prototype
- Runs on all desktop OS

# Where to go for Help?
[Python](https://www.python.org/)<br>
Online courses like [codecademy](https://www.codecademy.com/catalog/language/python?g_acctid=243-039-7011&g_keywordid=kwd-6533013805&g_adid=434619800071&g_keyword=python%20online%20course&g_campaign=US+Language%3A+Basic+-+Exact&g_adtype=search&g_network=g&g_adgroupid=102526215538&g_campaignid=10030170703&utm_id=t_kwd-6533013805:ag_102526215538:cp_10030170703:n_g:d_c&utm_term=python%20online%20course&utm_campaign=US%20Language%3A%20Basic%20-%20Exact&utm_source=google&utm_medium=paid-search&utm_content=434619800071&hsa_acc=2430397011&hsa_cam=10030170703&hsa_grp=102526215538&hsa_ad=434619800071&hsa_src=g&hsa_tgt=kwd-6533013805&hsa_kw=python%20online%20course&hsa_mt=e&hsa_net=adwords&hsa_ver=3&gclid=Cj0KCQjwhvf6BRCkARIsAGl1GGgtuCoJ0kxJLX0o1v4FQtXXs3-ZizM4j54k1GqmO6OLrvfIbDBJOHwaAs40EALw_wcB) and [Udemy](https://www.udemy.com/topic/python/?utm_source=adwords&utm_medium=udemyads&utm_campaign=Branded-Topic_la.EN_cc.US&utm_content=deal4584&utm_term=_._ag_79612132259_._ad_387769071010_._kw_udemy%20python_._de_c_._dm__._pl__._ti_kwd-314742262049_._li_9003250_._pd__._&matchtype=e&gclid=Cj0KCQjwhvf6BRCkARIsAGl1GGh63UO04FHiTiYOl3i1T-vZoan7H9wb07FsShfZGnTBcC-UuIiS5jcaAmipEALw_wcB)<br>
Online forums like [stackoverflow](https://stackoverflow.com/questions/tagged/python)<br>

---

# Easiest Way to Start
For data science applications: [Anaconda Individual Edition](https://www.anaconda.com/products/individual)
- open source
- easy to manage environments (ArcGIS Pro, R)
- easy to manage packages with conda
- access to the SPYDER IDE, advanced editing, tab completion, interactive testing, debugging and introspection features, variable inspection, etc.
- access to Jupyter Notebook environment

---

# Part II: Python Syntax

# Language Structures
- Python syntax has structures that are equivalent to English

|English|Python|
|:--|:--|
|Sentence|**Statement** - complete computer instruction|
|Noun|**Object** - any piece of data|
|Adjective|**Property** - describes an object|
|Verb|**Method** - action an object can take|
    
    
    


## Variables
- names that are given to objects
- when the variable statement is run:
    - named object exists in RAM until we remove it with del command **be careful of how much stuff you put there!**
    - **variable substitution** , i.e. insert variable (and the object it represents) into mathematical equations, etc..
    - reuse variable as many times as we want and we can update it
- to create a variable:
    - specify the variable name
    - follow with equal sign 
    - followed by definition


In [1]:
country = "Australia"
print(country)

Australia


### Rules for Naming Objects
- cAse MaTTeRs
- must start with a letter or underscore
- can contain letters, digits, or underscores
- no spaces
- no reserved words
    - e.g.: and, or, del, for, if, print, try, except
- no quotes - text in quotes is a string
- no special characters
    - e.g.: !,@,#,$,%,^,&,*,(,),~

### Working with Variables 
- use **variables** within a **statement** instead of the **object** itself
- reconstruct complex mathematical expressions with variables

In [2]:
x = 5
y = 12
x + y

17

# Basic Data Types 
## Numbers
- integers: whole numbers
- float: decimal numbers
- expressions can include floats and integers, output is most complex
- Packages like [numpy](https://numpy.org/), [scipy](https://www.scipy.org/), and [sympy](https://docs.sympy.org/latest/index.html) offer other number types inlcuding boolean, complex numbers, arrays and matrices

Float and Integer results in a Float

In [3]:
5 + 1.5 

6.5

Integer and Integer results in an Integer

In [4]:
5 + 5

10

### Number Operations

In [5]:
3 + 2

5

In [6]:
3 - 2

1

In [7]:
11.0 / 3

3.6666666666666665

In [8]:
11.0 // 3

3.0

In [9]:
3 * 3

9

In [10]:
3 ** 3

27

## Strings
- strings are anything enclosed by quotes, treated as text
- can contain letters, symbols, and numbers
- single or double quotes are fine, but be consistent


In [11]:
str1 = "believe it or not, this is a string"
print (str1)

believe it or not, this is a string


### String Formatting
- backslashes \ have a special signficance as they initiate formatting commands
- backslash followed by a letter specifies the command to be performed
- ex: \n and \t

\n starts a new line

In [12]:
print ("line 1 \nline 2")

line 1 
line 2


\t inserts a tab

In [13]:
print ("\tline 1 \nline2")

	line 1 
line2


### Disabling String Formatting 
- file pathnames use backslashes in directories!
- place an **r** in front of the string
- use \ \ instead of \

### Sequences 
- sequences are ordered collections of items (like characters)
- strings are sequences
- characters can be retrieved from a sequence by specifying the position
- python starts counting at zero

### Retrieving Characters from a String

In [14]:
name = "Tim Smith"

# retrieve single character
name[0]

'T'

In [15]:
# or retrieve multiple characters
name[0:3]

'Tim'

In [16]:
name[-5:]

'Smith'

### String Methods

In [17]:

name.lower()

'tim smith'

In [18]:
name.upper()

'TIM SMITH'

In [19]:
name.replace('Smith','Cooper')

'Tim Cooper'

In [20]:
name.split(' ')

['Tim', 'Smith']

# Data Collections
- create data object when script is run, exists in RAM 
- basic python collections: 
    - lists, dictionaries, tuples, set
- advanced data structures (Numpy, Scipy, Pandas):
    - arrays, matrices, dataframes

## Lists
- lists are containers for objects separated by comma and enclosed with brackets
- objects are ordered, but changeable
- data types can be mixed and don't have to be unique


In [21]:
l1 = [1, "a", 3, 4, 4]
print (l1)

[1, 'a', 3, 4, 4]


- lists are sequences - we can retrieve items by specifying position

In [22]:
print (l1[2])

3


## Building Lists
- lists can be built all at once

In [23]:
l2 = [1,2,3,4]
print (l2)

[1, 2, 3, 4]


- we can add stuff to the list with an append

In [24]:
l2.append("puppy")
print (l2)

[1, 2, 3, 4, 'puppy']


- we can insert items into a list at any position

In [25]:
l2.insert(3,"red")
print (l2)

[1, 2, 3, 'red', 4, 'puppy']


### Deleting Items from a list

In [26]:
del l2[1]
print (l2)


[1, 3, 'red', 4, 'puppy']


[follow this link for all list methods](https://docs.python.org/3/tutorial/datastructures.html#more-on-lists)

## Dictionaries
- collection of objects that is unordered, changeable, and indexed
- enclosed with curly brackets and with colon separating key from its value
- no duplicate keys

In [27]:
car_1 = {'make':'Ford',
        'model':'Mustang',
        'year':1964}
print(car_1)

{'make': 'Ford', 'model': 'Mustang', 'year': 1964}


Accessing dictionary value with key

In [28]:
print (car_1['make'])

Ford


Change value

In [29]:
car_1['year'] = 2018
print (car_1)

{'make': 'Ford', 'model': 'Mustang', 'year': 2018}


## Tuples
- container of objects separated with comma and enclosed in parantheses
- immutable - canoot be iterated over whereas you can iterate over items in a list

In [30]:
pos = (100,200)
print (pos)

(100, 200)


## Sets
- unordered collection of objects separated by comma and enclosed by brackets
- primarily used to eliminate duplicate entries

In [31]:
l3 = ['apple', 'orange', 'apple', 'pear', 'orange', 'banana']
l3_set = set(l3)
print (l3_set)

{'banana', 'pear', 'orange', 'apple'}


# Program Flow
## Loops
- a loop is a set of statements that repeats until a condition is met 
- loops repeat until...
    - each item in a list or key in dictionary has been iterated over (for loop)
    - a certain condition is met (while loop)
- loops are **compound statements**


## Compound Statements
- statements that consist of multiple lines of code (loops, functions, classes, etc)
- 1st line is header line, ends in colon
- lines following header are **indented**
    - indicates that line is associated with the header
- consistent indentation is required

## for loops
- for loops iterate over items in a list or keys in a dictionary to perform a task
- the task runs once for each item 
- in each iteration...
    - an item from is assigned to a variable (ex. i)
    - indented statements run and carry out some action on i

In [32]:
l3 = [0,1,2,3,4,5]
for i in l3:
    print (10**i)

1
10
100
1000
10000
100000


## while loops
- while loops use a **test condition** to determine when to stop
- at the start of each iteration, the condition is tested
    - if the condition is true, the loop continues
    - if it is false, the loop ends
- **NOTE** the variable condition should change with each iteration or the loop will never end

In [33]:
x = 0
while x < 5:
    x = x + 1
    print (x)

1
2
3
4
5


## Decision Making
- we can program python to make decisions based on a set of conditions
- **if** statements allow a script to test conditions
    - if a condition is true, then do this...
- **if** statements are **compound statements**

In [34]:
x = 2
if x < 5:
    print ("True")

True


### What if there was more than 1 criteria? - elif
- used to perform additional tests 

In [35]:
x = 2
if x > 5: 
    print ('value is greater than 5')
elif x < 5:
    print ('value is less than 5')

value is less than 5


### What about everything else we can't fit into either category?
- **else** does not perform any conditional assessments, it simply identifies those categories that did not meet any other criteria
- runs only if all preceding **if** and **elif** tests fail
- ensures that all possible conditions are addressed

In [36]:
x = 5
if x > 5: 
    print ('value is greater than 5')
elif x < 5:
    print ('value is less than 5')
else:
    print ('the value has to be 5, what else can it be?')

the value has to be 5, what else can it be?


### Conditional Operators

|Symbol|Meaning|
|:--|:--|
|**==**|Equal To|
|**!=**|Not Equal To|
|**<**|Less Than|
|**<=**|Less Than or Equal To|
|**>**|Greater Than|
|**>=**|Greater Than or Equal To|
|**and**|True if Both Conditions are True|
|**or**|True if at least One Conditions is Met|
|**not**|Opposite of Result|
|**in**|Check if Object is in String or Data Collection|

### Custom Functions
- functions allow you to reuse code that can be called upon at a later point in the script 
- compound statement preceded with the keyword **def** followed by the function name and arguments
- the remaining statements carry out the actions of the function

In [37]:
def fibonacci (limit):
    seq0 = 0
    seq1 = 1
    print (seq0)
    while seq1 < limit:
        print (seq1)
        seq_n = seq1 + seq0
        seq0 = seq1
        seq1 = seq_n

        

In [38]:
fibonacci(1000)

0
1
1
2
3
5
8
13
21
34
55
89
144
233
377
610
987


# Part 3: Hydrology and Hydraulic Domain Specific

# Modules useful for H&H

|Module|Use|
|:--|:--|
|[**Pandas**](https://pandas.pydata.org/)|fast, powerful, data analysis and manipulation tool with time series support|
|[**Numpy**](https://numpy.org/)|fundamental package for scientific computing, implements n-dimensional arrays|
|[**Scipy**](https://www.scipy.org/scipylib/index.html)|fundamental package for scientific computing, does what Numpy does not|
|[**Matplotlib**](https://matplotlib.org/)|standard python plotting library|
|[**Plotly**](https://plotly.com/python/)|Premium plotting package|


# Tips for Scripts
- writing a script is like writing down a recipe
    - tools = modules (e.g. Pandas, numpy, etc)
    - ingredients = data (stream gage)
    - instructions = statements 
- **order of operations matters!**
- write down your steps and plan it out on paper

# Application: Flow Duration Curves
## We need to:
1. import the necessary tools to work with and plot data
2. connect to some data and pull it into Python
3. perform some data management
4. calculate exceedance probability
5. plot 

### Step 1: Import Modules

In [39]:
import pandas as pd
import plotly.graph_objects as go
import numpy as np

### Step 2: Import Data

In [40]:
url = 'https://raw.githubusercontent.com/knebiolo/AustralianWaterSchool/main/WeirRiver.csv'
data = pd.read_csv(url,skiprows = 9)
print (data.head(5))

                       Timestamp  Value  Quality Code  Interpolation Type
0  2009-06-25T02:00:00.000+10:00    0.0            10                 102
1  2009-06-25T03:00:00.000+10:00    0.0            10                 102
2  2009-06-25T04:00:00.000+10:00    0.0            10                 102
3  2009-06-25T09:00:00.000+10:00    0.0            10                 102
4  2009-06-25T10:00:00.000+10:00    0.0            10                 102


### Step 3: Fix Timestamps and Delete Unnecessary Columns

In [41]:
data.Timestamp = pd.to_datetime(data.Timestamp.values)
data.drop(columns = ['Quality Code','Interpolation Type'],inplace = True)
print(data.head(5))

                  Timestamp  Value
0 2009-06-25 02:00:00+10:00    0.0
1 2009-06-25 03:00:00+10:00    0.0
2 2009-06-25 04:00:00+10:00    0.0
3 2009-06-25 09:00:00+10:00    0.0
4 2009-06-25 10:00:00+10:00    0.0


### Step 4: Rank Stream Flows

In [42]:
data['Rank'] = data.Value.rank(ascending = False)
print(data.head(5))

                  Timestamp  Value     Rank
0 2009-06-25 02:00:00+10:00    0.0  96125.0
1 2009-06-25 03:00:00+10:00    0.0  96125.0
2 2009-06-25 04:00:00+10:00    0.0  96125.0
3 2009-06-25 09:00:00+10:00    0.0  96125.0
4 2009-06-25 10:00:00+10:00    0.0  96125.0


### Step 5: Calculate Exceedance Probability

In [43]:
n = len(data)
data['ExcProb'] = 100 * (data.Rank/n)
print(data.head(5))

                  Timestamp  Value     Rank    ExcProb
0 2009-06-25 02:00:00+10:00    0.0  96125.0  76.944936
1 2009-06-25 03:00:00+10:00    0.0  96125.0  76.944936
2 2009-06-25 04:00:00+10:00    0.0  96125.0  76.944936
3 2009-06-25 09:00:00+10:00    0.0  96125.0  76.944936
4 2009-06-25 10:00:00+10:00    0.0  96125.0  76.944936


### Step 6: Plot

In [45]:
data.sort_values(by = 'Rank', ascending = True, inplace = True)
data = data[data.Value > 0]
data['log10flow'] = np.log10(data.Value)
fig = go.Figure(data = go.Scatter(x = data.ExcProb,y = data.log10flow,))
fig.show()

# Thank You
## Questions?
Kevin Nebiolo, PhD.

kevin.nebiolo@kleinschmidtgroup.com

http://github.com/knebiolo