<img alt="" src="//upload.wikimedia.org/wikipedia/commons/thumb/2/2e/Logo_der_Technischen_Universit%C3%A4t_Berlin.svg/200px-Logo_der_Technischen_Universit%C3%A4t_Berlin.svg.png" decoding="async" width="150" height="112">

# Python for ecohydrology

Prof. Dr. Eva Paton & Dr. Pedro Alencar


## Python - Data Analysis

This notebook presents:

* NumPy
* Pandas
* Seaborn (basics)

## NumPy

NumPy is the linear algebra library for Python and is used as a building block for many other libraries.

Using the Anaconda distribution for python you can install NumPy by typing in your terminal the following command:

`conda install numpy`

You can get more information about NumPy [here](https://numpy.org/doc/stable/).

### Load NumPy

In [None]:
import numpy as np

### NumPy Arrays

In [None]:
x = [1,2,3,4]
np.array(x)

In [None]:
y = [[1,2],[3,4]]
np.array(y)

NumPy arrays are objects themselves and have methods.

In [None]:
z = np.array(x)

z.argmax()

### NumPy Methods

In [None]:
np.arange(0,10,2) # similar to list(range(0,10,2)), but the result is an array

In [None]:
np.zeros((3,3))

In [None]:
np.ones((3,3))

In [None]:
np.eye(3)

In [None]:
np.linspace(0,5,5) # similar to np.arange(0,5,1.25), but the upper value is included

In [None]:
np.random.rand(5) # uniform distribution in [1,0)

In [None]:
np.random.randn(5) # normal distribution (mean = 0; st.dev. = 1)

In [None]:
a = 1
b = 5
np.random.randint(a,b) # random integer in the interval [a,b)

### Slicing and selecting

In [None]:
x = np.arange(0,10)

In [None]:
x[2]

In [None]:
x[:2]

In [None]:
x[:-2]

In [None]:
x[-2:]

In [None]:
y = np.array([[1,2],[3,4]])

In [None]:
y[1] # single row

In [None]:
y[1][1] # single element: [rwo][col]

In [None]:
y[1,1] # single element: [rwo,col]

In [None]:
y[:,1] # single col

In [None]:
x[x>5] # selects all values in x larger than 5

In [None]:
x[x%2 == 0] # selects all even values in x 

### Operations

Arithmetic operations between lists is not allowed. To do so, we need to use NumPy arrays.

In [None]:
a = [1,2,3]

# Uncomment the lines below one at time to to test what is the output of arithmetic operation on lists:
# a+a # error
# a*a # error
# a-2 # error
# a*2 # appends a identical list

In [None]:
x+x

In [None]:
x*x

In [None]:
x*2

In [None]:
x**2

### NumPy functions

In [None]:
np.sqrt(x)

In [None]:
np.log(x) # base e

In [None]:
np.log10(x) # base 10

In [None]:
np.max(x) # similar to x[x.argmax()]

## Pandas

Pandas allows you to work with labled data (data frames) and is one of the most popular libraries from python as well.

Using the Anaconda distribution for python you can install Pandas by typing in your terminal the following command:

`conda install pandas`

You can get more information about Pandas [here](https://pandas.pydata.org/).

In [None]:
import pandas as pd

### Series

One of the basic structures from pandas, is very similar to NumPy arrays, but allows labelling. Series can be created from NumPy arrays, lists or dictionaries.

In [None]:
x = [1,2,3,4]
y = np.array([1,2,3,4])
z = {'':1,'b':2,'c':3,'d':[4,5]}

In [None]:
pd.Series(data=x, index = ['a', 'b', 'c', 'd'])

In [None]:
pd.Series(data=y, index = ['a', 'b', 'c', 'd'])

In [None]:
pd.Series(data=z) # noe that the dtype is different!

In [None]:
x1 = pd.Series(data=x, index = ['a', 'b', 'c', 'd'])
x2 = pd.Series(data=x, index = ['a', 'b', 'c', 'e'])

In [None]:
x1['a']

In [None]:
x1+x1

In [None]:
x1+x2

### DataFrames

In [None]:
df = pd.DataFrame(np.random.rand(8,4),columns=['a','b','c','d'], index = ['e','f','g','h','i','j','k','l'])
df

In [None]:
df['a'] # selects a column

In [None]:
df[['a', 'c']] # selects multiple columns

In [None]:
df.loc['e'] # selects a row by row name

In [None]:
df.iloc[0] # selects a row by position

In [None]:
df.iloc[0] # selects an element by position

In [None]:
df['new'] = df['a']*df['b']

In [None]:
df = df.assign(new1 = lambda dataframe: dataframe['a']*dataframe['b']) #similar to dplyr::mutate (R)

In [None]:
df.transform(lambda x: x + 1) #similar to dplyr::mutate_all (R)

In [None]:
df.drop(['new', 'new1'],axis=1)

In [None]:
df.head()

In [None]:
df.drop(['new', 'new1'],axis=1, inplace = True)

In [None]:
df.tail()

In [None]:
df.drop('l',axis=0, inplace = True)
df.tail()

In [None]:
df.reset_index()

In [None]:
df.reset_index(drop = True)

In [None]:
df2 = pd.DataFrame(np.random.rand(100,4),columns=['a','b','c','d'])
df2.head()

In [None]:
df2['group'] = np.random.choice(np.array(['x','y']), size = 100, replace = True)

In [None]:
df2.groupby("group").mean()

In [None]:
df2.groupby("group").std()

In [None]:
df2['a'].apply(lambda x: x+1)

## Seaborn

Seaborn is a library for visualization based on the more broad [matplotlib](https://matplotlib.org/). It provides a more friendly and intuitive interface to generate high-quality graphics.

Using the Anaconda distribution for python you can install Seaborn by typing in your terminal the following command:

`conda install seaborn`

You can get more information about Seaborn [here](https://seaborn.pydata.org/).

In [None]:
import seaborn as sns

In [None]:
# Apply the default theme
sns.set_theme()

# Load an example dataset
tips = sns.load_dataset("tips")

tips.head()

In [None]:
# Create a visualization
sns.relplot(
    data=tips,
    x="total_bill", y="tip", col="time",
    hue="smoker", style="smoker", size="size",
)

In [None]:
sns.lmplot(data=tips, x="total_bill", y="tip", col="time", hue="smoker")

In [None]:
sns.displot(data=tips, x="total_bill", col="time", kde=True)

In [None]:
sns.catplot(data=tips, kind="violin", x="day", y="total_bill", hue="smoker", split=True)