# Introdcution to Python for Movement Scientists

<p>This Jupyter notebook summarizes the basic needs for Python programming. It is by no means an exclusive list, but is should be enough to get started.</p>
<p>The commands here are universal across platforms (mac os/windows/linux), across programming IDEs or programming notebooks.</p>

<p>This notebook will show you the basiscs of Python scripting. Starting with a general flow of Python code, then the primary data types. Followed by the collections that can be used to store multiple variables, loops and condition statements. Input statement to load data from documents, and save data back into an other file format. Lastly, Python functions are being discussed. </p>

## General flow of Python script

1. Import statements at the top of the page
>~~~
import [library]
>~~~
<div class="alert alert-block alert-warning">
<b>Example:</b> <b><font color="green">import</font></b> pandas <b><font color="green">as</font></b> pd
</div>
        
2. Any and all functions 
>~~~
def my_function():
>~~~
3. The code body

In [47]:
# 1. Import statements
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import random
from random import choice

# 2. Functions
def welcome_message(conference, location, occation):
    
    print("Welcome to {} in {}!\nWe hope you enjoy the {}".format(conference, location, occation))

    
# 3. The code
welcome_message("esmac-2024", "Oslo", "python seminar")

Welcome to esmac-2024 in Oslo!
We hope you enjoy the python seminar


## Data types
<p>The 4 basic data types in Python are **textual data** in the form of strings; **numerical data** like integers, floating point numbers, and complex numbers; and **Booleans** like True and False. These are scaler types, meaning, single elements. </p>


<div class="alert alert-block alert-info">
<b>Note:</b> The opperations that can be done on the data depents on the data type (e.g. addition (*) works different on strings, then they do on numerical types)
</div>

In [14]:
#String
"Hello"
'Hello'

#Subscripting string
print("Hello"[0]) # first 
print("Hello"[-1]) # last
print("Hello" + " " + "World")

#Integer -- whole numbers
print(123 + 345)

#Float (floating point number)
pi = 3.14159
print(pi)

#Complex 
2 + 3j
complex(2, 3)

#Boolean
True
False

#To go from one Data Type to the other --> Type casting
print(type(pi))
print(type(str(pi)))
# int()
# float()
# str()

H
o
Hello World
468
3.14159
<class 'float'>
<class 'str'>


<!-- ### Common errors
<p>Common errors with these data types are Syntax errors and Type errors. </p>
<p>When considering strings, the wrong use of quotation marks causes a *Syntax Error*. </p>
<ul>
    <li>Example: "This is a 'valid' string" 
    <p>out[ ]: "This is a 'valid' string"</p>
    <li>Example: "this is "not a valid" string"
    <p>out[ ]: <font color='red'>SyntaxError: invalid syntax </font></p>
</ul>

<p>Using the wrong data type for builed in functions will cause *Type Errors* </p>
<ul>
    <li>Example: len("Python") 
    <p>out[ ]: 6 </p>
    <li>Example: len(5)
    <p>out[ ]: <font color='red'>TypeError: object of type 'int' has no len()</font> </p>
</ul>        
 -->


#### Common errors
Common errors with these data types are Syntax errors and Type errors. 

When considering strings, the wrong use of quotation marks causes a <font color='red'>SyntaxError: </font>

~~~
"This is a 'valid' string"
~~~

~~~
"This is "not a valid" string"
~~~

<div class="alert alert-block alert-info">
    <b>Note:</b> The double use of the same quotation marks in the second  causes the SyntaxError 
</div>

Using the wrong data type for builed in functions will cause <font color='red'>TypeError:</font> 
~~~
In[ ]: len("Python")
Out [ ]: 6 

In[ ]: len(5)
Out[ ]: TypeError                                 Traceback (most recent call last)
Cell In[34], line 1
----> 1 len(5)

TypeError: object of type 'int' has no len()

~~~

<div class="alert alert-block alert-info">
    <b>Note:</b> The first example gives the output 6, meaning the string "Python" contains 6 characters; Using the len function on integers cuases a TypeError, because the integer has no length.
</div>

## Data structures
A way of organizing and storing grouped pieces of data in python. Usually these pieces of data have a relationship with each other. This is also the way to store data in a certain order (e.g. time series data). Here lists and dictionaries are shown. Lists and dictionaries are part of the standard python library and form the basis of the python language.


1. Lists
    - Can store any data type you'd want and can even combine them in a list
    - List output looks like an array [1, "hello", 2.2]    
    - They can be modified and added onto when needed. 
    
    >~~~
    my_list = [item1, item2, ...]
    >~~~

2. Dictionaries
    - A dictionary contains a key, which is a string, with an associated item. This item can be whatever, a scalar, a list or other dictionary.
    
    >~~~
    my_dict = {"key1": item1, 
             "key2": item2,
             ... }
    >~~~                  

<div class="alert alert-block alert-info">
    <b>Tip:</b> Indexing in the datastructures start from 0. This can be visualised as an offset from the start. Meaning, the first item has an offset of zero, whereas the second item has an ofset of 1 and so forth. The last item can be accessed with -1 (i.e. the offset of -1 from the start)
</div>

In [42]:
#Lists
my_list_of_numbers = [4, 3, 2, 1, 0]
my_list_of_strings = ["apple", "banana", "orange"]

#Indexing in list
my_list_of_strings[1] 
my_list_of_strings[-1]

#Change existing item in list 
my_list_of_numbers[2] = 5

#Add item to list 
my_list_of_strings.append("peach")

#Remove item from list
my_list_of_strings.remove("banana")

#Dictionaries
my_dict = {"ppID": 1, 
            "sensor": "Sensor",
            "data": [14, 47, 41, 6, 35, 12, 27, 48, 0, 25, 11, 36, 26, 28, 32]}

#Indexing in dictionary
my_dict.keys() # get the keys
my_dict["data"] # get the data values 
my_dict["data"][3] # get the fourth item of the data

#numpy array
my_array = np.linspace(1,10)
my_matrix = my_array.reshape(10,5)

#pandas DataFrame
my_df = pd.DataFrame({'Data': my_array})

Unnamed: 0,Data
0,1.0
1,1.183673
2,1.367347
3,1.55102
4,1.734694


## Loops and conditional statements 
There are a few things that are important when consideing loops, statement (and also functions, but more on that later). 
  
1. The first line always end with a colon (:). If the colon is missing you get a <font color="red">SyntaxError:</font> 
2. Indentation of the following lines are important. Either a tab of 4 spaces. Otherwise you get an  
    <font color="red">IndentationError:</font>
    - If the colon is placed correctly, indentation should be automatic.

#### For loops
In python it is very easy to loop over items in a list. In Python programming, you can directly access the items in a list of dictionary using the keyword *in*. This is unlike programming languages like MATLAB where you need to index into the list to access the desired item. The python syntax shows how powerful it is to have an easy to understand syntax.

Example MATLAB code:
>~~~
for i=1:5
    subj = subject_list[i]
    disp("Analyzing ", subj)
>~~~    

Example Python code: 
>~~~
for subj in subject_list:
    print("Analyzing {}".format(subj))
>~~~

In [22]:
#Just print item in list
print("Looping over the numbers in a list:")
for number in my_list_of_numbers:
    print(number)
    
print('\nLooping over the items in a list and printing the index along side it:')
#Print item in list and extract the index of that item with the enumerate function
for index, string in enumerate(my_list_of_strings):
    print(index, string)

Looping over the numbers in a list
4
3
5
1
0

Looping over the items in a list and printing the index along side it
0 apple
1 orange
2 peach


#### Conditional statements

In [17]:
x=2

if x==3:
    print("x equals 3")
elif x==2:
    print("x equals 2")
else:
    print("x equals something else")    

x equals 2


#### While loop

In [18]:
still_on = True
x = 0

while still_on:
    print("I'm still confused about while statements... ")
    x += 1
    
    if x > 5:
        still_on = False
        print("Wait, I think I got it ;)")

I'm still confused about while statements... 
I'm still confused about while statements... 
I'm still confused about while statements... 
I'm still confused about while statements... 
I'm still confused about while statements... 
I'm still confused about while statements... 
Wait, I think I got it ;)


## Functions
Functions in Python are usually written when a certain block of code should be repeatable or repeated many times. They can be enbedded in the document itself (like at the top of this jupyter notebook). Or they can be in a seperate document. 
<div class="alert alert-block alert-info">    
<b>Note:</b> If the function is in a seperate document, you'd need to import it in your script: 
<b><font color="green">from</font></b> my_function_file <b><font color="green">import</font></b> my_function
</div>

The key elements of a function are: 
1. The **<font color="green">def</font>** keyword
    - This tells python that the following code is a function
2. The **function name** followed by parentheses and a colon. 
> ~~~
def my_function_name():
> ~~~

3. **Indentation matters!**
    - Similar to the loops and conditional statements, the indentation of the code body should be 1 tab or 4 spaces from the border. 
    - Again, if the key factors are correct, the indentation should be automatic
    
***
4. The **input parameters** hithin the paranthesis (optional). 
<div class="alert alert-block alert-warning">
    <b>Example:</b> In the welcome_message function defined in the first code cell there are three input parameters (conference, location, and occation) needed to print the welcome message. There we put the relevant strings in order of appearance. You could also use the keywords (e.g. *location*) to specify the input, then the order does not matter. 
</div>

> ~~~
welcome_message("esmac-2024", "Oslo", "python seminar")
welcome_message(location="Oslo", occation="Python seminar", conference="esmac-2024")
> ~~~

5. The **return** statement (optional---to return or not to return). 
    - the function for the welcome measage does not return anything. So once the mesage is printed nothing can be changed about it. there are no retured variables to play with afterwards. 
    - Placing the **<font color="green">return</font>** keyword at the end of the function allowes you to specify the variables to keep.
***    

> ~~~
def welcome_message(conference, location, occation):
    msg = "Welcome to {} in {}!\nWe hope you enjoy the {}".format(conference, 
                                                                  location, occation)
    return msg
> ~~~
    
<div class="alert alert-block alert-info">
<b>Note:</b> The function will only return the parameters that are specified and no code should come after the return statement. That code will not run (except ofcourse in conditional statements).
    



In [51]:
# example function
import json  # json is a datastructure very similar to python dictionaries.

def grab(fl, ext):
    """
    Grabs the data that is in the folders
    :param fl: str full path to file
    :param ext: str extension of the file
    :return:
    """
    
    if ext =="json":
        with open(fl, "r") as f:
            r = json.load(f)
    elif ext == "csv":
        r = pd.read_csv(fl)

    return r




## Debugging
This can be a hassle in Python. <br>
Also here in Notebook, there is no debugging button you can click. The advantage of jupyteer notebook or google colab is that you can run cell by cell to see where you are going wrong. But it is harder to debug functions that way, as you cannot step into the function to see what happens. In those cases small and simple IDEs like __[Thonny](https://thonny.org)__ can come in handy. Copy past your code, and visually see what happens. <br>
When using IDEs like PyCharm, DataSpell, ect, there is a debuggin button. Just set a break point and start your script by pressing the bug icon. Then you an step into the code and see what is happening or find the error.

# Working with data
For movement science, the libraries that you will defenetly encounter as as movement scientist are __[NumPy](https://numpy.org)__, __[pandas](https://pandas.pydata.org)__,  and __[Matplotlib](https://matplotlib.org)__. Before you are able to use these libraries, you need to make sure they are installed in your virtual environment. These three libraries work seemlessly together to get the best experience with handeling large data-sets.<br>


#### NumPy
NumPy (**N**umerical **P**ython) is the most used library for scientific computing and engeneering. Even is you don't use it directly, there is a big chance that other libraries you use rely on it in the background. NumPy provides powerfull multi-dimensional array structures (*ndarray*) and the tools for comprehensive assortment of methods that opperate efficiently on those arrays including mathematical, discrete Fourier transforms, basic linear algebra, basic statistics and much more. <br>

Some essentials NymPy: 
1. ndarray has a homogenous datatype  
2. element-by-element opperations are default (vectorization and broadcasting)
3. shape of the ndarray is fixed: e.g. [] (empty) 500 (one-dimensional), 2x500 (two-dimensional), or 2x2x500 (three-dimensional)
4. easy to pre-allocate variables np.zeros((nrow, ncol))
5. fully supports object oriented programming where ndarray is the class


It is impossible to list all possibilities for this seminar, therefore the guide __[NumPy: the absolute basics for beginners](https://numpy.org/doc/stable/user/absolute_beginners.html#numpy-the-absolute-basics-for-beginners)__ is a very usefull reference guide.



<div class="alert alert-box alert-info">
<b>Note:</b> Important about all these libraries is that they all follow the Object Oriented Programming (OOP) structure. What this means concreately: From the class blueprint of <i>ndarray</i> you create an array and then you are able to directly modify and perform opperateions on that array. 
    
~~~
>>> b = np.arange(12).reshape(3, 4)
>>> b.cumsum(axis=1)  # cumulative sum along each row
array([[ 0,  1,  3,  6],
       [ 4,  9, 15, 22],
       [ 8, 17, 27, 38]])
~~~
    
The same is true for the pandas DataFrame and the matplotlib's pyplot
</div>

***
3. NumPy's Array
    - The __[NumPy](https://numpy.org)__ library is a fundamental package for scientific computing.
    - To create an array of equally spaced floats (e.g. for a timeline) you can use the following code"
    
    >~~~
    import numpy as np
    my_array = np.linspace(1,10)
    my_matrix = my_array.reshape(10,5)
    >~~~   
    
#### standard ndarray
>~~~
import numpy as np
my_array = np.array([1,2,3]) # 1-dimensional array
my_matrix = np.array([[1,2,3], [4,5,6]]) # 2-dimensional array

>~~~

#### shape, size and type
>~~~
my_matrix.shape #tuple containing (row,column)
my_matrix.size # number of elements
my_matrix.dtype # get the type of variables
>~~~

#### standard math
>~~~
my_matrix.max(axis=0) # maximum of the rows
my_matrix.max(axis=1) # maximum of the columns
my_matrix.max() # overall maximum
my_array.mean()
>~~~        
    
 
4. Panda's DataFrame
    - The __[pandas](https://pandas.pydata.org)__ library is the second fundamental package used for scientific computing. 
    - It provides easy-to-use data structures and analysis tools on those data structures. 
    - To create a pandas DataFrame, the following code can be used: 
    
    >~~~
    import pandas as pd
    my_df = pd.DataFrame({'Data': my_array})
    >~~~
    - Note, the input structure is very similar to that of a dictionary. with a key given as a string, and the items in the form of a list.
    - The DataFrames are displayed in tables with column headers, making it easy to visualize the data
    
    >~~~
    my_df.head()
    >~~~
    

# Object Oriented Programming (OOP)
Object oriented programming is a programming paradime based on objects or data rather than functions. This is approach is well suited for projects that are large and complex. An object is basically a blueprint for <br> 
The structure of OOP makes it exceptionally suited for colaborative development and code reusability. Movement science project often have different aspects to them. Starting with extracting the data, some preprocessing steps, posible fitering or normalisation, the actual analysis; e.g. step detection; and calculating the outcome variables. All these different components of a movement science project can be **modules**. When used correctly, OOP simplefies the relationships within the project. <br>


## How to use OOP
The first step in OOP is to identify all the objects you'd want to manupulate and how they relate to eacht other.
The structure of an OOP is as followes:
1. classes
    - Blueprints that models real-life object
2. objects
    - Individual objects that are generated from the blueprint
3. atributes
    - The characteristics that define the objects
4. methods
    - The functions of the objects.  

Python is currently one of the most popular programming languages that is builed around object oriented programming. The telltale use of the code becomes my_array.max()---find the maximum value in my array or my_dataframe.head()---display the first 5 rows of my DataFrame. Here, you have the object my_dataframe or my_array that is constructed from the class Array or DataFrame. Using the functionality of the methods, the data can be manipulated.

<div class="alert alert-box alert-info">
    <b>Example:</b> If you want to run a virtual restaurant you'd need (at least) a waiter, server, chef, and manager. That means that you have the <i>waiter class</i> which is the blueprint for the waiter. When createing a single waiter to handle the restaurant, then that becomes the <i>waiter objects</i>. To builed the waiter object  the two most important things that make up this object are: <br> 
> what it <b>has</b>: holds_plate=True; tables_responsible=[1,2,3] <br>
> what it <b>does</b> def taking_oder(table, order): def takes_payment(amount):.<br>
What it has are the <i>atributes</i> and what it does are the <i>methods</i>. The atributes are basically variables that are associated with the modeld object. The methods are the functions that a modeld object can do.
</div>

### Constructing a objects
