# Introdcution to Python for Movement Scientists

This Jupyter notebook is part of the *Python programming for the movement sciences* to give you a crude introduction to the language and some usefull biomechanics python packages. This seminar is hardly enough for you to learn Python, but it provides the basic knowledge needed to get started. <br>
The information in the introduction will be enough to understand basic python concepts that you will inspect in the tutorials!
The commands shown here are universal across all platforms (mac os/windows/linux), across programming IDEs or programming notebooks.

This notebook will show you the basiscs of Python scripting. Starting with a general flow of Python code, then the primary data types. Followed by the collections that can be used to store multiple variables, loops and condition statements. Input statement to load data from documents, and save data back into another file format. Lastly, Python functions are being discussed.

## General flow of Python script

1. Import statements at the top of the page: `import [library]`
2. Any and all functions: `def my_function():`
3. The code body

In [5]:
# 1. Import statements
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import random
from random import choice

# 2. Functions
def welcome_message(conference, location, occation):
    
    print("Welcome to {} in {}!\nWe hope you enjoy the {}".format(conference, location, occation))

    
# 3. The code
welcome_message("esmac-2024", "Oslo", "python seminar")

Welcome to esmac-2024 in Oslo!
We hope you enjoy the python seminar


## Data types
The 4 basic data types in Python are **textual data** in the form of strings (`str`); **numerical data** like integers (`int`), floating point numbers (`float`), and complex numbers; and **Booleans** like `True` and `False`. These are scaler types, meaning, single elements.

* String
~~~
"Hello"
'Hello'
~~~

* Subscripting string
~~~
print("Hello"[0]) # first
print("Hello"[-1]) # last
print("Hello" + " " + "World")
~~~

* Integer -- whole numbers
~~~
print(123 + 345)
~~~

* Float (floating point number)
~~~
pi = 3.14159
print(pi)
~~~

* Complex
~~~
2 + 3j
complex(2, 3)
~~~

* Boolean
~~~
True
False
~~~

* To go from one Data Type to the other --> Type casting
~~~
print(type(pi))
print(type(str(pi)))
# int()
# float()
# str()
~~~

<div class="alert alert-block alert-info">
<b>Note:</b> The opperations that can be done on the data depents on the data type (e.g. addition (*) works different on strings, then they do on numerical types)
</div>

#### Common errors
Common errors with these data types are Syntax errors and Type errors. 

When considering strings, the wrong use of quotation marks causes a <font color='red'>SyntaxError: </font>

~~~
"This is a 'valid' string"
~~~

~~~
"This is "not a valid" string"
~~~

<div class="alert alert-block alert-info">
    <b>Note:</b> The double use of the same quotation marks in the second  causes the SyntaxError 
</div>

Using the wrong data type for builed in functions will cause <font color='red'>TypeError:</font> 
~~~
In[ ]: len("Python")
Out [ ]: 6 

In[ ]: len(5)
Out[ ]: TypeError                                 Traceback (most recent call last)
Cell In[34], line 1
----> 1 len(5)

TypeError: object of type 'int' has no len()

~~~

<div class="alert alert-block alert-info">
    <b>Note:</b> The first example gives the output 6, meaning the string "Python" contains 6 characters; Using the len function on integers cuases a TypeError, because the integer has no length.
</div>

## Data structures
A way of organizing and storing grouped pieces of data in python. Usually these pieces of data have a relationship with each other. This is also the way to store data in a certain order (e.g. time series data). Here lists and dictionaries are shown. Lists and dictionaries are part of the standard python library and form the basis of the python language.


1. Lists:  `my_list = [item1, item2, item3]`
    - Can store any data type you'd want and can even combine them in a list
    - List output looks like an array [1, "hello", 2.2]    
    - They can be modified and added onto when needed: `my_list.appen(item4)`
    - Items can also easily be removed: `my_list.remove(item2)`

2. Dictionaries: 
    ```
    my_dict = {"key1": item1, 
             "key2": item2,
             ... }
    ```
    - A dictionary contains a key, which is a string, with an associated item. This item can be whatever, a scalar, a list or other dictionary:
    ~~~
      my_dict = {"ppID": 1,
                "sensor": "Sensor",
                "data": [14, 47, 41, 6, 35, 12, 27, 48, 0, 25, 11, 36, 26, 28, 32]}
    ~~~
    - It is very easy to index into dictionaries and explore it's content:
        - To get the keys: `my_dict.keys()`
        - To get the content of a key: `my_dict["data"]`
        - To get a specific value of my data: `my_dict["data"][4]`
    
    
    
                

<div class="alert alert-block alert-info">
    <b>Tip:</b> Indexing in the datastructures start from 0. This can be visualised as an offset from the start. Meaning, the first item has an offset of zero, whereas the second item has an ofset of 1 and so forth. The last item can be accessed with -1 (i.e. the offset of -1 from the start)
</div>

## Loops and conditional statements 
There are a few things that are important when consideing loops, statement (and also functions, but more on that later). 
  
1. The first line always end with a colon (:). If the colon is missing you get a <font color="red">SyntaxError:</font> 
2. Indentation of the following lines are important. Either a tab of 4 spaces. Otherwise you get an  
    <font color="red">IndentationError:</font>
    - If the colon is placed correctly, indentation should be automatic.

#### For loops
In python, it is very easy to loop over items in a list. In Python programming, you can directly access the items in a list of dictionary using the keyword *in*. This is unlike programming languages like MATLAB where you need to index into the list to access the desired item. The python syntax shows how powerful it is to have an easy-to-understand syntax.

Example MATLAB code:
~~~
for i=1:5
    subj = subject_list[i]
    disp("Analyzing ", subj)
    disp("Data saved on index nr: num2str(i))
end
~~~

Example Python code: 
~~~
# very simple loop
for subj in subject_list:
    print("Analyzing {}".format(subj))

# To also loop over the index for posible saving purposes
for i, subj in enumarate(subject_list):
    print("Analyzing {}".format(subj))
    print("Data saved on index nr: {}".format(i))
~~~

#### Conditional statements
~~~
x=2

if x==3:
    print("x equals 3")
elif x==2:
    print("x equals 2")
else:
    print("x equals something else")
~~~

#### While loop
~~~
still_on = True
x = 0

while still_on:
    print("I'm still confused about while statements... ")
    x += 1

    if x > 5:
        still_on = False
        print("Wait, I think I got it ;)")
~~~

## Functions
Functions in Python are usually written when a certain block of code should be repeatable or repeated many times. They can be embedded in the document itself (like at the top of this jupyter notebook). Or they can be in a separate document.

<div class="alert alert-block alert-info">    
<b>Note:</b> If the function is in a seperate document, you'd need to import it in your script: 
<b><font color="green">from</font></b> my_function_file <b><font color="green">import</font></b> my_function
</div>

The key elements of a function are: 
1. The `def` keyword
    - This tells python that the following code is a function
2. The function name followed by parentheses and a colon: `def my_function_name():`

3. Indentation matters!
    - Similar to the loops and conditional statements, the indentation of the code body should be 1 tab or 4 spaces from the border. (There are a lot of discussions on the internet on which is best, but both have the same functionality)
    - Again, if the key factors are correct, the indentation should be automatic

4. The input parameters within the parenthesis (optional).
    - If your function needs some input parameters like with the welcome message, this is where you specify them.
    - These inputs (keys) can be used to assign certain variabels to the function.
    - When calling the function, you can either use the order in which to place the variables, or you can use the keys:
    ~~~
    welcome_message("esmac-2024", "Oslo", "python seminar")
    welcome_message(location="Oslo", occation="Python seminar", conference="esmac-2024")
    ~~~

5. The return statement (optional---to return or not to return).
    - the function for the welcome message does not return anything. So once the message is printed nothing can be changed about it. There are no returned variables to play with afterward.
    - Placing the `return` keyword at the end of the function allows you to specify the variables to keep.
~~~
def welcome_message(conference, location, occation):
    msg = "Welcome to {} in {}!\nWe hope you enjoy the {}".format(conference, 
                                                                  location, occation)
    return msg
~~~

<div class="alert alert-block alert-warning">
    <b>Example:</b> In the welcome_message function defined in the first code cell there are three input parameters (conference, location, and occation) needed to print the welcome message. There we put the relevant strings in order of appearance. You could also use the keywords (e.g. *location*) to specify the input, then the order does not matter.
</div>

<div class="alert alert-block alert-info">
<b>Note:</b> The function will only return the parameters that are specified and no code should come after the return statement. That code will not run (except ofcourse in conditional statements).


### A (slightly more) relevant function example :)
~~~
import json  # json is a datastructure very similar to python dictionaries.

def grab(fl, ext):
    """
    Grabs the data that is in the folders
    :param fl: str full path to file
    :param ext: str extension of the file
    :return r: extension specific datastructure
    """

    if ext =="json":
        with open(fl, "r") as f:
            r = json.load(f)
    elif ext == "csv":
        r = pd.read_csv(fl)
    elif ext == "c3d:
        r =

    return r
~~~

Most functions contain "docstrings". That is the block of text under the function definition enclosed by the triple quotation marks. Within these docstrings you can specify what the function does, which parameters are needed, and what it returns. Once the function is specified, and you call the function, you can hover over the parenthesis with your mouse, and it will give the information presented in the docstring.

## Debugging
Debugging can be a hassle in Python. Here in Jupyter Notebook, there is no debugging button you can click. The advantage of Jupyter notebook or Google Colab is that you can run cell by cell to see where you are going wrong. But it is harder to debug functions that way, as you cannot step into the function to see what happens. In those cases small and simple IDEs like __[Thonny](https://thonny.org)__ can come in handy. Copy past your code, and visually see what happens. When using IDEs like PyCharm, DataSpell, ect, there is a debugging button. Just set a break point where you need it and run your script by pressing the *bug* icon. Then you can step into the code and see what is happening and find the error.

# Object-Oriented Programming (OOP)
Object-oriented programming is a programming paradime based on objects or data rather than functions. This is approach is well suited for projects that are large and complex. <br>
The structure of OOP makes it exceptionally suited for collaborative development and code usability. Movement science project often have different aspects to them. Starting with extracting the data, some preprocessing steps, possible filtering or normalisation, the actual analysis e.g. step detection; and calculating the outcome variables. All these different components of a movement science project could be considered **modules**. When used correctly, OOP simplifies the relationships within the project, however, it might be hard to start with, as most programming is taught using a functions approach. <br>

The first step in OOP is to identify all the objects you'd want to manipulate and how they relate to each other. The structure of OOP is as follows:
1. Classes
    - The blueprints that models real-life object
2. Objects
    - The individual objects that are generated from the blueprint
3. Attributes
    - The characteristics/variables that define the objects
4. methods
    - The functions of the objects.  

Python is currently one of the most popular programming languages that is build around Object-Oriented Programming. The telltale use of the code has the structure of **object.method()**. For example: `my_array.max()`---find the maximum value in the array; or `my_dataframe.head()`---displays the first 5 rows of the DataFrame. Here, you have the object `my_dataframe` or `my_array` that is constructed from the class Array or DataFrame and using the functionality of the methods like `max()` or `head()`, the data can be manipulated.

Now that you know a bit about OOP, we can apply this knowledge on working with data. Every object, such as a numpy, pandas or pyplot object has *methods* associated with it. **Tip:** Within Jupyter notebook you can use the TAB button to get an overview of the available methods. For example: `pd.` + TAB

<div class="alert alert-box alert-warning">
    <b>Example:</b> If you want to run a virtual restaurant you'd need (at least) a waiter, server, chef, and manager. That means that you have the <i>waiter class</i> which is the blueprint for the waiter. When createing a single waiter to handle the restaurant, then that becomes the <i>waiter objects</i>. To builed the waiter object  the two most important things that make up this object are: <br>
> what it <b>has</b>: holds_plate=True; tables_responsible=[1,2,3] <br>
> what it <b>does</b> def taking_oder(table, order): def takes_payment(amount):.<br>
What it has are the <i>atributes</i> and what it does are the <i>methods</i>. The atributes are basically variables that are associated with the modeld object. The methods are the functions that a modeld object can do.
</div>

# Working with data
For movement science, the libraries that you will definitely encounter as movement scientist are: __[NumPy](https://numpy.org)__, __[pandas](https://pandas.pydata.org)__, __[SciPy](https://scipy.org)__,  and __[Matplotlib](https://matplotlib.org)__. Before you are able to use these libraries, you need to make sure they are installed in your virtual environment. These three libraries work seamlessly together to get the best experience with handling large data-sets.<br>
Important for all these libraries, they all follow the Object-Oriented Programming (OOP) structure. What this means concretely: from the class blueprint of <i>ndarray</i> you create an array object `b = np.arange(12).reshape(3, 4)`, and then you are able to directly apply the class methods on that object `b.cumsum(axis=1)`. The same is true for the pandas DataFrame and the matplotlib's pyplot

#### NumPy
NumPy (**N**umerical **P**ython) is the most used library for scientific computing and engineering. Even is you don't use it directly, there is a good chance that other libraries you use rely on it in the background. NumPy provides powerful multidimensional array structures (*ndarray*) and the tools for comprehensive assortment of methods that operate efficiently on those arrays including mathematical, discrete Fourier transforms, basic linear algebra, basic statistics and much more.

Some essentials:
1. ndarray has a homogenous datatype
2. element-by-element operations are default (vectorization and broadcasting)
3. shape of the ndarray is fixed: e.g. [] (empty) 500 (one-dimensional), 2x500 (two-dimensional), or 2x2x500 (three-dimensional)
4. easy to pre-allocate variables `np.zeros((nrow, ncol))`
5. fully supports Object-Oriented Programming where ndarray is the class


#### pandas
Pandas is the second essential library for data science. The pandas are build on top of NumPy, and it is the easies ways to import data that is saved in text, csv, or Excel files. It has been created to work with *relational* or *labeled* data.  <br>
Some essentials:
1. The two primary data structures are `pandas.Series` (1D) and `pandas.DataFrame` (2D)
2. Series and DataFrames can contain heterogeneous datatypes (e.g. col1=str (subj_id), col2=bool (female True/False), col3=int (age), col4=float (peak_knee_angle)...)
3. Shape and size is mutable: columns can be added and deleted
4. Powerful **group by** functions: `raw_data.group_by("Sex")`
5. label-based slicing and indexing: e.g. `data.peak_knee_angle.max()` or `data["peak_knee_angle"].max()`
6. time-series specific functionality: date range generation and frequency conversion, moving window statistics, date shifting, and lagging

How to use pandas to read and write data:
~~~
import pandas as pd
data = pd.read_csv("/file_path/file_name.csv")
data.head() # prints first 5 rows of the DataFrame
data.info() # prints the detailed info for the DataFrame
data.describe() # prints the descriptive statistics of the data
data["column_name"].plot()
data.plot.scatter(x="peak_knee_angle", y="running_velocity")
data.to_excel("file_path/file_name.xlsx")
~~~


#### SciPy
SciPy (**S**cientific **P**ython) is an extension of the NumPy library that adds more scientific functionalities by providing extensions of array computing and specialized datastructures. There is a full list of methods on the user guide in the SciPy documentation,


#### matplotlib
Matplotlib is a standard visualization library for Python. There are others (e.g. seaborn, plotly), but matplotlib is a good place to start learning the basics.

Some essentials:
1. works excellently with pandas: e.g. `data["column_name"].plot()` or `data.plot.scatter(x="peak_knee_angle", y="running_velocity")` for quick and dirty visualization
2.



<div class="alert alert-block alert-info">
<b>Tip:</b> It is impossible to list all possibilities in this seminar, therefore the guide
<a href="https://numpy.org/doc/stable/user/absolute_beginners.html#numpy-the-absolute-basics-for-beginners"><b>NumPy: the absolute basics for beginners</b></a>,
<a href="https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf"><b>pandas cheat sheet</b></a>, and
<a href="https://matplotlib.org/stable/users/explain/quick_start.html"><b>matplotlib quick start guide</b></a>
are a very usefull reference guides.
</div>#%% md
# Introdcution to Python for Movement Scientists

This Jupyter notebook is part of the *Python programming for the movement sciences* to give you a crude introduction to the language and some usefull biomechanics python packages. This seminar is hardly enough for you to learn Python, but it provides the basic knowledge needed to get started. <br>
The information in the introduction will be enough to understand basic python concepts that you will inspect in the tutorials!
The commands shown here are universal across all platforms (mac os/windows/linux), across programming IDEs or programming notebooks.

This notebook will show you the basiscs of Python scripting. Starting with a general flow of Python code, then the primary data types. Followed by the collections that can be used to store multiple variables, loops and condition statements. Input statement to load data from documents, and save data back into another file format. Lastly, Python functions are being discussed.

## General flow of Python script

1. Import statements at the top of the page: `import [library]`
2. Any and all functions: `def my_function():`
3. The code body

In [6]:
# 1. Import statements
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import random
from random import choice

# 2. Functions
def welcome_message(conference, location, occation):

    print("Welcome to {} in {}!\nWe hope you enjoy the {}".format(conference, location, occation))


# 3. The code
welcome_message("esmac-2024", "Oslo", "python seminar")

Welcome to esmac-2024 in Oslo!
We hope you enjoy the python seminar


## Data types
The 4 basic data types in Python are **textual data** in the form of strings (`str`); **numerical data** like integers (`int`), floating point numbers (`float`), and complex numbers; and **Booleans** like `True` and `False`. These are scaler types, meaning, single elements.

* String
~~~
"Hello"
'Hello'
~~~

* Subscripting string
~~~
print("Hello"[0]) # first
print("Hello"[-1]) # last
print("Hello" + " " + "World")
~~~

* Integer -- whole numbers
~~~
print(123 + 345)
~~~

* Float (floating point number)
~~~
pi = 3.14159
print(pi)
~~~

* Complex
~~~
2 + 3j
complex(2, 3)
~~~

* Boolean
~~~
True
False
~~~

* To go from one Data Type to the other --> Type casting
~~~
print(type(pi))
print(type(str(pi)))
# int()
# float()
# str()
~~~

<div class="alert alert-block alert-info">
<b>Note:</b> The opperations that can be done on the data depents on the data type (e.g. addition (*) works different on strings, then they do on numerical types)
</div>

#### Common errors
Common errors with these data types are Syntax errors and Type errors.

When considering strings, the wrong use of quotation marks causes a <font color='red'>SyntaxError: </font>

~~~
"This is a 'valid' string"
~~~

~~~
"This is "not a valid" string"
~~~

<div class="alert alert-block alert-info">
    <b>Note:</b> The double use of the same quotation marks in the second  causes the SyntaxError
</div>

Using the wrong data type for builed in functions will cause <font color='red'>TypeError:</font>
~~~
In[ ]: len("Python")
Out [ ]: 6

In[ ]: len(5)
Out[ ]: TypeError                                 Traceback (most recent call last)
Cell In[34], line 1
----> 1 len(5)

TypeError: object of type 'int' has no len()

~~~

<div class="alert alert-block alert-info">
    <b>Note:</b> The first example gives the output 6, meaning the string "Python" contains 6 characters; Using the len function on integers cuases a TypeError, because the integer has no length.
</div>

## Data structures
A way of organizing and storing grouped pieces of data in python. Usually these pieces of data have a relationship with each other. This is also the way to store data in a certain order (e.g. time series data). Here lists and dictionaries are shown. Lists and dictionaries are part of the standard python library and form the basis of the python language.


1. Lists:  `my_list = [item1, item2, item3]`
    - Can store any data type you'd want and can even combine them in a list
    - List output looks like an array [1, "hello", 2.2]
    - They can be modified and added onto when needed: `my_list.appen(item4)`
    - Items can also easily be removed: `my_list.remove(item2)`

2. Dictionaries:
    ```
    my_dict = {"key1": item1,
             "key2": item2,
             ... }
    ```
    - A dictionary contains a key, which is a string, with an associated item. This item can be whatever, a scalar, a list or other dictionary:
    ~~~
      my_dict = {"ppID": 1,
                "sensor": "Sensor",
                "data": [14, 47, 41, 6, 35, 12, 27, 48, 0, 25, 11, 36, 26, 28, 32]}
    ~~~
    - It is very easy to index into dictionaries and explore it's content:
        - To get the keys: `my_dict.keys()`
        - To get the content of a key: `my_dict["data"]`
        - To get a specific value of my data: `my_dict["data"][4]`





<div class="alert alert-block alert-info">
    <b>Tip:</b> Indexing in the datastructures start from 0. This can be visualised as an offset from the start. Meaning, the first item has an offset of zero, whereas the second item has an ofset of 1 and so forth. The last item can be accessed with -1 (i.e. the offset of -1 from the start)
</div>

## Loops and conditional statements
There are a few things that are important when consideing loops, statement (and also functions, but more on that later).

1. The first line always end with a colon (:). If the colon is missing you get a <font color="red">SyntaxError:</font>
2. Indentation of the following lines are important. Either a tab of 4 spaces. Otherwise you get an
    <font color="red">IndentationError:</font>
    - If the colon is placed correctly, indentation should be automatic.

#### For loops
In python, it is very easy to loop over items in a list. In Python programming, you can directly access the items in a list of dictionary using the keyword *in*. This is unlike programming languages like MATLAB where you need to index into the list to access the desired item. The python syntax shows how powerful it is to have an easy-to-understand syntax.

Example MATLAB code:
~~~
for i=1:5
    subj = subject_list[i]
    disp("Analyzing ", subj)
    disp("Data saved on index nr: num2str(i))
end
~~~

Example Python code:
~~~
# very simple loop
for subj in subject_list:
    print("Analyzing {}".format(subj))

# To also loop over the index for posible saving purposes
for i, subj in enumarate(subject_list):
    print("Analyzing {}".format(subj))
    print("Data saved on index nr: {}".format(i))
~~~

#### Conditional statements
~~~
x=2

if x==3:
    print("x equals 3")
elif x==2:
    print("x equals 2")
else:
    print("x equals something else")
~~~

#### While loop
~~~
still_on = True
x = 0

while still_on:
    print("I'm still confused about while statements... ")
    x += 1

    if x > 5:
        still_on = False
        print("Wait, I think I got it ;)")
~~~

## Functions
Functions in Python are usually written when a certain block of code should be repeatable or repeated many times. They can be embedded in the document itself (like at the top of this jupyter notebook). Or they can be in a separate document.

<div class="alert alert-block alert-info">
<b>Note:</b> If the function is in a seperate document, you'd need to import it in your script:
<b><font color="green">from</font></b> my_function_file <b><font color="green">import</font></b> my_function
</div>

The key elements of a function are:
1. The `def` keyword
    - This tells python that the following code is a function
2. The function name followed by parentheses and a colon: `def my_function_name():`

3. Indentation matters!
    - Similar to the loops and conditional statements, the indentation of the code body should be 1 tab or 4 spaces from the border. (There are a lot of discussions on the internet on which is best, but both have the same functionality)
    - Again, if the key factors are correct, the indentation should be automatic

4. The input parameters within the parenthesis (optional).
    - If your function needs some input parameters like with the welcome message, this is where you specify them.
    - These inputs (keys) can be used to assign certain variabels to the function.
    - When calling the function, you can either use the order in which to place the variables, or you can use the keys:
    ~~~
    welcome_message("esmac-2024", "Oslo", "python seminar")
    welcome_message(location="Oslo", occation="Python seminar", conference="esmac-2024")
    ~~~

5. The return statement (optional---to return or not to return).
    - the function for the welcome message does not return anything. So once the message is printed nothing can be changed about it. There are no returned variables to play with afterward.
    - Placing the `return` keyword at the end of the function allows you to specify the variables to keep.
~~~
def welcome_message(conference, location, occation):
    msg = "Welcome to {} in {}!\nWe hope you enjoy the {}".format(conference,
                                                                  location, occation)
    return msg
~~~

<div class="alert alert-block alert-warning">
    <b>Example:</b> In the welcome_message function defined in the first code cell there are three input parameters (conference, location, and occation) needed to print the welcome message. There we put the relevant strings in order of appearance. You could also use the keywords (e.g. *location*) to specify the input, then the order does not matter.
</div>

<div class="alert alert-block alert-info">
<b>Note:</b> The function will only return the parameters that are specified and no code should come after the return statement. That code will not run (except ofcourse in conditional statements).


### A (slightly more) relevant function example :)
~~~
import json  # json is a datastructure very similar to python dictionaries.

def grab(fl, ext):
    """
    Grabs the data that is in the folders
    :param fl: str full path to file
    :param ext: str extension of the file
    :return r: extension specific datastructure
    """

    if ext =="json":
        with open(fl, "r") as f:
            r = json.load(f)
    elif ext == "csv":
        r = pd.read_csv(fl)
    elif ext == "c3d:
        r =

    return r
~~~

Most functions contain "docstrings". That is the block of text under the function definition enclosed by the triple quotation marks. Within these docstrings you can specify what the function does, which parameters are needed, and what it returns. Once the function is specified, and you call the function, you can hover over the parenthesis with your mouse, and it will give the information presented in the docstring.

## Debugging
Debugging can be a hassle in Python. Here in Jupyter Notebook, there is no debugging button you can click. The advantage of Jupyter notebook or Google Colab is that you can run cell by cell to see where you are going wrong. But it is harder to debug functions that way, as you cannot step into the function to see what happens. In those cases small and simple IDEs like __[Thonny](https://thonny.org)__ can come in handy. Copy past your code, and visually see what happens. When using IDEs like PyCharm, DataSpell, ect, there is a debugging button. Just set a break point where you need it and run your script by pressing the *bug* icon. Then you can step into the code and see what is happening and find the error.

# Object-Oriented Programming (OOP)
Object-oriented programming is a programming paradime based on objects or data rather than functions. This is approach is well suited for projects that are large and complex. <br>
The structure of OOP makes it exceptionally suited for collaborative development and code usability. Movement science project often have different aspects to them. Starting with extracting the data, some preprocessing steps, possible filtering or normalisation, the actual analysis e.g. step detection; and calculating the outcome variables. All these different components of a movement science project could be considered **modules**. When used correctly, OOP simplifies the relationships within the project, however, it might be hard to start with, as most programming is taught using a functions approach. <br>

The first step in OOP is to identify all the objects you'd want to manipulate and how they relate to each other. The structure of OOP is as follows:
1. Classes
    - The blueprints that models real-life object
2. Objects
    - The individual objects that are generated from the blueprint
3. Attributes
    - The characteristics/variables that define the objects
4. methods
    - The functions of the objects.

Python is currently one of the most popular programming languages that is build around Object-Oriented Programming. The telltale use of the code has the structure of **object.method()**. For example: `my_array.max()`---find the maximum value in the array; or `my_dataframe.head()`---displays the first 5 rows of the DataFrame. Here, you have the object `my_dataframe` or `my_array` that is constructed from the class Array or DataFrame and using the functionality of the methods like `max()` or `head()`, the data can be manipulated.

Now that you know a bit about OOP, we can apply this knowledge on working with data. Every object, such as a numpy, pandas or pyplot object has *methods* associated with it. **Tip:** Within Jupyter notebook you can use the TAB button to get an overview of the available methods. For example: `pd.` + TAB

<div class="alert alert-box alert-warning">
    <b>Example:</b> If you want to run a virtual restaurant you'd need (at least) a waiter, server, chef, and manager. That means that you have the <i>waiter class</i> which is the blueprint for the waiter. When createing a single waiter to handle the restaurant, then that becomes the <i>waiter objects</i>. To builed the waiter object  the two most important things that make up this object are: <br>
> what it <b>has</b>: holds_plate=True; tables_responsible=[1,2,3] <br>
> what it <b>does</b> def taking_oder(table, order): def takes_payment(amount):.<br>
What it has are the <i>atributes</i> and what it does are the <i>methods</i>. The atributes are basically variables that are associated with the modeld object. The methods are the functions that a modeld object can do.
</div>

# Working with data
For movement science, the libraries that you will definitely encounter as movement scientist are: __[NumPy](https://numpy.org)__, __[pandas](https://pandas.pydata.org)__, __[SciPy](https://scipy.org)__,  and __[Matplotlib](https://matplotlib.org)__. Before you are able to use these libraries, you need to make sure they are installed in your virtual environment. These three libraries work seamlessly together to get the best experience with handling large data-sets.<br>
Important for all these libraries, they all follow the Object-Oriented Programming (OOP) structure. What this means concretely: from the class blueprint of <i>ndarray</i> you create an array object `b = np.arange(12).reshape(3, 4)`, and then you are able to directly apply the class methods on that object `b.cumsum(axis=1)`. The same is true for the pandas DataFrame and the matplotlib's pyplot.

<div class="alert alert-block alert-info">
<b>Tip:</b> It is impossible to list all possibilities in this seminar, therefore
<a href="https://numpy.org/doc/stable/user/absolute_beginners.html#numpy-the-absolute-basics-for-beginners"><b>NumPy: the absolute basics for beginners</b></a>,
<a href="https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf"><b>pandas cheat sheet</b></a>,
<a href="https://docs.scipy.org/doc/scipy/tutorial/index.html#user-guide"><b>SciPy User Guide</b></a> and
<a href="https://matplotlib.org/stable/users/explain/quick_start.html"><b>matplotlib quick start guide</b></a>
are a very usefull reference guides.
</div>


#### NumPy
NumPy (**N**umerical **P**ython) is the most used library for scientific computing and engineering. Even is you don't use it directly, there is a good chance that other libraries you use rely on it in the background. NumPy provides powerful multidimensional array structures (*ndarray*) and the tools for comprehensive assortment of methods that operate efficiently on those arrays including mathematical, discrete Fourier transforms, basic linear algebra, basic statistics and much more.

Some essentials:
1. ndarray has a homogenous datatype
2. element-by-element operations are default (vectorization and broadcasting)
3. shape of the ndarray is fixed: e.g. [] (empty) 500 (one-dimensional), 2x500 (two-dimensional), or 2x2x500 (three-dimensional)
4. easy to pre-allocate variables `np.zeros((nrow, ncol))`
5. fully supports Object-Oriented Programming where ndarray is the class




#### pandas
Pandas is the second essential library for data science. The pandas are build on top of NumPy, and it is the easies ways to import data that is saved in text, csv, or Excel files. It has been created to work with *relational* or *labeled* data.  <br>
Some essentials:
1. The two primary data structures are `pandas.Series` (1D) and `pandas.DataFrame` (2D)
2. Series and DataFrames can contain heterogeneous datatypes (e.g. col1=str (subj_id), col2=bool (female True/False), col3=int (age), col4=float (peak_knee_angle)...)
3. Shape and size is mutable: columns can be added and deleted
4. Powerful **group by** functions: `raw_data.group_by("Sex")`
5. label-based slicing and indexing: e.g. `data.peak_knee_angle.max()` or `data["peak_knee_angle"].max()`
6. time-series specific functionality: date range generation and frequency conversion, moving window statistics, date shifting, and lagging

How to use pandas to read and write data:
~~~
import pandas as pd
data = pd.read_csv("/file_path/file_name.csv")
data.head() # prints first 5 rows of the DataFrame
data.info() # prints the detailed info for the DataFrame
data.describe() # prints the descriptive statistics of the data
data["column_name"].plot()
data.plot.scatter(x="peak_knee_angle", y="running_velocity")
data.to_excel("file_path/file_name.xlsx")
~~~

#### SciPy
SciPy (**S**cientific **P**ython) is an extension of the NumPy library that adds more scientific functionalities by providing extensions of array computing and specialized datastructures. There is a full list of methods on the user guide in the SciPy documentation, but some useful ones to keep an eye on are: `scipy.interpolate` for interpolation purposes; `scipy.fft` fast-fourier transform; and `scipi.signal` for signal processing.

#### matplotlib
Matplotlib is a standard visualization library for Python. There are others (e.g. seaborn, plotly), but matplotlib is a good place to start learning the basics.

Some essentials:
1. works excellently with pandas: e.g. `data["column_name"].plot()` or `data.plot.scatter(x="peak_knee_angle", y="running_velocity")` for quick and dirty visualization
2. 