# Python for (open) Neuroscience

_Lecture 2.1_ - Real world Python on real world data 

Luigi Petrucco

## Before starting

- Last main course lecture!
- Final module will consist of optional advanced lectures - refer to the schedule in the group

## Outline

- Working with notebooks
- Import code in notebooks and scripts
- Exploring and organizing files with Python

## Notebooks & interactive prompts

What is the difference between running a script, or work in an interactive environment? (Jupyter or Spyder)

### Running scripts

```bash
> python script_to_run.py 
```

An interpreter is opened, some Python code (previously existing in a file) is processed, and at the end the interpreter is closed. Variables are created and destroyed in the course of the process.

### Running interactive environments

An interpreter is opened, and remains open waiting for code lines to be passed. Code can be passed by writing and running cells (as in a notebook), writing commands in an interpreter (Terminal interpreter, Spyder), or executing scripts or parts of scripts (Spyder). Variables remain defined as long as the session is open.

### `Jupyter` notebooks

Interactive Python development platform based on a browser interface (The whole thing is powered by some python and javascript code that is provided by the `Jupyter` project)

There's two components of the `Jupyter` system: the interface (left) amd the kernel (right)

![folderschema](./files/folderschema-08.png)

**Practical:** 

Open jupyter from the terminal invoking `jupyter notebook` (from your `base` environment)

The **user interface** we see is a browser page divided in markdown cells (for comments) and Python code cells

![folderschema](./files/folderschema-09.png)

**Practical**: create a new code cell (`+` button above), and write some code. Make a new cell, and covert it to a comment cell (selecting the cell and pressing `M`, or by `Cell > Cell type > Markdown`)

### The python kernel

A notebook is always executing code using its ***kernel***, just a fancy word to call **an open interpreter** from one of your environments (depending on the configuration of the notebook).

![folderschema](./files/folderschema-10.png)

Notebooks run in a browser but are **NOT** running online if you launch `jupyter notebook` from the terminal! 

Relying on the browser is just a convenient way to have a nice graphical interface where you can divide and execute code in separated cells, add markdown comments, show plots, etc.

Also, the same notebook can run online on remote computing platforms (e.g., colab). One of the nice things about notebooks!

Each notebook interacts with a new Python session! You can open more notebooks after you run jupyter, but each of them will open a new Python kernel, so it won't see the variables and code defined in another notebook!

**Practical**: 
- create a new notebook, selecting the `course-env` environment. Write a simple `test_function` function that prints `hello` in that notebook.
 - open a second notebook using the same environment, and try to use `test_function` function without defining it. Can you do that?

### Share code across notebooks

There are multiple ways of making code available across notebooks:

- Define functions in scripts in the notebook folder  (a good starting point)
- Keep functions in scripts somewhere in your computer, and append their location to `sys` in the notebook  👎
- Define pip-installable custom modules from which code can be imported  🤩 💯



What we will look at:
    
- Define functions in scripts in the notebook folder  (a good starting point)

The easiest is to put all functions you want to make available across notebooks in a python `.py` file in the same folder as the notebook, and to import the functions from there.

```
code-folder
        ├── Notebook1.ipynb
        ├── Notebook2.ipynb
        └── custom_utils.py 
```

Functions defined in `custom_utils.py` can be imported in the notebooks with:
```python
from custom_utils import function
```

**Practical**: 

 - Open a new text file in the folder where you created the two notebooks. You can do it in jupyter  "New" > "Text file"
 - Rename the file something like `custom_utils.py` (the `.py` extension here is important!), and save it!
 - Restart the notebook kernel ("Kernel" > "Restart") 
 - Import the custom function and run it

## Working with the filesystem

The first important skill when you start working with your local data!

###  `Path` objects

The fundamental tool for working with the filesystem is the `Path` class from the `pathlib` module.

(You can find people doing it using the `os` module, but it is old and not recommended as of 2024)

In [3]:
from pathlib import Path

A `Path` object is initialized with a string indicating a location:

In [4]:
a_path = Path("/Users/vigji/code")
# Note 1: to run those examples on your machine, you have to change the path strings!
# Note 2: in windows backslashes are problematic! To fix, start string with r (for "raw-string"):
# a_windows_path = Path(r"C:\Users\vigji\code")

We can define a Path object with a non-existing path, but we can check if a path exists using the exist() method:

In [6]:
a_path = Path("/Users/vigji/code/python-cimec-2024/lectures/Lecture1.0_Numpy-intro.ipynb")
print(a_path.exists())

True


In [7]:
a_wrong_path = Path("/Users/pippo")
print(a_wrong_path.exists())

False


`Path` objects have some useful attributes:

In [8]:
a_path.name  # name of the file (string)  

'Lecture1.0_Numpy-intro.ipynb'

In [9]:
a_path.stem  # name of the file without extension (string)

'Lecture1.0_Numpy-intro'

In [10]:
a_path.suffix

'.ipynb'

In [12]:
a_folder_path = Path("/Users/vigji/code/python-cimec-2024/lectures")
a_folder_path.suffix

''

In [13]:
a_path.parent  # folder containing the file (Path object)

PosixPath('/Users/vigji/code/python-cimec-2024/lectures')

In [14]:
a_path.parent

PosixPath('/Users/vigji/code/python-cimec-2024/lectures')

### `glob()` and regular expressions

We can browse the filesystem using the `.glob()` method. `.glob()` finds all files in the folder from which it is called that match a specific pattern.

The pattern we pass to `glob()` is a so-called <span style="color:indianred">regular expression</span> (or _regex_). Regular expressions are strings that we can use to specify the features of the string we are looking for.

The most common one is the symbol for "any string": `*`. If we just put `*` in our regular expression, we get any possible string match, that is to say, the whole content of the folder:

In [18]:
a_path = Path("/Users/vigji/code/python-cimec-2024/lectures")

for path in a_path.glob("*.ipynb"):  # match all files in the folder
    print(path)

/Users/vigji/code/python-cimec-2024/lectures/Lecture1.0_Numpy-intro.ipynb
/Users/vigji/code/python-cimec-2024/lectures/Lecture1.5_More-plotting.ipynb
/Users/vigji/code/python-cimec-2024/lectures/Lecture2.1_Real-world-data.ipynb
/Users/vigji/code/python-cimec-2024/lectures/Lecture0.2_Flow-controls-style.ipynb
/Users/vigji/code/python-cimec-2024/lectures/Lecture0.0.1_Python-syntax.ipynb
/Users/vigji/code/python-cimec-2024/lectures/Lecture1.2_Intro-pandas.ipynb
/Users/vigji/code/python-cimec-2024/lectures/Lecture1.4_More-pandas.ipynb
/Users/vigji/code/python-cimec-2024/lectures/Lecture0.1_Containers.ipynb
/Users/vigji/code/python-cimec-2024/lectures/Lecture1.1_Numpy.ipynb
/Users/vigji/code/python-cimec-2024/lectures/Lecture1.3_More-pandas.ipynb
/Users/vigji/code/python-cimec-2024/lectures/Lecture0.3_Functions.ipynb
/Users/vigji/code/python-cimec-2024/lectures/Lecture2.0_Real-world-python.ipynb
/Users/vigji/code/python-cimec-2024/lectures/Lecture0.0.0_Introduction.ipynb
/Users/vigji/code/p

We can create more complicated patterns to look for specific files, or files with a word in the name. 

For example, to look for all jupyter notebooks here (extension `".ipynb"`), we can write:

In [None]:
# match all paths in the folder whose name finishes with .md (and starts with anything):
for path in sorted(a_path.glob("*.ipynb")):
    print(path)

Or to look at all files of the second module - whose name starts with `"Lecture1."`:

In [19]:
# match all paths in the folder whose name starts with python and (finishes with anything):
for path in a_path.glob("Lecture1."):
    print(path)

Or look at all files whose name contains `"pandas"`:

In [20]:
# match all paths in the folder whose name contains pandas (could find anything before and after):
for path in a_path.glob("*pandas*"):
    print(path)

/Users/vigji/code/python-cimec-2024/lectures/Lecture1.2_Intro-pandas.ipynb
/Users/vigji/code/python-cimec-2024/lectures/Lecture1.4_More-pandas.ipynb
/Users/vigji/code/python-cimec-2024/lectures/Lecture1.3_More-pandas.ipynb


If we want to look if a path is a folder (as opposed to a file name), we can use the `.isdir()` method:

In [21]:
# match all paths in the folder that are folders:
for path in a_path.glob("*"):
    if path.is_dir():
        print(path)

/Users/vigji/code/python-cimec-2024/lectures/files
/Users/vigji/code/python-cimec-2024/lectures/.ipynb_checkpoints


We can include subfolders in our search using `.rglob()` (short for recursive `.glob()`)

In [22]:
# match all paths in the folder and its subfolders:
for path in a_path.rglob("*"):
    print(path)

/Users/vigji/code/python-cimec-2024/lectures/Lecture1.0_Numpy-intro.ipynb
/Users/vigji/code/python-cimec-2024/lectures/Lecture1.5_More-plotting.ipynb
/Users/vigji/code/python-cimec-2024/lectures/Lecture2.1_Real-world-data.ipynb
/Users/vigji/code/python-cimec-2024/lectures/Lecture0.2_Flow-controls-style.ipynb
/Users/vigji/code/python-cimec-2024/lectures/Lecture0.0.1_Python-syntax.ipynb
/Users/vigji/code/python-cimec-2024/lectures/.DS_Store
/Users/vigji/code/python-cimec-2024/lectures/Lecture1.2_Intro-pandas.ipynb
/Users/vigji/code/python-cimec-2024/lectures/Lecture1.4_More-pandas.ipynb
/Users/vigji/code/python-cimec-2024/lectures/Lecture0.1_Containers.ipynb
/Users/vigji/code/python-cimec-2024/lectures/rise.css
/Users/vigji/code/python-cimec-2024/lectures/Lecture1.1_Numpy.ipynb
/Users/vigji/code/python-cimec-2024/lectures/Lecture1.3_More-pandas.ipynb
/Users/vigji/code/python-cimec-2024/lectures/Lecture0.3_Functions.ipynb
/Users/vigji/code/python-cimec-2024/lectures/files
/Users/vigji/cod

Note that `.glob()` does not return a list, but a `generator` object - we can loop on a generator, but not index it!

In [24]:
an_assumed_list = list(a_path.glob("*"))
an_assumed_list[1]  # this will give an error as a generator is not subscriptable

PosixPath('/Users/vigji/code/python-cimec-2024/lectures/Lecture1.5_More-plotting.ipynb')

If you want to index a file path, you have to convert the result to list first!

In [None]:
a_file_list = list(a_path.glob("*"))
a_file_list

Also, note that you can't count on the files to be sorted. This can be [very important to remember](https://discuss.python.org/t/a-code-glitch-may-have-caused-errors-in-more-than-100-published-studies/2583/3)!

If you want some sorting, you should do it yourself (e.g. using the `sorted()` function):

In [None]:
for path in sorted(a_path.glob("*")):  # alphabetically sorted() files
    print(path)

### Concatenating paths

We can use the `/` operator to concatenate parts of the path **independently from the OS we are on**. This is because here we use `/` as a python operator, **not as a string**! `pathlib` will know what are the OS requirements for slashes.

In [None]:
course_path = Path("/Users/vigji/code/python-cimec")
lectures_path = a_path / "lectures"  # this will work regardless of the OS

### Create folders

`Path` objects can be used to write new directories in the filesystem using `.mkdir()`. It throws an error if the folder exists, unless we specify `exist_ok=True` argument:

In [27]:
a_path = Path("/Users/vigji/new_folder")
a_path.mkdir(exist_ok=True)

If we want to create a path in a location that does not exist yet, we can create in one single shot the required parent folders using the `parents=True` argument (if we don't, the method will complain as we are trying to create a directory in a directory that does not exist).

In [30]:
a_subfolder_path = a_path / "subfolder" / "subsubfolder" / "subsubsubfolder"
a_subfolder_path.mkdir(parents=True)

### Move files

We can use the `.replace()` method of a path to move a file to a new location that we pass as input to the method. 

In [31]:
a_path = Path("/Users/vigji/new_container_folder")
a_path.mkdir(exist_ok=True)

In [32]:
path_to_move = Path("/Users/vigji/new_folder")
path_to_move.mkdir(exist_ok=True)

In [33]:
path_to_move.replace(a_path / path_to_move.name)

PosixPath('/Users/vigji/new_container_folder/new_folder')

## Automatically organize data folders

Every time you end up manually moving and renaming files, consider doing it programmatically!

In [34]:
data_folder = Path("/Users/vigji/sample_data_folder")  # original data folder

new_data_folder = Path("/Users/vigji/reorganized_data_folder")  # new reorganized folder
new_data_folder.mkdir(exist_ok=True)

In [35]:
for file in data_folder.glob("*"):
    subject, session = file.stem.split("_")  # remove extension and separate using _
    new_location = new_data_folder / subject / session  # create new path using subject and session
    new_location.mkdir(exist_ok=True, parents=True)  # if necessary, create also parent folder
    
    file.replace(new_location / file.name)  # move the file

ValueError: not enough values to unpack (expected 2, got 1)

You can also automatically create dataframes from the filesystem structure as we go! (in the practical)

(Practical 2.1.2)