# Adding data to the Notebook


## Preparations

Get the working data for this section. First, create a local working directory.

In [16]:
from pathlib import Path

base_path = Path.cwd().parents[0]
INPUT = base_path / "00_data"
INPUT.mkdir(exist_ok=True)

Download the data from remote. We are using (_importing_) a method `tools.get_zip_extract()` that has been prepared for this book to fetch remote data.

In [17]:
import sys

module_path = str(base_path / "py")
if module_path not in sys.path:
    sys.path.append(module_path)
from modules import tools

In [18]:
sample_data_url = 'https://datashare.tu-dresden.de/s/KEL6bZMn6GegEW4/download'

In [21]:
tools.get_zip_extract(
    uri_filename=sample_data_url,
    output_path=INPUT,
    write_intermediate=True)

Loaded 81.17 MB of 81.18 (100%)..
Extracting zip..
Retrieved download, extracted size: 109.24 MB


## Working with folders and paths

Define parameters at the top of the notebook, written in CAPITAL LETTERS.

This will allow:
- to reuse repeatedly used variables across the notebook
- help readers identify the important parameters that will change or affect outputs

In [22]:
from pathlib import Path

Example for referencing repeatedly used directories.

In [23]:
OUTPUT = Path.cwd().parents[0] / "out"       # output directory for figures (etc.)
WORK_DIR = Path.cwd().parents[0] / "tmp"     # Working directory

## Access local data

For accessing data, based on the format of the data different libraries and procedures can be used.

`````{admonition} Local?
:class: note
_Local_ here means from where you started your Jupyter Lab - it can be your server, your local drive, or any cloud service.
`````

`````{admonition} Upload data
:class: tip
If you want to work with data stored on your local computer, but your jupyter service runs somewhere else, use drag & drop to add data to Jupyter. The left explorer view is is comparable to Windows Explorer.
`````

## Path conventions

Paths can be formatted differently in Python, depending on the Operating System used.

- Two common formats for path are:

  1. Use double backslashes (\\): &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;  **ex:**  "C:\\Users\\file.json"


  2. Prefix the string with "r":  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;   **ex:**  r"C:\Users\file.json"

`````{admonition} backslashes (`\`)
:class: note
The backslashes (\) in the file path interpreted as escape characters in Python strings. 
`````

`````{admonition} ToDo: Comment from Alex...
:class: tip
Use Python's [pathlib](https://docs.python.org/3/library/pathlib.html), to write code that is independent of the operating system used (Windows, MAC, or Linux). This allows colleagues **working** in other systems to directly use your code and reproduce your results.
`````

```python
from pathlib import Path
```

Relative from the current working directory:

In [25]:
current_path = Path.cwd()
mydata = current_path.parents[0] / "00_data" / "LBM2018IS_DD.json"

Since we defined `INPUT` already above, this can be shortened.

In [26]:
mydata = INPUT / "LBM2018IS_DD.json"

Compare an example for an absolute path (Windows):
```python
file_path = Path("C:\\Users\\Fatem\\files\\LBM2018IS_DD.json")
```

Afterwards, continue working with the data in Python.

`````{admonition} pathlib not always supported
:class: note
Some older Python packages do not support paths from pathlib. For these cases, convert the pathlib object to a string first (e.g. `str(mydata)`).
`````

There are some convenient functions available. For instance, to get the size:

In [37]:
size = mydata.stat().st_size
size

84024118

Convert it to Megabyte, and format to showing two decimals by using [f-strings](https://docs.python.org/3/reference/lexical_analysis.html#f-strings).

In [45]:
size_gb = size / 1024 / 1024
print(f'{size_gb:.2f} MB')

80.13 MB


If you do not know whether a variable is a string or a pathlib object, use the jupyter `?`.

In [47]:
?mydata

[0;31mType:[0m        PosixPath
[0;31mString form:[0m /home/jovyan/work/nfdi4biodiversity/00_data/LBM2018IS_DD.json
[0;31mFile:[0m        /opt/conda/envs/worker_env/lib/python3.12/pathlib.py
[0;31mDocstring:[0m  
Path subclass for non-Windows systems.

On a POSIX system, instantiating a Path should return this object.

### Access data via API 


For retriving data from an API the package `request` is needed.

In [30]:
import requests

Then the path including the URL string is imported.

In [31]:
path="http://dataverse-test.ioer.de:8080/api/access/datafile/344"

Then using the GET method a request send to the path to access the data from the server. 

In [32]:
response = requests.get(path)

Then based on the format of the data in the API, the response should be converted. In the following example, it converts the JSON format to a dictionary and the data stored in the data variable.

In [113]:
api_data = response.json()

### Access WFS data
