# Accessing data

```{admonition} Summary
:class: hint

Python has many open-source libraries for accessing and importing data into your environment. 
This chapter explains 3 basic ways to add data in Python and thus in your Jupyter Notebook
- Local files
- API
- WFS

Note: If you want to learn more about other ways to access data in Python, please have a look here (add LINK)
```

## Local files 📁
Data is often stored in a file on your computer. However, depending on where you are running Python - whether on your own computer or in an online platform like JupyterLab - 'local' storage could refer to your computer's hard drive, a remote server, or a cloud service.

You can access the data with a relative or absolute path. 


 ⚠️ Please check - are the following explainations only valid for local file paths on the computer?

### Local file path
When specifying a file path in Python, there are three common formats:

- Double backslashes (`\\`) → `"C:\\Users\\file.json"`
- Raw string (prefix with `r`) → `r"C:\Users\file.json"`
- Forward slashes (`/`) → `r"C:/Users/file.json"`

`````{admonition} Note
:class: note
In Python, single backslashes (`\`) are treated as [escape characters](https://en.wikipedia.org/wiki/Escape_character), which is why using double backslashes (`\\`) or a raw string (`r""`) is necessary to avoid errors.
`````

`````{admonition} Use Python's pathlib
:class: tip
Use Python's [pathlib](https://docs.python.org/3/library/pathlib.html) to write code that works on any operating system, whether it's Windows, Mac, or Linux. This module simplifies file and folder management by providing a consistent, structured approach. By using pathlib, your code becomes more portable, allowing colleagues on different systems to run it without modification and reproduce your results."
`````

In [1]:
from pathlib import Path

base_path = Path.cwd().parents[0]
# Path.cwd() gets the current working directory (cwd), 
# meaning the folder where the Jupyter Notebook or Python script is running.
# .parents[0] moves one level up from the current working directory.

INPUT = base_path / "00_data" # refers to the folder path named "00_data"
INPUT.mkdir(exist_ok=True) # creates the "00_data" folder if it doesn't already exist.


### Defining parameters at the beginning of a Jupyter Notebook

It is a good idea to define parameters at the beginning of a Jupyter Notebook (typically in the first few code cells), using CAPITAL LETTER to make them easily identifiable.


`````{admonition} 
:class: tip
In Python, variables written in all CAPITAL LETTERS are commonly referred to as constants. Learn more here: [Python Constants: Improve Your Code's Maintainability](https://realpython.com/python-constants/).

**Why should I use constants?**

Using constants in a Python script or Jupyter Notebook helps to:

- Reuse frequently used variables that should not change.
- Make key parameters easy to find and modify.
- Follows a common Python naming convention.
`````

### Example of using constants and pathlip 

In [None]:
from pathlib import Path

# Defining constants at the top of the notebook

base_path = Path.cwd().parents[0]  # Moves one level up from the notebook / folder
OUTPUT = base_path / "out"         # Directory for saving figures, reports, etc.
WORK_DIR = base_path / "tmp"       # Working directory

 ⚠️ I do not understand why base_path is not in CAPITAL LETTERS?

In [None]:
current_path = Path.cwd()
mydata = current_path.parents[0] / "00_data" / "LBM2018IS_DD.json"

Since we defined `INPUT` already above, this can be shortened.

In [2]:
mydata = INPUT / "LBM2018IS_DD.json"

Compare an example for an absolute path in Windows:
```python
file_path = Path("C:\\Users\\Fatem\\files\\LBM2018IS_DD.json")
```

 ⚠️ In this example, the file "00_data" is missing. 

`````{admonition} pathlib not always supported
:class: note
Some older Python packages do not support paths from pathlib. For these cases, convert the pathlib object to a string first (e.g. `str(mydata)`).
`````

There are some convenient functions available. For instance, to get the size:

In [37]:
size = mydata.stat().st_size
size

84024118

Convert it to Megabyte, and format to showing two decimals by using f-strings.


In [45]:
size_gb = size / 1024 / 1024
print(f'{size_gb:.2f} MB')

80.13 MB


If you do not know whether a variable is a string or a pathlib object, use the jupyter `?`.


In [47]:
?mydata

[0;31mType:[0m        PosixPath
[0;31mString form:[0m /home/jovyan/work/nfdi4biodiversity/00_data/LBM2018IS_DD.json
[0;31mFile:[0m        /opt/conda/envs/worker_env/lib/python3.12/pathlib.py
[0;31mDocstring:[0m  
Path subclass for non-Windows systems.

On a POSIX system, instantiating a Path should return this object.

## Old paths

Before we can start, we need to get the working data for this section. First, create a local working directory.

In [16]:
from pathlib import Path

base_path = Path.cwd().parents[0]
INPUT = base_path / "00_data"
INPUT.mkdir(exist_ok=True)

Download the data from remote. We are using (_importing_) a method `tools.get_zip_extract()` that has been prepared for this book to fetch remote data.

In [21]:
import sys

module_path = str(base_path / "py")
if module_path not in sys.path:
    sys.path.append(module_path)
from modules import tools

sample_data_url = 'https://datashare.tu-dresden.de/s/KEL6bZMn6GegEW4/download'

tools.get_zip_extract(
    uri_filename=sample_data_url,
    output_path=INPUT,
    write_intermediate=True)

Loaded 81.17 MB of 81.18 (100%)..
Extracting zip..
Retrieved download, extracted size: 109.24 MB


⚠️ Is this too complicated if readers do not have the module `tools.get_zip_extract()`

 `````{admonition} Upload data
:class: tip
If you want to work with data stored on your local computer, but your jupyter service runs somewhere else, use drag & drop to add data to Jupyter. The left explorer view is is comparable to Windows Explorer.
`````

 ⚠️ I do not understand the upload data tip

## API


For retriving data from an API the package `request` is needed.

In [2]:
import requests

Then the path including the URL string is imported.

In [3]:
path="http://dataverse-test.ioer.de:8080/api/access/datafile/344"

Then using the GET method a request send to the path to access the data from the server. 

In [None]:
response = requests.get(path)

Based on the format of the data in the API, the response should be converted. To findout the response format, check the `Content Type`.

In [None]:
print(response.headers["Content-Type"])

In the following example, it converts the response of JSON format to a Python data structures and the data stored in a data variable `api_data`

In [113]:
api_data = response.json()

### Access WFS data
