# First steps with DigitalHub

This notebook supports the first steps with the platform:
* create a project
* create and run a function
* write artifacts

## Project initialization

Create a project: a dedicated space where we can manage functions, artifacts and executions.

Use the ``username`` as param to create a personal project in the shared space.

In [None]:
import digitalhub as dh
import os

project = dh.get_or_create_project(f"my-test-project-{os.environ['USER']}")
project

## 1. Create a function and run it locally

A simple hello world script can be registered as a *function* and executed via sdk. We need to define:

* the source code
* the name of the function
* the `handler`: the function to be called




In [None]:
%%writefile "hellojob.py"

def hello():
    print("Hello Job!")

In [None]:
hello()

In [None]:
func = project.new_function(name="hello-job",
                            kind="python",
                            python_version="PYTHON3_10",
                            code_src="hellojob.py",
                            handler="hello")

In [None]:
run = func.run("job", wait=True, local_execution=True)

In [None]:
# remote execution: follow on the console the progress
# run = func.run("job", wait=True, local_execution=True)



## 2. More complex example - geodata 

Do some geodata exploration using public data

In [None]:
%pip install geopandas contextily

In [None]:
import geopandas 


url = "https://dati.meteotrentino.it/service.asmx/getHumidexGeoJson"
df = geopandas.read_file(url)
df.head()

In [None]:
import contextily as cx

ax = df.plot()
cx.add_basemap(ax, crs=df.crs)
ax.figure.savefig('foo.pdf')

### 2.1. Create and run function locally

Pick the code from notebook cells and define a function.
Then execute the function in the local env (`local=True`)

In [None]:
%%writefile "hellogeo.py"

import geopandas 
import contextily as cx

# def geoprocessing():

In [None]:
func = project.new_function(name="geo-job",
                            kind="python",
                            python_version="PYTHON3_10",
                            code_src="hellogeo.py",
                            handler="geoprocessing"
                           )

In [None]:
run = func.run("job", wait=True, local_execution=True)

### 2.2. Execute as remote

Execute the function as batch job (`local=False`).
Does it work?

If not, figure out the error and then fix the function definition.

### 2.3 Persist the output

Maybe we want to *persist* the result of the work...

* log as artifact
* return the artifact as function *output*

ref https://scc-digitalhub.github.io/sdk-docs/0.14/reference/objects/artifact/crud/#digitalhub.entities.artifact.crud.log_artifact

In [None]:
%%writefile "hellogeo.py"

import geopandas 
import contextily as cx

# def geoprocessing(project):
    ...
    return project.log_artifact(name="foo.pdf", kind="artifact", source="foo.pdf")


In [None]:
func = project.new_function(name="geo-job",
                            kind="python",
                            python_version="PYTHON3_10",
                            code_src="hellogeo.py",
                            handler="geoprocessing"
                           )

In [None]:

run = func.run("job", wait=True, local_execution=False)

## 3. Resource management

Compute resources are limited: in a shared environment we need to *request* the resources needed to execute the job.

ref
https://scc-digitalhub.github.io/sdk-docs/0.14/reference/configuration/kubernetes/overview/

In [None]:
%%writefile "hellores.py"

import pandas as pd

def resources(project):
    # Define the size of the dataset
    num_rows = 40000000  # 40 million rows
    
    # Example DataFrame with inefficient datatypes
    data = {'A': [1, 2, 3, 4],
            'B': [5.0, 6.0, 7.0, 8.0]}
    df = pd.DataFrame(data)
    
    # Replicate the DataFrame to create a larger dataset
    df_large = pd.concat([df] * (num_rows // len(df)), ignore_index=True)

In [None]:
func = project.new_function(name="resource-job",
                            kind="python",
                            python_version="PYTHON3_10",
                            code_src="hellores.py",
                            handler="resources"
                           )

In [None]:
run = func.run("job", wait=True, local_execution=False)

In [None]:
## try changing the resources: resources={"mem": "8Gi"}

In [None]:
## append file storage operation

# url ="https://huggingface.co/datasets/kitofrank/RFUAV/resolve/main/DEVENTION%20DEVO.rar"

# import urllib.request
# urllib.request.urlretrieve(url, "download.rar")

In [None]:
# set a volume and download to a specific path
# volumes = [{
#     "volume_type": "persistent_volume_claim",
#     "name": "my-pvc",
#     "mount_path": "/data",
#     "spec": {"size": "2Gi"}
# }]