# H2O Drive
This tutorial walks through how to import data that a user has uploaded into the H2O Drive app to use in your own apps or notebooks. It also shows how to save data so that you can access it in new versions of the app or notebook. 

**Requirement:** This code will only run from inside of an H2O AI Cloud system - it will fail for local development. 

In [1]:
CLIENT_ID = "q8s-internal-platform"
TOKEN_ENDPOINT = "https://auth.demo.h2o.ai/auth/realms/q8s-internal/protocol/openid-connect/token"
REFRESH_TOKEN = "https://cloud-internal.h2o.ai/auth/get-platform-token"

DRIVE_URL = "http://drive-internal-drive-service.drive-internal:8081"

In [2]:
# !pip install https://s3.amazonaws.com/artifacts.h2o.ai/releases/ai/h2o/drive/v0.5.1/python/h2o-drive-0.5.1.tar.gz
# !pip install h2o_authn

In [3]:
from getpass import getpass

import h2o_drive
import h2o_authn

import pandas as pd

## Securely connect to the platform
We first connect to the H2O AI Cloud using our personal access token to create a token provider object. We can then use this object to log into Drive and other APIs.

In [4]:
print(f"Visit {REFRESH_TOKEN} to get your personal access token")
tp = h2o_authn.TokenProvider(
    refresh_token=getpass("Enter your access token: "),
    client_id=CLIENT_ID,
    token_endpoint_url=TOKEN_ENDPOINT
)

Visit https://cloud-internal.h2o.ai/auth/get-platform-token to get your personal access token


Enter your access token:  ···········································································································································································································································································································································································································································································································································································································································································································································································································


## Access all Drive data for this user
For whichever user is logged in, presumably "you" in a Notebook but your app-users in a Wave App, list everything in the bucket that they own.

In [5]:
user_bucket = await h2o_drive.MyBucket(token=tp.as_async(), endpoint_url=DRIVE_URL)

In [6]:
user_files = await user_bucket.list_objects()

In [7]:
for f in user_files:
    print(f.key)

ai.h2o.drive/workspace/connector_credentials.json
apps/jupter-poc/workspace/h2o_drive-2022-03-30.ipynb
home/churnTest.csv
home/demo_datasets/Amazon_Reviews/amazon_reviews.csv
home/demo_datasets/Employee_Attrition/employee_attrition.csv
home/demo_datasets/Home_Prices/house_prices.csv
home/demo_datasets/S_P_Forecasting/sandp_test.csv
home/demo_datasets/S_P_Forecasting/sandp_train.csv
home/demo_datasets/Telco_Churn/telco_customer_churn.csv
home/demo_datasets/Walmart_Forecasting/walmart_test.csv
home/demo_datasets/Walmart_Forecasting/walmart_train.csv
jupter-poc/workspace/my_chanages_churnTest.csv


## User's uploaded data
All files uploaded by the user in the H2O Drive App. We filter by looking data in the `Home` workspace of this user. 

This is what you might show on an `Import Data` drop downlist in a wave app: 
```python
file_choices = [ui.choice(name=f.key, label=f.key) for f in user_drive_app_files]
```


In [8]:
user_drive_app_files = await user_bucket.workspace(h2o_drive.Workspace.HOME).list_objects()

for f in user_drive_app_files:
    print(f.key)

churnTest.csv
demo_datasets/Amazon_Reviews/amazon_reviews.csv
demo_datasets/Employee_Attrition/employee_attrition.csv
demo_datasets/Home_Prices/house_prices.csv
demo_datasets/S_P_Forecasting/sandp_test.csv
demo_datasets/S_P_Forecasting/sandp_train.csv
demo_datasets/Telco_Churn/telco_customer_churn.csv
demo_datasets/Walmart_Forecasting/walmart_test.csv
demo_datasets/Walmart_Forecasting/walmart_train.csv


### Download a dataset

In [9]:
file_name = user_drive_app_files[0].key.replace("home/", "")
await user_bucket.workspace(h2o_drive.Workspace.HOME).download_file(
    object_name=user_drive_app_files[0].key, 
    file_name=file_name
)

df = pd.read_csv(file_name)
df.head()

Unnamed: 0,State,Account_Length,Area_Code,Phone_No,International_Plan,Voice_Mail_Plan,No_Vmail_Messages,Total_Day_minutes,Total_Day_Calls,Total_Day_charge,Total_Eve_Minutes,Total_Eve_Calls,Total_Eve_Charge,Total_Night_Minutes,Total_Night_Calls,Total_Night_Charge,Total_Intl_Minutes,Total_Intl_Calls,Total_Intl_Charge,No_CS_Calls
0,HI,101.0,510.0,3548815,no,no,0,70.9,123.0,12.05,211.9,73.0,18.01,236.0,73.0,10.62,10.6,3.0,2.86,3
1,MT,137.0,510.0,3817211,no,no,0,223.6,86.0,38.01,244.8,139.0,20.81,94.2,81.0,4.24,9.5,7.0,2.57,0
2,OH,103.0,408.0,4119481,no,yes,29,294.7,95.0,50.1,237.3,105.0,20.17,300.3,127.0,13.51,13.7,6.0,3.7,1
3,NM,99.0,415.0,4189100,no,no,0,216.8,123.0,36.86,126.4,88.0,10.74,220.6,82.0,9.93,15.7,2.0,4.24,1
4,SC,108.0,415.0,4133643,no,no,0,197.4,78.0,33.56,124.0,101.0,10.54,204.5,107.0,9.2,7.7,4.0,2.08,2


## Data for this app
* **App Data**: Data you may want to have access to any time this user uses any version of this application
* **Version Data:** Data users want to access for any instance of this version of this app, will not be in other versions
* **Instance Data:** Data for this user to access in this specific instance of the app, for restarts

In [10]:
app_workspace = await user_bucket.workspace(h2o_drive.Workspace.APP).list_objects()
version_workspace = await user_bucket.workspace(h2o_drive.Workspace.APP_VERSION).list_objects()
instance_workspace = await user_bucket.workspace(h2o_drive.Workspace.APP_INSTANCE).list_objects()

print("Files for this App")
for f in app_workspace:
    print("\t", f.key)
    
print("Files for this Version")
for f in version_workspace:
    print("\t", f.key)
    
print("Files for this Instance")
for f in instance_workspace:
    print("\t", f.key)

Files for this App
	 my_chanages_churnTest.csv
Files for this Version
Files for this Instance


### Save a file 
Let's say the user edits our data some how and wants to save it for later. We _might_ save the data back to their Home directory if we are building a "edit data" app. But in this case let's say that they would want to use the edited data in future vesions of this specific app. 

In [9]:
df = pd.read_csv("churnTest.csv")
top_customers = df.head()

new_file_name = "my_changes_churnTest.csv"
top_customers.to_csv(new_file_name)

In [11]:
await user_bucket.workspace(h2o_drive.Workspace.APP).upload_file(
    file_name=new_file_name, 
    object_name=new_file_name
)