## Object storage using the Horizon GUI

- Go to “Experiment” > “CHI@TACC”
- Specify the name as object-persist-project33
- Leave other settings at their defaults, and click “Submit”

# Use rclone and authenticate to object store from a compute instance

For *write* access, we need to add our UserID and the Application Credential that is created through Horizon GUI -> “Identity” > “Application Credentials”.

Steps below:

On the compute instance, install rclone:

````
# run on node-persist
curl https://rclone.org/install.sh | sudo bash

````

````
# run on node-persist
# this line makes sure user_allow_other is un-commented in /etc/fuse.conf
sudo sed -i '/^#user_allow_other/s/^#//' /etc/fuse.conf
````

````
# run on node-persist
mkdir -p ~/.config/rclone
nano  ~/.config/rclone/rclone.conf
````

You will also need to substitute your own user ID. You can find it using “Identity” > “Users” in the Horizon GUI; it is an alphanumeric string (not the human-readable user name).

````
[chi_tacc]
type = swift
user_id = YOUR_USER_ID
application_credential_id = APP_CRED_ID
application_credential_secret = APP_CRED_SECRET
auth = https://chi.tacc.chameleoncloud.org:5000/v3
region = CHI@TACC
````

Use Ctrl+O and Enter to save the file, and Ctrl+X to exit nano.

To test it, run

````
# run on node-persist
rclone lsd chi_tacc:
````

# Create a pipeline to load training data into the object store

In your folder LLM_LegalDocSummarization (cd into it)

````
cd ~/data_pipeline
````

````
docker compose -f docker-compose-data.yaml run process-data
````

````
export RCLONE_CONTAINER=object-persist-project33
docker compose -f docker-compose-data.yaml run load-data
````

## Mount an object store to local file system

Now that our data is safely inside the object store, we can use it anywhere - on a VM, on a bare metal site, on multiple compute instances at once, even outside of Chameleon - to train or evaluate a model. We would not have to repeat the ETL pipeline each time we want to use the data.

If we were working on a brand-new compute instance, we would need to download rclone and create the rclone configuration file at ~/.config/rclone.conf, as we have above. Since we already done these steps in order to load data into the object store, we don’t need to repeat them.

** If accessing the object store from a new instance, connect to it;  download rclone and create the rclone configuration file at ~/.config/rclone.conf, as we have above**

The next step is to create a mount point for the data in the local filesystem:

````
# run on node-persist
sudo mkdir -p /mnt/object
sudo chown -R cc /mnt/object
sudo chgrp -R cc /mnt/object
````

Now finally, we can use rclone mount to mount the object store at the mount point

````
rclone mount chi_tacc:object-persist-project33 /mnt/object --read-only --allow-other --daemon
````

To confirm,

````
# run on node-persist
ls /mnt/object
````

Now, we can start a Docker container with access to that virtual “filesystem”, by passing that directory as a bind mount. Note that to mount a directory that is actually a FUSE filesystem inside a Docker container, we have to pass it using a slightly different --mount syntax, instead of the -v that we had used in previous examples.

````
# run on node-persist
docker run -d --rm \
  -p 8888:8888 \
  --shm-size 8G \
  -e LEGAL_DATA_DIR=/mnt/merged_dataset \
  -v ~/LLM_LegalDocSummarization/workspace:/home/project33/work/. \
  --mount type=bind,source=/mnt/object,target=/mnt/merged_dataset,readonly \
  --name jupyter \
  quay.io/jupyter/pytorch-notebook:latest

````
Then run,

````
# run on node-persist
docker logs jupyter

````
and look for a line like

http://127.0.0.1:8888/lab?token=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Paste this into a browser tab, but in place of 127.0.0.1, substitute the floating IP assigned to your instance, to open the Jupyter notebook interface that is running on your compute instance.

--do work--

Close the Jupyter container tab in your browser, and then stop the container with

````
# run on node-persist
docker stop jupyter
````

# Un-mount an object store

To stop rclone running and un-mount the object store, we would run

````
fusermount -u /mnt/object
````