## How to use the Data Lab *Storage Manager* Service

*Revised:  May 5, 2017*

This notebook documents how to use the Data Lab virtual storage system via the storage manager service. This can be done either from a Python script or from the command line using the <i>datalab</i> command.

### The storage manager service interface

The storage manager service simplifies access to the Data Lab virtual storage system. This section describes the storage manager service interface in case we want to write our own code against that rather than using one of the provided tools. The storage manager service accepts an HTTP GET call to the appropriate endpoint for the particular operation:

| Endpoint | Description | Parameters |
|----------|-------------|------------|
| /get | Retrieve a file | name |
| /put | Upload a file | name |
| /cp | Copy a file/directory | from, to |
| /ln | Link a file/directory | from, to |
| /ls | Get a file/directory listing | name |
| /mkdir | Create a directory | name |
| /mv | Move/rename a file/directory | from, to |
| /rm | Delete a file | name |
| /rmdir | Delete a directory | name |
| /tag | Annotate a file/directory | name, tag |

For example, /get?name=vos://mag.csv

#### Virtual storage identifiers

Files in the virtual storage are identified via a "vos://" label. This resolves to the home area of our space.  Navigation above our home area is not supported.

#### Authentication
The storage manager service requires a DataLab security token. This needs to be passed as the value of the header keyword "X-DL-AuthToken" in any HTTP GET call to the service. 

### From Python code

The storage manager service can be called from Python code using the <i>datalab</i> module. This provides methods to access the various query manager functions in the <i>storeMgr</i> subpackage. 

#### Initialization
This is the setup that is required to use the storage manager. The first thing to do is import the relevant Python modules and also retrieve our DataLab security token (remember that this has to be included in any call to the storage manager service).

In [1]:
# Standard notebook imports
from __future__ import print_function
import getpass
from dl import authClient, storeClient

In [2]:
# Get the security token for the demo00 user
token = authClient.login ('demo00',getpass.getpass('Account password: '))
if not authClient.isValidToken (token):
    print ('Error: invalid user login (%s)' % token)
else:
    print ("Login token:   %s" % token)

Account password: ········
Login token:   demo00.1018.1018.$1$Orf0pq3.$Z8V7xvbuP/J5nZLzWr1rd0


#### The storeMgr class

All queries are executed through the relevant methods of the <i>storeMgr</i> class:

| Method | Description | Arguments |
|----------|-------------|----------------|
| get | Retrieve a file | name, location |
| put | Upload a file | name, location |
| cp | Copy a file/directory | fr, to |
| ln | Create a link to a file/directory | fr, target |
| ls | Get a file/directory listing | name |
| mkdir | Create a directory | name |
| mv | Move/rename a file/directory | fr, to |
| rm | Delete a file | name |
| rmdir | Delete a directory | name |
| tag| Tag a file/directory | name, tag |

#### Listing a file/directory

We can see all the files that are in a specific directory or get a full listing for a specific file.  In this case, we'll list the default virtual storage directory to use as a basis for changes we'll make below.

In [8]:
listing = storeClient.ls (token, name = 'vos://')
print (listing)

public,tmp


The *public* directory show here is visible to all Data Lab users and provides a means of sharing data without having to setup special access.  Similarly, the *tmp* directory is read-protected and provides a convenient temporary directory to be used in a workflow.

#### Uploading a file

Now we want to upload a new data file from our local disk to the virtual storage:

In [25]:
storeClient.put (token, to = 'vos://newmags.csv', fr = './newmags.csv')
storeClient.ls (token, name='vos://')

(1 / 1) ./newmags.csv -> vos://newmags2.csv


'gavo1.csv,gavo26.csv,gavo27.csv,gavo28.csv,newmags2.csv,public,tmp,zgavo28.csv'

#### Downloading a file

Let's say we want to download a file from our virtual storage space, in this case a query result that we saved to it in the "How to use the Data Lab query manager service" notebook:

In [11]:
storeClient.get (token, fr = 'vos://newmags.csv', to = './mymags.csv')



'[<Response [200]>]'

It is also possible to get the contents of a remote file directly into your notebook by specifying the location as an empty string:

In [12]:
data = storeClient.get (token, fr = 'vos://newmags.csv', to = '')
print (data)

id,g,r,i
001,22.3,12.4,21.5
002,22.3,12.4,21.5
003,22.3,12.4,21.5
004,22.3,12.4,21.5
005,22.3,12.4,21.5
006,22.3,12.4,21.5
007,22.3,12.4,21.5



#### Creating a directory

We can create a directory on the remote storage to be used for saving data later:

In [13]:
storeClient.mkdir (token, name = 'vos://results')

'OK'

#### Copying a file/directory

We want to put a copy of the file in a remote work directory:

In [15]:
storeClient.mkdir (token, name = 'vos://temp')
print ("Before: " + storeClient.ls (token, name='vos://temp/'))
storeClient.cp (token, fr = 'vos://newmags.csv', to = 'vos://temp/newmags.csv')
print ("After: " + storeClient.ls (token, name='vos://temp/'))

Before: 
After: newmags.csv


Notice that in the *ls()* call we append the directory name with a trailing '/' to list the contents of the directory rather than the directory itself.

#### Linking to a file/directory

Sometimes we want to create a link to a file or directory.  In this case, the link named by the *'fr'* parameter is created and points to the file/container named by the *'target'* parameter.

In [16]:
storeClient.ln (token, fr = 'vos://mags.csv', target = 'vos://temp/newmags.csv')
print ("Root dir: " + storeClient.ls (token, name='vos://'))
print ("Temp dir: " + storeClient.ls (token, name='vos://temp/'))

Root dir: mags.csv,newmags.csv,public,results,temp,tmp
Temp dir: newmags.csv


#### Moving/renaming a file/directory

We can move a file or directory:

In [17]:
storeClient.mv(token, fr = 'vos://temp/newmags.csv', to = 'vos://results')
print ("Results dir: " + storeClient.ls (token, name='vos://results/'))

Results dir: 


#### Deleting a file

We can delete a file:

In [19]:
print ("Before: " + storeClient.ls (token, name='vos://'))
storeClient.rm (token, name = 'vos://mags.csv')
print ("After: " + storeClient.ls (token, name='vos://'))

Before: mags.csv,newmags.csv,public,results,temp,tmp
After: newmags.csv,public,results,temp,tmp


#### Deleting a directory

We can also delete a directory, doing so also deletes the contents of that directory:

In [20]:
storeClient.rmdir( token, name = 'vos://temp')

'OK'

#### Tagging a file/directory

We can tag any file or directory with arbitrary metadata:

In [21]:
storeClient.tag(token, name = 'vos://results', tag = 'The results from my analysis')

'OK'

NOTE: We need a method to retrieve tags or include them in the listing.

#### Cleanup the demo directory of remaining files

In [23]:
storeClient.rm (token, name = 'vos://newmags.csv')
storeClient.rm (token, name = 'vos://results')
storeClient.ls (token, name = 'vos://')

'public,tmp'

### Using the datalab command

The <i>datalab</i> command provides an alternate command line way to work with the query manager through the <i>query</i> subcommands.

#### Initialization
We need to be logged into the DataLab to use the query manager.

In [None]:
!datalab login user=demo00 password=...

#### Downloading a file

Let's say we want to download a file from our virtual storage space:

In [None]:
!datalab get fr="vos://mags.csv" to="./mags.csv"

#### Uploading a file

Now we want to upload a new data file from our local disk:

In [None]:
!datalab put fr="./newmags.csv" to="vos://newmags.csv"

#### Copying a file/directory

We want to put a copy of the file in a remote work directory:

In [None]:
!datalab cp fr="vos://newmags.csv" to="vos://temp/newmags.csv"

#### Linking to a file/directory

Sometimes we want to create a link to a file or directory:

In [None]:
!datalab ln fr="vos://temp/mags.csv" to="vos://mags.csv"

#### Listing a file/directory

We can see all the files that are in a specific directory or get a full listing for a specific file:

In [None]:
!datalab ls name="vos://temp"

#### Creating a directory

We can create a directory:

In [None]:
!datalab mkdir name="vos://results"

#### Moving/renaming a file/directory

We can move a file or directory:

In [None]:
!datalab mv fr="vos://temp/newmags.csv" to="vos://results"

#### Deleting a file

We can delete a file:

In [None]:
!datalab rm name="vos://temp/mags.csv"

#### Deleting a directory

We can also delete a directory:

In [None]:
!datalab rmdir name="vos://temp"

#### Tagging a file/directory

We can tag any file or directory with arbitrary metadata:

In [None]:
!datalab tag name="vos://results" tag="The results from my analysis"