In [1]:
__author__ = 'Alice Jacques <alice.jacques@noao.edu>, NOIRLab Astro Data Lab Team <datalab@noao.edu>'
__version__ = '20200915'
__keywords__ = ['vospace','mydb','store files','query']

# How to use the Data Lab *Command Line Client* Service

### Table of Contents

* [Summary](#summary)
* [Disclaimer & attribution](#attribution)
* [Imports & setup](#imports)
* [Handling VOSpace directories/files via the datalab command line client](#datalabcommand)
    - [Uploading a file](#cmdupload)
    - [Downloading a file](#cmddownload)
    - [Copying a file/directory](#cmdcopy)
    - [Linking a file/directory](#cmdlink)
    - [Creating a directory](#cmdcreate)
    - [Moving a file/directory](#cmdmove)
    - [Deleting a file](#cmddeletefile)
    - [Deleting a directory](#cmddeletedirectory)
    - ~[Tagging a file/directory](#cmdtag)~
* [Handling MyDB tables via the datalab command line client](#datalabcmdmydb)
    - [Listing MyDB tables and a table's schema](#listcmdmydb)
    - [Creating a MyDB table](#createcmdmydb)
    - [Inserting data into a MyDB table](#insertcmdmydb)
    - [Importing data into a MyDB table](#importcmdmydb)
    - [Truncating a MyDB table](#truncmdmydb)
    - [Copying a MyDB table](#copycmdmydb)
    - [Renaming a MyDB table](#renamecmdmydb)
    - [Dropping a MyDB table](#dropcmdmydb)
    
*Note: those that are crossed out above indicate this feature is currently not working.*    

<a class="anchor" id="summary"></a>
# Summary

This notebook documents how to use the `datalab` command line to access the Data Lab virtual storage system VOSpace and a user's personal MyDB database. The `datalab` command provides an alternate command line way to work with the auth client, query client, and store client. The API documentation can be found [here](https://datalab.noao.edu/docs/manual/UsingTheNOAODataLab/CommandLineTools/TheDatalabCommand/TheDatalabCommand.html).

<a class="anchor" id="attribution"></a>
# Disclaimer & attribution
If you use this notebook for your published science, please acknowledge the following:

* Data Lab concept paper: Fitzpatrick et al., "The NOAO Data Laboratory: a conceptual overview", SPIE, 9149, 2014, http://dx.doi.org/10.1117/12.2057445

* Data Lab disclaimer: http://datalab.noao.edu/disclaimers.php

<a class="anchor" id="imports"></a>
# Imports & setup

In [2]:
# Use Data Lab's queryClient to create an example file in this notebook
from dl import queryClient as qc
# Use to convert table to Pandas dataframe object
from dl.helpers.utils import convert

In [3]:
# temp
qc.set_svc_url('http://dltest.datalab.noao.edu/query')

The Data Lab Command Line Client is a Python-based package that provides an alternate way to interact with the various Data Lab services. It can be installed with 

    pip install --ignore-installed --no-cache-dir noaodatalab
    
It is invoked via the `datalab` command. 

We need to be logged into the Data Lab to use the query client and store client. If you are not already logged in, enter your Data Lab username after '*user=*' and enter your password for Data Lab after '*password=*' below:

In [4]:
!datalab login user=ajacques_dltest password=

User 'ajacques_dltest' is already logged in to the Data Lab


<a class="anchor" id="datalabcommand"></a>
# 1. Handling directories/files via the datalab command line client

The `datalab` command provides a way to use a user's VOSpace. VOSpace is a convenient virtual storage space for users to save their work. It can store any data or file type.

Before we start this section, let's first query some example data from a Data Lab database and save it locally as a CSV file named `smags.csv`:

In [5]:
query = 'SELECT gmag, imag, rmag, zmag FROM smash_dr1.object LIMIT 10'
qc.query(adql=query,fmt='csv',out='./smags.csv')

'OK'

<a class="anchor" id="cmdupload"></a>
### 1.1 Uploading a file

Let's say we want to upload a file from our local disk to the virtual storage:

In [6]:
!datalab put fr="./smags.csv" to="vos://smags.csv"

We can check that it has been uploaded to VOSpace:

In [7]:
!datalab ls name="vos://smags.csv"

smags.csv


<a class="anchor" id="cmddownload"></a>
### 1.2 Downloading a file

Let's say we want to download a file from our virtual storage space, in this case the CSV file that we uploaded to it in the last cell:

In [8]:
!datalab get fr="vos://smags.csv" to="./mysmags.csv"

<a class="anchor" id="cmdcopy"></a>
### 1.3 Copying a file/directory

We want to put a copy of the file in a remote work directory:

In [9]:
!datalab cp fr="vos://smags.csv" to="vos://tmp/smags.csv"

<a class="anchor" id="cmdlink"></a>
### 1.4 Linking to a file/directory

Sometimes we want to create a link to a file or directory. The following creates a (soft) link to the specified file at the given location:

In [10]:
!datalab ln fr="vos://tmp/mags.csv" to="vos://mags.csv"

<a class="anchor" id="cmdlist"></a>
### 1.5 Listing a file/directory

We can see all the files that are in a specific directory or get a full listing for a specific file:

In [11]:
!datalab ls name="vos://tmp"

smags.csv


<a class="anchor" id="cmdcreate"></a>
### 1.6 Creating a directory

We can create a directory:

In [12]:
!datalab mkdir name="vos://results"

<a class="anchor" id="cmdmove"></a>
### 1.7 Moving a file/directory

We can move a file or directory:

In [13]:
!datalab mv fr="vos://tmp/smags.csv" to="vos://results"

<a class="anchor" id="cmddeletefile"></a>
### 1.8 Deleting a file

We can delete a file:

In [14]:
!datalab rm name="vos://results/smags.csv"

<a class="anchor" id="cmddeletedirectory"></a>
### 1.9 Deleting a directory

We can also delete a directory:

In [15]:
!datalab rmdir name="vos://results"

<a class="anchor" id="cmdtag"></a>
### ~1.10 Tagging a file/directory~
**Warning**: Tagging is currently **not** working in the Data Lab storage manager. This notebook will be updated when the problem has been resolved.

We can tag any file or directory with arbitrary metadata:

In [16]:
#!datalab tag name="vos://results" tag="The results from my analysis"

<a class="anchor" id="datalabcmdmydb"></a>
# 2. Handling MyDB tables via the datalab command line client
The `datalab` command provides a way to use a user's MyDB database. MyDB is a useful virtual storage space for users to save their work as a table. It can only store data tables. **_NOTE: The data must be in the form of either a CSV file or Pandas Dataframe object in order to load it into MyDB._**

<a class="anchor" id="listcmdmydb"></a>
### 2.1 Listing MyDB tables and a table's schema

We can list all of the MyDB tables currently in a user's database with the `mydb_list` function:

In [17]:
!datalab mydb_list

spec0436SPZ,created:2020-09-11 22:10:33 UTC
spec0436a,created:2020-09-11 22:21:17 UTC
spec,created:2020-09-15 00:37:51 UTC
scan_results,created:2020-09-04 19:08:58 UTC
magsvos,created:2020-09-15 16:30:55 UTC
table1,created:2020-09-15 16:45:54 UTC
photobj,created:2020-09-15 22:08:19 UTC
testresult2,created:2020-08-28 19:18:13 UTC
spec0436,created:2020-09-11 21:49:15 UTC



We can also list the schema and schema's datatype in a specified MyDB table:

In [18]:
!datalab mydb_list table="usno_objects"

relation "usno_objects" not known


<a class="anchor" id="createcmdmydb"></a>
### 2.2 Creating a MyDB table 
We can create a new empty MyDB table with a user-provided schema file using the `mydb_create` function with the following parameters:

*table* - name of the new MyDB table to create  
*schema* - location and name of the schema definition to be in the table

The schema definition is stored in a text file, in this case in the user notebook directory. The schema definition file is a CSV-formatted file that contains column name and (Postgres) data type, one row per column. The general format is:

`Columnname1,datatype1\nColumnname2,datatype2\nColumnname3,datatype3`

Let's first create a simple (id,ra,dec) schema of an integer value and two double values and save it locally as a text file named `schema.txt`:

In [19]:
schema_str = 'id,integer\nra,double precision\ndec,double precision\n'
with open ('schema.txt','w') as fd:
    fd.write (schema_str)

Now let's use the `mydb_create` function to make a new table in MyDB with the schema definition we created above:

In [20]:
!datalab mydb_create table="table1" schema="./schema.txt"

OK


Let's make sure the table was created in MyDB by calling the `mydb_list` function:

In [21]:
!datalab mydb_list

spec0436SPZ,created:2020-09-11 22:10:33 UTC
spec0436a,created:2020-09-11 22:21:17 UTC
spec,created:2020-09-15 00:37:51 UTC
scan_results,created:2020-09-04 19:08:58 UTC
magsvos,created:2020-09-15 16:30:55 UTC
photobj,created:2020-09-15 22:08:19 UTC
table1,created:2020-09-22 22:22:47 UTC
testresult2,created:2020-08-28 19:18:13 UTC
spec0436,created:2020-09-11 21:49:15 UTC



Let's also make sure the schema was loaded into the table by calling the `mydb_list` function on the table:

In [22]:
!datalab mydb_list table="table1"

id,integer,
ra,double precision,
dec,double precision,



<a class="anchor" id="insertcmdmydb"></a>
### 2.3 Inserting data into a MyDB table

We can choose to insert data saved on a local computer or insert data from VOSpace into a pre-existing MyDB table. The data must be in the form of either a CSV file or Pandas Dataframe object in order to load it into MyDB. Use the `mydb_insert` function and the following parameters:  

*table* - name of the pre-existing MyDB table in which to insert the data  
*data* - location and name of the data to insert into the table

We will use the `exampledata.csv` file provided in this notebook directory as our data to insert into the `table1` table we created a few cells above:

In [23]:
!datalab mydb_insert table="table1" data="./exampledata.csv"

OK


**IN THIS NOTEBOOK (not on command line):** Let's make sure the data was inserted into the table by converting the table into a Pandas Dataframe and printing it on-screen:

In [24]:
df1=convert(qc.query(sql="SELECT * FROM mydb://table1"))
df1

Unnamed: 0,id,ra,dec


<a class="anchor" id="importcmdmydb"></a>
### 2.4 Importing data into a MyDB table
We can import data saved on a local computer or import data from VOSpace into a MyDB data table. The data must be in the form of either a CSV file or Pandas Dataframe object in order to load it into MyDB. Use the `mydb_import` function with the following parameters: 

*table* - name of the new MyDB table to create with the imported data  
*data* - location and name of the data to import  

Let's first query some example data from a Data Lab database and save it locally as a CSV file named `gaia_result.csv`:

In [25]:
query = "select * from gaia_dr1.gaia_source limit 10"
qc.query (adql=query, fmt='csv', out='./gaia_result.csv')

'OK'

Now we can import the queried data into a new MyDB table:

In [26]:
!datalab mydb_import table="gaia_result_table" data="./gaia_result.csv"

OK


Let's make sure the table was created in MyDB by calling the `mydb_list` function:

In [27]:
!datalab mydb_list

spec0436SPZ,created:2020-09-11 22:10:33 UTC
spec0436a,created:2020-09-11 22:21:17 UTC
spec,created:2020-09-15 00:37:51 UTC
scan_results,created:2020-09-04 19:08:58 UTC
magsvos,created:2020-09-15 16:30:55 UTC
photobj,created:2020-09-15 22:08:19 UTC
table1,created:2020-09-22 22:22:47 UTC
gaia_result_table,created:2020-09-22 22:23:04 UTC
testresult2,created:2020-08-28 19:18:13 UTC
spec0436,created:2020-09-11 21:49:15 UTC



**IN THIS NOTEBOOK (not on command line):** Let's also make sure the data was imported into the table by converting the table into a Pandas Dataframe and printing it on-screen:

In [28]:
df2=convert(qc.query(sql="SELECT * FROM mydb://gaia_result_table"))
df2

Unnamed: 0,ring256,solution_id,source_id,random_index,htm9,nest256,ref_epoch,ra,ra_error,dec,...,scan_direction_mean_k4,phot_g_n_obs,phot_g_mean_flux,phot_g_mean_flux_error,phot_g_mean_mag,phot_variable_flag,l,b,ecl_lon,ecl_lat
0,,380766208,23179905444265169,843169876263567360,2408475,,2015,116.290939,3.405656,-23.164752,...,18.818005,77,236.328243,2.200514,19.590981,NOT_AVAILABLE,239.515537,0.61239,124.174458,-43.534636
1,,380766208,4351318574415056,2598032507693694976,2408475,,2015,116.290231,0.246735,-23.164576,...,10.967476,113,1935.924665,4.577411,17.307549,NOT_AVAILABLE,239.515059,0.611914,124.173528,-43.534623
2,,380766208,4385953190690000,887301575763034112,2408475,,2015,116.291572,1.255318,-23.161752,...,9.119689,84,123.182389,1.431457,20.298399,NOT_AVAILABLE,239.513232,0.614394,124.17423,-43.531585
3,,380766208,4378806365109368,3076843072533823488,2408475,,2015,116.285239,0.153873,-23.162769,...,3.36632,131,2734.271919,3.688565,16.932666,NOT_AVAILABLE,239.511197,0.608845,124.166781,-43.533986
4,,380766208,4399697086037200,1609310944297484288,2408475,,2015,116.289244,1.458838,-23.160597,...,8.174327,95,112.689561,1.341041,20.395061,NOT_AVAILABLE,239.51116,0.61312,124.170979,-43.530984
5,,380766208,4373858562784544,3862989939318718464,2408475,,2015,116.254219,6.171484,-23.162376,...,21.202408,69,198.763997,2.069524,19.778926,NOT_AVAILABLE,239.496583,0.584351,124.128486,-43.54053
6,,380766208,4407393667431712,632482909848076288,2408475,,2015,116.255381,1.086849,-23.160792,...,13.260841,101,656.460955,3.903377,18.481748,NOT_AVAILABLE,239.495746,0.586068,124.129384,-43.538735
7,,380766208,4434881458126128,3212738582263365632,2408475,,2015,116.253264,0.832638,-23.159074,...,25.699459,86,4388.125085,10.672744,16.419073,NOT_AVAILABLE,239.493286,0.585243,124.126205,-43.537541
8,,380766208,4430483411615008,3483203663713796096,2408475,,2015,116.257002,1.024042,-23.159237,...,14.362892,105,986.275922,2.528124,18.039774,NOT_AVAILABLE,239.495146,0.588136,124.130858,-43.536865
9,,380766208,4456321934867744,3753117769579626496,2408473,,2015,116.257122,6.084146,-23.157678,...,21.201269,61,363.494481,2.59896,19.123526,NOT_AVAILABLE,239.493851,0.589012,124.130484,-43.535325


Similarly, we can use the `mydb_import` function to import data from VOSpace into a MyDB table:

In [29]:
!datalab mydb_import table="magsvos" data="vos://smags.csv"

OK


Let's make sure the table was created in MyDB by calling the `mydb_list` function:

In [30]:
!datalab mydb_list

spec0436SPZ,created:2020-09-11 22:10:33 UTC
spec0436a,created:2020-09-11 22:21:17 UTC
spec,created:2020-09-15 00:37:51 UTC
scan_results,created:2020-09-04 19:08:58 UTC
photobj,created:2020-09-15 22:08:19 UTC
table1,created:2020-09-22 22:22:47 UTC
gaia_result_table,created:2020-09-22 22:23:04 UTC
magsvos,created:2020-09-22 22:23:14 UTC
testresult2,created:2020-08-28 19:18:13 UTC
spec0436,created:2020-09-11 21:49:15 UTC



We can view the schema definition of the table by calling the `mydb_list` function on the table:

In [31]:
!datalab mydb_list table="magsvos"

gmag,double precision,
imag,double precision,
rmag,double precision,
zmag,double precision,



**IN THIS NOTEBOOK (not on command line):** Let's also make sure the data was imported into the table by converting the table into a Pandas Dataframe and printing it on-screen:

In [32]:
df3=convert(qc.query(sql="SELECT * FROM mydb://magsvos"))
df3

Unnamed: 0,gmag,imag,rmag,zmag
0,24.859207,23.768522,24.14867,23.900684
1,25.097267,24.406269,24.357933,99.989998
2,25.083416,24.010031,24.611797,99.989998
3,25.379248,24.756306,99.989998,23.832212
4,24.923378,23.779806,24.037075,24.031853
5,24.816929,24.573749,24.496265,23.736013
6,25.039248,99.989998,99.989998,99.989998
7,24.665981,24.532278,24.336454,23.703526
8,25.134247,99.989998,99.989998,99.989998
9,24.831894,23.679804,24.246521,23.322733


<a class="anchor" id="truncmdmydb"></a>
### 2.5 Truncating a MyDB table 
            
We can truncate a MyDB table, i.e. drop all rows but keep the table definition (schema), with the `mydb_truncate` function and the following parameter:

*table* - name of the MyDB table to truncate

In [33]:
!datalab mydb_truncate table="table1" 

OK


Let's make sure the table was truncated by calling the `mydb_list` function on the table: 

In [34]:
!datalab mydb_list table="table1"

id,integer,
ra,double precision,
dec,double precision,



**IN THIS NOTEBOOK (not on command line):** We can also make sure the table was truncated by converting the table into a Pandas Dataframe and printing it on-screen:

In [35]:
df4=convert(qc.query(sql="SELECT * FROM mydb://table1"))
df4

Unnamed: 0,id,ra,dec


<a class="anchor" id="copycmdmydb"></a>
### 2.6 Copying a MyDB table
We can copy a MyDB table that currently exists in a user's MyDB database with the `mydb_copy` function and the following parameters:

*source* - name of table to copy  
*target* - name of new table with copied data from source table

In [36]:
!datalab mydb_copy source="magsvos" target="magsvos_copy"

OK


Let's make sure the newly copied table exists in MyDB database by calling the `mydb_list` function:

In [37]:
!datalab mydb_list

spec0436SPZ,created:2020-09-11 22:10:33 UTC
spec0436a,created:2020-09-11 22:21:17 UTC
spec,created:2020-09-15 00:37:51 UTC
scan_results,created:2020-09-04 19:08:58 UTC
photobj,created:2020-09-15 22:08:19 UTC
table1,created:2020-09-22 22:22:47 UTC
gaia_result_table,created:2020-09-22 22:23:04 UTC
magsvos,created:2020-09-22 22:23:14 UTC
magsvos_copy,
testresult2,created:2020-08-28 19:18:13 UTC
spec0436,created:2020-09-11 21:49:15 UTC



<a class="anchor" id="renamecmdmydb"></a>
### 2.7 Renaming a MyDB table
We can choose to rename a MyDB table with the `mydb_rename` function and the following parameters:

*old* - name of table to rename  
*new* - new name of table

In [38]:
!datalab mydb_rename old="magsvos_copy" new="newermagsvos"

OK


Let's make sure the name was changed by calling the `mydb_list` function:

In [39]:
!datalab mydb_list

spec0436SPZ,created:2020-09-11 22:10:33 UTC
spec0436a,created:2020-09-11 22:21:17 UTC
spec,created:2020-09-15 00:37:51 UTC
scan_results,created:2020-09-04 19:08:58 UTC
photobj,created:2020-09-15 22:08:19 UTC
table1,created:2020-09-22 22:22:47 UTC
gaia_result_table,created:2020-09-22 22:23:04 UTC
magsvos,created:2020-09-22 22:23:14 UTC
newermagsvos,
testresult2,created:2020-08-28 19:18:13 UTC
spec0436,created:2020-09-11 21:49:15 UTC



<a class="anchor" id="dropcmdmydb"></a>
### 2.8 Dropping a MyDB table
We can remove a MyDB table from a user's MyDB database by calling the `mydb_drop` function and the following parameter:

*table* - name of the table we wish to remove from MyDB database

In [40]:
!datalab mydb_drop table="newermagsvos"

OK


Let's make sure the MyDB table was dropped by calling the `mydb_list` function:

In [41]:
!datalab mydb_list

spec0436SPZ,created:2020-09-11 22:10:33 UTC
spec0436a,created:2020-09-11 22:21:17 UTC
spec,created:2020-09-15 00:37:51 UTC
scan_results,created:2020-09-04 19:08:58 UTC
photobj,created:2020-09-15 22:08:19 UTC
table1,created:2020-09-22 22:22:47 UTC
gaia_result_table,created:2020-09-22 22:23:04 UTC
magsvos,created:2020-09-22 22:23:14 UTC
testresult2,created:2020-08-28 19:18:13 UTC
spec0436,created:2020-09-11 21:49:15 UTC



# Clean up MyDB and VOSpace
For clean-up purposes, let's remove the tables we created in MyDB and files/directories we created in VOSpace.

In [42]:
!datalab mydb_drop table="gaia_result_table"
!datalab mydb_drop table="table1"
!datalab rm name="vos://smags.csv"
!datalab rm name="vos://mysmags.csv"
!datalab rmdir name="vos://tmp"

OK
OK
