# Storage Commands

Cloud Datalab provides a set of commands for working with data stored in Google Cloud Storage. This is especially interesting for working against data files containing data that is not in BigQuery, or to use it for managing data imported into or exported from BigQuery.

This notebook introduces various Storage commands that Cloud Datalab introduces into the notebook environment.

## The Commands

The commands cover the ability to list storage buckets, and the contained objects, manage them, as well as read from and write to those objects.

In [1]:
%%storage --help

usage: storage [-h] {copy,create,delete,list,read,view,write} ...

Execute various storage-related operations. Use "%storage <command> -h" for
help on a specific command.

positional arguments:
  {copy,create,delete,list,read,view,write}
                        commands
    copy                Copy one or more GCS objects to a different location.
    create              Create one or more GCS buckets.
    delete              Delete one or more GCS buckets or objects.
    list                List buckets in a project, or contents of a bucket.
    read                Read the contents of a storage object into a Python
                        variable.
    view                View the contents of a storage object.
    write               Write the value of a Python variable to a storage
                        object.

optional arguments:
  -h, --help            show this help message and exit


# Buckets and Objects

Items or files held in Cloud Storage are called objects. These are immutable once written. They are organized into buckets.

## Listing

First a couple of commands to list the Cloud Datalab sample data.

In [2]:
%%storage list --project cloud-datalab-samples

Bucket,Created
gs://cloud-datalab-samples,2015-10-04 16:47:48.785000+00:00


In [3]:
%%storage list --bucket gs://cloud-datalab-samples

Name,Type,Size,Updated
applogs,application/octet-stream,506050,2015-11-24 00:06:07.588000+00:00
carprices/testing.csv,text/csv,3635,2015-10-06 09:02:03.638000+00:00
carprices/training.csv,text/csv,15018,2015-10-06 09:01:46.040000+00:00
cars.csv,text/csv,248,2015-10-05 04:58:10.481000+00:00
cars2.csv,text/csv,92,2015-10-05 05:41:30.935000+00:00
hello.txt,text/plain,14,2015-10-05 04:48:39.433000+00:00
httplogs/logs20140615.csv,text/csv,23799981,2015-10-06 08:39:42.605000+00:00
httplogs/logs20140616.csv,text/csv,86323745,2015-10-06 08:39:43.067000+00:00
httplogs/logs20140617.csv,text/csv,51282558,2015-10-06 08:39:43.622000+00:00
httplogs/logs20140618.csv,text/csv,53380318,2015-10-06 08:39:44.191000+00:00


Try this command to list any buckets within the current project:

In [None]:
%%storage list

## Creating

In [5]:
# Some code to determine a unique bucket name for the purposes of the sample
import gcp

project = gcp.Context.default().project_id
sample_bucket_name = project + '-datalab-samples'
sample_bucket_path = 'gs://' + sample_bucket_name
sample_bucket_object = sample_bucket_path + '/Hello.txt'

print 'Bucket: ' + sample_bucket_path
print 'Object: ' + sample_bucket_object

Bucket: gs://cloud-ml-users-datalab-samples
Object: gs://cloud-ml-users-datalab-samples/Hello.txt


NOTE: In the examples below, the variables will be referenced in the command using `$` syntax, since the names are determined based on the current project. In your scenarios, you might be able to use literal values if they are constant, rather than creating and using variables.

In [6]:
%%storage create --bucket $sample_bucket_path

In [7]:
%%storage list --bucket $sample_bucket_path

In [8]:
%%storage copy --source gs://cloud-datalab-samples/hello.txt --destination $sample_bucket_object

In [9]:
%%storage list --bucket $sample_bucket_path

Name,Type,Size,Updated
Hello.txt,text/plain,14,2016-01-30 00:04:12.218000+00:00


## Reading and Writing

In [10]:
%%storage view --object $sample_bucket_object

'Hello World!\n\n'

In [11]:
%%storage read --object $sample_bucket_object --variable text

In [12]:
print text

Hello World!




In [13]:
text = 'Hello World!\n====\n'

In [14]:
%%storage write --variable text --object $sample_bucket_object

In [15]:
%%storage list --bucket $sample_bucket_path

Name,Type,Size,Updated
Hello.txt,text/plain,18,2016-01-30 00:04:14.100000+00:00


## Deleting

In [16]:
%%storage delete --object $sample_bucket_object

In [17]:
%%storage delete --bucket $sample_bucket_path

# Looking Ahead

The Storage commands seen above build on the Storage APIs included in Cloud Datalab. The next notebook will demonstrate these APIs.

Additionally, the BigQuery functionality supports exporting data to and importing data from Cloud Storage, as shown in the BigQuery tutorials.