# Importing and Exporting Data

Data can be imported into Google BigQuery from a CSV file stored within Google Cloud Storage, or it can be streamed directly into BigQuery from Python code.

Similarly, the results of a query can be exported to Google Cloud Storage as a set of shards, or they can be streamed directly into a file within Cloud Datalab. Note that for larger data sizes, it is recommended to choose the sharded method.

In [None]:
from google.datalab import Context
import google.datalab.bigquery as bq
import google.datalab.storage as storage
import pandas as pd
from StringIO import StringIO

# Importing Data

The first step to analyzing and querying your data is importing it. For this demo, we'll create a temporary table in a temporary dataset within BigQuery, using a small data file within Cloud Storage.

## Importing Data from Cloud Storage

To interact with Google Cloud Storage, Datalab includes the `%%gcs` command. First, see the available options on `%%gcs`:

In [None]:
%gcs -h

Let's use the `read` option to read a storage object into a local Python variable:

In [None]:
%%gcs read --object gs://cloud-datalab-samples/cars.csv --variable cars

In [None]:
print cars

In [None]:
# Create the schema, conveniently using a DataFrame example.
df = pd.read_csv(StringIO(cars))
schema = bq.Schema.from_data(df)

# Create the dataset
bq.Dataset('importingsample').create()

# Create the table
sample_table = bq.Table('importingsample.cars').create(schema = schema, overwrite = True)

In [None]:
sample_table.load('gs://cloud-datalab-samples/cars.csv', mode='append',
                  source_format = 'csv', csv_options=bq.CSVOptions(skip_leading_rows = 1))

In [None]:
%%bq query -n importingSample
SELECT * FROM importingsample.cars

In [None]:
%bq execute -q importingSample

## Importing Data from a DataFrame

In [None]:
cars2 = storage.Object('cloud-datalab-samples', 'cars2.csv').read_stream()
df2 = pd.read_csv(StringIO(cars2))
df2

In [None]:
df2.fillna(value='', inplace=True)
df2

In [None]:
sample_table.insert(df2)
sample_table.to_dataframe()

# Exporting Data

## Exporting Data to Cloud Storage

In [None]:
project = Context.default().project_id
sample_bucket_name = project + '-datalab-samples'
sample_bucket_path = 'gs://' + sample_bucket_name
sample_bucket_object = sample_bucket_path + '/tmp/cars.csv'
print 'Bucket: ' + sample_bucket_name
print 'Object: ' + sample_bucket_object

In [None]:
sample_bucket = storage.Bucket(sample_bucket_name)
sample_bucket.create()
sample_bucket.exists()

In [None]:
table = bq.Table('importingsample.cars')
table.extract(destination = sample_bucket_object)

In [None]:
bucket = storage.Bucket(sample_bucket_name)

In [None]:
list(bucket.objects())

In [None]:
obj = list(bucket.objects())[0]

In [None]:
data = obj.read_stream()

In [None]:
print data

## Exporting Data to a Local File

In [None]:
table.to_file('/tmp/cars.csv')

In [None]:
%%bash
ls -l /tmp/cars.csv

In [None]:
lines = None
with open('/tmp/cars.csv') as datafile:
  lines = datafile.readlines()
print ''.join(lines)

# Cleanup

In [None]:
sample_bucket.object('tmp/cars.csv').delete()
sample_bucket.delete()
bq.Dataset('importingsample').delete(delete_contents = True)