This notebook provides recipes for loading and saving data from external sources.

# Local file system

## Uploading files from your local file system

`files.upload` returns a dictionary of the files which were uploaded.
The dictionary is keyed by the file name, the value is the data which was uploaded.

In [0]:
from google.colab import files

uploaded = files.upload()

for fn in uploaded.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(
      name=fn, length=len(uploaded[fn])))

# Google Drive

You can access files in Drive in a number of ways, including:
1. Using the [native REST API](https://developers.google.com/drive/v3/web/about-sdk);
1. Using a wrapper around the API such as [PyDrive](https://gsuitedevs.github.io/PyDrive/docs/build/html/index.html); or
1. Mounting your Google Drive in the runtime's virtual machine.

## Mounting Google Drive locally

The example below shows how to mount your Google Drive in your virtual machine using an authorization code, and shows a couple of ways to write & read files there. Once executed, observe the new file (`foo.txt`) is visible in https://drive.google.com/

Note this only supports reading and writing files; to programmatically change sharing settings etc use one of the other options below.

In [0]:
from google.colab import drive
drive.mount('/content/gdrive')

# Google Cloud Storage (GCS)

We'll start by authenticating to GCS and creating the service client.

In [0]:
from google.colab import auth
auth.authenticate_user()

## Upload a file from Python to a GCS bucket



We'll start by creating the sample file to be uploaded.

Next, we'll upload the file using the `gsutil` command, which is included by default on Colab backends.

In [0]:
# First, we need to set our project. Replace the assignment below
# with your project ID.
project_id = 'chatbotdemo-ai'

In [0]:
!gcloud config set project {project_id}

In [0]:
import uuid

# Make a unique bucket to which we'll upload the file.
# (GCS buckets are part of a single global namespace.)
bucket_name = 'sample-bucket-' + str(uuid.uuid1())

# Full reference: https://cloud.google.com/storage/docs/gsutil/commands/mb
!gsutil mb gs://{bucket_name}

In [0]:
# Copy the file to our new bucket.
# Full reference: https://cloud.google.com/storage/docs/gsutil/commands/cp
!gsutil cp trained_model.pkl gs://{bucket_name}/

### Using Python

This section demonstrates how to upload files using the native Python API rather than `gsutil`.

This snippet is based on [a larger example](https://github.com/GoogleCloudPlatform/storage-file-transfer-json-python/blob/master/chunked_transfer.py) with additional uses of the API.

In [0]:
# The first step is to create a bucket in your cloud project.
#
# Replace the assignment below with your cloud project ID.
#
# For details on cloud projects, see:
# https://cloud.google.com/resource-manager/docs/creating-managing-projects
project_id = 'chatbotdemo-ai'

In [0]:
# Authenticate to GCS.
from google.colab import auth
auth.authenticate_user()

# Create the service client.
from googleapiclient.discovery import build
gcs_service = build('storage', 'v1')

# Generate a random bucket name to which we'll upload the file.
import uuid
bucket_name = 'sample-bucket-' + str(uuid.uuid1())

body = {
  'name': bucket_name,
  # For a full list of locations, see:
  # https://cloud.google.com/storage/docs/bucket-locations
  'location': 'us',
}
gcs_service.buckets().insert(project=project_id, body=body).execute()
print('Done')

**a. The cell below uploads the file from google drive to our newly created bucket.**

In [0]:
from googleapiclient.http import MediaFileUpload

media = MediaFileUpload('/content/gdrive/My Drive/trained_model.pkl', 
                        mimetype='text/plain',
                        resumable=True)

# media = MediaFileUpload('trained_model.pkl', 
#                         mimetype='text/plain',
#                         resumable=True)

request = gcs_service.objects().insert(bucket=bucket_name, 
                                       name='trained_model.pkl',
                                       media_body=media)

response = None
while response is None:
  # _ is a placeholder for a progress object that we ignore.
  # (Our file is small, so we skip reporting progress.)
  _, response = request.next_chunk()

print('Upload complete')

**b. The cell below uploads the file from local to our newly created bucket.**


In [0]:
from googleapiclient.http import MediaFileUpload

media = MediaFileUpload('trained_model.pkl', 
                        mimetype='text/plain',
                        resumable=True)

request = gcs_service.objects().insert(bucket=bucket_name, 
                                       name='trained_model.pkl',
                                       media_body=media)

response = None
while response is None:
  # _ is a placeholder for a progress object that we ignore.
  # (Our file is small, so we skip reporting progress.)
  _, response = request.next_chunk()

print('Upload complete')

Once the upload has finished, the data will appear in the cloud console storage browser for your project:

https://console.cloud.google.com/storage/browser?project=YOUR_PROJECT_ID_HERE

## Downloading a file from GCS to Python

Next, we'll download the file we just uploaded in the example above. It's as simple as reversing the order in the `gsutil cp` command.*italicized text*

In [0]:
# Download the file.
!gsutil cp gs://{bucket_name}/trained_model.pkl /tmp/trained_model.pkl
  
# Print the result to make sure the transfer worked.
!cat /tmp/trained_model.pkl

#### Using Python

We repeat the download example above using the native Python API.

In [0]:
# Authenticate to GCS.
from google.colab import auth
auth.authenticate_user()

# Create the service client.
from googleapiclient.discovery import build
gcs_service = build('storage', 'v1')

from apiclient.http import MediaIoBaseDownload

with open('/content/gdrive/My Drive/trained_model.pkl', 'wb') as f:
  request = gcs_service.objects().get_media(bucket=bucket_name,
                                            object='trained_model.pkl')
  media = MediaIoBaseDownload(f, request)

  done = False
  while not done:
    # _ is a placeholder for a progress object that we ignore.
    # (Our file is small, so we skip reporting progress.)
    _, done = media.next_chunk()

print('Download complete')