# Downloading Objectron Data
This tutorial covers how to download/use the Objectron datasets. 
There are three ways you can download the objectron data to you disks. 
- Use `gsutil`
- Download via Public HTTP API
- Download using Cloud Python client.

Keep in mind you can always directly consume the dataset from the Google cloud bucket using our [tf.data.Dataset](https://github.com/google-research-datasets/Objectron/blob/master/notebooks/Hello%20World.ipynb) or [torch_xla.utils.tf_record_reader](https://github.com/google-research-datasets/Objectron/blob/master/notebooks/Objectron_Pytorch_tutorial.ipynb) without copying the data to your local machine. See the tutorial notebooks for more details.



## Data Locations
The data is stored in the `objectron` bucket on Google Cloud storage. and includes the following assets:

- The video sequences (located in `gs://objectron/videos/class/batch-i/j/video.MOV` files)
- The annotation labels containing the 3D bounding boxes for objects stored in `gs://objectron/annotations`. The annotation protobufs are located in `/videos/class/batch-i/j/geometry.pbdata` files. They are formatted using the object.proto.
- AR metadata (such as camera poses, point clouds, and planar surfaces).
- Processed dataset: sharded and shuffled `tf.records` of the annotated frames, in tf.example format. These are used for creating the input data pipeline to your models. These files are located in `gs://objectron/v1/records_sharded/class/`
- The index of all available samples, as well as train/test splits for easy access and download. For each category, first you need to get the index for the available files. Copies of the indices are stored in the objectron bucket (`objectron/v1/index`) as well as the [github repo](https://github.com/google-research-datasets/Objectron/tree/master/index) under `index` folder. There are three files: class_annotations, and the 20/80 test/train split: class_annotations_test and class_annotations_train. Each file contains multiple lines with the format `class/batch-i/j`. Combine this with the root directory of the Objectron to get the key for videos and annotations.

For example, for public URLs:
* annotation file: https://storage.googleapis.com/objectron/annotations/class/batch-i/j.pbdata
* video file: https://storage.googleapis.com/objectron/videos/class/batch-i/j/video.MOV
* AR metadata: https://storage.googleapis.com/objectron/videos/class/batch-i/j/geometry.pbdata

## Downloading Data using gsutil

`gsutil` is the small utility to execute shell commands like ls and cp on the google storage bucket. 
For example you can use

```
gsutil ls gs://objectron/v1/records_shuffled
```
to see the available classes in the dataset. Similarly, the easiest way to copy files to the local machine would be
```
gsutil cp -r gs://objectron/v1/records_shuffled local_dataset_dir
```

## Downloading Data using Public HTTP API
Te users can download data without authentication directly using HTTP address. The dataset's public URL is `ttps://storage.googleapis.com/objectron`. You can use curl, request or any other downloader for this purpose.


In [19]:
import requests
public_url = "https://storage.googleapis.com/objectron"
blob_path = public_url + "/v1/index/cup_annotations_test"
video_ids = requests.get(blob_path).text
video_ids = video_ids.split('\n')
# Download the first ten videos in cup test dataset
for i in range(1):
    video_filename = public_url + "/videos/" + video_ids[i] + "/video.MOV"
    metadata_filename = public_url + "/videos/" + video_ids[i] + "/geometry.pbdata"
    annotation_filename = public_url + "/annotations/" + video_ids[i] + ".pbdata"
    # video.content contains the video file.
    video = requests.get(video_filename)
    metadata = requests.get(metadata_filename)
    annotation = requests.get(annotation_filename)
    file = open("example.MOV", "wb")
    file.write(video.content)
    file.close()


## Downloading data using cloud storage API
The third option is to download the data using Google cloud storage Python API. Alternatively you can use the google.cloud [Python API](https://cloud.google.com/storage/docs/downloading-objects#storage-download-object-python) to download the files using Python. For more information, see the [Cloud Storage Python API reference documentation](https://cloud.google.com/storage/docs/reference/libraries).

Note the cloud storage API requires you to authenticate before downloading the dataset. Refer to [authentication](https://googleapis.dev/python/google-api-core/latest/auth.html) on how to set your credentials.

In [None]:
!pip install google-cloud-storage


In [None]:
from google.cloud import storage

def download_blob(bucket_name, 
                  source_blob_name, 
                  destination_file_name):
    """Downloads a blob from the bucket."""
    storage_client = storage.Client()
    bucket = storage_client.bucket(bucket_name)
    blob = bucket.blob(source_blob_name)
    blob.download_to_filename(destination_file_name)
    print(
        "Blob {} downloaded to {}.".format(
            source_blob_name, destination_file_name
        )
    )
download_blob('objectron', 'v1/index/cup_annotation_train',  './cup_annotation_train')

## Getting the raw data



You can use the scripts in this repo to process the annotation and metadata protos and/or convert them in to Tensorflow examples.


public URL: https://storage.googleapis.com/objectron/v1/index/cup_annotations
authenticated URI: gs://objectron/v1/index/cup_annotations


In [None]:
gsutil cp gs://BUCKET_NAME/OBJECT_NAME SAVE_TO_LOCATION

In [13]:
download_blob('objectron', 'v1/index/cup_annotation_train',  './cup_annotation_train')
!ls

DefaultCredentialsError: Could not automatically determine credentials. Please set GOOGLE_APPLICATION_CREDENTIALS or explicitly create credentials and re-run the application. For more information, please see https://cloud.google.com/docs/authentication/getting-started