In [1]:
import requests
import time
from thredds_crawler.crawl import Crawl

In this example we are downloading all data from RS03CCAL-MJ03F-05-BOTPTA301. The ingredients being used to build the data_request_url can be found here. http://ooi.visualocean.net/instruments/view/RS03CCAL-MJ03F-05-BOTPTA301

reference designato = RS03CCAL-MJ03F-05-BOTPTA301  
delivery method = streamed  
stream = botpt_nano_sample  
beginDT = 2012-01-01T01:01:01.000Z (setting the begin time to 2012 and no end time guarantees getting the full dataset)

BOTPT instruments sample at 20 Hz, making this a very large data set when requesting multiple years of data. Since the system processes the raw data through dataset drivers and algorithms on demand, much like a cloud computing system providing software as a service, the request can take a while to fulfill, especially if there is a high load on the system due to other users requesting data. For this reason we have staged the pre-processed dataset on a THREDDS server. See 01_botpt_data_wrangling.ipynb.

The mechanics of requesting data and checking for completion are outlined below for reference. It is not necessary to proceed with this notebook beyond this point. You can set a smaller time range to be requested (beginDT = yesterday, for example) to test the request response flow, if you like.

In [2]:
# enter your username and token obtained at https://ooinet.oceanobservatories.org/ 
# under your user profile (top right corner), which becomes available after logging in.
USERNAME =''
TOKEN= ''

In [8]:
DATA_API_BASE_URL = 'https://ooinet.oceanobservatories.org/api/m2m/12576/sensor/inv/'

data_request_url = DATA_API_BASE_URL+\
                    'RS03CCAL/'+\
                    'MJ03F/'+\
                    '05-BOTPTA301/'+\
                    'streamed/'+\
                    'botpt_nano_sample'+'?'+\
                    'beginDT=2012-01-01T01:01:01.000Z'

In [9]:
r = requests.get(data_request_url, auth=(USERNAME, TOKEN))
data = r.json()

In [14]:
# the first url in the response is the THREDDS location to which the data is being delivered.
print(data['allURLs'][0])

https://opendap.oceanobservatories.org/thredds/catalog/ooi/friedrich.knuth@rutgers.edu/20180116T163125-RS03CCAL-MJ03F-05-BOTPTA301-streamed-botpt_nano_sample/catalog.html


In [15]:
# the first url in the response is the Apache location to which the data is being delivered, 
# which is useful if you want to use wget to download the data to your local directory, 
# instead of reading it directly off THREDDS.
print(data['allURLs'][1])

https://opendap.oceanobservatories.org/async_results/friedrich.knuth@rutgers.edu/20180116T163125-RS03CCAL-MJ03F-05-BOTPTA301-streamed-botpt_nano_sample


In [None]:
# once the request completes, a confirmation email will be sent to the email associated with your account.
# alternatively, you can use the THREDDS module to programatically monitor the THREDDS directory for the
# status.txt file to be written to the THREDDS directory. this confirms that the request has completed.

check_complete = data['allURLs'][1] + '/status.txt'
for i in range(100000): 
    r = requests.get(check_complete)
    if r.status_code == requests.codes.ok:
        print('request completed')
        break
    else:
        time.sleep(.5) 