# Fetching SMAP data programatically

See https://nsidc.org/support/how/how-do-i-programmatically-request-data-services

### Step 1. Query capabilities


`https://n5eil01u.ecs.nsidc.org/egi/capabilities/(SHORT_NAME).(VERSION).xml`

This will return an xml file detailing the format, projection, and subsetting options, as well as the variable names contained within the data files


In [1]:
import urllib2
import xml.etree.ElementTree as ET
import xmltodict

In [33]:
url = 'https://n5eil01u.ecs.nsidc.org/egi/capabilities/SPL2SMP_E.001.xml'
xmlfile = urllib2.urlopen(url)
xmldata = xmlfile.read()
xmlfile.close()

In [34]:
data = xmltodict.parse(xmldata)

In [35]:
root = ET.fromstring(xmldata)

In [95]:
root.tag

'{http://eosdis.nasa.gov/esi/dataset}DataSets'

In [37]:
for child in root:
    print child.tag, child.attrib

DataSet {'dataCenter': 'NSIDC', 'orderPH': 'false', 'orderQA': 'true', 'orderHDF_MAP': 'false', 'versionId': '1', 'longName': 'SMAP Enhanced L2 Radiometer Half-Orbit 9 km EASE-Grid Soil Moisture', 'id': 'SMAP Enhanced L2 Radiometer Half-Orbit 9 km EASE-Grid Soil Moisture V001', 'shortName': 'SPL2SMP_E', 'orderBrowse': 'false'}


See the reformatting options available (use the format value as opposed to the label, if different)

In [38]:
[f.attrib for f in root.findall('.//Format')]

[{'label': 'No Reformatting', 'value': ''},
 {'label': 'HDF-EOS5', 'value': 'HDF-EOS5'},
 {'label': 'NetCDF4-CF', 'value': 'NetCDF4-CF'},
 {'label': 'ASCII', 'value': 'ASCII'},
 {'label': 'KML', 'value': 'KML'},
 {'label': 'GeoTIFF', 'value': 'GeoTIFF'},
 {'label': 'NetCDF-3', 'value': 'NetCDF-3'}]

In [39]:
[f.attrib for f in root.findall('.//SubsetVariable')]

[{'compoundRule': 'false',
  'label': 'Soil_Moisture_Retrieval_Data',
  'value': 'Soil_Moisture_Retrieval_Data'},
 {'label': 'albedo', 'value': 'Soil_Moisture_Retrieval_Data:albedo'},
 {'label': 'boresight_incidence',
  'value': 'Soil_Moisture_Retrieval_Data:boresight_incidence'},
 {'compoundRule': 'false',
  'excludeFormat': 'HDF-EOS5,NetCDF4-CF',
  'label': 'EASE_column_index',
  'value': 'Soil_Moisture_Retrieval_Data:EASE_column_index'},
 {'compoundRule': 'false',
  'excludeFormat': 'HDF-EOS5,NetCDF4-CF',
  'label': 'EASE_row_index',
  'value': 'Soil_Moisture_Retrieval_Data:EASE_row_index'},
 {'label': 'freeze_thaw_fraction',
  'value': 'Soil_Moisture_Retrieval_Data:freeze_thaw_fraction'},
 {'label': 'latitude', 'value': 'Soil_Moisture_Retrieval_Data:latitude'},
 {'label': 'latitude_centroid',
  'value': 'Soil_Moisture_Retrieval_Data:latitude_centroid'},
 {'label': 'longitude', 'value': 'Soil_Moisture_Retrieval_Data:longitude'},
 {'label': 'longitude_centroid',
  'value': 'Soil_Mois

### Step 2. Create token

A token is needed in order to access data using your Earthdata Login credentials. The token is valid for **30 days**.

    curl -X POST --header "Content-Type: application/xml" -d "<token><username>sample_username</username><password>sample-password</password><client_id>client_name_of_your_choosing</client_id><user_ip_address>your_origin_ip_address</user_ip_address> </token>" https://cmr.earthdata.nasa.gov/legacy-services/rest/tokens

You should receive the token ID in the response. The value in the ID tag is the token you will use in Step 3. For example:

    <token>
    <id>75E5CEBE-6BBB-2FB5-A613-0368A361D0B6</id>
    <username>earthdata_login_user_name</username>
    <client_id>NSIDC_client_id</client_id>
    <user_ip_address>your_origin_ip_address</user_ip_address>
    <provider>NSIDC_ECS</provider>
    </token>

In [2]:
import sys
sys.path.append('/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages')
import requests

### Earthdata login

In [17]:
username = 'EARTHDATA-USERNAME'
password = 'EARTHDATA-PASSWORD'
client_id = 'NSIDC_client_id'

Current IP address

In [4]:
import socket
user_ip = [l for l in ([ip for ip in socket.gethostbyname_ex(socket.gethostname())[2] if not ip.startswith("127.")][:1], [[(s.connect(('8.8.8.8', 53)), s.getsockname()[0], s.close()) for s in [socket.socket(socket.AF_INET, socket.SOCK_DGRAM)]][0][1]]) if l][0][0]
user_ip

'136.142.68.231'

In [14]:
r = requests.post('https://api.echo.nasa.gov/echo-rest/tokens',
                 headers={'content-type': 'application/xml'},
                  data='<token><username>{0}</username><password>{1}</password><client_id>{2}</client_id>'
                  '<user_ip_address>{3}</user_ip_address> </token>'.format(
                      username, password, client_id, user_ip))

In [None]:
re_dict = xmltodict.parse(r.text)
token = re_dict['token']['id']
token

### Step 3. Create API endpoint

The programmatic access structure is as follows:

    https://n5eil01u.ecs.nsidc.org/egi/request?search_kvp=search_kvp&service_kvp=service_kvp&token=token&email=email

**`bounding_box`**

Specifies a search filter to find files with a spatial extent that overlaps this bounding box, specified in decimal
degrees of latitude and longitude in WSEN format:

    bounding_box=lower_left_long,lower_left_lat,
    upper_right_long,upper_right_lat

**`bbox`**

Unlike `bounding_box` above, `bbox` specifies a bounding box to be used for spatial subsetting. Output files will be cropped to the specified bounding box extent.

    bbox=lower_left_long,lower_left_lat,
    upper_right_long,upper_right_lat

In [57]:
import datetime
from zipfile import ZipFile

In [96]:
L = 0
short_name = ['SPL2SMP_E', 'SPL3SMP_E'][L]
version = ['001', '001'][L]
outputformat = ['GeoTIFF', 'HDF-EOS5'][1]
from_time = ['2016-08-01', '2018-02-01'][0]
to_time = ['2016-09-01', datetime.datetime.now().strftime('%Y-%m-%d')][0]
variables = ','.join(['/Soil_Moisture_Retrieval_Data/soil_moisture',
                      '/Soil_Moisture_Retrieval_Data/latitude',
                      '/Soil_Moisture_Retrieval_Data/longitude',
                      '/Soil_Moisture_Retrieval_Data/tb_time_utc',
                      '/Soil_Moisture_Retrieval_Data/spacecraft_overpass_time_utc'])
projection = 'Geographic'
page_size = 2000 # default is 10, max is 2000

In [97]:
# just Connellsville
box_lat = [39.9767, 40.0559]
box_lon = [-79.6315, -79.5231]
bbox = ','.join(map(str, [box_lon[0], box_lat[0], box_lon[1], box_lat[1]]))

In [98]:
bbox

'-79.6315,39.9767,-79.5231,40.0559'

had some issues with bbox, using bounding box instead (larger files)

In [99]:
request = ('https://n5eil01u.ecs.nsidc.org/egi/request?short_name={0}&version={1}&format={2}&'
           'time={3},{4}&Coverage={5}&token={6}&bounding_box={7}&page_size={8}'.format(
               short_name, version, outputformat, from_time, to_time, variables,
               token, bbox, page_size))

In [None]:
request

In [101]:
response = requests.get(request, stream=True)
print(response.headers)

{'Content-Disposition': 'attachment; filename="5000000053234.zip"', 'Transfer-Encoding': 'chunked', 'Keep-Alive': 'timeout=15, max=100', 'Server': 'Apache', 'Connection': 'Keep-Alive', 'Date': 'Mon, 26 Feb 2018 17:21:00 GMT', 'X-Frame-Options': 'SAMEORIGIN', 'Content-Type': 'application/xml'}


In [102]:
filename = 'smap.zip'
with open(filename, 'wb') as handle:
    for block in response.iter_content(1024):
        handle.write(block)

In [103]:
with ZipFile(filename, "r") as zip_ref:
    zip_ref.extractall('4')

In [70]:
import glob

In [104]:
fs = glob.glob('4/*/*')

In [105]:
fs[0]


'4/5000000053234:5000015052424/processed_SMAP_L2_SM_P_E_08013_D_20160801T112156_R14010_001.he5'

In [106]:
fs

['4/5000000053234:5000015052424/processed_SMAP_L2_SM_P_E_08013_D_20160801T112156_R14010_001.he5',
 '4/5000000053234:5000015052425/processed_SMAP_L2_SM_P_E_08020_A_20160801T220157_R14010_001.he5',
 '4/5000000053234:5000015052426/processed_SMAP_L2_SM_P_E_08035_A_20160802T223853_R14010_001.he5',
 '4/5000000053234:5000015052427/processed_SMAP_L2_SM_P_E_08057_D_20160804T113416_R14010_001.he5',
 '4/5000000053234:5000015052428/processed_SMAP_L2_SM_P_E_08064_A_20160804T221417_R14010_001.he5',
 '4/5000000053234:5000015052429/processed_SMAP_L2_SM_P_E_08086_D_20160806T110944_R14010_001.he5',
 '4/5000000053234:5000015052430/processed_SMAP_L2_SM_P_E_08093_A_20160806T214941_R14010_001.he5',
 '4/5000000053234:5000015052431/processed_SMAP_L2_SM_P_E_08101_D_20160807T114637_R14010_001.he5',
 '4/5000000053234:5000015052432/processed_SMAP_L2_SM_P_E_08108_A_20160807T222637_R14010_001.he5',
 '4/5000000053234:5000015052433/processed_SMAP_L2_SM_P_E_08130_D_20160809T112200_R14010_001.he5',
 '4/5000000053234:50

In [107]:
len(fs)

36