## Using a hash to index thousands of files of NASA Land Surface Forcings Data & Using Get in a for loop with error handling

As part of a larger research project, I downloaded 15 years worth of data from NASA's MERRA-2 dataset.  The subset of data I chose was "inst1_2d_lfo_Nx (M2I1NXLFO): Land Surface Forcings."  These data contain measurements of land surface forcings measured at 1-hour increments from 1 January 2000 to 31 December 2015 for a rectangle of geographic space that covered California.  The data was mapped to the nearest 0.25 grid point.  After I registered and submitted my request, NASA provided a list of links I could use with a get request to download each file.  I saved this list as a .txt file.  This notebook shows how I used a hash to index the files as I downloaded each one.

In [1]:
import time
import requests
import numpy as np
import pandas as pd

Set up files for the data to go into

In [2]:
file_number = list(filter((0).__ne__, range(5846))) #number should be 5846 for 5845 files
filepath = 'D:/NASA data/'
filelist = []

for i in range(len(file_number)):
    filename = filepath + str(i) + '.nc4'
    filelist.append(filename)

In [3]:
len(filelist)

5845

Create a dataframe of the URLs and filenames

In [4]:
colname = ['URL']
urls = pd.read_csv('D:/NASA data/url_list.txt', sep='\n', header=None, names=colname)
urls['filename'] = filelist

In [5]:
urls.head()

Unnamed: 0,URL,filename
0,https://goldsmr4.gesdisc.eosdis.nasa.gov/data/...,D:/NASA data/0.nc4
1,https://goldsmr4.gesdisc.eosdis.nasa.gov/daac-...,D:/NASA data/1.nc4
2,https://goldsmr4.gesdisc.eosdis.nasa.gov/daac-...,D:/NASA data/2.nc4
3,https://goldsmr4.gesdisc.eosdis.nasa.gov/daac-...,D:/NASA data/3.nc4
4,https://goldsmr4.gesdisc.eosdis.nasa.gov/daac-...,D:/NASA data/4.nc4


Iterate through dataframe, open url, and save data from it to filename

In [None]:
for index, row in urls.iterrows():
    filename = row[1]
    url = row[0]
    try:
        result = requests.get(url)
        result.raise_for_status()
        f = open(filename, 'wb')
        f.write(result.content)
        f.close()
        print('Contents of URL written to ' + filename)
        time.sleep(2)
    except requests.exceptions.HTTPError as err:
        print(err)

The first file turned out to be a .pdf about the data.  The other 5844 files were data files in NetCDF format.