<center>
<img src='./img/nsidc_logo.png'/>

# How to download NOAA@NSIDC data using python

</center>

## 1. Tutorial Overview 
This notebook demonstrates how to download NOAA@NSIDC data using python. It includes examples for downloading a single file and all the files in a directory.

### Credits 
This notebook was developed by Jennifer Roebuck of NSIDC.

For questions regarding the notebook or to report problems, please create a new issue in the [NSIDC-Data-Tutorials repo](https://github.com/nsidc/NSIDC-Data-Tutorials/issues)

### Learning Objectives

By the end of this demonstration you will be able to:

1. Download a single file from a NOAA@NSIDC data set
2. Download all the files in a directory on the NOAA@NSIDC HTTPS server 

### Prerequisites 

1. The `requests` and `bs4` libraries are already installed. 

### Time requirement 

Allow approximately 5 to 10 minutes to complete this tutorial.

## 2. Tutorial Steps 

### Import necessary libraries

In [None]:
#import the requests library 
import requests
from bs4 import BeautifulSoup #TBD Describe what htis library does. Do we need to add it to our support set??

### Downloading a single file
This demonstrates how to download a single file.

First we need to set the URL of the file we wish to download. The URL will follow the format of: `https://noaadata.apps.nsidc.org/NOAA/<path to data set and file>`

where \<path to data set and file\> is specific to the data set and can be determined by exploring https://noaadata.apps.nsidc.org in a web browser. 

We will use the [Sea Ice Index (G02135)](https://nsidc.org/data/G02135) data set as an example, and download the text file containing daily sea ice extent values for the Arctic (N_seaice_extent_daily_v3.0.csv)


In [None]:
#URL of the file 
file_url = "https://noaadata.apps.nsidc.org/NOAA/G02135/north/daily/data/N_seaice_extent_daily_v3.0.csv"

Next we need to create a HTTPS response object for that URL using the `get` method from the `requests` library. We will raise an exception if the response returns an error.

In [None]:
#Create a HTTPS response object
r = requests.get(file_url)
  
try:
    r = requests.get(file_url)
    r.raise_for_status()
except requests.exceptions.RequestException as err:
    raise SystemExit(err)

Now we need to set the filename that we want to save the downloaded file as, and download the file. 

In [None]:
#Download and save the file
with open("N_seaice_extent_daily_v3.0.csv", "wb") as f:
    f.write(r.content)

### Downloading all the files in a directory 
This demonstrates downloading all of the files in a single directory.

First we need to set the URL path of the directory we wish to download. It follows a similar format to the one described above for downloading a single file.

Again we will use the [Sea Ice Index (G02135)](https://nsidc.org/data/G02135) data set as an example and download all the daily GeoTIFFs for October 1978. 

In [None]:
#Set the URL of the directory we wish to download all the files from
archive_url = "https://noaadata.apps.nsidc.org/NOAA/G02135/north/daily/geotiff/1978/10_Oct/"

Next we need to create an HTTPS response object for the URL, again using the `get` method from the `requests` library. 

Then we will use `BeautifulSoup` to parse all the filenames that are in the directory. 

In [None]:
#Create a HTTPS response object
r = requests.get(archive_url)

#Use BeautifulSoup to get a list of the files in the directory
data = BeautifulSoup(r.text, "html.parser")
data

Now we will create a URL for each of the files, set filenames for each of our downloaded files, and download the files. 

In [None]:
#Loop through the list of the html links (excluding the first one which is just a link to the previous directory)
for l in data.find_all("a")[1:]:
    #generate URL to download each of the files 
    r = requests.get(archive_url + l["href"])
    print(r.status_code) #print status code
    print(l["href"]) #prints name of file
    #Download and save files 
    with open(l["href"], "wb") as f:
        f.write(r.content)

## 3. Learning outcomes recap

We have learned how to:
1. Download a single file from a NOAA@NSIDC data set
2. Download all the files in a directory related to a NOAA@NSIDC data set. 