The landing page for this project seems to be: 
    
https://pgg.ldeo.columbia.edu/data/agap-gambit

The READMEs linked there appear to refer to a much older data release -- they direct me to ftp://gravity.ldeo.columbia.edu, where the agap/agap login doesn't seem to work anymore.

Instead, the data is at: http://wonder.ldeo.columbia.edu/data/AGAP/DataLevel_1/RADAR/index.html
I assume we'll want the SAR data: http://wonder.ldeo.columbia.edu/data/AGAP/DataLevel_1/RADAR/DecimatedSAR_netcdf/index.html

The directory structure is {F,L,T,V}####/F##_L##-###_1D_SAR.nc, and each netCDF file is ~16M.
The two L (lines) that I checked are about 2G each.

Link for first chunk of L270: http://wonder.ldeo.columbia.edu/data/AGAP/DataLevel_1/RADAR/DecimatedSAR_netcdf/L270/F31b_L270-181_1D_SAR.nc

The READMEs attached here only mention matlab files, and talk about known issues with the preliminary data release.

Line-based naming convention:

    * L -- long
    * T -- tie
    * F -- connecting to Dome Fujii survey
    * V -- connecting to Lake Vostok survey
    
And, old matlab files split into 3.5km along-track chunks. Ouch!! netCDF does the same. Uggggh! This is testing my resolve to directly work with whatever formats the providers supply. 

In [4]:
import requests  # For downloading index page
from bs4 import BeautifulSoup   # For parsing html and extracting the links


In [8]:
# Start by iterating over all links in the index
ldeo_agap_sar = "http://wonder.ldeo.columbia.edu/data/AGAP/DataLevel_1/RADAR/DecimatedSAR_netcdf"

In [13]:
reqs = requests.get(ldeo_agap_sar + "/index.html")
soup = BeautifulSoup(reqs.text, 'html.parser')
line_urls = [link.get('href') for link in soup.find_all('a')]
print(line_urls)

['F10130/index.html', 'F10150/index.html', 'F10170/index.html', 'F10190/index.html', 'F10210/index.html', 'F10230/index.html', 'L270/index.html', 'L280/index.html', 'L290/index.html', 'L300/index.html', 'L310/index.html', 'L320/index.html', 'L330/index.html', 'L340/index.html', 'L350/index.html', 'L360/index.html', 'L370/index.html', 'L380/index.html', 'L390/index.html', 'L400/index.html', 'L410/index.html', 'L420/index.html', 'L430/index.html', 'L440/index.html', 'L450/index.html', 'L460/index.html', 'L470/index.html', 'L480/index.html', 'L490/index.html', 'L500/index.html', 'L510/index.html', 'L520/index.html', 'L530/index.html', 'L540/index.html', 'L550/index.html', 'L560/index.html', 'L570/index.html', 'L580/index.html', 'L590/index.html', 'L600/index.html', 'L610/index.html', 'L620/index.html', 'L630/index.html', 'L640/index.html', 'L650/index.html', 'L660/index.html', 'L670/index.html', 'L680/index.html', 'L690/index.html', 'L700/index.html', 'L710/index.html', 'L720/index.html',

In [None]:
# Their documentation uses "flight line" to refer to what UTIG calls transects, 
# and then "file number" for the segment IDs that they're split up into.
file_count = 0
for line_url in line_urls:
    line = line_url.split('/')[0]
    #print("Handling: {}".format(line))
    
    reqs = requests.get('/'.join([ldeo_agap_sar, line_url]))
    soup = BeautifulSoup(reqs.text, 'html.parser')
    file_urls = [link.get('href') for link in soup.find_all('a')]
    #print('\n'.join(file_urls))
    #print("...{} files".format(len(file_urls)))
    file_count += len(file_urls)
    
print(file_count)

In [18]:
# How big do we expect the downlaod to be? Spot-checking gave 15.8 M for a single file
file_count * 15.8 / 1024  # Convert to GB.  OK, 150G isn't that bad to download all of.

149.45195312500002

### Given that I just want every file in that directory structure, is there something better than iterating manually?
~~~
QICERADAR_DATA=/Volumes/RadarData
LDEO_DIR=${QICERADAR_DATA}/LDEO/AGAP_GAMBIT
mkdir -p $LDEO_DIR
cd $LDEO_DIR

wget -c -r -d -nH --cut-dirs=5 http://wonder.ldeo.columbia.edu/data/AGAP/DataLevel_1/RADAR/DecimatedSAR_netcdf/index.html
~~~
* -r for recursive
* -d for debug
* -nH --cut-dirs=5 should disable the deep directory structure
  * -nH removes the URL from the start
  * --cut-dirs removes data/AGAP/DataLevel_1/RADAR/DecimatedSAR_netcdf 
* -c for continue (checks number of bytes, tries to append to end of file rather than overwriting) (I haven't tested this yet)

This can just be run repeatedly. (And will need to be, as I'm getting ~500 kB/sec - 2.5 MB/sec)

#### What about the source listed in the Bedmap compilation? DOI: http://get.iedadata.org/doi/317765

It claims to use the "Marine Geoscience Data System", but the link is to USAP-DC

And, as usual, USAP-DC says:
"Due to its size only a list of the file names is directly available, the actual data are available on request from info@usap-dc.org"

So, I think I should tell people to cite the DOI, but automate the download from Columbia's website.


# Rosetta

Appears to be available as a single 200G download (!!)


* example code for Matlab: http://wonder.ldeo.columbia.edu/data/ROSETTA-Ice/Radar/RS_Process_Example/DICE-master.zip
* 200G download for DICE (deep ice radar): http://wonder.ldeo.columbia.edu/data/ROSETTA-Ice/Radar/RS_Process_Example/Rosetta_Data.zip

When I e-mailed Kirsty a year ago, she said that they were trying to move it to "formal access", and gave me chunked example data for a single line:
    https://drive.google.com/drive/folders/10MvRe21Jj7xZepDdCauBt7gIC-hybul2
    

# Greenland!

Looks like they have some 2014 Greenland data.

https://pgg.ldeo.columbia.edu/data/icepod

I haven't yet figured out how I plan to segment Arctic/Antarctic data; probably just separate top-level directories in my data drive for now.