I haven't found a ton of radar profiles on PANGAEA, but hwere's what I have:

* Antarctic
  * Radar profiles across ice-shelf channels at the Roi Baudouin Ice Shelf https://doi.pangaea.de/10.1594/PANGAEA.907146?format=html#download Reinhart Drews 2019
  * Another Roi Baudouin (UWB) 2018: https://doi.pangaea.de/10.1594/PANGAEA.942989?format=html#download
* Arctic
  * Shear Margins for NEGIS https://doi.pangaea.de/10.1594/PANGAEA.928569?format=html#download 2018
  * 2018 EGRIP-NOR https://doi.pangaea.de/10.1594/PANGAEA.914258?format=html#download
  * Nioghalvfjerdsbrae (Greenland) 2018 https://doi.pangaea.de/10.1594/PANGAEA.949391?format=html#download
  * Nioghalvfjerdsbrae (Greenland) 1998 https://doi.pangaea.de/10.1594/PANGAEA.949619

In [48]:
from bs4 import BeautifulSoup   # For parsing html and extracting the links
import pathlib
import requests  # For downloading index page

In [None]:
dataset_link = 'https://doi.pangaea.de/10.1594/PANGAEA.942989?format=html#download'
dataset_link = 'https://doi.pangaea.de/10.1594/PANGAEA.928569?format=html#download'

reqs = requests.get(dataset_link)
soup = BeautifulSoup(reqs.text, 'html.parser')

all_urls = [link.get('href') for link in soup.find_all('a')]
#print(all_urls)
# download_urls = [base_url + url for url in all_urls if url.startswith(prefix)]
#filenames = [url.strip(base_url+prefix).split('?')[0] for url in download_urls]

for link in soup.find_all('a'):
    href = link.get('href')
    print(href)
    if href.endswith('mat'):  
        print("...found mat file!")


In [34]:

datasets = {
    'ANTARCTIC': 
        ('https://doi.pangaea.de/10.1594/PANGAEA.907146?format=html#download',  # Zipped format; can't download README or examply python script, haven't tried data itself yet. 
         'https://doi.pangaea.de/10.1594/PANGAEA.942989?format=html#download',  # 2018 UWB Roi Baudouin -> *.mat format
        ),
    'ARCTIC':
        ('https://doi.pangaea.de/10.1594/PANGAEA.928569?format=html#download',  # 2018 NEGIS Shear Margins -> has qlook.zip, sar1.zip, sar2.zip, as well as some QGIS files
         'https://doi.pangaea.de/10.1594/PANGAEA.914258?format=html#download',  # 2018 EGRIP-NOR
         'https://doi.pangaea.de/10.1594/PANGAEA.949391?format=html#download',  # 2018 Nioghalvfjerdsbrae
         'https://doi.pangaea.de/10.1594/PANGAEA.949619?format=html#download',  # 1998 Nioghalvfjerdsbrae -> EWR, *.mat format     
        )
}

# Maybe it would be cleaner to do it by dataset number:
datasets = {
    'ANTARCTIC': 
        (907146,  # Zipped format; can't download README or examply python script, haven't tried data itself yet. 
         942989,  # 2018 UWB Roi Baudouin -> *.mat format
        ),
    'ARCTIC':
        (928569,  # 2018 NEGIS Shear Margins -> has qlook.zip, sar1.zip, sar2.zip, as well as some QGIS files
         914258,  # 2018 EGRIP-NOR
         949391,  # 2018 Nioghalvfjerdsbrae
         949619,  # 1998 Nioghalvfjerdsbrae -> EWR, *.mat format     
        )
}

# There are two places to grab data from, and all 6 datasets seem to follow a different convention.
# So ... given this, I'm inclined to NOT start on AWI data yet. 
    
## TYPE 1
# 907146 has datafiles:
# * https://hs.pangaea.de/reflec/RoiBaudouinIceShelf/Drews_2019/README.txt
# * https://hs.pangaea.de/reflec/RoiBaudouinIceShelf/Drews_2019/RadarData_RoiBaudouinIceShelf_Drews.tar.bz2
# * https://hs.pangaea.de/reflec/RoiBaudouinIceShelf/Drews_2019/RadarprofileLocationQuickshot.png
# * https://hs.pangaea.de/reflec/RoiBaudouinIceShelf/Drews_2019/RadarprofileQuickshot.png
# * https://hs.pangaea.de/reflec/RoiBaudouinIceShelf/Drews_2019/SampleScriptPlottingData.py
    
# as does 914258:
# https://hs.pangaea.de/polar6/EGRIP-NOR_2018/EGRIP_upstream_profiles_map.png
# https://hs.pangaea.de/polar6/EGRIP-NOR_2018/UpstreamProfile_Center_IceStream.zip
# https://hs.pangaea.de/polar6/EGRIP-NOR_2018/UpstreamProfile_Flowline_1.zip
# https://hs.pangaea.de/polar6/EGRIP-NOR_2018/UpstreamProfile_Flowline_2.zip
# https://hs.pangaea.de/polar6/EGRIP-NOR_2018/shapes.zip

## TYPE 2
# while 942989 has these 2 files:  
# * https://download.pangaea.de/dataset/942989/files/Data_20190106_02_002.mat
# * https://download.pangaea.de/dataset/942989/files/Data_20190106_02_003.mat
    
# 928569 follows the same prefix, but they're zip files, e.g.:
# * https://download.pangaea.de/dataset/928569/files/20180508_02_qlook.zip
# * https://download.pangaea.de/dataset/928569/files/20180508_02_sar1.zip
# * https://download.pangaea.de/dataset/928569/files/20180508_02_sar2.zip
    
# 949391 also follows that, but with yet another naming convention. e.g.:
# * https://download.pangaea.de/dataset/949391/files/CSARP_qlook_20180414_08.pdf
# * https://download.pangaea.de/dataset/949391/files/CSARP_qlook_20180414_08.zip
# * https://download.pangaea.de/dataset/949391/files/CSARP_standard_20180414_08.pdf
# * https://download.pangaea.de/dataset/949391/files/CSARP_standard_20180414_08.zip
    
# 949619 has:
# * https://download.pangaea.de/dataset/949619/files/983518.png
# * https://download.pangaea.de/dataset/949619/files/983518_awi_emr.mat
# * https://download.pangaea.de/dataset/949619/files/983525.png
# * https://download.pangaea.de/dataset/949619/files/983525_awi_emr.mat
# * https://download.pangaea.de/dataset/949619/files/983526.png
# * https://download.pangaea.de/dataset/949619/files/983526_awi_emr.mat
# * https://download.pangaea.de/dataset/949619/files/983529.png
# * https://download.pangaea.de/dataset/949619/files/983529_awi_emr.mat
# * https://download.pangaea.de/dataset/949619/files/983531.png
# * https://download.pangaea.de/dataset/949619/files/983531_awi_emr.mat





In [41]:
def list_awi_data(dataset_id):
    dataset_link = "https://doi.pangaea.de/10.1594/PANGAEA.{}?format=html#download".format(dataset_id)
    print("Scraping: {}".format(dataset_link))
    reqs = requests.get(dataset_link)
    soup = BeautifulSoup(reqs.text, 'html.parser')
    

    # Two places the data could bre ...
    datafile_prefix = "https://download.pangaea.de/dataset/{}/files/".format(dataset_id)
    pangaea_prefix = "https://hs.pangaea.de/"
    
    all_urls = [link.get('href') for link in soup.find_all('a') 
                if link.get('href').startswith(datafile_prefix)
               or link.get('href').startswith(pangaea_prefix)]
    return all_urls
    
    
for pole, dataset_ids in datasets.items():
    print(pole)
    for dataset_id in dataset_ids:
        dest_dir = "/Volumes/RadarData/{}/AWI/{}".format(pole, dataset_id)
        print("saving to: {}".format(dest_dir))

        #if dataset_id in [907146, 942989, 928569, 949391, 914258]:
        #    continue
        data_urls = list_awi_data(dataset_id)
        print('\n'.join(data_urls))
        
        # `wget --content-on-error` is required; for some reason, plain wget gives error code 500.
        for data_url in data_urls:
            wget_cmd = 'wget -c --content-on-error --directory-prefix="{}" "{}"'.format(dest_dir, data_url) 
            print(wget_cmd)

            

ANTARCTIC
Scraping: https://doi.pangaea.de/10.1594/PANGAEA.907146?format=html#download
https://hs.pangaea.de/reflec/RoiBaudouinIceShelf/Drews_2019/README.txt
https://hs.pangaea.de/reflec/RoiBaudouinIceShelf/Drews_2019/RadarData_RoiBaudouinIceShelf_Drews.tar.bz2
https://hs.pangaea.de/reflec/RoiBaudouinIceShelf/Drews_2019/RadarprofileLocationQuickshot.png
https://hs.pangaea.de/reflec/RoiBaudouinIceShelf/Drews_2019/RadarprofileQuickshot.png
https://hs.pangaea.de/reflec/RoiBaudouinIceShelf/Drews_2019/SampleScriptPlottingData.py
saving to: /Volumes/RadarData/ANTARCTIC/AWI/907146
wget -c --content-on-error --directory-prefix="/Volumes/RadarData/ANTARCTIC/AWI/907146" "https://hs.pangaea.de/reflec/RoiBaudouinIceShelf/Drews_2019/README.txt"
wget -c --content-on-error --directory-prefix="/Volumes/RadarData/ANTARCTIC/AWI/907146" "https://hs.pangaea.de/reflec/RoiBaudouinIceShelf/Drews_2019/RadarData_RoiBaudouinIceShelf_Drews.tar.bz2"
wget -c --content-on-error --directory-prefix="/Volumes/RadarDat

In [42]:
import subprocess

In [43]:
help(subprocess.check_call)

Help on function check_call in module subprocess:

check_call(*popenargs, **kwargs)
    Run command with arguments.  Wait for command to complete.  If
    the exit code was zero then return, otherwise raise
    CalledProcessError.  The CalledProcessError object will have the
    return code in the returncode attribute.
    
    The arguments are the same as for the call function.  Example:
    
    check_call(["ls", "-l"])



In [44]:
dir(subprocess)

['CalledProcessError',
 'CompletedProcess',
 'DEVNULL',
 'PIPE',
 'Popen',
 'STDOUT',
 'SubprocessError',
 'TimeoutExpired',
 '_PIPE_BUF',
 '_PopenSelector',
 '_USE_POSIX_SPAWN',
 '__all__',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 '_active',
 '_args_from_interpreter_flags',
 '_cleanup',
 '_mswindows',
 '_optim_args_from_interpreter_flags',
 '_posixsubprocess',
 '_time',
 '_use_posix_spawn',
 'builtins',
 'call',
 'check_call',
 'check_output',
 'contextlib',
 'errno',
 'getoutput',
 'getstatusoutput',
 'grp',
 'io',
 'list2cmdline',
 'os',
 'pwd',
 'run',
 'select',
 'selectors',
 'signal',
 'sys',
 'threading',
 'time',
 'types',

In [47]:
help(subprocess.getoutput)

Help on function getoutput in module subprocess:

getoutput(cmd)
    Return output (stdout or stderr) of executing cmd in a shell.
    
    Like getstatusoutput(), except the exit status is ignored and the return
    value is a string containing the command's output.  Example:
    
    >>> import subprocess
    >>> subprocess.getoutput('ls /bin/ls')
    '/bin/ls'

