# Get bathymetry/backscatter data

In [2]:
import shutil, os, html, requests

In [2]:
def download_file(url):
    local_filename = os.path.join('data',url.split('/')[-1])
    with requests.get(url, stream=True) as r:
        with open(local_filename, 'wb') as f:
            shutil.copyfileobj(r.raw, f)

    return local_filename

For bathymetry data (25m): (damaged zip file?)

In [11]:
download_file('https://gsi.geodata.gov.ie/downloads/Marine/Data/Downloads/LatestEntireAreaMerge/IE_GSI_MI_Bathymetry_25m_IE_Waters_WGS84_LAT_TIFF.zip')

'data/IE_GSI_MI_Bathymetry_25m_IE_Waters_WGS84_LAT_TIFF.zip'

For backscatter data (40m):

In [7]:
download_file('https://gsi.geodata.gov.ie/downloads/Marine/Data/Downloads/LatestEntireAreaMerge/IE_GSI_MI_Backscatter_40m_Offshore_IE_WGS84_LAT_TIFF.zip')

'data/IE_GSI_MI_Backscatter_40m_Offshore_IE_WGS84_LAT_TIFF.zip'

In [3]:
download_file('https://gsi.geodata.gov.ie/downloads/Marine/Data/Downloads/LatestEntireAreaMerge/IE_GSI_MI_Bathymetry_100m_Offshore_IE_WGS84_LAT_TIFF.zip')

'data/IE_GSI_MI_Bathymetry_100m_Offshore_IE_WGS84_LAT_TIFF.zip'

## Extract zip files

In [2]:
import zipfile

In [5]:
# make directories to store geotiff images
if not os.path.exists('data/bathymetry/'):
    os.makedirs('data/bathymetry/')
    
if not os.path.exists('data/backscatter/'):
    os.makedirs('data/backscatter/')

In [13]:
! unzip data/IE_GSI_MI_Backscatter_40m_Offshore_IE_WGS84_LAT_TIFF.zip -d data/backscatter/

Archive:  data/IE_GSI_MI_Backscatter_40m_Offshore_IE_WGS84_LAT_TIFF.zip
  inflating: data/backscatter/IE_GSI_MI_Backscatter_40m_Offshore_IE_WGS84_LAT_TIFF.tfw  
  inflating: data/backscatter/IE_GSI_MI_Backscatter_40m_Offshore_IE_WGS84_LAT_TIFF.tif  
  inflating: data/backscatter/IE_GSI_MI_Backscatter_40m_Offshore_IE_WGS84_LAT_TIFF.tif.aux.xml  
  inflating: data/backscatter/IE_GSI_MI_Backscatter_40m_Offshore_IE_WGS84_LAT_TIFF.tif.ovr  
  inflating: data/backscatter/IE_GSI_MI_Backscatter_40m_Offshore_IE_WGS84_LAT_TIFF.tif.xml  


In [4]:
! unzip data/IE_GSI_MI_Bathymetry_100m_Offshore_IE_WGS84_LAT_TIFF.zip -d data/bathymetry/

Archive:  data/IE_GSI_MI_Bathymetry_100m_Offshore_IE_WGS84_LAT_TIFF.zip
  inflating: data/bathymetry/IE_GSI_MI_Bathymetry_100m_Offshore_IE_WGS84_LAT_TIFF.tfw  
  inflating: data/bathymetry/IE_GSI_MI_Bathymetry_100m_Offshore_IE_WGS84_LAT_TIFF.tif  
  inflating: data/bathymetry/IE_GSI_MI_Bathymetry_100m_Offshore_IE_WGS84_LAT_TIFF.tif.aux.xml  
  inflating: data/bathymetry/IE_GSI_MI_Bathymetry_100m_Offshore_IE_WGS84_LAT_TIFF.tif.ovr  
  inflating: data/bathymetry/IE_GSI_MI_Bathymetry_100m_Offshore_IE_WGS84_LAT_TIFF.tif.xml  


## Stream data from Google Drive

Some of our data cannot be extracted directly on JupyterLab, so we want to resolve them first locally, then upload to Drive (instead of JupyterLab itself since it's painstakingly slow). We also want to convert the GEOTIFF files to netcdf files to work with the machine learning models later, so that's another plus.

In [5]:
! pip install gdown

Defaulting to user installation because normal site-packages is not writeable
Collecting gdown
  Downloading gdown-4.7.1-py3-none-any.whl (15 kB)
Installing collected packages: gdown
Successfully installed gdown-4.7.1


In [6]:
import gdown

What we are about to do:
1. Open sharing for the file we want to download and set it to public access (anyone with the link can view), then copy the link

The link may look like this: "https://drive.google.com/file/d/1ddy4s33lzBumYcEjg45M8uf4TQZn1Xce/view?usp=sharing." We are going to copy the ID of the file, which is the part between 'd/' and '/view...'.

2. Create URL for gdown to download file with template below, then use gdown to download file

In [7]:
file_id = '1ddy4s33lzBumYcEjg45M8uf4TQZn1Xce'
url = f'https://drive.google.com/uc?id={file_id}'

In [8]:
url

'https://drive.google.com/uc?id=1ddy4s33lzBumYcEjg45M8uf4TQZn1Xce'

In [9]:
output = 'data/sth.nc'

In [10]:
gdown.download(url, output, quiet=False)

Downloading...
From (uriginal): https://drive.google.com/uc?id=1ddy4s33lzBumYcEjg45M8uf4TQZn1Xce
From (redirected): https://drive.google.com/uc?id=1ddy4s33lzBumYcEjg45M8uf4TQZn1Xce&confirm=t&uuid=62767d37-8724-4b2d-966c-2b1471e1f579
To: /home/jovyan/ohw23-proj-habitatmapping/data/sth.nc
100%|██████████| 681M/681M [00:05<00:00, 122MB/s] 


'data/sth.nc'

The file `sth.nc` is one of the data files we are working on...

Process the rest of the netcdf4 files the same way

In [12]:
URLs = {
'bathymetry_10': 'https://drive.google.com/file/d/16XyhnPwIfabtffUKwnM_ERJfxRqiPTK8/view?usp=sharing',
'backscatter_10': 'https://drive.google.com/file/d/1uR9ZMjbzb4msAPUw-W52tjuRVtllGpGY/view?usp=sharing'
}

In [13]:
for file_name, URL in URLs.items():
    # get file ID
    file_id = URL.split('/')[-2]
    print(file_id)
    download_url = f'https://drive.google.com/uc?id={file_id}'
    
    output = os.path.join('data/', f'{file_name}.nc')
    gdown.download(url, output, quiet=False)

Downloading...
From (uriginal): https://drive.google.com/uc?id=1ddy4s33lzBumYcEjg45M8uf4TQZn1Xce
From (redirected): https://drive.google.com/uc?id=1ddy4s33lzBumYcEjg45M8uf4TQZn1Xce&confirm=t&uuid=c1c9b67a-412d-43b7-8a89-26b364d0e322
To: /home/jovyan/ohw23-proj-habitatmapping/data/bathymetry_10.nc
100%|██████████| 681M/681M [00:04<00:00, 141MB/s] 
Downloading...
From (uriginal): https://drive.google.com/uc?id=1ddy4s33lzBumYcEjg45M8uf4TQZn1Xce
From (redirected): https://drive.google.com/uc?id=1ddy4s33lzBumYcEjg45M8uf4TQZn1Xce&confirm=t&uuid=4350af80-53a6-46ef-aa69-803c8abf2bd3
To: /home/jovyan/ohw23-proj-habitatmapping/data/backscatter_10.nc
100%|██████████| 681M/681M [00:03<00:00, 226MB/s] 
