<a id="top"></a>
# Download Aperio *(.svs)* data for experiment with openslide library
****
### Openslide links
[OpenSlide api](https://openslide.org/api/python/) <br>
[openslide github](https://github.com/openslide/openslide-python) <br>
[openslide dot org](https://openslide.org/) <br>
### Install on Mac OSX:
```bash
# may require creation of this directory
# sudo mkdir /usr/loca/Frameworks
brew install openslide
pip3 install openslide-python
# or maybe
# pip install openslide-python
```
***
    Internal links:
[View the list of files available](#view_example_files) <br>
[Notebook Cell to Download SVS example data](#dwnld_svs) <br>

****
    External Links
## Exercise low-level functions in python std library urllib & view openslide examples data
[urllib docs](https://docs.python.org/3/library/urllib.html) <br>
[StackOverflow: download and unzip](https://stackoverflow.com/questions/5710867/downloading-and-unzipping-a-zip-file-without-writing-to-disk) <br>
[openslide data](http://openslide.cs.cmu.edu/download/openslide-testdata/) <br>

**** 
## Python Requests vs (urllib, urllib2) ... std lib preferred
[requests on StackOverflow](https://stackoverflow.com/questions/2018026/what-are-the-differences-between-the-urllib-urllib2-and-requests-module) <br>

<a id="view_example_files"></a>
## View the list of example files available
[Top](#top) <br>

In [1]:
import os
import tempfile
import time
import json
import urllib.request
import openslide

remote_index_file_name = 'index.json'
data_download = 'http://openslide.cs.cmu.edu/download/openslide-testdata/'
dir_file = os.path.join(data_download, remote_index_file_name)
print("View json data in the Raw (If Internet Allows):\n{}\n".format(dir_file))

dir_dict = {}
with tempfile.TemporaryDirectory() as tmpdirname:
    print('with temporary directory\n\t {} \n'.format(tmpdirname))
    temp_json = os.path.join(tmpdirname, remote_index_file_name)
    tuple_of_stuff = urllib.request.urlretrieve(dir_file, temp_json)
    
    with open(temp_json, 'r') as fh:
        dir_dict = json.loads(fh.read())
    
if len(dir_dict) > 0:
    MB = 1000000
    n_files = len(dir_dict)
    k1 = dir_dict[list(dir_dict.keys())[0]]
    print('%i filenames:'%(n_files))
    for k in k1.keys():
        print('\t',k)
    print()
    for k, v in dir_dict.items():
        print('%50s: %6i MB'%(k, v['size'] / MB))

View json data in the Raw (If Internet Allows):
http://openslide.cs.cmu.edu/download/openslide-testdata/index.json

with temporary directory
	 /var/folders/gf/ybz3wzn55m139k_4dw2v_tvm0000gn/T/tmp3r6a28hy 

48 filenames:
	 description
	 format
	 license
	 sha256
	 size

                       Aperio/CMU-1-JP2K-33005.svs:    132 MB
                     Aperio/CMU-1-Small-Region.svs:      1 MB
                                  Aperio/CMU-1.svs:    177 MB
                                  Aperio/CMU-2.svs:    390 MB
                                  Aperio/CMU-3.svs:    253 MB
                           Aperio/JP2K-33003-1.svs:     63 MB
                           Aperio/JP2K-33003-2.svs:    289 MB
                           Generic-TIFF/CMU-1.tiff:    204 MB
                           Hamamatsu-vms/CMU-1.zip:    646 MB
                           Hamamatsu-vms/CMU-2.zip:   1216 MB
                           Hamamatsu-vms/CMU-3.zip:    951 MB
                              Hamamatsu/CMU-1.nd

<a id="dwnld_svs"></a>
****
## How to download one file:
```python
data_download = 'http://openslide.cs.cmu.edu/download/openslide-testdata/'
file_to_download = data_download + 'Aperio/CMU-1-Small-Region.svs'
destination_file_name = os.path.join(target_dir, 'CMU-1-Small-Region.svs')
# destination_file_name = '../data/Aperio/CMU-1-Small-Region.svs'
print(file_to_download)

if os.path.isdir(target_dir):
    print('dir found for', destination_file_name)

    tuple_of_stuff = urllib.request.urlretrieve(file_to_download, destination_file_name)
    print(tuple_of_stuff)
```
****

In [4]:
# Generic-TIFF/CMU-1.tiff
tiff_dir = '../../DigiPath_MLTK_data/Tiff'
if os.path.isdir(tiff_dir) == False:
    os.makedirs(tiff_dir)

In [8]:
#             Get the lone tiff
data_download = 'http://openslide.cs.cmu.edu/download/openslide-testdata/'
file_to_download = data_download + 'Generic-TIFF/CMU-1.tiff'
destination_file_name = os.path.join(tiff_dir, 'CMU-1.tiff')

print(file_to_download)

if os.path.isdir(tiff_dir):
    print('dir found for', destination_file_name)

    tuple_of_stuff = urllib.request.urlretrieve(file_to_download, destination_file_name)
    print(tuple_of_stuff)

http://openslide.cs.cmu.edu/download/openslide-testdata/Generic-TIFF/CMU-1.tiff
dir found for ../../DigiPath_MLTK_data/Tiff/CMU-1.tiff
('../../DigiPath_MLTK_data/Tiff/CMU-1.tiff', <http.client.HTTPMessage object at 0x103b12c18>)


## Download SVS example files
    (Only need to run once)
[Top](#top) <br>

In [3]:
target_dir = '../../DigiPath_MLTK_data/Aperio'

if os.path.isdir(target_dir) == False:
    os.makedirs(target_dir)
    
Aperio_data_files_list = []

for k, v in dir_dict.items():
    d_in_q = k.split('/')[0]
    if d_in_q == 'Aperio':
        print('%50s: %6i MB'%(k, v['size'] / MB))
        Aperio_data_files_list.append(os.path.join(data_download, k))
        
for file_to_download in Aperio_data_files_list:
    target_file_name = file_to_download.split('/')[-1]
    target_file_name = os.path.join(target_dir, target_file_name)
    t0 = time.time()
    print(target_file_name)
    tuple_of_stuff = urllib.request.urlretrieve(file_to_download, target_file_name)
    print(tuple_of_stuff)
    print('%0.3f seconds'%(time.time() - t0))

                       Aperio/CMU-1-JP2K-33005.svs:    132 MB
                     Aperio/CMU-1-Small-Region.svs:      1 MB
                                  Aperio/CMU-1.svs:    177 MB
                                  Aperio/CMU-2.svs:    390 MB
                                  Aperio/CMU-3.svs:    253 MB
                           Aperio/JP2K-33003-1.svs:     63 MB
                           Aperio/JP2K-33003-2.svs:    289 MB
../../DigiPath_MLTK_data/Aperio/CMU-1-JP2K-33005.svs
('../../DigiPath_MLTK_data/Aperio/CMU-1-JP2K-33005.svs', <http.client.HTTPMessage object at 0x10ec20a58>)
44.536 seconds
../../DigiPath_MLTK_data/Aperio/CMU-1-Small-Region.svs
('../../DigiPath_MLTK_data/Aperio/CMU-1-Small-Region.svs', <http.client.HTTPMessage object at 0x10ec20908>)
0.797 seconds
../../DigiPath_MLTK_data/Aperio/CMU-1.svs
('../../DigiPath_MLTK_data/Aperio/CMU-1.svs', <http.client.HTTPMessage object at 0x10ec20630>)
60.383 seconds
../../DigiPath_MLTK_data/Aperio/CMU-2.svs
('../../DigiPath_MLTK

## Download all Sample Data

In [2]:
%whos

Variable                 Type             Data/Info
---------------------------------------------------
MB                       int              1000000
autopep8                 module           <module 'autopep8' from '<...>te-packages/autopep8.py'>
data_download            str              http://openslide.cs.cmu.e<...>nload/openslide-testdata/
dir_dict                 dict             n=48
dir_file                 str              http://openslide.cs.cmu.e<...>slide-testdata/index.json
fh                       TextIOWrapper    <_io.TextIOWrapper name='<...>ode='r' encoding='UTF-8'>
json                     module           <module 'json' from '/Lib<...>hon3.7/json/__init__.py'>
k                        str              Zeiss/Zeiss-4-Mosaic.zvi
k1                       dict             n=5
n_files                  int              48
openslide                module           <module 'openslide' from <...>s/openslide/__init__.py'>
os                       module           <module 'os

In [None]:
Mirax_target_dir = '../../DigiPath_MLTK_data/Mirax'

if os.path.isdir(Mirax_target_dir) == False:
    os.makedirs(Mirax_target_dir)
    
Mirax_data_files_list = []

for k, v in dir_dict.items():
    d_in_q = k.split('/')[0]
    if d_in_q == 'Mirax':
        print('%50s: %6i MB'%(k, v['size'] / MB))
        Mirax_data_files_list.append(os.path.join(data_download, k))
        
for file_to_download in Mirax_data_files_list:
    target_file_name = file_to_download.split('/')[-1]
    target_file_name = os.path.join(Mirax_target_dir, target_file_name)
    t0 = time.time()
    print(target_file_name)
    tuple_of_stuff = urllib.request.urlretrieve(file_to_download, target_file_name)
    print(tuple_of_stuff)
    print('%0.3f seconds'%(time.time() - t0))

In [5]:
os.listdir('../../DigiPath_MLTK_data/')

['.DS_Store',
 'module_test',
 'out_to_test',
 'Aperio',
 'bad_test_images',
 'Tiff']