# NHS Data Access - Patterns, Utilities and Useful Examples

This notebook includes various utilities and examples for accessing and working with NHS opendata datasets.

## Downloading and Unzipping Files

Several code examples for downloading and working with files.

### Downloading Files in a Linux Environment

One of the easiest ways to donwload files in a Linux environment, such as the environment that MyBinder applications run it, is to use the command line tools `wget` and `unzip`.

In [None]:
#Specify the URL from which the file will be downloaded
url='http://www.cqc.org.uk/sites/default/files/21_September_2016_CQC_directory.zip'

#Extract the full filename - split the URL on each / into a list and take the last (-1'th index) list item
fn=url.split('/')[-1] #21_September_2016_CQC_directory.zip
#Extract the first part of the filename - split the filename into a list on the . and take the first (0'th index) item
stub=fn.split('.')[0] #21_September_2016_CQC_directory

#Download the data from the CQC website
!wget -P downloads/ {url}

#Create a temporary download directory if it doesn't already exist
!mkdir -p tmp

#Remove any previous copies of the file
!rm -r tmp/{fn}

#Unzip the downloaded files into a subdirectory of the data folder, making sure the data dir exists first
#The -o flag is overkill - if we hadn't deleted the original folder it would overwirte any similar files
!unzip -o -d tmp/{fn}

#Create a data directory if it doesn't already exist
!mkdir -p tmp
#Move the unzipped csv file from the tmp directory to the data directory, renaming it to locations.csv as we do so
!rm data/locations.csv
!mv tmp/{stub}.csv  data/locations.csv

### Downloading and Accessing a File From a Zip Archive in Python

We can download a zip file and then extract files from it using Python commands.

In [None]:
#Create a function to grab a zip file from an online location and then grab a specified file from inside it
import requests, zipfile, StringIO
def zipgrabber(url, f):
    r = requests.get(url)
    z = zipfile.ZipFile(StringIO.StringIO(r.content))
    return z.open(f)

In [None]:
import pandas as pd

#Download URL
url='http://systems.hscic.gov.uk/data/ods/datadownloads/data-files/xls/epraccur.zip'

#zipgrabber(url,'epraccur.xls') extracts the epraccur.xls file from the zip archive
#The pandas ExcelFile() function then reads in the contents of the file
xl=pd.ExcelFile(zipgrabber(url,'epraccur.xls'))
xl.sheet_names