Digital TV Coverage By Postcode in the U.K.
Python
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
data
.gitattributes
.gitignore
ReadMe.md
converter.py

ReadMe.md

UK Digital TV Coverage

UK Digital TV Coverage data, and scripts for getting that data.

Getting the Data

  1. Download the Data:

    # sample code
    
    library(httr)
    library(rvest)
    
    for (i in 1:nrow(postcodes_ons)) {
    
    request <- httr::GET(paste0("http://www.digitaluk.co.uk/coveragechecker/main/display/detailed/",paste(strsplit(postcodes_ons$pcd[i], " ")[[1]], collapse="+"),"/NA/0"))
    
    webpage <- html(request)
    if(length(webpage %>% html_nodes("#error-frame"))!=0)
    ...
    
    }
    
    
    • Given there are 2.5 million postcodes, run multiple instances. For instance, if a page takes 1 second to return, we need approximately 700 hours or nearly 29 days to download the data using a single instance. (See here for basic installs for initializing a R based scraper on Ubuntu.)
    
    nohup RScript downloader.R & 
    
    
  2. Concatenate all the error files and put all the html files in a single folder.

    cat *error > errors
    
    
  3. Parse the Data:

    • Run converter.py with the folder containing html files as the source folder (sample html files folder. The python script will produce output.csv (you can change the name of the ouput file.)

Data

  1. errors: All the postcodes for which no data are returned.
  2. output.csv (Harvard DVN Link): Data on the postcodes for which data are returned:
    • Postal code of the address: postal.code
    • Quality of TV Signal: quality.terrestrial.tv.signal
    • Transmitter name: transmitter.name
    • Transmitter region: transmitter.region
    • Digital services available through aerial: service.* (e.g. service.bt_vision, service.freeview) Data take values 0 and 1, indicating whether or not a service is available.
    • Channels available: channel.* (entertainment, hd, childrens, news, adult, text stream, radio.. etc.).

License

Scripts are released under the MIT License.