# How to obtain a water quality dataset for the location and period of interest 

This example notebook explains how to obtain the data from DEFRA to start looking at water quality in any region of interest in the UK.

In [1]:
import numpy as np
import sys
import pandas as pd
import os

## 1) Download necessary files 
#### for the nature of the dataset, this step cannot be automated

At https://environment.data.gov.uk/water-quality/view/download/new you can find all the data that DEFRA has collected for the water quality in the UK since 2000.
You will need to download the dataset year by year, for the time of interest. 
It is recommended to investigate each region separately since a vast amount of data has been collected.

Save the datasets in a directory.
In my case, I download the Solent and South Downs datasets and the Cornwall and Devon from 2000 to 2025, i.e. region code SSD and DC.

## 2) Create the dataset

In [9]:
dir_files = "./data/" # directory containing the datasets
start_year = 2000 
end_year = 2026
region_code = "DC" #change it for the region code of your region of interest


In [10]:
dfs = {} # initiate dictionary
df_to_conc = [] # initiate array

# Load the CSV into the dictionary & concatenate datasets

for year in range(start_year, end_year):
    file_path = f'{dir_files}{region_code}-{year}.csv'  
    dfs[year] = pd.read_csv(file_path, engine='python')
    df_to_conc.append(pd.read_csv(file_path, engine='python'))
    # Now that the datasets are loaded, you can remove the files.
    # This step is necessary if you want to work with github (files are too big to get pushed into the repository).
    # Comment out the next line if you want to keep the files on your machine
    #os.remove(file_path) 


In [11]:
df_water = pd.concat(df_to_conc, ignore_index=True) 



In [12]:
df_water

Unnamed: 0,@id,sample.samplingPoint,sample.samplingPoint.notation,sample.samplingPoint.label,sample.sampleDateTime,determinand.label,determinand.definition,determinand.notation,resultQualifier.notation,result,codedResultInterpretation.interpretation,determinand.unit.label,sample.sampledMaterialType.label,sample.isComplianceSample,sample.purpose.label,sample.samplingPoint.easting,sample.samplingPoint.northing
0,http://environment.data.gov.uk/water-quality/d...,http://environment.data.gov.uk/water-quality/i...,SW-70110104,RIVER LIM AT BEACH,2000-05-02T10:05:00,Temp Water,Temperature of Water,76,,10.000,,cel,RIVER / RUNNING SURFACE WATER,False,ENVIRONMENTAL MONITORING STATUTORY (EU DIRECTI...,334223,92129
1,http://environment.data.gov.uk/water-quality/d...,http://environment.data.gov.uk/water-quality/i...,SW-70110104,RIVER LIM AT BEACH,2000-05-02T10:05:00,StrepF PMF,Streptococci : Faecal : Presumptive : MF,6423,,290.000,,no/100ml,RIVER / RUNNING SURFACE WATER,False,ENVIRONMENTAL MONITORING STATUTORY (EU DIRECTI...,334223,92129
2,http://environment.data.gov.uk/water-quality/d...,http://environment.data.gov.uk/water-quality/i...,SW-70110104,RIVER LIM AT BEACH,2000-05-02T10:05:00,Cond @ 20C,Conductivity at 20 C,62,,421.000,,us/cm,RIVER / RUNNING SURFACE WATER,False,ENVIRONMENTAL MONITORING STATUTORY (EU DIRECTI...,334223,92129
3,http://environment.data.gov.uk/water-quality/d...,http://environment.data.gov.uk/water-quality/i...,SW-70110104,RIVER LIM AT BEACH,2000-05-02T10:05:00,F Coli Pre,"Coliforms, Faecal : Presumptive : MF",3461,,2600.000,,no/100ml,RIVER / RUNNING SURFACE WATER,False,ENVIRONMENTAL MONITORING STATUTORY (EU DIRECTI...,334223,92129
4,http://environment.data.gov.uk/water-quality/d...,http://environment.data.gov.uk/water-quality/i...,SW-70110104,RIVER LIM AT BEACH,2000-05-02T10:05:00,WethYdy Temp,Weather : Temperature : Previous 24 hours,3026,,4.000,,coded,RIVER / RUNNING SURFACE WATER,False,ENVIRONMENTAL MONITORING STATUTORY (EU DIRECTI...,334223,92129
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7308646,http://environment.data.gov.uk/water-quality/d...,http://environment.data.gov.uk/water-quality/i...,SW-SSN1632,SSN1632 DICKFORD WATER AT FURSDON BARN,2025-03-07T12:39:00,Oxygen Diss,"Oxygen, Dissolved as O2",9924,,11.100,,mg/l,RIVER / RUNNING SURFACE WATER,False,MONITORING (NATIONAL AGENCY POLICY),275241,83915
7308647,http://environment.data.gov.uk/water-quality/d...,http://environment.data.gov.uk/water-quality/i...,SW-SSN1632,SSN1632 DICKFORD WATER AT FURSDON BARN,2025-03-07T12:39:00,Flow Type,Flow Type,2896,,5.000,,coded,RIVER / RUNNING SURFACE WATER,False,MONITORING (NATIONAL AGENCY POLICY),275241,83915
7308648,http://environment.data.gov.uk/water-quality/d...,http://environment.data.gov.uk/water-quality/i...,SW-SSN1632,SSN1632 DICKFORD WATER AT FURSDON BARN,2025-03-07T12:39:00,Alky pH 4.5,Alkalinity to pH 4.5 as CaCO3,162,,22.000,,mg/l,RIVER / RUNNING SURFACE WATER,False,MONITORING (NATIONAL AGENCY POLICY),275241,83915
7308649,http://environment.data.gov.uk/water-quality/d...,http://environment.data.gov.uk/water-quality/i...,SW-SSN1632,SSN1632 DICKFORD WATER AT FURSDON BARN,2025-03-07T12:39:00,Orthophospht,"Orthophosphate, reactive as P",180,,0.014,,mg/l,RIVER / RUNNING SURFACE WATER,False,MONITORING (NATIONAL AGENCY POLICY),275241,83915


In [None]:
df_water.to_csv(dir_files+region_code+"WaterQuality.csv")
