# How to obtain a water quality dataset for the location and period of interest 

This example notebook explains how to obtain the data from DEFRA to start looking at water quality in any region of interest in the UK.

In [1]:
import numpy as np
import sys
import pandas as pd
import os

## 1) Download necessary files 
#### for the nature of the dataset, this step cannot be automated

At https://environment.data.gov.uk/water-quality/view/download/new you can find all the data that DEFRA has collected for the water quality in the UK since 2000.
You will need to download the dataset year by year, for the time of interest. 
It is recommended to investigate each region separately since a vast amount of data has been collected.

Save the datasets in a directory.
In my case, I download the Solent and South Downs datasets and the Cornwall and Devon from 2000 to 2025, i.e. region code SSD and DC.

## 2) Create the dataset

In [2]:
dir_files = "./data/" # directory containing the datasets
start_year = 2000 
end_year = 2026
region_code = "SSD" #change it for the region code of your region of interest


In [3]:
dfs = {} # initiate dictionary
df_to_conc = [] # initiate array

# Load the CSV into the dictionary & concatenate datasets

for year in range(start_year, end_year):
    file_path = f'{dir_files}{region_code}-{year}.csv'  
    dfs[year] = pd.read_csv(file_path, engine='python')
    df_to_conc.append(pd.read_csv(file_path, engine='python'))
    # Now that the datasets are loaded, you can remove the files.
    # This step is necessary if you want to work with github (files are too big to get pushed into the repository).
    # Comment out the next line if you want to keep the files on your machine
    #os.remove(file_path) 


In [4]:
df_water = pd.concat(df_to_conc, ignore_index=True) 



In [5]:
df_water

Unnamed: 0,@id,sample.samplingPoint,sample.samplingPoint.notation,sample.samplingPoint.label,sample.sampleDateTime,determinand.label,determinand.definition,determinand.notation,resultQualifier.notation,result,codedResultInterpretation.interpretation,determinand.unit.label,sample.sampledMaterialType.label,sample.isComplianceSample,sample.purpose.label,sample.samplingPoint.easting,sample.samplingPoint.northing
0,http://environment.data.gov.uk/water-quality/d...,http://environment.data.gov.uk/water-quality/i...,SO-19801A08,APSLEY FARM BOREHOLE A,2000-03-22T10:30:00,BOD ATU,BOD : 5 Day ATU,85,<,3.00,,mg/l,UNCODED,False,WASTE MONITORING (AGENCY AUDIT - PERMIT),442350,146570
1,http://environment.data.gov.uk/water-quality/d...,http://environment.data.gov.uk/water-quality/i...,SO-19801A08,APSLEY FARM BOREHOLE A,2000-03-22T10:30:00,Ammonia(N),Ammoniacal Nitrogen as N,111,<,0.50,,mg/l,UNCODED,False,WASTE MONITORING (AGENCY AUDIT - PERMIT),442350,146570
2,http://environment.data.gov.uk/water-quality/d...,http://environment.data.gov.uk/water-quality/i...,SO-19801A08,APSLEY FARM BOREHOLE A,2000-03-22T10:30:00,pH,pH,61,,7.15,,phunits,UNCODED,False,WASTE MONITORING (AGENCY AUDIT - PERMIT),442350,146570
3,http://environment.data.gov.uk/water-quality/d...,http://environment.data.gov.uk/water-quality/i...,SO-19801A08,APSLEY FARM BOREHOLE A,2000-03-22T10:30:00,Chloride Ion,Chloride,172,,14.10,,mg/l,UNCODED,False,WASTE MONITORING (AGENCY AUDIT - PERMIT),442350,146570
4,http://environment.data.gov.uk/water-quality/d...,http://environment.data.gov.uk/water-quality/i...,SO-19801A08,APSLEY FARM BOREHOLE A,2000-03-22T10:30:00,COD as O2,Chemical Oxygen Demand :- {COD},92,,27.00,,mg/l,UNCODED,False,WASTE MONITORING (AGENCY AUDIT - PERMIT),442350,146570
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4280997,http://environment.data.gov.uk/water-quality/d...,http://environment.data.gov.uk/water-quality/i...,SO-Y0017589,R E YAR AT FORD FARM,2025-03-25T11:13:00,Alky pH 4.5,Alkalinity to pH 4.5 as CaCO3,162,,160.00,,mg/l,RIVER / RUNNING SURFACE WATER,False,MONITORING (NATIONAL AGENCY POLICY),451487,79385
4280998,http://environment.data.gov.uk/water-quality/d...,http://environment.data.gov.uk/water-quality/i...,SO-Y0017589,R E YAR AT FORD FARM,2025-03-25T11:13:00,pH,pH,61,,8.00,,phunits,RIVER / RUNNING SURFACE WATER,False,MONITORING (NATIONAL AGENCY POLICY),451487,79385
4280999,http://environment.data.gov.uk/water-quality/d...,http://environment.data.gov.uk/water-quality/i...,SO-Y0017593,STICKWORTH HALL STW,2025-01-20T10:10:00,Ammonia(N),Ammoniacal Nitrogen as N,111,,11.00,,mg/l,FINAL SEWAGE EFFLUENT,True,COMPLIANCE AUDIT (PERMIT),453860,85200
4281000,http://environment.data.gov.uk/water-quality/d...,http://environment.data.gov.uk/water-quality/i...,SO-Y0017593,STICKWORTH HALL STW,2025-04-02T12:40:00,BOD ATU,BOD : 5 Day ATU,85,,13.00,,mg/l,FINAL SEWAGE EFFLUENT,True,COMPLIANCE AUDIT (PERMIT),453860,85200


In [6]:
df_water.to_csv(dir_files+region_code+"WaterQualityData.csv")
