# Lake chemistry data: Preprocessing

**Author: Jakob Nyström, 5563**

This notebook contains code for preprocessing of the lake chemistry data from the Swedish lake monitoring program, hosted by SLU. We perform basic data cleaning and adjust / recalculate some of the variables for future analysis.

In [103]:
# Import required packages
import pandas as pd
import numpy as np
import pyproj

In [104]:
# Load Black for code formatting
import jupyter_black

jupyter_black.load(lab=False)

## 1. Load and inspect data

**Summary:** Data set contains 8974 observations and 52 columns. There are 110 lakes included, and there are between 87 and 34 measurements per lake. Some columns contain a lot of nulls, but most interesting variables based on initial discussions have good coverage.

In [105]:
# Load raw data
df_lake = pd.read_csv("../data/LakeChem 2001-2022 Surface Season cleaned.csv")
df_lake.head()

Unnamed: 0,MD-MVM Id,Nationellt övervakningsstations-ID,Övervakningsstation,Stationskoordinat N/X,Stationskoordinat E/Y,Län,Kommun,MS_CD C3,ProvId,Provdatum,...,Tot-P (µg/l P),Si (mg/l),Fe (µg/l),Al (µg/l),Al_s (µg/l),Syrgashalt (mg/l O2),Siktdjup (m),Siktdjup med kikare (m),Siktdjup utan kikare (m),Vattentemperatur (°C)
0,54,262403.0,Spjutsjön,6722638,524356,Dalarnas län,Falun,WA42559716,22480,2001-03-28,...,8.0,2.27,40.0,,85.0,,5.5,,,0.6
1,54,262403.0,Spjutsjön,6722638,524356,Dalarnas län,Falun,WA42559716,22481,2001-05-21,...,5.0,1.62,43.0,,80.0,,7.6,,,10.2
2,54,262403.0,Spjutsjön,6722638,524356,Dalarnas län,Falun,WA42559716,22482,2001-08-22,...,5.0,1.73,19.0,,45.0,,6.0,,,18.6
3,54,262403.0,Spjutsjön,6722638,524356,Dalarnas län,Falun,WA42559716,22483,2001-10-15,...,8.0,1.72,41.0,,50.0,,7.0,,,10.3
4,54,262403.0,Spjutsjön,6722638,524356,Dalarnas län,Falun,WA42559716,22484,2002-02-26,...,8.0,1.89,36.0,,55.0,,,,,1.5


In [106]:
# Inspect number of rows and columns
df_lake.shape

(8974, 52)

In [107]:
len(df_lake["MD-MVM Id"].unique())

110

In [110]:
# Create a list with all lake names
# Note that a few lakes have the same name, but different identifiers
list_of_lakes = df_lake["Övervakningsstation"].unique().tolist()
print(list_of_lakes)

['Spjutsjön', 'Edasjön', 'Mäsen', 'Hällsjön', 'Gipsjön', 'Översjön', 'Remmarsjön', 'Hällvattnet', 'Siggeforasjön', 'Bäen', 'Valkeajärvi', 'Valasjön', 'Skärsjön', 'Latnjajaure', 'Sännen', 'Svinarydsjön', 'Örsjön', 'Harasjön', 'Svartesjön', 'Stora Skärsjön', 'Fyrsjön', 'Älgarydssjön', 'Björken', 'Älgsjön', 'Rammsjön', 'Lilla Öresjön', 'Överudssjön', 'N. Yngern', 'Stora Envättern', 'Hjärtsjön', 'Fiolen', 'Storasjö', 'Ulvsjön', 'Bysjön', 'Dagarn', 'Övre Skärsjön', 'Hinnasjön', 'Limmingsjön', 'Granvattnet', 'Rotehogstjärnen', 'Hökesjön', 'Allgjuttern', 'Bergträsket', 'Brunnsjön', 'Tomeshultagölen', 'Stora Tresticklan', 'Brännträsket', 'St. Lummersjön', 'Vuolgamjaure', 'Stor-Arasjön', 'Öjsjön', 'Njalakjaure', 'Täftesträsket', 'Stora Gryten', 'Skärgölen', 'Grissjön', 'Alsjön', 'Degervattnet', 'Humsjön', 'Stor-Björsjön', 'Sangen', 'Stor-Backsjön', 'Västra Solsjön', 'Fräcksjön', 'Tärnan', 'Tängersjö', 'Fjärasjö', 'Louvvajaure', 'Jutsajaure', 'Övre Fjätsjön', 'Pahajärvi', 'Tväringen', 'Sidensjön

## 2. Data cleaning 

We rename columns to English names and drop columns that are not needed for any analysis or joins. We convert the sample date column to datetime format. Coordinates are converted to standard double decimal (DD) format. 

In [111]:
print(list(df_lake.columns))

['MD-MVM Id', 'Nationellt övervakningsstations-ID', 'Övervakningsstation', 'Stationskoordinat N/X', 'Stationskoordinat E/Y', 'Län', 'Kommun', 'MS_CD C3', 'ProvId', 'Provdatum', 'Provtagningsår', 'Provtagningsmånad', 'Provtagningsdag', 'SeasonType', 'Season', 'Season priority', 'Season cleaning', 'Max provdjup (m)', 'TOC (mg/l C)', 'DOC (mg/l C)', 'Tot-N_ps (µg/l N)', 'Tot-N_TNb (µg/l N)', 'Abs_F 254 (/5cm)', 'Abs_F 365 (/5cm)', 'Abs_F 420 (/5cm)', 'Abs_F 436 (/m)', 'Abs_OF 420 (/5cm)', 'Turb_FNU (FNU)', 'Kfyll (µg/l)', 'pH', 'Kond_25 (mS/m)', 'Alk/Acid (mekv/l)', 'Ca (mekv/l)', 'Mg (mekv/l)', 'Na (mekv/l)', 'K (mekv/l)', 'SO4 (mekv/l)', 'Cl (mekv/l)', 'F (mekv/l)', 'NH4-N (µg/l N)', 'NO2+NO3-N (µg/l N)', 'PO4-P (µg/l P)', 'Tot-P (µg/l P)', 'Si (mg/l)', 'Fe (µg/l)', 'Al (µg/l)', 'Al_s (µg/l)', 'Syrgashalt (mg/l O2)', 'Siktdjup (m)', 'Siktdjup med kikare (m)', 'Siktdjup utan kikare (m)', 'Vattentemperatur (°C)']


### 2.1. Drop unwanted columns

In [112]:
# Columns that will not be used for any analysis
cols_to_drop = [
    "Nationellt övervakningsstations-ID",
    "ProvId",
    "Season priority",
    "Season cleaning",
    "Abs_F 365 (/5cm)",
    "Abs_F 436 (/m)",
    "Abs_OF 420 (/5cm)",
]

# Drop these columns from the dataframe
df_lake = df_lake.drop(cols_to_drop, axis="columns")
df_lake.head()

Unnamed: 0,MD-MVM Id,Övervakningsstation,Stationskoordinat N/X,Stationskoordinat E/Y,Län,Kommun,MS_CD C3,Provdatum,Provtagningsår,Provtagningsmånad,...,Tot-P (µg/l P),Si (mg/l),Fe (µg/l),Al (µg/l),Al_s (µg/l),Syrgashalt (mg/l O2),Siktdjup (m),Siktdjup med kikare (m),Siktdjup utan kikare (m),Vattentemperatur (°C)
0,54,Spjutsjön,6722638,524356,Dalarnas län,Falun,WA42559716,2001-03-28,2001,3,...,8.0,2.27,40.0,,85.0,,5.5,,,0.6
1,54,Spjutsjön,6722638,524356,Dalarnas län,Falun,WA42559716,2001-05-21,2001,5,...,5.0,1.62,43.0,,80.0,,7.6,,,10.2
2,54,Spjutsjön,6722638,524356,Dalarnas län,Falun,WA42559716,2001-08-22,2001,8,...,5.0,1.73,19.0,,45.0,,6.0,,,18.6
3,54,Spjutsjön,6722638,524356,Dalarnas län,Falun,WA42559716,2001-10-15,2001,10,...,8.0,1.72,41.0,,50.0,,7.0,,,10.3
4,54,Spjutsjön,6722638,524356,Dalarnas län,Falun,WA42559716,2002-02-26,2002,2,...,8.0,1.89,36.0,,55.0,,,,,1.5


### 2.2. Rename columns to English

In [113]:
# Columns to be renamed
column_mapper = {
    "Övervakningsstation": "Survey station",
    "Stationskoordinat N/X": "Latitude",
    "Stationskoordinat E/Y": "Longitude",
    "Län": "County",
    "Kommun": "Municipality",
    "Provdatum": "Sample date",
    "Provtagningsår": "Sample year",
    "Provtagningsmånad": "Sample month",
    "Provtagningsdag": "Sample day",
    "Max provdjup (m)": "Max sample depth (m)",
    "Kfyll (µg/l)": "C_phyll (µg/l)",
    "Kond_25 (mS/m)": "Cond_25 (mS/m)",
    "Syrgashalt (mg/l O2)": "Oxygen (mg/l O2)",
    "Siktdjup (m)": "Secchi depth (m)",
    "Siktdjup med kikare (m)": "Secchi depth binoculars (m)",
    "Siktdjup utan kikare (m)": "Secchi depth no binoculars (m)",
    "Vattentemperatur (°C)": "Water temp (°C)",
}

# Rename these columns and check result
df_lake = df_lake.rename(columns=column_mapper)
df_lake.head()

Unnamed: 0,MD-MVM Id,Survey station,Latitude,Longitude,County,Municipality,MS_CD C3,Sample date,Sample year,Sample month,...,Tot-P (µg/l P),Si (mg/l),Fe (µg/l),Al (µg/l),Al_s (µg/l),Oxygen (mg/l O2),Secchi depth (m),Secchi depth binoculars (m),Secchi depth no binoculars (m),Water temp (°C)
0,54,Spjutsjön,6722638,524356,Dalarnas län,Falun,WA42559716,2001-03-28,2001,3,...,8.0,2.27,40.0,,85.0,,5.5,,,0.6
1,54,Spjutsjön,6722638,524356,Dalarnas län,Falun,WA42559716,2001-05-21,2001,5,...,5.0,1.62,43.0,,80.0,,7.6,,,10.2
2,54,Spjutsjön,6722638,524356,Dalarnas län,Falun,WA42559716,2001-08-22,2001,8,...,5.0,1.73,19.0,,45.0,,6.0,,,18.6
3,54,Spjutsjön,6722638,524356,Dalarnas län,Falun,WA42559716,2001-10-15,2001,10,...,8.0,1.72,41.0,,50.0,,7.0,,,10.3
4,54,Spjutsjön,6722638,524356,Dalarnas län,Falun,WA42559716,2002-02-26,2002,2,...,8.0,1.89,36.0,,55.0,,,,,1.5


In [114]:
print(list(df_lake.columns))

['MD-MVM Id', 'Survey station', 'Latitude', 'Longitude', 'County', 'Municipality', 'MS_CD C3', 'Sample date', 'Sample year', 'Sample month', 'Sample day', 'SeasonType', 'Season', 'Max sample depth (m)', 'TOC (mg/l C)', 'DOC (mg/l C)', 'Tot-N_ps (µg/l N)', 'Tot-N_TNb (µg/l N)', 'Abs_F 254 (/5cm)', 'Abs_F 420 (/5cm)', 'Turb_FNU (FNU)', 'C_phyll (µg/l)', 'pH', 'Cond_25 (mS/m)', 'Alk/Acid (mekv/l)', 'Ca (mekv/l)', 'Mg (mekv/l)', 'Na (mekv/l)', 'K (mekv/l)', 'SO4 (mekv/l)', 'Cl (mekv/l)', 'F (mekv/l)', 'NH4-N (µg/l N)', 'NO2+NO3-N (µg/l N)', 'PO4-P (µg/l P)', 'Tot-P (µg/l P)', 'Si (mg/l)', 'Fe (µg/l)', 'Al (µg/l)', 'Al_s (µg/l)', 'Oxygen (mg/l O2)', 'Secchi depth (m)', 'Secchi depth binoculars (m)', 'Secchi depth no binoculars (m)', 'Water temp (°C)']


### 2.3. Inspect data types and convert to datetime

In [115]:
# Get data types for all columns
df_lake.dtypes

MD-MVM Id                           int64
Survey station                     object
Latitude                            int64
Longitude                           int64
County                             object
Municipality                       object
MS_CD C3                           object
Sample date                        object
Sample year                         int64
Sample month                        int64
Sample day                          int64
SeasonType                          int64
Season                              int64
Max sample depth (m)              float64
TOC (mg/l C)                      float64
DOC (mg/l C)                      float64
Tot-N_ps (µg/l N)                 float64
Tot-N_TNb (µg/l N)                float64
Abs_F 254 (/5cm)                  float64
Abs_F 420 (/5cm)                  float64
Turb_FNU (FNU)                    float64
C_phyll (µg/l)                    float64
pH                                float64
Cond_25 (mS/m)                    

In [116]:
# Cast sample time to datetime object
df_lake["Sample date"] = pd.to_datetime(df_lake["Sample date"])

### 2.4. Convert coordinates

The coordinates in the original data set are given in SWEREF99 format, which is a coordinate system specific to Sweden. In order to use python packages for coordinate data, we need to convert these to standard double decimal format.

In [117]:
def convert_coordinates_to_dd(df):
    """
    Convert coordinates from SWEREF99 to standard double decimal format.

    Args:
        df: Dataframe containing 'Latitude' and 'Longitude' columns
            with coordinates in SWEREF99 format.

    Returns:
        df: Dataframe with coordinates in standard format.
    """
    df_copy = df.copy()

    # Create transformer object, specifying the from-to standards for conversion
    transformer = pyproj.Transformer.from_crs(crs_from="EPSG:3006", crs_to="EPSG:4326")

    # Generate tuples of transformed coordinates and update dataframe
    df_copy["Latitude"], df_copy["Longitude"] = transformer.transform(
        df_copy["Latitude"], df_copy["Longitude"]
    )
    return df_copy

In [118]:
# Run coordinate transformation
df_lake = convert_coordinates_to_dd(df_lake)
df_lake.head()

Unnamed: 0,MD-MVM Id,Survey station,Latitude,Longitude,County,Municipality,MS_CD C3,Sample date,Sample year,Sample month,...,Tot-P (µg/l P),Si (mg/l),Fe (µg/l),Al (µg/l),Al_s (µg/l),Oxygen (mg/l O2),Secchi depth (m),Secchi depth binoculars (m),Secchi depth no binoculars (m),Water temp (°C)
0,54,Spjutsjön,60.638793,15.445276,Dalarnas län,Falun,WA42559716,2001-03-28,2001,3,...,8.0,2.27,40.0,,85.0,,5.5,,,0.6
1,54,Spjutsjön,60.638793,15.445276,Dalarnas län,Falun,WA42559716,2001-05-21,2001,5,...,5.0,1.62,43.0,,80.0,,7.6,,,10.2
2,54,Spjutsjön,60.638793,15.445276,Dalarnas län,Falun,WA42559716,2001-08-22,2001,8,...,5.0,1.73,19.0,,45.0,,6.0,,,18.6
3,54,Spjutsjön,60.638793,15.445276,Dalarnas län,Falun,WA42559716,2001-10-15,2001,10,...,8.0,1.72,41.0,,50.0,,7.0,,,10.3
4,54,Spjutsjön,60.638793,15.445276,Dalarnas län,Falun,WA42559716,2002-02-26,2002,2,...,8.0,1.89,36.0,,55.0,,,,,1.5


### 2.5. Combine Total N columns

There are two Tot-N (total concentration of nitrogen) columns based on two different calculation methods, depending on when the sample was taken / analyzed (they give approximately the same result). We combine these into one column.

In [119]:
# Combine old and new way of calculating Tot-N, by first checking if the
# new method value exists, otherwise using old method as fallback
df_lake["Tot-N (µg/l N)"] = np.where(
    df_lake["Tot-N_TNb (µg/l N)"].notnull(),
    df_lake["Tot-N_TNb (µg/l N)"],
    df_lake["Tot-N_ps (µg/l N)"],
)

# Drop the old columns
df_lake = df_lake.drop(["Tot-N_TNb (µg/l N)", "Tot-N_ps (µg/l N)"], axis="columns")
df_lake.head()

Unnamed: 0,MD-MVM Id,Survey station,Latitude,Longitude,County,Municipality,MS_CD C3,Sample date,Sample year,Sample month,...,Si (mg/l),Fe (µg/l),Al (µg/l),Al_s (µg/l),Oxygen (mg/l O2),Secchi depth (m),Secchi depth binoculars (m),Secchi depth no binoculars (m),Water temp (°C),Tot-N (µg/l N)
0,54,Spjutsjön,60.638793,15.445276,Dalarnas län,Falun,WA42559716,2001-03-28,2001,3,...,2.27,40.0,,85.0,,5.5,,,0.6,409.0
1,54,Spjutsjön,60.638793,15.445276,Dalarnas län,Falun,WA42559716,2001-05-21,2001,5,...,1.62,43.0,,80.0,,7.6,,,10.2,360.0
2,54,Spjutsjön,60.638793,15.445276,Dalarnas län,Falun,WA42559716,2001-08-22,2001,8,...,1.73,19.0,,45.0,,6.0,,,18.6,195.0
3,54,Spjutsjön,60.638793,15.445276,Dalarnas län,Falun,WA42559716,2001-10-15,2001,10,...,1.72,41.0,,50.0,,7.0,,,10.3,383.0
4,54,Spjutsjön,60.638793,15.445276,Dalarnas län,Falun,WA42559716,2002-02-26,2002,2,...,1.89,36.0,,55.0,,,,,1.5,385.0


### 2.6. Adjust absorbance measures

We are interested in absorbance at different wavelengths as dependent variables. We adjust absorbance at 420 nm to be per m instead of 5 cm. We then calculate a specific absorbance measure called `SUVA 254` based on the absorbance at 254 nm.

In [120]:
# Adjust Abs 420 to be per m
df_lake["Abs_F 420 (/m)"] = df_lake["Abs_F 420 (/5cm)"] * 20
df_lake = df_lake.drop("Abs_F 420 (/5cm)", axis="columns")

In [121]:
# Function to do the SUVA 254 calculation
def convert_to_suva254(df):
    """
    Calculate SUVA 254 based on absorbance at 254 nm.

    Args:
        df: Dataframe containing columns 'Abs_F 254 (/5cm)', 'Fe (µg/l)',
            and 'TOC (mg/l C)'.

    Returns:
        df: Dataframe with additional column 'SUVA_254 (m*l/mg)' containing
            calculated SUVA 254 values.
    """
    df = df.copy()

    # Extract base absorbance
    abs_254 = df["Abs_F 254 (/5cm)"]

    # Convert absorbance to per 1 cm
    abs_254 = abs_254 / 5

    # Correct for Fe absorbance (incl. convert microgram to milligram)
    abs_254 = abs_254 - 0.0653 * (df["Fe (µg/l)"] / 1000) + 0.002

    # Divide by TOC concentration and convert to m
    abs_254 = (abs_254 * 100) / df["TOC (mg/l C)"]

    df["SUVA_254 (m*l/mg)"] = abs_254

    return df

In [122]:
df_lake = convert_to_suva254(df_lake)
df_lake.head()

Unnamed: 0,MD-MVM Id,Survey station,Latitude,Longitude,County,Municipality,MS_CD C3,Sample date,Sample year,Sample month,...,Al (µg/l),Al_s (µg/l),Oxygen (mg/l O2),Secchi depth (m),Secchi depth binoculars (m),Secchi depth no binoculars (m),Water temp (°C),Tot-N (µg/l N),Abs_F 420 (/m),SUVA_254 (m*l/mg)
0,54,Spjutsjön,60.638793,15.445276,Dalarnas län,Falun,WA42559716,2001-03-28,2001,3,...,,85.0,,5.5,,,0.6,409.0,1.06,
1,54,Spjutsjön,60.638793,15.445276,Dalarnas län,Falun,WA42559716,2001-05-21,2001,5,...,,80.0,,7.6,,,10.2,360.0,1.02,
2,54,Spjutsjön,60.638793,15.445276,Dalarnas län,Falun,WA42559716,2001-08-22,2001,8,...,,45.0,,6.0,,,18.6,195.0,0.58,
3,54,Spjutsjön,60.638793,15.445276,Dalarnas län,Falun,WA42559716,2001-10-15,2001,10,...,,50.0,,7.0,,,10.3,383.0,0.7,
4,54,Spjutsjön,60.638793,15.445276,Dalarnas län,Falun,WA42559716,2002-02-26,2002,2,...,,55.0,,,,,1.5,385.0,0.7,


In [123]:
# Check that the calculated metric seems reasonable
df_lake["SUVA_254 (m*l/mg)"].mean()

2.9066259203061526

### 2.7. Calculate ratios for dependent variables

Additionally, we calculate two different ratios for the analysis

- TOC (total organic carbon) : TON (total organic nitrogen)
- TOC : TOP (total organic phosphorus)

We also create separate columns for organic and inorganic N / P.

In [124]:
# Function for calculating the TOC:TON ratio
def calculate_toc_ton_ratio(df):
    """
    Calculate the total organic carbon (TOC): total organic nitrogen
    TON ratio in mol/l. Additionally, create two separate columns for
    organic and inorganic N in their original units.

    Args:
        df: Dataframe containing columns 'TOC (mg/l C)', 'Tot-N (µg/l N)',
            'NH4-N (µg/l N)', and 'NO2+NO3-N (µg/l N)'.

    Returns:
        df: Dataframe with additional column 'TOC:TON (mol/l)' containing
            calculated TOC:TON ratios, and columns for organic and
            inorganic N.
    """
    df = df.copy()
    toc = df["TOC (mg/l C)"]

    # Convert TOC to mol/l
    toc = (toc / 1000) / 12.011

    # Calculate TON
    ton = df["Tot-N (µg/l N)"] - df["NH4-N (µg/l N)"] - df["NO2+NO3-N (µg/l N)"]
    df["Organic N (µg/l N)"] = ton
    df["Inorganic N (µg/l N)"] = df["Tot-N (µg/l N)"] - ton

    # Convert TON to mol/l
    ton = (ton / 1000000) / 28.02

    # Calculate the ratio
    df["TOC:TON (mol/l)"] = toc / ton

    return df

In [125]:
# Function for calculating the TOC:TOP ratio
def calculate_toc_top_ratio(df):
    """
    Calculate the Total Organic Carbon (TOC): Total Organic Phosphorus (TOP)
    ratio in mol/l. Additionally, create two separate columns for
    organic and inorganic P in their original units.

    Args:
        df: Dataframe containing columns 'TOC (mg/l C)', 'Tot-P (µg/l P)',
                and 'PO4-P (µg/l P)'.

    Returns:
        df: Dataframew with additional column 'TOC:TOP (mol/l)' containing
            calculated TOC:TOP ratios, and columns for organic and
            inorganic P.
    """

    df = df.copy()
    toc = df["TOC (mg/l C)"]

    # Convert TOC to mol/l
    toc = (toc / 1000) / 12.011

    # Calculate TOP
    top = df["Tot-P (µg/l P)"] - df["PO4-P (µg/l P)"]
    df["Organic P (µg/l P)"] = top
    df["Inorganic P (µg/l P)"] = df["PO4-P (µg/l P)"]

    # Convert TOP to mol/l
    top = (top / 1000000) / 123.88

    # Calculate the ratio
    df["TOC:TOP (mol/l)"] = toc / top

    return df

In [126]:
# Calculate the ratios and return updated dataframes
df_lake = calculate_toc_ton_ratio(df_lake)
df_lake = calculate_toc_top_ratio(df_lake)
df_lake.head()

Unnamed: 0,MD-MVM Id,Survey station,Latitude,Longitude,County,Municipality,MS_CD C3,Sample date,Sample year,Sample month,...,Water temp (°C),Tot-N (µg/l N),Abs_F 420 (/m),SUVA_254 (m*l/mg),Organic N (µg/l N),Inorganic N (µg/l N),TOC:TON (mol/l),Organic P (µg/l P),Inorganic P (µg/l P),TOC:TOP (mol/l)
0,54,Spjutsjön,60.638793,15.445276,Dalarnas län,Falun,WA42559716,2001-03-28,2001,3,...,0.6,409.0,1.06,,201.0,208.0,61.513265,7.0,1.0,7809.079772
1,54,Spjutsjön,60.638793,15.445276,Dalarnas län,Falun,WA42559716,2001-05-21,2001,5,...,10.2,360.0,1.02,,256.0,104.0,37.362236,4.0,1.0,10571.725918
2,54,Spjutsjön,60.638793,15.445276,Dalarnas län,Falun,WA42559716,2001-08-22,2001,8,...,18.6,195.0,0.58,,185.0,10.0,107.18553,4.0,1.0,21916.992757
3,54,Spjutsjön,60.638793,15.445276,Dalarnas län,Falun,WA42559716,2001-10-15,2001,10,...,10.3,383.0,0.7,,353.0,30.0,28.417294,6.0,2.0,7391.613243
4,54,Spjutsjön,60.638793,15.445276,Dalarnas län,Falun,WA42559716,2002-02-26,2002,2,...,1.5,385.0,0.7,,203.0,182.0,55.161258,7.0,1.0,7072.374133


## 3. Save processed data

Save the processed data as a new csv file on the GitHub repository (or overwrite the existing file with the same name).

In [127]:
# Save the file in the data folder
df_lake.to_csv("../data/lake_chem_data_clean.csv", index=False)