# Wind Energy Analysis in Ireland

Author: Philip Cullen

This notebook analyses historical wind speed data from Met Éireann
to explore wind energy potential and long-term trends across Ireland.

In [1]:
import os
os.listdir("data")

['mullingar.csv', 'valentia_obsrv.csv', 'malin_head.csv', 'dublin_airport.csv']

In [2]:
import pandas as pd

df = pd.read_csv(
    "data/malin_head.csv",
    sep=";",
    skiprows=22
)
df.head()

Unnamed: 0,"date,ind,rain,ind,temp,ind,wetb,dewpt,vappr,rhum,msl,ind,wdsp,ind,wddir,ww,w,sun,vis,clht,clamt"
0,"01-may-1955 01:00,0,0.0,0,8.5,0,7.2,5.5,9.1,83..."
1,"01-may-1955 02:00,0,0.0,0,8.2,0,7.2,6.1,9.3,85..."
2,"01-may-1955 03:00,3,0.0,0,7.3,0,6.6,5.5,9.3,91..."
3,"01-may-1955 04:00,3,0.0,0,7.8,0,7.2,6.1,9.6,91..."
4,"01-may-1955 05:00,0,0.0,0,8.1,0,7.3,6.6,9.7,89..."


In [3]:
df.columns

Index(['date,ind,rain,ind,temp,ind,wetb,dewpt,vappr,rhum,msl,ind,wdsp,ind,wddir,ww,w,sun,vis,clht,clamt'], dtype='object')

## 2. Data Loading and Initial Inspection

This section involves the loading and initial inspection of the Met Éireann weather data.

This includes:

- Identifying the header structure used in the raw CSV files

- Verifying that the data loads correctly into pandas DataFrames

- Confirming column names and data types

- Checking dataset size

- Inspecting for missing or malformed values

- Verifying station identifiers after merging multiple datasets

This step ensures the data is correctly structured and suitable for further cleaning, analysis, and visualisation in later stages of the project.

In [4]:
import pandas as pd
import numpy as np

In [5]:
def load_station(path, station_name, skiprows=0):
    df = pd.read_csv(path, sep=";", skiprows=skiprows)
    df["station"] = station_name
    return df

In [6]:
malin = load_station(
    "data/malin_head.csv",
    station_name="Malin Head",
    skiprows=23
)

malin.head()

Unnamed: 0,"date,ind,rain,ind,temp,ind,wetb,dewpt,vappr,rhum,msl,ind,wdsp,ind,wddir,ww,w,sun,vis,clht,clamt",station
0,"01-may-1955 01:00,0,0.0,0,8.5,0,7.2,5.5,9.1,83...",Malin Head
1,"01-may-1955 02:00,0,0.0,0,8.2,0,7.2,6.1,9.3,85...",Malin Head
2,"01-may-1955 03:00,3,0.0,0,7.3,0,6.6,5.5,9.3,91...",Malin Head
3,"01-may-1955 04:00,3,0.0,0,7.8,0,7.2,6.1,9.6,91...",Malin Head
4,"01-may-1955 05:00,0,0.0,0,8.1,0,7.3,6.6,9.7,89...",Malin Head


In [7]:
valentia = load_station("data/valentia_obsrv.csv", "Valentia", skiprows=23)
dublin = load_station("data/dublin_airport.csv", "Dublin Airport", skiprows=23)
mullingar = load_station("data/mullingar.csv", "Mullingar", skiprows=17)

In [8]:
malin.columns


Index(['date,ind,rain,ind,temp,ind,wetb,dewpt,vappr,rhum,msl,ind,wdsp,ind,wddir,ww,w,sun,vis,clht,clamt', 'station'], dtype='object')

In [9]:
valentia.columns

Index(['date,ind,rain,ind,temp,ind,wetb,dewpt,vappr,rhum,msl,ind,wdsp,ind,wddir,ww,w,sun,vis,clht,clamt', 'station'], dtype='object')

In [10]:
dublin.columns

Index(['date,ind,rain,ind,temp,ind,wetb,dewpt,vappr,rhum,msl,ind,wdsp,ind,wddir,ww,w,sun,vis,clht,clamt', 'station'], dtype='object')

In [11]:
mullingar.columns

Index(['date,ind,rain,ind,temp,ind,wetb,dewpt,vappr,rhum,msl,ind,wdsp,ind,wddir', 'station'], dtype='object')

### Data Overview

- Daily historical weather data from Met Éireann
- Four stations: two coastal, two inland
- Data includes wind speed and additional meteorological variables
- Files contain metadata rows and use semicolon delimiters, requiring preprocessing

Initial inspection revealed that the Met Éireann CSV files include
metadata rows and non-standard headers, requiring additional preprocessing
before analysis.

## 3. Data Cleaning and Standardisation

This section resolves the formatting issues identified during initial
inspection, including correcting headers, standardising column names,
and preparing the data for analysis.

Some columns contain mixed data types due to missing values and coded entries in the raw Met Éireann files. These are handled explicitly during data cleaning and type conversion.

In [12]:
import warnings

In [13]:
warnings.filterwarnings("ignore")

In [14]:
def load_and_clean_station(path, station_name, header_row):
    df = pd.read_csv(path, header=header_row)

    # Add station label
    df["station"] = station_name

    # Keep only required columns
    df = df[["date", "wdsp", "station"]]

    # Convert datatypes
    df["date"] = pd.to_datetime(df["date"], errors="coerce")
    df["wdsp"] = pd.to_numeric(df["wdsp"], errors="coerce")

    # Drop rows with missing core values
    df = df.dropna(subset=["date", "wdsp"])

    return df

In [15]:
malin = load_and_clean_station("data/malin_head.csv", "Malin Head", 20)
valentia = load_and_clean_station("data/valentia_obsrv.csv", "Valentia", 20)
dublin = load_and_clean_station("data/dublin_airport.csv", "Dublin Airport", 20)
mullingar = load_and_clean_station("data/mullingar.csv", "Mullingar", 14)

df = pd.concat([malin, valentia, dublin, mullingar], ignore_index=True)

In [16]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2493614 entries, 0 to 2493613
Data columns (total 3 columns):
 #   Column   Dtype         
---  ------   -----         
 0   date     datetime64[ns]
 1   wdsp     float64       
 2   station  object        
dtypes: datetime64[ns](1), float64(1), object(1)
memory usage: 57.1+ MB


In [17]:
df.head()
df["station"].value_counts()

station
Dublin Airport    709297
Valentia          709251
Malin Head        618716
Mullingar         456350
Name: count, dtype: int64