Analysis the wind speed around the country with a view to windfarms

• You may look for your own source of historic weather information, and/or use the Met Eireann one (Historical Data - Met Éireann - The Irish Meteorological Service). Click on the download button to get a zip file that contains a CSV file.
• You may need to clean and normalize the data before doing analysis
• Questions you can ask:
o How much wind power is there at a particular location?
▪ This is quite open ended, is this just the mean wind speed for an hour/day/month/year, or should you take into account that
there are wind ranges that the windfarms can operate in. (min max speeds)
▪ Some analysis of what power when would be useful (time of day/year)
o Are the wind speeds likely to be the same in 10 years in the future? ie is there a trend in recorded wind speeds over the last few decades.
o Is there any other weather metric worth analyzing (eg rain, temp)
o What will the power output of the windfarms in Ireland be like next week, according to the weather forecasts? (ok that is a tricky one, because you would need to get, or make up, information about the size and locations of the wind farms in Ireland, one find/makeup the windspeed to power output equation.



In [2]:
import pandas as pd

In [3]:
from pathlib import Path

In [7]:
def file_with_station(file,station):

    # Read file as text
    with open(file, "r", encoding="utf-8") as f:
        lines = f.readlines()

    # Find the row number where the data starts i.e. the row that contain 'date' in its first column
    header_row = None

    for i, line in enumerate(lines):
        # split on comma and strip spaces
        first_cell = line.split(",")[0].strip().lower()
        if first_cell == "date":
            header_row = i
            break
    
    # Read file as csv, delete uneccessary rows
    df = pd.read_csv(file, skiprows=header_row,low_memory=False)
    # modify the format of 'date'
    df['date'] = pd.to_datetime(df['date'], format='%d-%b-%Y %H:%M')
    # Add a column which takes the station name as value
    df["station"]= station
    # Add a column that contains only date details
    df['dateonly']= df['date'].dt.date
    # Add a column that contains only month-year
    df['yearmonth'] = pd.to_datetime(df['date']).dt.strftime('%Y-%m')
    # Add a column that contains only month
    df['month'] = pd.to_datetime(df['date']).dt.strftime('%m')
    # Save as a new file in folder stationdata
    df.to_csv(f"stationdata/{station}.csv")
    print(f"The file {station}.csv is now created.")
    return

In [11]:
file_with_station("data/hly275.csv","MACE HEAD")

The file MACE HEAD.csv is now created.


In [8]:
file_with_station("data/hly275.csv","MACE HEAD")
file_with_station("data/hly375.csv","OAK PARK")
file_with_station("data/hly518.csv","SHANNON AIRPORT")
file_with_station("data/hly532.csv","DUBLIN AIRPORT")
file_with_station("data/hly575.csv","MOORE PARK")
file_with_station("data/hly675.csv","BALLYHAISE")
file_with_station("data/hly775.csv","SHERKIN ISLAND")
file_with_station("data/hly875.csv","MULLINGAR")
file_with_station("data/hly1075.csv","ROCHES POINT")
file_with_station("data/hly1175.csv","NEWPORT")
file_with_station("data/hly1375.csv","DUNSANY")
file_with_station("data/hly1475.csv","GURTEEN")
file_with_station("data/hly1575.csv","MALIN HEAD")
file_with_station("data/hly1775.csv","JOHNSTOWN CASTLE 2")
file_with_station("data/hly1875.csv","ATHENRY")
file_with_station("data/hly1975.csv","MT DILLON")
file_with_station("data/hly2075.csv","FINNER")
file_with_station("data/hly2175.csv","CLAREMORRIS")
file_with_station("data/hly2275.csv","VALENTIA OBSERVATORY")
file_with_station("data/hly2375.csv","BELMULLET")
file_with_station("data/hly3723.csv","CASEMENT")
file_with_station("data/hly3904.csv","CORK AIRPORT")
file_with_station("data/hly4935.csv","KNOCK AIRPORT")

The file MACE HEAD.csv is now created.
The file OAK PARK.csv is now created.
The file SHANNON AIRPORT.csv is now created.
The file DUBLIN AIRPORT.csv is now created.
The file MOORE PARK.csv is now created.
The file BALLYHAISE.csv is now created.
The file SHERKIN ISLAND.csv is now created.
The file MULLINGAR.csv is now created.
The file ROCHES POINT.csv is now created.
The file NEWPORT.csv is now created.
The file DUNSANY.csv is now created.
The file GURTEEN.csv is now created.
The file MALIN HEAD.csv is now created.
The file JOHNSTOWN CASTLE 2.csv is now created.
The file ATHENRY.csv is now created.
The file MT DILLON.csv is now created.
The file FINNER.csv is now created.
The file CLAREMORRIS.csv is now created.
The file VALENTIA OBSERVATORY.csv is now created.
The file BELMULLET.csv is now created.
The file CASEMENT.csv is now created.
The file CORK AIRPORT.csv is now created.
The file KNOCK AIRPORT.csv is now created.


In [11]:
df = pd.read_csv('stationdata/ATHENRY.csv')
pd.api.types.is_datetime64_any_dtype(df["date"])
df["date"] = pd.to_datetime(df["date"], errors="coerce")
pd.api.types.is_datetime64_any_dtype(df["date"])
df["date"].isna().any()

  df = pd.read_csv('stationdata/ATHENRY.csv')


np.False_

In [None]:
# Folder contains all files with stations in column
DATADIR = Path("stationdata")
#STATIONINFO = "stationinfo.csv"
STUDYDATETILL2025 = "studydate.csv"

# Create a list 'stationlist'
stationlist = []

# In each file in folder 'stationdata'
for file in DATADIR.glob('*.csv'):
    # Read only 2 columns 'station' and 'date'
    df = pd.read_csv(file, usecols=["station", "date"])
    # Remove rows where 'date' is missing
    #df = df[df["date"].notna()]
    # Convert the column 'date' to datetime
    df["date"] = pd.to_datetime(df["date"], errors="coerce")

    # Get the station's name by getting the first row value of column 'station'
    station_name = df["station"].iloc[0]
    # Get the start date of the station
    start_date = df["date"].min().strftime("%Y-%m-%d %H:%M:%S")
    # Get the end date of the station
    end_date = df["date"].max().strftime("%Y-%m-%d %H:%M:%S")
    
    # Write the infos above in 'stationlist'
    stationlist.append({
            "station": station_name,
            "startdate": start_date,
            "enddate": end_date
        })
# Save stationinfo.csv
stationinfo_df = pd.DataFrame(stationlist)

stationinfo_df.to_csv('stationinfo.csv', index=False)
print("stationinfo.csv is ready")


# Step 2: Generate datestudy.csv: the idea behind this step is to set the same start date and end date for all stations, 
# to make the comparison simpler for later?

# get the max start date among all stations
max_start = stationinfo_df["startdate"].max()
# get the max end date among all stations
max_end = stationinfo_df["enddate"].max()

# Data frame that contains all dates from max_start to max_end
all_dates = pd.date_range(start=max_start, end=max_end, freq="h")

# Save to datetill2025.csv
datestudy_df = pd.DataFrame({"date": all_dates})
datestudy_df.to_csv('datestudy.csv', index=False)
print("datestudy.csv is ready")

stationinfo.csv is ready
datestudy.csv is ready


In [16]:
stationinfo = pd.read_csv("stationinfo.csv")
# get the max start date among all stations
max_start = stationinfo["startdate"].max()

DATADIR = Path("stationdata")

# Create a list
sta_miss = []

for file in DATADIR.glob('*.csv'):
    # Read stations files
    station_df = pd.read_csv(file)

    # Keep only rows with dates in studydate.csv
    station_df = station_df[station_df["date"]>= max_start]
    
    # change the type on 'wdsp' to numeric, the missing value will be turned into NaN with parameter errors="coerce"
    station_df["wdsp"] = pd.to_numeric(station_df["wdsp"], errors="coerce")

    # count the number of missing values
    station_df["wdsp"].isna().sum()

    # Skip stations with no remaining data
    if station_df.empty:
        continue

    # Write the infos above in 'stationlist'
    sta_miss.append({
            "station": station_df["station"].iloc[0],
            "startdate": station_df["wdsp"].isna().sum()
        })
    
# Save stationinfo.csv
sta_miss_df = pd.DataFrame(sta_miss)
    # Save as a new file in folder stationdata
    #station_df.to_csv(file, index=False)
print(sta_miss_df)
    


  station_df = pd.read_csv(file)
  station_df = pd.read_csv(file)
  station_df = pd.read_csv(file)
  station_df = pd.read_csv(file)
  station_df = pd.read_csv(file)
  station_df = pd.read_csv(file)
  station_df = pd.read_csv(file)
  station_df = pd.read_csv(file)
  station_df = pd.read_csv(file)
  station_df = pd.read_csv(file)
  station_df = pd.read_csv(file)
  station_df = pd.read_csv(file)
  station_df = pd.read_csv(file)
  station_df = pd.read_csv(file)
  station_df = pd.read_csv(file)
  station_df = pd.read_csv(file)
  station_df = pd.read_csv(file)
  station_df = pd.read_csv(file)
  station_df = pd.read_csv(file)
  station_df = pd.read_csv(file)
  station_df = pd.read_csv(file)
  station_df = pd.read_csv(file)


                 station  startdate
0                DUNSANY         13
1              BELMULLET          1
2     JOHNSTOWN CASTLE 2         34
3            CLAREMORRIS        207
4               CASEMENT          4
5        SHANNON AIRPORT         12
6         DUBLIN AIRPORT          0
7           CORK AIRPORT          0
8           ROCHES POINT         18
9   VALENTIA OBSERVATORY         45
10            BALLYHAISE         28
11              OAK PARK          1
12             MULLINGAR         13
13            MOORE PARK          0
14         KNOCK AIRPORT          0
15             MACE HEAD         92
16             MT DILLON         62
17        SHERKIN ISLAND          5
18               NEWPORT         76
19            MALIN HEAD         52
20                FINNER       7421
21               ATHENRY         50
22               GURTEEN         35


  station_df = pd.read_csv(file)


In [3]:
df = pd.read_csv('datestudy.csv')
pd.api.types.is_datetime64_any_dtype(df["date"])

False

In [32]:
# change the type on 'wdsp' to numeric, the missing value will be turned into NaN with parameter errors="coerce"
df["wdsp"] = pd.to_numeric(df["wdsp"], errors="coerce")

# count the number of missing values
df["wdsp"].isna().sum()

np.int64(24)

In [33]:
# Set 'date' as index as interpolate based on actual time differences between index values
df = df.set_index('date')

# Interpolate missing windspeed (linear is best for meteorological data) -- I use AI to help me with this.
df['wdsp'] = df['wdsp'].interpolate(method='time', limit_direction='both', limit_area='inside')

# count the number of missing values
#df["wdsp"].isna().sum()

KeyError: "None of ['date'] are in the columns"