# Programming for Data Analysis Project

### By Joanne Feeney

***

Data source: https://data.gov.ie/dataset/dublin-bonham-st-rainfall-data

The data is licenced by Met Éireann

***

For this project, I will synthesise the data contained in the above dataset. 

It consists of monthly rainfall records for a Met Éireann station at Bonham St. in Co. Dublin.

The data contains 170 rows. (The first 13 being descriptors of the data)

There are 7 columns of data included in the data set:

* year - Year
* month - Month
* rain - Precipitation Amount (mm)
* gdf - Greatest daily fall (mm)
* rd - Number of rain days (0.2mm or more)
* wd - Number of wet days (1.0 mm or more)	
* ind - Indicator

https://www.met.ie/climate/available-data/long-term-data-sets

There has not been any research into this particular dataset that I have been able to locate, so this notebook
will display assumptions that I am making and not from any other sources.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
from datetime import date, time, datetime

In [None]:
# Naming as df and reading it into python
df = pd.read_csv("Bonham_St._Rainfall.csv", skiprows=13)

In [None]:
df = df.dropna()

df.describe(include='all')

As you can see, there are multiple parts of this dataset which are missing information. I have chosen to remove these incomplete sections of the data.

https://www.met.ie/climate/available-data/long-term-data-sets

*"Although there are recognized uncertainties in the early record, it is concluded that the derived series offers valuable insights for understanding multi-decadal rainfall variability in Ireland, a sentinel location in northwest Europe and provides a firm basis for benchmarking other long-term records and future reconstructions."* Met Éireann, last accessed 04/12/23

In [None]:
# Pie chart of years survey active
Rain=df.year.value_counts()
Rain.plot(kind='pie',autopct="%.2f%%");

From the above pie chart, we can see that rainfall was recorded at Bonham St. station from 1942 to 1954.



In [None]:
df_1942 = df[0:12]
df_1943 = df[12:24]
df_1944 = df[24:36]
df_1945 = df[36:48]
df_1946 = df[48:60]
df_1947 = df[60:72]
df_1948 = df[72:84]
df_1949 = df[84:96]
df_1950 = df[96:108]
df_1951 = df[108:120]
df_1952 = df[120:132]
df_1953 = df[132:144]
df_1954 = df[144:156]

I know there is a much better way of assigning the year and months this data was recorded using time series but I just kept getting it wrong and an extensive google search would not allow me to find out how to assign time series to data that does not contain a day.

### Numbers in the below plots represent months of the year, 1 being January and 12 being December

In [None]:
Jan = 1
Feb = 2
Mar = 3
Apr = 4
May = 5
Jun = 6
Jul = 7
Aug = 8
Sep = 9
Oct = 10
Nov = 11
Dec = 12

In [None]:
# Scatterplot of rain in 1942
sns.scatterplot(df_1942, x="month", y="rain", color="red")
plt.title("Rain(mm) in 1942", size=20, color="black");

Taking an example of the first year that rainfall was recorded, we can see the rainfall from
month 1 (being January) to month 12 (being December).

In [None]:
# Scatterplot of rain in 1954
sns.scatterplot(df_1954, x="month", y="rain", color="red")
plt.title("Rain(mm) in 1954", size=20, color="black");

Taking an example of the final year that rainfall was recorded, we can see the rainfall from
month 1 (being January) to month 12 (being December) was quite different to 1942.

In [None]:
df_1945_1947 = df[36:72]

https://www.statology.org/seaborn-figure-size/

In [None]:
# Histogram of greatest avg rain from 1945-1947
sns.histplot(df_1945_1947, x ="gdf", color="black")
sns.set(rc={"figure.figsize":(19, 10)})
plt.title("Greatest daily fall(mm) 1945-1947", size=30, color="black");

Taking an example from January 1945 to December 1947, we can see via the above histogram that the greatest daily fall(gdf),
was almost always a unique floating point number and only 6 times across these two years was the exact same amount of rainfall seen in any one month.

In [None]:
# Bar plot of rain in 1954
sns.barplot(df_1954, x ="rain", y="month")
plt.title("Rain(mm) in 1954", size=30, color="black");

Taking another example of data from one particular year, namely 1954 (the final year the rainfall was recorded at this station), we can see that the rainfall from the first month of the year gradually increases right up until month 12 at the end of the year, where it is at its highest level.

In [None]:
# Bar plot of rain in 1942
sns.barplot(df_1942, x ="rain", y="month")
plt.title("Rain(mm) in 1942", size=30, color="black");

https://datatofish.com/string-to-integer-dataframe/

https://stackoverflow.com/questions/16729483/converting-strings-to-floats-in-a-dataframe

In [None]:
# For some reason I cannot convert the values to int or float points using some version of the below
# I have tried multiple times with different variations

# df['rain'] = df['rain'].astype('float64') 

In [None]:
sns.scatterplot(df_1942, x="month", y="gdf")
plt.title("Greatest daily fall(mm) in 1942", size=30, color="black");

Unfortunately, as I have been unsuccessful in changing the float columns from strings to numerical values, I cannot represent this data very well and cannot come to certain conclusions on it as the data being read into the scatter plots etc. is not output correctly.

***
# End