We aim to observe temperature data for Blackford hill Edinburgh to establish if global warming exists as evidenced by the general tendency to increase in temperature

In [1]:
#import necessary libraries
import pandas as pd
import glob
from matplotlib import pyplot as plt

In [2]:
#Creating a temporary file name holder variable. We are doing this through the glob library
path = "edinburgh_temperatures"
all_csv_files = glob.glob(path + "/*.csv") #all_csv_files is now the temporary bucket holding all the file names in that specific folder that meet the specified criteria (in this case, have the .csv ending)

In [5]:
#We know that for each file, the first ninety lines and the very last line are useless so we want to get rid of them

li = [] # empty list into which we append all our data. Basically, we will extra the table data into a list format then we wil concatenate all these lists using a pandas method back into a dataframe (which is, technically, just a multidemsional array )
n = 1
for filename in all_csv_files:
    df = pd.read_csv(filename, index_col=None, header = 0, skiprows=90)
    df.drop(df.tail(n).index,inplace = True)# With this extra line, we expect the number of rows to move from 38950 to 28950-108 since we have 108 csv files, each we are removing the last line. When run, it reduces to 38842 which is exactly what we would want to have
    li.append(df) # This stores all the read data into the empty list that we created before

df = pd.concat(li, axis = 0, ignore_index=True) #To convert it back to a dataframe, we use the .concat method which is provided for in pandas. It is actually a dataframe datatype. A datatype specific to the pandas library and can only be manipulated using the library's methods. The alternative is to convert them into another format then you can manipulate them differently in the other format.


In [6]:
#Now to pick just the specific columns that we want to focus on and dropping all the rows that contain NaN data

df_2 = df.filter(["ob_end_time", "max_air_temp", "min_air_temp"], axis = 1) # Identifies the selected columns, filters them and pastes them to a new dataframe. What does the axis = 1 mean though?
df_2 = df_2.dropna(inplace = False) # deletes all the rows that contain a NaN value. It is important that the inplace is set to False otherwise the next step does not work

In [7]:
#Now to identify the maximum and minimum air temperature and their respective dates
maximum_temperature = df_2["max_air_temp"].max()
minimum_temperature = df_2["min_air_temp"].min()

print("The lowest temperature is:", minimum_temperature, "degrees celcius which was recorded on:", df_2["ob_end_time"].loc[df_2["min_air_temp"] == minimum_temperature])

print("The highest temperature is:", maximum_temperature, " degrees celcius which was recorded on:", df_2["ob_end_time"].loc[df_2["max_air_temp"] == maximum_temperature])

The lowest temperature is: -11.4 degrees celcius which was recorded on: 31392    1982-01-11 09:00:00
Name: ob_end_time, dtype: object
The highest temperature is: 29.8  degrees celcius which was recorded on: 29041    1975-08-05 09:00:00
Name: ob_end_time, dtype: object
