# MEI Introduction to Data Science
# Lesson 6 - Activity 1a

It is widely reported that the temperature has increased both worldwide and in the UK over the last 100 years (e.g. https://www.bbc.co.uk/news/science-environment-50976909). This activity uses the data from the Edexcel large data set and related data from the Met office website to explore how much the temperature has changed and whether this change is similar for different parts of the UK. Activity 1a and 1b demonstrate at least two iterations of a data science cycle.

* Run the code below to import pandas and matplotlib

In [None]:
# import pandas
import pandas as pd 

# import matplotlib
import matplotlib.pyplot as plt

## Problem (1)
Using the data from the large data set you could explore the following problem:
> *Was 2015 a hotter year than 1987?*

To explore this you could find the mean temperature for these two years at the given locations and display boxplots or time series for the temperatures.

## Getting the data (1)
* Run the code in the boxes below so that it imports the data for Hurn for 1987 and 2015

In [None]:
hurn_2015_data = pd.read_csv("../input/ldsedexcel/hurn-2015.csv")
hurn_2015_data.head()

In [None]:
hurn_1987_data = pd.read_csv('../input/ldsedexcel/hurn-1987.csv')
hurn_1987_data.head()

## Exploring the data (1)
The data gives the weather for the same range of days for each year (1st May to 31st October). To compare the temperatures you can create a new dataset with the temperature columns listed along with the date expressed as day/month (but not year).
* Run the code below to create the new dataset

In [None]:
# create a temporary array of the new data
# hurn_1987_data['Date'].str[:5] extracts the first 5 characters of the string for the date
new_data = {'shortdate': hurn_1987_data['Date'].str[:5], 'hurntemp1987' : hurn_1987_data['Daily Mean Temperature']}

# create the dataframe with these columns and the data
all_stations_data = pd.DataFrame (new_data, columns = ['shortdate','hurntemp1987'])

# diplay the top rows to check it has worked
all_stations_data.head()

* Run the code below to add the temperature for Hurn 2015

In [None]:
# add a new column for the 2015 Hurn data 
all_stations_data['hurntemp2015'] = hurn_2015_data['Daily Mean Temperature']

# display the head to check it has imported correctly
all_stations_data.head()

To explore this data you can use the `describe` command. You can also draw time series plot.
* Run the code in the boxes below to get a summary of the columns and display the plot

In [None]:
# print a summary of the temperature fields for Hurn 1987 and 2015
print(all_stations_data['hurntemp1987'].describe())
print(all_stations_data['hurntemp2015'].describe())

In [None]:
# display a time series plot for the temperatures using the shortdate as the x variable
all_stations_data.plot(x='shortdate', figsize=(12,5))
plt.show()

* Add code to the code boxes below to add columns to the all_stations_data dataset for 1987 and 2015 for at least two other UK weather stations

In [None]:
# import the 1987 data


# import the 2015 data


# add a new column for the 1987 data 


# add a new column for the 2015 data 


# display the head to check it has imported correctly
all_stations_data.head()

In [None]:
# print a summary of the temperature fields for 1987 and 2015


In [None]:
# display a time series plot for the temperatures using the shortdate as the x variable


**Checkpoint**
> How can the statistics and charts produced be used compare the temperatures for the two years?

## Analysing the data (1)
To answer the problem you could compare the means and standard deviations for the two years and draw a boxplot.
* Run the code below to compare the means and standard deviations and display a boxplot

In [None]:
# print the mean and standard deviation of the temperature for Hurn 1987
print("Hurn 1987 temperature mean: "+str(all_stations_data['hurntemp1987'].mean()))
print("Hurn 1987 temperature standard deviation: "+str(all_stations_data['hurntemp1987'].std()))

# print the mean and standard deviation of the temperature for Hurn 2015
print("Hurn 2015 temperature mean: "+str(all_stations_data['hurntemp2015'].mean()))
print("Hurn 2015 temperature standard deviation: "+str(all_stations_data['hurntemp2015'].std()))

In [None]:
# display a boxplot for the temperatures for 1987 and 2015
all_stations_data.boxplot(column = ['hurntemp1987','hurntemp2015'], vert=False,figsize=(12, 4))
plt.show()

* Add code to the code boxes below to so that it displays the mean, standard deviation and boxplots for the temperature at the other stations

In [None]:
# print the mean and standard deviation of the temperature for 1987


# print the mean and standard deviation of the temperature for 2015



In [None]:
# display a boxplot for the temperatures for 1987 and 2015


**Checkpoint**
> * Was the difference in temperature for the two years similar for the different weather stations? 
> * How do the statistics and charts help you answer this question?

## Communicating the results (1)
**Checkpoint**
> Use the results above to answer the initial problem: *Was 2015 a hotter year than 1987?*

## Problem (2)
The analysis above only explores the data for 2 different years from at most 5 different weather stations. There is data for more stations and more years at https://www.metoffice.gov.uk/research/climate/maps-and-data/historic-station-data. 

Using this you could rephrase the initial problem as:
> Has the temperature been higher at UK locations since 1990?

This is explored further in actvity 1b for this lesson: https://www.kaggle.com/tombutton/mei-ds-lesson-6-activity-1b/