# Python Code

# Exercises For Session 2 - Fixing the Code

The code below has been broken! Your job is to:

1. Read through all of the code and steps. You do not have to understand all of the code, but you should read the comments and understand roughly what each box does.
1. Use the short bits of code in the "Code Snippets" box below to fix the rest of the code in this notebook. Look for the rows of hyphens (------) to see where to put your code. *N.B*- The # in the code are used to stop each line from running and should *not* be copied!

Good luck!

**Code Snippets:**

In [None]:
# 
# 
# 
# 
# 
# 
# 
# 
# 
# 

# Step 1: Loading a file and preparing the data

Here, we repeat the setup steps from the last lesson. We need to import all of the *libraries* used to do useful tasks, such as plotting graphs.

In [None]:
import matplotlib.pyplot as plt
import numpy as np #Numpy allows us to perform complex mathematical processes quickly
import pandas as pd #Pandas is another useful set of tools for statistics
import datetime
        
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

In [None]:
#load the bike hire data from the CSV file
data = pd.read_csv("/kaggle/input/london-bike-hire/bike_hire.csv")

# Prepare the data - convert the timestamp string into a date (which will allow us to sort and group later)
data['timestamp'] = pd.to_datetime(data['timestamp'], format= '%Y-%m-%d %H:%M:%S')



# Step 2 - Transforming the data into daily summaries

Here, as in the last lesson, we pull out daily totals of bike hires and maximum temperatures. 

This time, we will combine these summary statistics into a single data frame, so we can plot them together. 

In [None]:
#set a date range for the graph
filtered_data = data[(data['timestamp']> "2016-06-01") & (data['timestamp']< "2016-07-01")]

#group by and count the totals for each date
graph_data = filtered_data.groupby(filtered_data.timestamp.dt.date)['count'].sum()
# turn the timestamp index into a column
graph_data = graph_data.reset_index()

#Create a dataframe containing the highest temperature data for each day
temp_data = filtered_data.groupby(filtered_data.timestamp.dt.date)['t1'].max()

# turn the timestamp index into a column
temp_data = temp_data.reset_index()

# Step 2 - Transforming the data into daily summaries

Here, we will plot the two data types on top of each other, to start to look at the relationship between them. 

In order to plot multiple graphs or data types together, you may need to specify different subplots. This can be done using matplotlib and the subplots() function. 

To get two plots to sit on top of each other, you can use the twinx() function. In this case, it will create a second y-axis, shown on the right of the plot, for the temperature scale. 

In [None]:
import matplotlib.pyplot as plt
import matplotlib.dates as md
# specify how the timestamp should look - we shorten this to make it readable
xfmt = md.DateFormatter('%Y-%m')

# create your figure and its first subplot
fig, ax1 = plt.subplots()
# rotate the timestamp labels so they are readable
plt.xticks( rotation=25 )
# set up the 2nd axis
ax2 = ax1.twinx()  

# plot the first series onto the graph as a bar chart (number of hires)
ax1.bar(height=graph_data['count'], x=graph_data['timestamp'], color='blue')
ax1.set_xlabel('Timestamp')
ax1.set_ylabel('Bike hires')

# plot the second series onto the graph (max daily temperature)
ax2.plot(temp_data['timestamp'], temp_data['t1'], color='red')
ax2.set_ylabel('Temperature')

It's not clear if there's a trend there - how about we look over a longer period of time?

In [None]:
# Annual trends

#set a date range for the graph
filtered_data = data[(data['timestamp']> "2016-01-01") & (data['timestamp']< "2017-01-01")]
graph_data = filtered_data.groupby(filtered_data.timestamp.dt.date)['count'].sum()
temp_data = filtered_data.groupby(filtered_data.timestamp.dt.date)['t1'].max()


In [None]:
# join the bike hire and temperature data into a single data frame for plotting

combined_data = pd.concat([graph_data, temp_data], axis=1)
# turn the timestamp index into a column
combined_data = combined_data.reset_index()
combined_data.index = combined_data['timestamp']
combined_data.head()

# shorten the timestamp labels so they are readable
import matplotlib.dates as md
xfmt = md.DateFormatter('%Y-%m')

# create your figure and its first subplot
fig, ax1 = plt.subplots()
# rotate the date labels so they don't overlap
plt.xticks( rotation=25 )
# set up the 2nd axis
ax2 = ax1.twinx()  

ax1.bar(height=combined_data['count'], x=combined_data['timestamp'], color='blue')
ax1.set_title('Timestamp')
ax1.set_ylabel('Bike hires')

ax2.plot(combined_data['timestamp'], combined_data['t1'], color='red')
ax2.set_ylabel('Temperature')
ax2.set_xlim(combined_data['timestamp'].min(), combined_data['timestamp'].max())
print(combined_data['timestamp'].max())
ax1.xaxis.set_major_formatter(xfmt)


# Extension

Visit this mentimeter and answer the question: https://www.menti.com/nd2mxa98qx
