The aim here is to build a plot showing the number of journeys by weekday and hour.
Let's first import the library we need, load the data, and look how big it is.

In [None]:
%matplotlib inline

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

data = pd.read_csv('../input/uber-raw-data-janjune-15.csv')

print(data.shape[0])

Over 14 millions rows, my poor computer... Let's try to reduce it by grouping it by date

In [None]:
per_date = data.groupby(['Pickup_date'], as_index=False).agg(['count'])
per_date.reset_index(level=['Pickup_date'], inplace=True)
per_date = per_date.ix[:,0:2]
per_date.columns = ['date', 'count']
print(per_date.shape[0])
print (per_date.head())

Now we have a beautiful 2.7M-rows dataframe, we can start the work.
Let's transform the dates and compute the number of journey for each hour in the dataframe

In [None]:
per_date['date'] = per_date['date'].apply(lambda x: pd.Timestamp(x))
per_date['dayhour'] = per_date['date'].dt.strftime('%y.%m.%d %H')
per_date = per_date[['dayhour', 'count']].groupby(['dayhour'], as_index=False)['count'].agg(['sum'])
per_date.reset_index(level=['dayhour'], inplace=True)
per_date.columns = ['dayhour', 'total']

print(per_date.head())

And then let's extract the day-hour and the weekday from the column 'dayhour' and aggregate the number of journeys according to them

In [None]:
per_date['dayhour'] = per_date['dayhour'].apply(lambda x: pd.Timestamp(x))
per_date['hour'] = per_date['dayhour'].dt.strftime('%H')
per_date['weekday'] = per_date['dayhour'].dt.strftime('%a')
per_date = per_date.groupby(['weekday', 'hour'], as_index=False)['total'].agg(['mean'])
per_date.reset_index(level=['weekday', 'hour'], inplace=True)

print(per_date.head())

Now we order the data per weekday and hour. The weekdays are enumerated in reversed order (Monday=6, Sunday=0) in order to get the days from the top to the bottom on the plot.

In [None]:
sequence = {'Mon': 6, 'Tue': 5, 'Wed': 4, 'Thu': 3, 'Fri': 2, 'Sat': 1, 'Sun': 0}
per_date['daynum'] = per_date['weekday'].apply(lambda x: sequence[x])
per_date = per_date.sort_values(by=['daynum', 'hour'])
per_date = per_date[['weekday', 'hour', 'mean']]

print(per_date.head())

Now we are ready to plot

In [None]:
mat=per_date['mean'].values.reshape(7,24)
plt.title('Mean Number of Journeys by Weekday and Hour')
plt.xlim([0, 24])
plt.ylim([0, 7])
plt.xlabel('Hour')
plt.ylabel('Week Day')
plt.yticks(np.arange(0.5, 7.5, 1), per_date['weekday'].unique())
plt.pcolor(mat,cmap=plt.cm.Reds)
plt.colorbar()
plt.show()

There are very few journeys between 3 and 6am, a lot between 6pm and 1am, and quite lss on thursday than in the other days.