In [2]:
import pandas as pd

The data frames `Customers`, `Employees`, `Offices`, `OrderDetails`, `Orders`, `Payments`, `ProductLines`, and `Products` contain data of the corresponding tables in the [ClassicModels database](https://www.richardtwatson.com/dm6e/Reader/ClassicModels.html).

The entity relationship diagram is shown here ![ERD](figures/ClassicModels.png)

- Report the total payments by date
- Report the total payments by year
- Report the total payments by month
- Report the total payments by year-month
- Report the total payments by year-quarter
- Report the total payments in a bi-yearly basis (each six months in each year)

*HINT* Convert `paymentDate` to a timestamp. Then you can extract year, month, and quarter using its properties. e.g.,

In [3]:
text = "6 October 2019"
ts = pd.to_datetime(text)
print(ts.year, ts.month, ts.day)

2019 10 6


In [4]:
Customers = pd.read_csv('data/ClassicModels_Customers.csv', sep=';')
Employees = pd.read_csv('data/ClassicModels_Employees.csv', sep=';')
Offices = pd.read_csv('data/ClassicModels_Offices.csv', sep=';')
OrderDetails = pd.read_csv('data/ClassicModels_OrderDetails.csv', sep=';')
Orders = pd.read_csv('data/ClassicModels_Orders.csv', sep=';')
Payments = pd.read_csv('data/ClassicModels_Payments.csv', sep=';')
ProductLines = pd.read_csv('data/ClassicModels_ProductLines.csv', sep=';')
Products = pd.read_csv('data/ClassicModels_Products.csv', sep=';')

In [6]:
Payments.sort_values('paymentDate')

Unnamed: 0,checkNumber,paymentDate,amount,customerNumber
167,IS232033,2003-01-16 00:00:00,10223.83,363
65,DI925118,2003-01-28 00:00:00,10549.01,128
122,GQ132144,2003-01-30 00:00:00,5494.78,181
58,DB889831,2003-02-16 00:00:00,50218.95,121
173,JJ246391,2003-02-20 00:00:00,53959.21,145
177,JN722010,2003-02-25 00:00:00,40206.20,141
121,GP636783,2003-03-02 00:00:00,52151.81,278
82,EK785462,2003-03-09 00:00:00,51001.22,385
47,CL442705,2003-03-12 00:00:00,22292.62,131
169,JB117768,2003-03-20 00:00:00,25833.14,486


- Report the total payments by date

In [12]:
Payments.groupby('paymentDate')['amount'].sum()

paymentDate
2003-01-16 00:00:00     10223.83
2003-01-28 00:00:00     10549.01
2003-01-30 00:00:00      5494.78
2003-02-16 00:00:00     50218.95
2003-02-20 00:00:00     53959.21
2003-02-25 00:00:00     40206.20
2003-03-02 00:00:00     52151.81
2003-03-09 00:00:00     51001.22
2003-03-12 00:00:00     22292.62
2003-03-20 00:00:00     25833.14
2003-03-27 00:00:00     48425.69
2003-04-09 00:00:00     24212.79
2003-04-11 00:00:00     11044.30
2003-04-16 00:00:00     21665.98
2003-04-19 00:00:00      1627.56
2003-04-20 00:00:00     33383.14
2003-04-22 00:00:00     44380.15
2003-05-09 00:00:00      3101.40
2003-05-12 00:00:00     35826.33
2003-05-20 00:00:00     45864.03
2003-05-21 00:00:00     16700.47
2003-05-25 00:00:00     50824.66
2003-05-31 00:00:00      7565.08
2003-06-05 00:00:00     14571.44
2003-06-06 00:00:00     32641.98
2003-06-13 00:00:00     57131.92
2003-06-18 00:00:00     58841.35
2003-06-25 00:00:00     17032.29
2003-07-05 00:00:00      2880.00
2003-07-06 00:00:00      6036.9


- Report the total payments by year

In [15]:
years = Payments.paymentDate.str[:4]
Payments.groupby(years)['amount'].sum()

paymentDate
2003    3250217.70
2004    4313328.25
2005    1290293.28
Name: amount, dtype: float64

In [22]:
Payments['date'] = pd.to_datetime(Payments['paymentDate'])
years2 = Payments['date'].apply(lambda x: x.year)
Payments.groupby(years2)['amount'].sum()

date
2003    3250217.70
2004    4313328.25
2005    1290293.28
Name: amount, dtype: float64

In [49]:
Payments.set_index('date').resample('Y')['amount'].sum()

date
2003-12-31    3250217.70
2004-12-31    4313328.25
2005-12-31    1290293.28
Freq: A-DEC, Name: amount, dtype: float64


- Report the total payments by month

In [32]:
months = Payments.paymentDate.str[5:7]
Payments.groupby(months)['amount'].sum()

paymentDate
01     397887.81
02     503357.59
03     989575.78
04     493457.60
05     640655.32
06     425151.10
07     442438.48
08     624299.16
09     637651.76
10     501961.39
11    1551479.98
12    1645923.26
Name: amount, dtype: float64

In [33]:
months2 = Payments['date'].apply(lambda x: x.month)
Payments.groupby(months2)['amount'].sum()

date
1      397887.81
2      503357.59
3      989575.78
4      493457.60
5      640655.32
6      425151.10
7      442438.48
8      624299.16
9      637651.76
10     501961.39
11    1551479.98
12    1645923.26
Name: amount, dtype: float64


- Report the total payments by year-month

In [34]:
months = Payments.paymentDate.str[:7]
Payments.groupby(months)['amount'].sum()

paymentDate
2003-01     26267.62
2003-02    144384.36
2003-03    199704.48
2003-04    136313.92
2003-05    159881.97
2003-06    180218.98
2003-07    158247.00
2003-08    246204.86
2003-09    161206.23
2003-10    316857.96
2003-11    694292.68
2003-12    826637.64
2004-01    234152.13
2004-02    106652.01
2004-03    404603.21
2004-04    173245.96
2004-05    208524.42
2004-06    185842.86
2004-07    284191.48
2004-08    378094.30
2004-09    476445.53
2004-10    185103.43
2004-11    857187.30
2004-12    819285.62
2005-01    137468.06
2005-02    252321.22
2005-03    385268.09
2005-04    183897.72
2005-05    272248.93
2005-06     59089.26
Name: amount, dtype: float64

In [41]:
months2 = Payments['date'].apply(lambda x: f"{x.year}-{x.month:2d}")
Payments.groupby(months2)['amount'].sum()

date
2003- 1     26267.62
2003- 2    144384.36
2003- 3    199704.48
2003- 4    136313.92
2003- 5    159881.97
2003- 6    180218.98
2003- 7    158247.00
2003- 8    246204.86
2003- 9    161206.23
2003-10    316857.96
2003-11    694292.68
2003-12    826637.64
2004- 1    234152.13
2004- 2    106652.01
2004- 3    404603.21
2004- 4    173245.96
2004- 5    208524.42
2004- 6    185842.86
2004- 7    284191.48
2004- 8    378094.30
2004- 9    476445.53
2004-10    185103.43
2004-11    857187.30
2004-12    819285.62
2005- 1    137468.06
2005- 2    252321.22
2005- 3    385268.09
2005- 4    183897.72
2005- 5    272248.93
2005- 6     59089.26
Name: amount, dtype: float64

In [47]:
months3 = pd.DatetimeIndex(Payments['date']).to_period('M')
Payments.groupby(months3)['amount'].sum()

date
2003-01     26267.62
2003-02    144384.36
2003-03    199704.48
2003-04    136313.92
2003-05    159881.97
2003-06    180218.98
2003-07    158247.00
2003-08    246204.86
2003-09    161206.23
2003-10    316857.96
2003-11    694292.68
2003-12    826637.64
2004-01    234152.13
2004-02    106652.01
2004-03    404603.21
2004-04    173245.96
2004-05    208524.42
2004-06    185842.86
2004-07    284191.48
2004-08    378094.30
2004-09    476445.53
2004-10    185103.43
2004-11    857187.30
2004-12    819285.62
2005-01    137468.06
2005-02    252321.22
2005-03    385268.09
2005-04    183897.72
2005-05    272248.93
2005-06     59089.26
Freq: M, Name: amount, dtype: float64

In [50]:
Payments.set_index('date').resample('M')['amount'].sum()

date
2003-01-31     26267.62
2003-02-28    144384.36
2003-03-31    199704.48
2003-04-30    136313.92
2003-05-31    159881.97
2003-06-30    180218.98
2003-07-31    158247.00
2003-08-31    246204.86
2003-09-30    161206.23
2003-10-31    316857.96
2003-11-30    694292.68
2003-12-31    826637.64
2004-01-31    234152.13
2004-02-29    106652.01
2004-03-31    404603.21
2004-04-30    173245.96
2004-05-31    208524.42
2004-06-30    185842.86
2004-07-31    284191.48
2004-08-31    378094.30
2004-09-30    476445.53
2004-10-31    185103.43
2004-11-30    857187.30
2004-12-31    819285.62
2005-01-31    137468.06
2005-02-28    252321.22
2005-03-31    385268.09
2005-04-30    183897.72
2005-05-31    272248.93
2005-06-30     59089.26
Freq: M, Name: amount, dtype: float64


- Report the total payments by year-quarter

In [51]:
quarters = Payments['date'].apply(lambda x: f"{x.year}-{x.quarter}")
Payments.groupby(quarters)['amount'].sum()

date
2003-1     370356.46
2003-2     476414.87
2003-3     565658.09
2003-4    1837788.28
2004-1     745407.35
2004-2     567613.24
2004-3    1138731.31
2004-4    1861576.35
2005-1     775057.37
2005-2     515235.91
Name: amount, dtype: float64

In [52]:
quarters2 = pd.DatetimeIndex(Payments['date']).to_period('Q')
Payments.groupby(quarters2)['amount'].sum()

date
2003Q1     370356.46
2003Q2     476414.87
2003Q3     565658.09
2003Q4    1837788.28
2004Q1     745407.35
2004Q2     567613.24
2004Q3    1138731.31
2004Q4    1861576.35
2005Q1     775057.37
2005Q2     515235.91
Freq: Q-DEC, Name: amount, dtype: float64

In [53]:
Payments.set_index('date').resample('Q')['amount'].sum()

date
2003-03-31     370356.46
2003-06-30     476414.87
2003-09-30     565658.09
2003-12-31    1837788.28
2004-03-31     745407.35
2004-06-30     567613.24
2004-09-30    1138731.31
2004-12-31    1861576.35
2005-03-31     775057.37
2005-06-30     515235.91
Freq: Q-DEC, Name: amount, dtype: float64


- Report the total payments in a bi-yearly basis (each six months in each year)

In [55]:
Payments.set_index('date').resample('6MS')['amount'].sum()

date
2003-01-01     846771.33
2003-07-01    2403446.37
2004-01-01    1313020.59
2004-07-01    3000307.66
2005-01-01    1290293.28
Freq: 6MS, Name: amount, dtype: float64

In [61]:
halfyear = Payments['date'].apply(lambda x: str(x.year) + "-" + ("H1" if x.month < 7 else "H2"))

In [62]:
Payments.groupby(halfyear)['amount'].sum()

date
2003-H1     846771.33
2003-H2    2403446.37
2004-H1    1313020.59
2004-H2    3000307.66
2005-H1    1290293.28
Name: amount, dtype: float64