## Pooled Classification

A common workflow with longitudinal spatial data is to apply the same classification scheme to an attribute over different time periods. More specifically, one would like to keep the class breaks the same over each period and examine how the mass of the distribution changes over these classes in the different periods.

The `Pooled` classifier supports this workflow.


In [1]:
import numpy as np
import mapclassify as mc

## Sample Data
20 crosssectional units at three time points

In [2]:
n = 20
data = np.array([np.arange(n)+i*n for i in range(1,4)]).T

In [3]:
data.shape

(20, 3)

In [4]:
data

array([[20, 40, 60],
       [21, 41, 61],
       [22, 42, 62],
       [23, 43, 63],
       [24, 44, 64],
       [25, 45, 65],
       [26, 46, 66],
       [27, 47, 67],
       [28, 48, 68],
       [29, 49, 69],
       [30, 50, 70],
       [31, 51, 71],
       [32, 52, 72],
       [33, 53, 73],
       [34, 54, 74],
       [35, 55, 75],
       [36, 56, 76],
       [37, 57, 77],
       [38, 58, 78],
       [39, 59, 79]])

## Default: Quintiles
The default is to apply a [vec](https://en.wikipedia.org/wiki/Vectorization_(mathematics)) operator to the data matrix and treat the observations as a single collection. Here the quantiles of the pooled data are obtained.

In [5]:
res = mc.Pooled(data)

In [6]:
res

Pooled Classifier

Pooled Quantiles      

   Interval      Count
----------------------
[20.00, 31.80] |    12
(31.80, 43.60] |     8
(43.60, 55.40] |     0
(55.40, 67.20] |     0
(67.20, 79.00] |     0

Pooled Quantiles      

   Interval      Count
----------------------
( -inf, 31.80] |     0
(31.80, 43.60] |     4
(43.60, 55.40] |    12
(55.40, 67.20] |     4
(67.20, 79.00] |     0

Pooled Quantiles      

   Interval      Count
----------------------
( -inf, 31.80] |     0
(31.80, 43.60] |     0
(43.60, 55.40] |     0
(55.40, 67.20] |     8
(67.20, 79.00] |    12

In [7]:
res = mc.Pooled(data, k=4)

In [8]:
res

Pooled Classifier

Pooled Quantiles      

   Interval      Count
----------------------
[20.00, 34.75] |    15
(34.75, 49.50] |     5
(49.50, 64.25] |     0
(64.25, 79.00] |     0

Pooled Quantiles      

   Interval      Count
----------------------
( -inf, 34.75] |     0
(34.75, 49.50] |    10
(49.50, 64.25] |    10
(64.25, 79.00] |     0

Pooled Quantiles      

   Interval      Count
----------------------
( -inf, 34.75] |     0
(34.75, 49.50] |     0
(49.50, 64.25] |     5
(64.25, 79.00] |    15

Extract the pooled classification objects for each column

In [9]:
c0, c1, c2 = res.col_classifiers

In [10]:
c0

Pooled Quantiles      

   Interval      Count
----------------------
[20.00, 34.75] |    15
(34.75, 49.50] |     5
(49.50, 64.25] |     0
(64.25, 79.00] |     0

Compare to the unrestricted classifier for the first column

In [11]:
mc.Quantiles(c0.y, k=4)

Quantiles             

   Interval      Count
----------------------
[20.00, 24.75] |     5
(24.75, 29.50] |     5
(29.50, 34.25] |     5
(34.25, 39.00] |     5

and the last column comparisions

In [12]:
c2

Pooled Quantiles      

   Interval      Count
----------------------
( -inf, 34.75] |     0
(34.75, 49.50] |     0
(49.50, 64.25] |     5
(64.25, 79.00] |    15

In [13]:
mc.Quantiles(c2.y, k=4)

Quantiles             

   Interval      Count
----------------------
[60.00, 64.75] |     5
(64.75, 69.50] |     5
(69.50, 74.25] |     5
(74.25, 79.00] |     5

## Non-default classifier: BoxPlot

In [14]:
res = mc.Pooled(data, classifier='BoxPlot', hinge=1.5)

In [15]:
res

Pooled Classifier

Pooled BoxPlot          

    Interval       Count
------------------------
(  -inf,  -9.50] |     0
( -9.50,  34.75] |    15
( 34.75,  49.50] |     5
( 49.50,  64.25] |     0
( 64.25, 108.50] |     0

Pooled BoxPlot          

    Interval       Count
------------------------
(  -inf,  -9.50] |     0
( -9.50,  34.75] |     0
( 34.75,  49.50] |    10
( 49.50,  64.25] |    10
( 64.25, 108.50] |     0

Pooled BoxPlot          

    Interval       Count
------------------------
(  -inf,  -9.50] |     0
( -9.50,  34.75] |     0
( 34.75,  49.50] |     0
( 49.50,  64.25] |     5
( 64.25, 108.50] |    15

In [16]:
c0, c1, c2 = res.col_classifiers

In [17]:
c0.yb

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2])

In [18]:
c00 = mc.BoxPlot(c0.y, hinge=3)

In [19]:
c00.yb

array([1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4])

In [20]:
c00

BoxPlot               

   Interval      Count
----------------------
( -inf, -3.75] |     0
(-3.75, 24.75] |     5
(24.75, 29.50] |     5
(29.50, 34.25] |     5
(34.25, 62.75] |     5

In [21]:
c0

Pooled BoxPlot          

    Interval       Count
------------------------
(  -inf,  -9.50] |     0
( -9.50,  34.75] |    15
( 34.75,  49.50] |     5
( 49.50,  64.25] |     0
( 64.25, 108.50] |     0

## Non-default classifier: FisherJenks

In [22]:
res = mc.Pooled(data, classifier='FisherJenks', k=5)

In [23]:
res

Pooled Classifier

Pooled FisherJenks    

   Interval      Count
----------------------
[20.00, 31.00] |    12
(31.00, 43.00] |     8
(43.00, 55.00] |     0
(55.00, 67.00] |     0
(67.00, 79.00] |     0

Pooled FisherJenks    

   Interval      Count
----------------------
( -inf, 31.00] |     0
(31.00, 43.00] |     4
(43.00, 55.00] |    12
(55.00, 67.00] |     4
(67.00, 79.00] |     0

Pooled FisherJenks    

   Interval      Count
----------------------
( -inf, 31.00] |     0
(31.00, 43.00] |     0
(43.00, 55.00] |     0
(55.00, 67.00] |     8
(67.00, 79.00] |    12

In [24]:
c0, c1, c2 = res.col_classifiers
mc.FisherJenks(c0.y, k=5)

FisherJenks           

   Interval      Count
----------------------
[20.00, 23.00] |     4
(23.00, 27.00] |     4
(27.00, 31.00] |     4
(31.00, 35.00] |     4
(35.00, 39.00] |     4

## Non-default classifier: MaximumBreaks



In [25]:
data[1, 0] = 10
data[1, 1] = 10
data[1, 2] = 10
data[9, 2] = 10
data

array([[20, 40, 60],
       [10, 10, 10],
       [22, 42, 62],
       [23, 43, 63],
       [24, 44, 64],
       [25, 45, 65],
       [26, 46, 66],
       [27, 47, 67],
       [28, 48, 68],
       [29, 49, 10],
       [30, 50, 70],
       [31, 51, 71],
       [32, 52, 72],
       [33, 53, 73],
       [34, 54, 74],
       [35, 55, 75],
       [36, 56, 76],
       [37, 57, 77],
       [38, 58, 78],
       [39, 59, 79]])

In [26]:
res = mc.Pooled(data, classifier='MaximumBreaks', k=5)

In [27]:
res

Pooled Classifier

Pooled MaximumBreaks  

   Interval      Count
----------------------
[10.00, 15.00] |     1
(15.00, 21.00] |     1
(21.00, 41.00] |    18
(41.00, 61.00] |     0
(61.00, 79.00] |     0

Pooled MaximumBreaks  

   Interval      Count
----------------------
[10.00, 15.00] |     1
(15.00, 21.00] |     0
(21.00, 41.00] |     1
(41.00, 61.00] |    18
(61.00, 79.00] |     0

Pooled MaximumBreaks  

   Interval      Count
----------------------
[10.00, 15.00] |     2
(15.00, 21.00] |     0
(21.00, 41.00] |     0
(41.00, 61.00] |     1
(61.00, 79.00] |    17

In [28]:
c0, c1, c2 = res.col_classifiers

In [29]:
c0

Pooled MaximumBreaks  

   Interval      Count
----------------------
[10.00, 15.00] |     1
(15.00, 21.00] |     1
(21.00, 41.00] |    18
(41.00, 61.00] |     0
(61.00, 79.00] |     0

In [30]:
mc.MaximumBreaks(c0.y, k=5)

Insufficient number of unique diffs. Breaks are random.


MaximumBreaks         

   Interval      Count
----------------------
[10.00, 15.00] |     1
(15.00, 21.00] |     1
(21.00, 22.50] |     1
(22.50, 28.50] |     6
(28.50, 39.00] |    11