## Overview of the mapclassify API

There are a number of ways to access the functionality in `mapclassify`

We first load the example dataset that we have seen earlier.

In [1]:
import geopandas
import libpysal
import mapclassify

Current `mapclassify` version.

In [2]:
mapclassify.__version__

'2.5.0+8.g34341a22.dirty'

In [3]:
pth = libpysal.examples.get_path("columbus.shp")
gdf = geopandas.read_file(pth)
y = gdf.HOVAL
gdf.head()

Unnamed: 0,AREA,PERIMETER,COLUMBUS_,COLUMBUS_I,POLYID,NEIG,HOVAL,INC,CRIME,OPEN,...,DISCBD,X,Y,NSA,NSB,EW,CP,THOUS,NEIGNO,geometry
0,0.309441,2.440629,2,5,1,5,80.467003,19.531,15.72598,2.850747,...,5.03,38.799999,44.07,1.0,1.0,1.0,0.0,1000.0,1005.0,"POLYGON ((8.62413 14.23698, 8.55970 14.74245, ..."
1,0.259329,2.236939,3,1,2,1,44.567001,21.232,18.801754,5.29672,...,4.27,35.619999,42.380001,1.0,1.0,0.0,0.0,1000.0,1001.0,"POLYGON ((8.25279 14.23694, 8.28276 14.22994, ..."
2,0.192468,2.187547,4,6,3,6,26.35,15.956,30.626781,4.534649,...,3.89,39.82,41.18,1.0,1.0,1.0,0.0,1000.0,1006.0,"POLYGON ((8.65331 14.00809, 8.81814 14.00205, ..."
3,0.083841,1.427635,5,2,4,2,33.200001,4.477,32.38776,0.394427,...,3.7,36.5,40.52,1.0,1.0,0.0,0.0,1000.0,1002.0,"POLYGON ((8.45950 13.82035, 8.47341 13.83227, ..."
4,0.488888,2.997133,6,7,5,7,23.225,11.252,50.73151,0.405664,...,2.83,40.009998,38.0,1.0,1.0,1.0,0.0,1000.0,1007.0,"POLYGON ((8.68527 13.63952, 8.67758 13.72221, ..."


## Original API (< 2.4.0)


In [4]:
bp = mapclassify.BoxPlot(y)
bp

BoxPlot

   Interval      Count
----------------------
( -inf, -0.70] |     0
(-0.70, 25.70] |    13
(25.70, 33.50] |    12
(33.50, 43.30] |    12
(43.30, 69.70] |     7
(69.70, 96.40] |     5

## Extended API (>= 2.40)

Note the original API is still available so this extension keeps backwards compatibility.

In [5]:
bp = mapclassify.classify(y, "box_plot")
bp

BoxPlot

   Interval      Count
----------------------
( -inf, -0.70] |     0
(-0.70, 25.70] |    13
(25.70, 33.50] |    12
(33.50, 43.30] |    12
(43.30, 69.70] |     7
(69.70, 96.40] |     5

In [6]:
type(bp)

mapclassify.classifiers.BoxPlot

In [7]:
q5 = mapclassify.classify(y, "quantiles", k=5)
q5

Quantiles

   Interval      Count
----------------------
[17.90, 23.08] |    10
(23.08, 30.48] |    10
(30.48, 39.10] |     9
(39.10, 45.83] |    10
(45.83, 96.40] |    10

### Robustness of the `scheme` argument

In [8]:
mapclassify.classify(y, "boxPlot")

BoxPlot

   Interval      Count
----------------------
( -inf, -0.70] |     0
(-0.70, 25.70] |    13
(25.70, 33.50] |    12
(33.50, 43.30] |    12
(43.30, 69.70] |     7
(69.70, 96.40] |     5

In [9]:
mapclassify.classify(y, "Boxplot")

BoxPlot

   Interval      Count
----------------------
( -inf, -0.70] |     0
(-0.70, 25.70] |    13
(25.70, 33.50] |    12
(33.50, 43.30] |    12
(43.30, 69.70] |     7
(69.70, 96.40] |     5

In [10]:
mapclassify.classify(y, "Box_plot")

BoxPlot

   Interval      Count
----------------------
( -inf, -0.70] |     0
(-0.70, 25.70] |    13
(25.70, 33.50] |    12
(33.50, 43.30] |    12
(43.30, 69.70] |     7
(69.70, 96.40] |     5

In [11]:
mapclassify.classify?

[0;31mSignature:[0m
[0mmapclassify[0m[0;34m.[0m[0mclassify[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0my[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mscheme[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mk[0m[0;34m=[0m[0;36m5[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mpct[0m[0;34m=[0m[0;34m[[0m[0;36m1[0m[0;34m,[0m [0;36m10[0m[0;34m,[0m [0;36m50[0m[0;34m,[0m [0;36m90[0m[0;34m,[0m [0;36m99[0m[0;34m,[0m [0;36m100[0m[0;34m][0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mpct_sampled[0m[0;34m=[0m[0;36m0.1[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mtruncate[0m[0;34m=[0m[0;32mTrue[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mhinge[0m[0;34m=[0m[0;36m1.5[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mmultiples[0m[0;34m=[0m[0;34m[[0m[0;34m-[0m[0;36m2[0m[0;34m,[0m [0;34m-[0m[0;36m1[0m[0;34m,[0m [0;36m1[0m[0;34m,[0m [0;36m2[0m[0;34m][0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mmindiff[0m[0;34m=[0m[0;36m0[0m[0;

In [12]:
mapclassify.classify(y, "User_Defined", bins=[0,50, 100])

UserDefined

    Interval       Count
------------------------
(  -inf,   0.00] |     0
(  0.00,  50.00] |    40
( 50.00, 100.00] |     9

### Finding bins for new data

In [13]:
r = mapclassify.classify(y, "User_Defined", bins=[0,50, 100])

In [14]:
r.find_bin??

[0;31mSignature:[0m [0mr[0m[0;34m.[0m[0mfind_bin[0m[0;34m([0m[0mx[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mSource:[0m   
    [0;32mdef[0m [0mfind_bin[0m[0;34m([0m[0mself[0m[0;34m,[0m [0mx[0m[0;34m)[0m[0;34m:[0m[0;34m[0m
[0;34m[0m        [0;34m"""[0m
[0;34m        Sort input or inputs according to the current bin estimate.[0m
[0;34m[0m
[0;34m        Parameters[0m
[0;34m        ----------[0m
[0;34m[0m
[0;34m        x : numpy.array, int, float[0m
[0;34m            A value or array of values to fit within the estimated bins.[0m
[0;34m[0m
[0;34m        Returns[0m
[0;34m        -------[0m
[0;34m[0m
[0;34m        right : numpy.array, int[0m
[0;34m            A bin index or array of bin indices that classify the[0m
[0;34m            input into one of the classifiers' bins.[0m
[0;34m[0m
[0;34m        Notes[0m
[0;34m        -----[0m
[0;34m[0m
[0;34m        This differs from similar functionality in[0m
[0;34m        

In [15]:
r.bins, r.counts

(array([  0,  50, 100]), array([ 0, 40,  9]))

In [16]:
r.find_bin([7,0, 51, 33])

array([1, 0, 2, 1])

Note that `find_bin` does not recalibrate the classifier:

In [17]:
r.bins, r.counts

(array([  0,  50, 100]), array([ 0, 40,  9]))