diff --git a/notebooks/04_spatial_weights.md b/notebooks/04_spatial_weights.md
index c3726346..7d4f5f50 100644
--- a/notebooks/04_spatial_weights.md
+++ b/notebooks/04_spatial_weights.md
@@ -6,11 +6,11 @@ jupyter:
       extension: .md
       format_name: markdown
       format_version: '1.1'
-      jupytext_version: 1.1.6
+      jupytext_version: 1.1.7
   kernelspec:
-    display_name: Python 3
+    display_name: Analysis
     language: python
-    name: python3
+    name: ana
 ---
 
 # Spatial Weights
@@ -85,7 +85,7 @@ question share an *edge*. According to this definition, polygon $0$ would be a r
 
 ```python
 # do a regular 3x3 lattice and draw it here
-w = weights.Contiguity.Rook.from_dataframe(gdf)
+w = weights.contiguity.Rook.from_dataframe(gdf)
 w.neighbors
 ```
 
@@ -150,7 +150,7 @@ neighbor relations for this same configuration as follows:
 
 ```python
 # do a regular 3x3 lattice and draw it here
-w = weights.Contiguity.Queen.from_dataframe(gdf)
+w = weights.contiguity.Queen.from_dataframe(gdf)
 w.neighbors
 ```
 
@@ -250,7 +250,7 @@ structures to extract these.
 
 ```python
 san_diego_tracts = geopandas.read_file(bookdata.san_diego_tracts())
-wq = weights.Contiguity.Queen.from_dataframe(san_diego_tracts)
+wq = weights.contiguity.Queen.from_dataframe(san_diego_tracts)
 ```
 
 Note the warning about disconnected observations (no neighbors). The disconnected observation warning indicates that there are polygons in the shapefile that do not share any vertices with other polygons. Any time there is a disconnected observation, sometimes called a topological *island*, the graph is also disconnected into at least two one components. Further, graphs can have more than one disconnected component even without any islands. We will return to deal with this later in this chapter. For now we focus on the object just created.
@@ -300,7 +300,7 @@ queen neighbors. The most common number of neighbors is 6.
 There is also a function to create the rook weights for the same dataframe:
 
 ```python
-wr = weights.Contiguity.Rook.from_dataframe(san_diego_tracts)
+wr = weights.contiguity.Rook.from_dataframe(san_diego_tracts)
 print(wr.pct_nonzero)
 s = pandas.Series(wr.cardinalities)
 s.plot.hist(bins=s.unique().shape[0]);
@@ -371,7 +371,7 @@ between these polygon objects, however. To do so we develop a representative
 point for each of the polygons using the so called "center of mass" or centroid.
 
 ```python
-wk4 = weights.Distance.KNN.from_dataframe(gdf, k=4)
+wk4 = weights.distance.KNN.from_dataframe(gdf, k=4)
 ```
 
 The centroids are attributes of the polygon shapes that PySAL calculates from
@@ -419,7 +419,7 @@ The simplest way to compute Kernel weights in PySAL involves a single function
 call:
 
 ```python
-w_kernel = weights.Distance.Kernel.from_dataframe(gdf)
+w_kernel = weights.distance.Kernel.from_dataframe(gdf)
 ```
 
 Like KNN weights, the Kernel weights are based on distances between observations. By default, if the input data is an areal unit, we use a central representative point (like the centroid) for that polygon.
@@ -469,7 +469,7 @@ ax.set_axis_off()
 We can see that the adaptive bandwidth adjusts for this:
 
 ```python
-w_adaptive = weights.Distance.Kernel.from_dataframe(top_30, fixed=False, k=15)
+w_adaptive = weights.distance.Kernel.from_dataframe(top_30, fixed=False, k=15)
 w_adaptive.bandwidth
 ```
 
@@ -512,7 +512,7 @@ other observations are within the buffer. If they are, they are assigned a
 weight of one in the spatial weights matrix, if not they receive a zero.
 
 ```python
-w_bdb = weights.Distance.DistanceBand.from_dataframe(gdf, 1.5, binary=True)
+w_bdb = weights.distance.DistanceBand.from_dataframe(gdf, 1.5, binary=True)
 ```
 
 This creates a binary distance weights where every other observation within
@@ -528,7 +528,7 @@ everyone else. For this example we will return to the small lattice example
 covered in the beginning:
 
 ```python
-w_hy = weights.Distance.DistanceBand.from_dataframe(gdf, 1.5, binary=False)
+w_hy = weights.distance.DistanceBand.from_dataframe(gdf, 1.5, binary=False)
 ```
 
 We apply a threshold of 1.5 for this illustration. PySAL truncates continuous
@@ -562,14 +562,13 @@ curvature implicit in non-projected reference systems (e.g.
 longitude/latitude) can be a sufficient workaround. PySAL provides such
 approximation as part of its functionality.
 
-[**LJW: we've gotta keep this section as is, not upgrading to use the `KNN` class, until [libpysal#82](https://github.com/pysal/libpysal/pull/82) is fixed**]
-
 To illustrate the relevance of ignoring this aspect altogether we will examine
 distance based weights for the case of counties in the state of Texas. First, let us compute
 a KNN-4 object that ignores the curvature of the Earth's surface:
 
 ```python
-knn4_bad = weights.user.knnW_from_shapefile(bookdata.texas(), k=4) # ignore curvature of the earth
+texas = geopandas.read_file(bookdata.texas())
+knn4_bad = weights.distance.KNN.from_dataframe(texas, k=4) # ignore curvature of the earth
 ```
 
 ```python
@@ -591,7 +590,7 @@ expressed in the units we have used for the radious, that is in miles in our
 case:
 
 ```python
-knn4 = weights.user.knnW_from_shapefile(bookdata.texas(), k=4, radius=radius)
+knn4 = weights.distance.KNN.from_dataframe(texas, k=4, radius=radius)
 ```
 
 ```python
@@ -661,7 +660,7 @@ From this, we can see *why* our tract is disconnected, rather than simply *that*
 That aside, we can still re-attach the observation to the graph by connecting it to its nearest neighbor. To do this, we can construct the KNN graph as we did above, but set `k=1`, so observations are only assigned to their nearest neighbor:
 
 ```python
-wk1 = weights.Distance.KNN.from_dataframe(san_diego_tracts, k=1)
+wk1 = weights.distance.KNN.from_dataframe(san_diego_tracts, k=1)
 ```
 
 In this graph, all our observations are connected to *at least* one other observation, so island observation is connected to:
@@ -707,7 +706,7 @@ A more elegant approach to the island problem makes use of PySAL's support for
 *set theoretic operations* on PySAL weights:
 
 ```python
-w_fixed_sets = weights.Wsets.w_union(wr, wk1)
+w_fixed_sets = weights.set_operations.w_union(wr, wk1)
 w_fixed_sets.histogram
 ```
 
@@ -798,7 +797,7 @@ Beginning with Queen weights:
 <!-- #endregion -->
 
 ```python
-queen_mx = weights.Contiguity.Queen.from_dataframe(mx)
+queen_mx = weights.contiguity.Queen.from_dataframe(mx)
 f, ax = plt.subplots(1, figsize=(9, 9))
 mx.plot(ax=ax)
 queen_mx.plot(mx,edge_kws=dict(linewidth=1.5, color='orangered'), node_kws=dict(marker='*'),  ax=ax, )
@@ -833,7 +832,7 @@ plt.show()
 Next, we construct the union of queen contiguity and block weights
 
 ```python
-union_mx = weights.Wsets.w_union(block_mx, queen_mx)
+union_mx = weights.set_operations.w_union(block_mx, queen_mx)
 
 f, ax = plt.subplots(1, figsize=(9, 9))
 mx.plot(column='INEGI2', categorical=True, cmap='Pastel2', ax=ax)
@@ -999,7 +998,7 @@ plt.hist(adjlist_wealth['diff'],
          histtype='step', bins=50, label='Neighbors')
 seaborn.despine()
 plt.ylabel("Density")
-plt.xlabel("Dollars ($)")
+plt.xlabel("Dollar Differences ($)")
 plt.legend()
 ```
 
@@ -1047,7 +1046,7 @@ plt.hist(adjlist_wealth['diff'], histtype='step',
          linewidth=2)
 seaborn.despine()
 plt.ylabel("Density")
-plt.xlabel("Dollars ($)")
+plt.xlabel("Dollar Differences ($)")
 plt.show()
 ```
 
@@ -1098,6 +1097,61 @@ ax[2].axis([west, south, east, north])
 These are the starkest contrasts in the map, and result in the most distinctive divisions between adjacent tracts' household incomes. 
 
 
+# Questions
+
+
+1. Rook contiguity & Queen contiguity are two of three kinds of contiguity that are defined in terms of chess analogies. The third kind, *Bishop contiguity*, applies when two observations are considered connected when they share single vertices, but are considered *disconnected* if they share an edge. This means that observations that exhibit Queen contiguity are those that exhibit either Rook or Bishop contiguity. Using the Rook and Queen contiguity matrices we built for San Diego and the `Wsets.w_difference` function, are there any Bishop-contiguous observations in San Diego? 
+
+2. Different kinds of spatial weights objects can result in very different kinds of graph structures. Considering the `cardinalities` of the Queen, Block, and union of Queen & Block, which graph type has the highest average cardinality? Why might this be the case? Which graph has more nonzero entries?
+
+3. Comparing their average cardinality and percentage of nonzero links, which graph in this chapter has the *most sparse* structure? That is, which graph is the most sparsely connected?
+
+4. In this chapter, we worked with regular *square* lattices using the `lat2W` function. In the same manner, the `hexLat2W` function can generate *hexagonal regular lattices*. For lattices of size (3,3), (6,6), and (9,9) for Rook & Queen `lat2W`, as well as for `hexLat2W`:
+
+    1. examine the average cardinality. Does `lat2W` or `hexLat2W` have higher average cardinality? 
+    2. Further, make a histogram of the cardinalities. Which type of lattice has higher variation in its number of neighbors? 
+    3. Why is there no `rook=True` option in `hexLat2W`, as there is in `lat2W`?
+
+5. The *Voronoi diagram* is a common method to construct polygons from a point dataset. A Voronoi diagram is built up from *Voronoi cells*, each of which contains the area that is closer to its source point than any other source point in the diagram. Further, the Queen contiguity graph for a *Voronoi diagram* obeys a number of useful properties, since it is the *Delaunay Triangulation* of a set of points. 
+    1. Using the following code, build and plot the voronoi diagram for the *centroids* of Mexican states, with the states and their centroids overlayed:
+    ```python
+    from pysal.lib.weights.distance import get_points_array
+    from pysal.lib.cg import voronoi_frames
+    
+    centroid_coordinates = get_points_array(mx.centroid)
+    cells, centers = voronoi_frames(centroid_coordinates)
+    
+    ax = cells.plot(facecolor='none', edgecolor='k')
+    mx.plot(ax=ax, edgecolor='red', facecolor='whitesmoke', alpha=.5)
+    mx.centroid.plot(ax=ax,color='red', alpha=.5, markersize=10)
+    ax.axis(mx.total_bounds[[0,2,1,3]])
+    plt.show()
+    ```
+    2. Using the `weights.Voronoi` function, build the Voronoi weights for the Mexico states data.
+    3. Compare the connections in the Voronoi and Queen weights for the Mexico states data. Which form is more connected? 
+    4. Make a plot of the Queen contiguity and Voronoi contiguity graphs to compare them visually, like we did with the block weights & Queen weights. How do the two graphs compare in terms of the length of their links and how they connect the Mexican states?
+    5. Using `weights.set_operations`, find any links that are in the Voronoi contiguity graph, but not in the Queen contiguity graph. Alternatively, find any links that are in the Queen contiguity graph, but not the Voronoi contiguity graph. 
+
+6. Interoperability is important for the Python scientific stack. Thanks to standardization around the `numpy` array and the `scipy.sparse` array datastructures, it is simple and computationally-easy to convert objects from one representation to another:
+    1. Using `w.to_networkx()`, convert the Mexico Regions Queen+Block weights matrix to a `networkx` graph. Compute the Eigenvector Centrality of that new object using `networkx.eigenvector_centrality`
+    2. Using `w.sparse`, compute the number of connected components in the Mexico Regions Block weights matrix using the `connected_components` function in `scipy.sparse.csgraph`. 
+    3. Using `w.sparse`, compute the all-pairs shortest path matrix in the Mexico Queen weights matrix using the `shortest_path` function in `scipy.sparse.csgraph`. 
+    
+7. While every node in a $k$-nearest neighbor graph has 5 neighbors, there is a conceptual difference between *indegree* and *outdegree* of nodes in a graph. The *outdegree* of a node is the number of outgoing links from a node; for a K-Nearest Neighbor graph, this is $k$ for every variable. The *indegree* of a node in a graph is the number of *incoming* links to that node; for a K-Nearest Neighbor graph, this is the number of other observations that pick the target as their nearest neighbor. The *indegree* of a node in the K-Nearest Neighbor graph can provide a measure of *hubbiness*, or how central a node is to other nodes. 
+    1. Using the San Diego Tracts data, build a $k=6$ nearest neighbor weight and call it `knn_6`. 
+    2. Verify that the $k=6$ by taking the row sum over the weights matrix in `knn_6.sparse`.
+    3. Compute the indegree of each observation by taking the *column sum* over the weights matrix in `knn_6.sparse`, and divide by 6, the outdegree for all observations. 
+    4. Make a histogram of the indegrees for the $k=6$ weights. How evenly-distributed is the distribution of indegrees?
+    5. Make a new histogram of the indegree standardized by the outdegree when $k=26$. Does hubbiness reduce when increasing the number of $k$-nearest neighbors?
+
+8. Sometimes, graphs are not simple to construct. For the `san_diego_neighborhoods` dataset:
+    1. Build the Queen contiguity weights, and plot the graph on top of the neighborhoods themselves. How many connected components does this Queen contiguity graph have? 
+    2. Build the K-Nearest Neighbor graph for the default, $k=2$. How many connected components does this K-Nearest Neighbor graph have? 
+    3. What is the smallest $k$ that you can find for the K-Nearest Neighbor graph to be fully-connected?
+    4. In graph theory, a link whose *removal* will increase the number of connected components in a graph is called a *bridge*. In the fully-connected KNN graph with the smallest $k$, how many bridges are there between the north and south components? *(hint: use the plotting functionality)*
+    5. What are the next two values of $k$ required for there to be an *additional* bridge at that $k$?
+
+
 # References
 
 
@@ -1110,6 +1164,18 @@ These are the starkest contrasts in the map, and result in the most distinctive
 [4] Dean, N., Dong, G., and Price, G. "Frontiers in Residential Segregation: Understanding Neighbourhood Boundaries and their Impacts." *Tijdschrift voor economische en sociale geografie*, in press
 
 
+# Further Reading
+
+LeSage, James P., and R. Kelley Pace. 2014. “The Biggest Myth in Spatial Econometrics.” *Econometrics* 2(4): 217–49. https://doi.org/10.3390/econometrics2040217.
+
+Griffith, D. A. 1996. “Some Guidelines for Specifying the Geographic Weights Matrix Contained in Spatial Statistical Models.” In *Practical Handbook of Spatial Statistics*, edited by Sandra Lach Arlinghaus, 65–82. Boca Raton, FL: CRC Press.
+
+
+
 ---
 
 <a rel="license" href="http://creativecommons.org/licenses/by-nc-nd/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-nd/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-nd/4.0/">Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License</a>.
+
+```python
+
+```
diff --git a/notebooks/05_choropleth.ipynb b/notebooks/05_choropleth.ipynb
index 7a43182c..532ccb9d 100644
--- a/notebooks/05_choropleth.ipynb
+++ b/notebooks/05_choropleth.ipynb
@@ -351,7 +351,7 @@
     "$$k=(max-\n",
     "min)/w.$$\n",
     "\n",
-    "Below we present several approaches to create these break points that follow criteria that can be of interest in different contests, as they focus on different priorities.\n",
+    "Below we present several approaches to create these break points that follow criteria that can be of interest in different contexts, as they focus on different priorities.\n",
     " \n",
     "#### Equal Intervals\n",
     "\n",
@@ -360,9 +360,9 @@
     "special case of a more general classifier known as \"equal intervals\", where each\n",
     "of the bins has the same width in the value space. \n",
     "For a given value of $k$, equal intervals\n",
-    "classification splits the range of the attribute space into equal lengthened\n",
+    "classification splits the range of the attribute space into $k$ equal length\n",
     "intervals, with each interval having a width\n",
-    "$w = \\frac{x_0 - x_{n-1}}{k-1}$.\n",
+    "$w = \\frac{x_0 - x_{n-1}}{k}$.\n",
     "Thus the maximum class is $(x_{n-1}-w, x_{n-1}]$ and the first class is\n",
     "$(-\\infty, x_{n-1} - (k-1)w]$.\n",
     "\n",
@@ -489,7 +489,7 @@
     "not problem free. The varying widths of the intervals can be markedly different\n",
     "which can lead to problems of interpretation. A second challenge facing quantiles\n",
     "arises when there are a large number of duplicate values in the distribution\n",
-    "such that the limits for one or more classes become ambiguous.\n",
+    "such that the limits for one or more classes become ambiguous. For example, if one had a variable with $n=20$ but 10 of the observations took on the same value which was the minimum observed, then for values of $k>2$, the class boundaries become ill-defined since a simple rule of splitting at the $n/k$ ranked observed value would depend upon how ties are tried when ranking.\n",
     "\n",
     "#### Mean-standard deviation\n",
     "\n",
@@ -539,13 +539,19 @@
    "source": [
     "This classifier is best used when data is normally distributed or, at least, when the sample mean is a meaningful measure to anchor the classification around. Clearly this is\n",
     "not the case for our income data as the positive skew results in a loss of\n",
-    "information when we use the standard deviation. The lack of symmetry thus leads to\n",
-    "inadmissible boundaries for the first  class as well as a concentration of the\n",
+    "information when we use the standard deviation. The lack of symmetry leads to\n",
+    "an inadmissible upper bound for the first  class as well as a concentration of the\n",
     "vast majority of values in the middle class.\n",
     "\n",
     "#### Maximum Breaks\n",
     "\n",
-    "The maximum breaks classifier decides where to set the break points between classes by considering the difference between sorted values. That is, rather than considering a value of the dataset in itself, it looks at how appart each value is from the next one in the sorted sequence. The classifier then places the the $k-1$ break points in between the $k$ values most stretched apart from each other in the entire sequence:"
+    "The maximum breaks classifier decides where to set the break points between\n",
+    "classes by considering the difference between sorted values. That is, rather\n",
+    "than considering a value of the dataset in itself, it looks at how appart each\n",
+    "value is from the next one in the sorted sequence. The classifier then places\n",
+    "the the $k-1$ break points in between the pairs of values most stretched apart from\n",
+    "each other in the entire sequence, proceeding in descending order relative to\n",
+    "the size of the breaks:"
    ]
   },
   {
@@ -581,7 +587,13 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Maximum breaks is an appropriate approach when we are interested in making sure observations in each class are as similar to each other as possible. As such, it works well in cases where the distribution of values is not unimodal. In addition, the algorithm is relatively fast to compute. However, its simplicitly can sometimes cause unexpected results. To the extent in only considers the top $k$ differences between consecutive values, other more nuanced within-group differences and dissimilarities can be ignored.\n",
+    "Maximum breaks is an appropriate approach when we are interested in making sure\n",
+    "observations in each class are separated from those in neighboring classes. As\n",
+    "such, it works well in cases where the distribution of values is not unimodal.\n",
+    "In addition, the algorithm is relatively fast to compute. However, its\n",
+    "simplicitly can sometimes cause unexpected results. To the extent in only\n",
+    "considers the top $k-1$ differences between consecutive values, other more nuanced\n",
+    "within-group differences and dissimilarities can be ignored.\n",
     "\n",
     "#### Box-Plot\n",
     "\n",
@@ -675,7 +687,7 @@
     "\n",
     "#### Head Tail Breaks\n",
     "\n",
-    "The head tail algorithm, introduced by Jiang (2013) is based on a recursive partioning of the data using splits around\n",
+    "The head tail algorithm, introduced by Jiang (2013), is based on a recursive partioning of the data using splits around\n",
     "iterative means. The splitting process continues until the distributions within each of\n",
     "the classes no longer display a heavy-tailed distribution in the sense that\n",
     "there is a balance between the number of smaller and larger values assigned to\n",
@@ -857,12 +869,12 @@
     "the number of classes and the class boundaries pose a problem to the map\n",
     "designer. Recall that the Freedman-Diaconis rule was said to be optimal,\n",
     "however, the optimality necessitates the specification of an objective function.\n",
-    "In the case of Freedman-Diaconis the objective function is to minimize the\n",
+    "In the case of Freedman-Diaconis, the objective function is to minimize the\n",
     "difference between the area under estimated kernel density based on the sample\n",
     "and the area under the theoretical population distribution that generated the\n",
     "sample.\n",
     "\n",
-    "This notion of statistical fit is an important one, however, not the\n",
+    "This notion of statistical fit is an important one. However, it is not the\n",
     "only consideration when evaluating classifiers for the purpose of choropleth\n",
     "mapping. Also relevant is the spatial distribution of the attribute values and\n",
     "the ability of the classifier to convey a sense of that spatial distribution. As\n",
@@ -2393,11 +2405,27 @@
     "to see the references cited.\n",
     "\n",
     "At the same time, given the philosophy underlying PySAL the methods we cover\n",
-    "here are sufficient for exploratory data analysis where rapid and flexible\n",
-    "generation of views is critical to the work flow. Once the analysis is complete\n",
+    "here are sufficient for exploratory data analysis where the rapid and flexible\n",
+    "generation of views is critical to the work flow. Once the analysis is complete,\n",
     "and the final presentation quality maps are to be generated, there are excellent\n",
     "packages in the data stack that the user can turn to.\n",
     "\n",
+    "## Questions\n",
+    "\n",
+    "1. A variable such as population density measured for census tracts in a metropolitan area can display a high degree of skewness. What is an appropriate choice for a choropleth classification for such a variable?\n",
+    "2. Provide two solutions to the problem of ties when applying quantile classification to the following series: $y=[2,2,2,2,2,2,4,7,8,9,20,21]$ and $k=4$. Discuss the merits of each approach.\n",
+    "3. Which classifiers are appropriate for data that displays a high degree of multi-modality in its statistical distribution?\n",
+    "4. Contrast and compare classed choropleth maps with class-less choropleth maps? What are the strengths and limitations of each type of visualization for spatial data?\n",
+    "5. In what ways do choropleth classifiers treat intra-class and inter-class heterogeneity differently? What are the implications of these choices?\n",
+    "6. To what extent do most commonly employed choropleth classification methods take the geographical distribution of the variable into consideration? Can you think of ways to incorporate the spatial features of a variable into a classification?\n",
+    "7. Discuss the similarities between  the choice of the number of classes in choropleth mapping, on the one hand, and the determination of the number of clusters in a data set on the other. What aspects of choropleth mapping differentiate the former from the latter?\n",
+    "8. The Fisher-Jenks classifier will always dominate other k-classifiers for a given data set, with respect to statistical fit. Given this, why might one decide on choosing a different k-classifier for a particular data set?"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
     "## References\n",
     "\n",
     "Duque, J.C., L. Anselin, and S.J. Rey. (2012) \"The max-p regions problem.\" *Journal of Regional Science*, 52:397-419.\n",
diff --git a/notebooks/05_choropleth.md b/notebooks/05_choropleth.md
index 23427060..e1dd3280 100644
--- a/notebooks/05_choropleth.md
+++ b/notebooks/05_choropleth.md
@@ -160,7 +160,7 @@ range of the attribute values. Given $w$ the number of bins ($k$) is:
 $$k=(max-
 min)/w.$$
 
-Below we present several approaches to create these break points that follow criteria that can be of interest in different contests, as they focus on different priorities.
+Below we present several approaches to create these break points that follow criteria that can be of interest in different contexts, as they focus on different priorities.
  
 #### Equal Intervals
 
@@ -169,9 +169,9 @@ the width and, in turn, the number of bins for the classification. This is a
 special case of a more general classifier known as "equal intervals", where each
 of the bins has the same width in the value space. 
 For a given value of $k$, equal intervals
-classification splits the range of the attribute space into equal lengthened
+classification splits the range of the attribute space into $k$ equal length
 intervals, with each interval having a width
-$w = \frac{x_0 - x_{n-1}}{k-1}$.
+$w = \frac{x_0 - x_{n-1}}{k}$.
 Thus the maximum class is $(x_{n-1}-w, x_{n-1}]$ and the first class is
 $(-\infty, x_{n-1} - (k-1)w]$.
 
@@ -215,7 +215,7 @@ While quantiles does avoid the pitfall of sparse classes, this classification is
 not problem free. The varying widths of the intervals can be markedly different
 which can lead to problems of interpretation. A second challenge facing quantiles
 arises when there are a large number of duplicate values in the distribution
-such that the limits for one or more classes become ambiguous.
+such that the limits for one or more classes become ambiguous. For example, if one had a variable with $n=20$ but 10 of the observations took on the same value which was the minimum observed, then for values of $k>2$, the class boundaries become ill-defined since a simple rule of splitting at the $n/k$ ranked observed value would depend upon how ties are tried when ranking.
 
 #### Mean-standard deviation
 
@@ -236,20 +236,32 @@ msd
 
 This classifier is best used when data is normally distributed or, at least, when the sample mean is a meaningful measure to anchor the classification around. Clearly this is
 not the case for our income data as the positive skew results in a loss of
-information when we use the standard deviation. The lack of symmetry thus leads to
-inadmissible boundaries for the first  class as well as a concentration of the
+information when we use the standard deviation. The lack of symmetry leads to
+an inadmissible upper bound for the first  class as well as a concentration of the
 vast majority of values in the middle class.
 
 #### Maximum Breaks
 
-The maximum breaks classifier decides where to set the break points between classes by considering the difference between sorted values. That is, rather than considering a value of the dataset in itself, it looks at how appart each value is from the next one in the sorted sequence. The classifier then places the the $k-1$ break points in between the $k$ values most stretched apart from each other in the entire sequence:
+The maximum breaks classifier decides where to set the break points between
+classes by considering the difference between sorted values. That is, rather
+than considering a value of the dataset in itself, it looks at how appart each
+value is from the next one in the sorted sequence. The classifier then places
+the the $k-1$ break points in between the pairs of values most stretched apart from
+each other in the entire sequence, proceeding in descending order relative to
+the size of the breaks:
 
 ```python
 mb5 = mapclassify.Maximum_Breaks(mx['PCGDP1940'], k=5)
 mb5
 ```
 
-Maximum breaks is an appropriate approach when we are interested in making sure observations in each class are as similar to each other as possible. As such, it works well in cases where the distribution of values is not unimodal. In addition, the algorithm is relatively fast to compute. However, its simplicitly can sometimes cause unexpected results. To the extent in only considers the top $k$ differences between consecutive values, other more nuanced within-group differences and dissimilarities can be ignored.
+Maximum breaks is an appropriate approach when we are interested in making sure
+observations in each class are separated from those in neighboring classes. As
+such, it works well in cases where the distribution of values is not unimodal.
+In addition, the algorithm is relatively fast to compute. However, its
+simplicitly can sometimes cause unexpected results. To the extent in only
+considers the top $k-1$ differences between consecutive values, other more nuanced
+within-group differences and dissimilarities can be ignored.
 
 #### Box-Plot
 
@@ -283,7 +295,7 @@ neighboring internal classes.
 
 #### Head Tail Breaks
 
-The head tail algorithm, introduced by Jiang (2013) is based on a recursive partioning of the data using splits around
+The head tail algorithm, introduced by Jiang (2013), is based on a recursive partioning of the data using splits around
 iterative means. The splitting process continues until the distributions within each of
 the classes no longer display a heavy-tailed distribution in the sense that
 there is a balance between the number of smaller and larger values assigned to
@@ -349,12 +361,12 @@ As a special case of clustering, the definition of
 the number of classes and the class boundaries pose a problem to the map
 designer. Recall that the Freedman-Diaconis rule was said to be optimal,
 however, the optimality necessitates the specification of an objective function.
-In the case of Freedman-Diaconis the objective function is to minimize the
+In the case of Freedman-Diaconis, the objective function is to minimize the
 difference between the area under estimated kernel density based on the sample
 and the area under the theoretical population distribution that generated the
 sample.
 
-This notion of statistical fit is an important one, however, not the
+This notion of statistical fit is an important one. However, it is not the
 only consideration when evaluating classifiers for the purpose of choropleth
 mapping. Also relevant is the spatial distribution of the attribute values and
 the ability of the classifier to convey a sense of that spatial distribution. As
@@ -632,11 +644,23 @@ choropleth maps. Readers interested in pursuing this literature are encouraged
 to see the references cited.
 
 At the same time, given the philosophy underlying PySAL the methods we cover
-here are sufficient for exploratory data analysis where rapid and flexible
-generation of views is critical to the work flow. Once the analysis is complete
+here are sufficient for exploratory data analysis where the rapid and flexible
+generation of views is critical to the work flow. Once the analysis is complete,
 and the final presentation quality maps are to be generated, there are excellent
 packages in the data stack that the user can turn to.
 
+## Questions
+
+1. A variable such as population density measured for census tracts in a metropolitan area can display a high degree of skewness. What is an appropriate choice for a choropleth classification for such a variable?
+2. Provide two solutions to the problem of ties when applying quantile classification to the following series: $y=[2,2,2,2,2,2,4,7,8,9,20,21]$ and $k=4$. Discuss the merits of each approach.
+3. Which classifiers are appropriate for data that displays a high degree of multi-modality in its statistical distribution?
+4. Contrast and compare classed choropleth maps with class-less choropleth maps? What are the strengths and limitations of each type of visualization for spatial data?
+5. In what ways do choropleth classifiers treat intra-class and inter-class heterogeneity differently? What are the implications of these choices?
+6. To what extent do most commonly employed choropleth classification methods take the geographical distribution of the variable into consideration? Can you think of ways to incorporate the spatial features of a variable into a classification?
+7. Discuss the similarities between  the choice of the number of classes in choropleth mapping, on the one hand, and the determination of the number of clusters in a data set on the other. What aspects of choropleth mapping differentiate the former from the latter?
+8. The Fisher-Jenks classifier will always dominate other k-classifiers for a given data set, with respect to statistical fit. Given this, why might one decide on choosing a different k-classifier for a particular data set?
+
+
 ## References
 
 Duque, J.C., L. Anselin, and S.J. Rey. (2012) "The max-p regions problem." *Journal of Regional Science*, 52:397-419.
diff --git a/notebooks/07_local_autocorrelation.md b/notebooks/07_local_autocorrelation.md
index 04d7ffc1..d4a2aaf9 100644
--- a/notebooks/07_local_autocorrelation.md
+++ b/notebooks/07_local_autocorrelation.md
@@ -6,11 +6,11 @@ jupyter:
       extension: .md
       format_name: markdown
       format_version: '1.1'
-      jupytext_version: 1.1.6
+      jupytext_version: 1.1.7
   kernelspec:
-    display_name: Python 3
+    display_name: Analysis
     language: python
-    name: python3
+    name: ana
 ---
 
 # Local Spatial Autocorrelation
@@ -95,7 +95,7 @@ And with this elements, we can generate a choropleth to get a quick sense of the
 f, ax = plt.subplots(1, figsize=(9, 9))
 ax.imshow(img, extent=ext, alpha=0.5)
 choropleth(db, column='Pct_Leave', cmap='viridis', scheme='quantiles',
-        k=5, edgecolor='white', linewidth=0.1, alpha=0.75, legend=True, ax=ax)
+           k=5, edgecolor='white', linewidth=0.1, alpha=0.75, legend=True, ax=ax)
 plt.text(ext[0],ext[2], lic, size=8)
 ax.set_axis_off()
 ```
@@ -104,14 +104,11 @@ The final piece we need before we can delve into spatial autocorrelation is the
 
 ```python
 # Generate W from the GeoDataFrame
-w = weights.Distance.KNN.from_dataframe(db, k=8)
+w = weights.distance.KNN.from_dataframe(db, k=8)
 # Row-standardization
 w.transform = 'R'
 ```
 
-**Should we adopt some scheme so as to refer to earlier chapters, as in the case of maps and weights here to improve the flow of the book and reduce any repetition of basic tasks?**
-
-
 ### Motivating Local Spatial Autocorrelation
 
 To better understand the underpinning of local autorocorrelation, we will return to the Moran Plot as a graphical tool. Let us first calculate the spatial lag of our variable of interest:
@@ -355,9 +352,9 @@ Similar to the global case, there are more local indicators of spatial correlati
 
 ```python
 # Gi
-gaol = esda.getisord.G_Local(db['Pct_Leave'], w)
+gostats = esda.getisord.G_Local(db['Pct_Leave'], w)
 # Gi*
-gaols = esda.getisord.G_Local(db['Pct_Leave'], w, star=True)
+gostars = esda.getisord.G_Local(db['Pct_Leave'], w, star=True)
 ```
 
 As the local statistics they are, it is best to explore them by plotting them on a map. Unlike with LISA though, the $G$ statistics only allow to identify positive spatial autocorrelation. When standardized, positive values imply clustering of high values, while negative implies grouping of low values. Unfortunately, it is not possible to discern spatial outliers.
@@ -416,7 +413,7 @@ def g_map(g, geog, img, ext, ax):
 # Setup figure and axes
 f, axs = plt.subplots(1, 2, figsize=(12, 6))
 # Loop over the two statistics and generate the map
-for g, ax in zip([gaol, gaols], axs.flatten()):
+for g, ax in zip([gostats, gostars], axs.flatten()):
     ax = g_map(g, db, img, ext, ax)
 # Render
 plt.show()
@@ -425,6 +422,36 @@ plt.show()
 As you can see, the results are virtually the same for $G_i$ and $G_i^*$. Also, at first glance, these maps appear to be visually similar to the final LISA map from above, and this leads to the question of why use the $G$ statistics at all. The answer to this question is that the two sets of local statistics, Local $I$ and the local $G$, are complementary statistics. This is because the local $I$ by itself cannot distinguish between the two forms of positive spatial association while the G can. At the same time, the G statistic does not consider negative spatial association, while the local I statistic does.
 
 
+# Questions
+1. Do the same Local Moran analysis done for `Pct_Leave`, but using `Pct_Turnout`. Is there a geography to how involved people were in different places? Where was turnout percentage (relatively) higher or lower? 
+2. Do the same Getis-Ord analysis done for `Pct_Leave`, but using `Pct_Turnout`. 
+3. Local Moran statistics are premised on a few distributional assumptions. One well-recognized concern with Moran statistics is when they are estimated for *rates*. Rate data is distinct from other kinds of data because it embeds the relationship between two quantities: the event and the population. For instance, in the case of Leave voting, the "event" is a person voting leave, and the "population" could be the number of eligible voters, the number of votes cast, or the total number of people. This usually only poses a problem for analysis when the event outcome is somehow dependent on the population. 
+    1. Using our past analytical steps, build a new `db` dataframe from `ref` and `lads` that contains the `Electorate`, `Votes_Cast`, and `Leave` columns. 
+    - From this new dataframe, make scatterplots of:
+       - the number of votes cast and the percent leave vote
+       - the size of the electorate and the percent of leave vote
+    2. Based on your answers to the previous point, does it appear that there is a relationship between the event and the population size? Use `scipy.stats.kendalltau` or `scipy.stats.pearsonr`  to confirm your visual intuition. 
+        3. Using `esda.moran.Moran_Rate`, estimate a global Moran's I that takes into account the rate structure of `Pct_Leave`, using the `Electorate` as the population. Is this estimate different from the one obtained without taking into account the rate structure? What about when `Votes_Cast` is used for the population? 
+    4. Using `esda.moran.Moran_Local_Rate`, estimate *local* Moran's I treating Leave data as a rate.
+        - does any site's local I change? Make a scatterplot of the `lisa.Is` you estimated before and this new rate-based local Moran. 
+        - does any site's local I change their outlier/statistical significance classifications? Use `pandas.crosstab` to examine how many classifications change between the two kinds of statistic. Make sure to consider observations' statistical significances in addition to their quadrant classification.
+    5. Make two maps, side by side, of the local statistics without rate correction and with rate correction. Does your interpretation of the maps change depending on the correction?
+4. Local statistics use *permutation-based* inference for their significance testing. This means that, to test the statistical significance of a local relationship, values of the observed variable are *shuffled* around the map. These large numbers of *random* maps are then used to compare against the observed map. Local inference requires some restrictions on how each shuffle occurs, since each observation must be "fixed" and compared to randomly-shuffle neighboring observations. The distribution of local statistics for each "shuffle" is contained in the `.rlisas` attribute of a Local Moran object. 
+    - For the first observation, make a `seaborn.distplot` of its shuffled local statistics. Add a vertical line to the histogram using `plt.axvline()`. 
+    - Do the same for the last observation as well. 
+    - Looking only at their permutation distributions, do you expect the first LISA statistic to be statistically-significant? Do you expect the last?
+5. LISAs have some amount of fundamental uncertainty due to their estimation. This is called the `standard error` of the statistic.
+    - The standard errors are contained in the `.seI_sim` attribute. Make a map of the standard errors. Are there any areas of the map that appear to be more uncertain about their local statistics? 
+    - compute the standard deviation of each observation's "shuffle" distribution, contained in the `.rlisas` attribute. Verify that the standard deviation of this shuffle distribution is the same as the standard errors in `seI_sim`. 
+6. Local Getis-Ord statistics come in two forms. As discussed above, Getis-Ord $G_i$ statistics *omit* each site from their own local statistic. In contrast, $G_i^*$ statistics *include* the site in its own local statistic.
+    - Make a scatterplot of the two types of statistic, contained in `gostats.Zs` and `gostars.Zs` to examine how similar the two forms of the Getis-Ord statistic are. 
+    - The two forms of the Getis-Ord statistic differ by their inclusion of the *site* value, $y_i$, in the value for the $G_i$ statistic at that site. So, make a scatterplot of the percent leave variable and the *difference* of the two statistics. Is there a relationship between the percent leave vote and the difference in the two forms of the Getis-Ord statistic? Confirm this for yourself using `scipy.stats.kendalltau` or `scipy.stats.pearsonr`. 
+
+
 ---
 
 <a rel="license" href="http://creativecommons.org/licenses/by-nc-nd/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-nd/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-nd/4.0/">Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License</a>.
+
+```python
+
+```
diff --git a/notebooks/10_clustering_and_regionalization.ipynb b/notebooks/10_clustering_and_regionalization.ipynb
index 509c279e..9dbd3303 100644
--- a/notebooks/10_clustering_and_regionalization.ipynb
+++ b/notebooks/10_clustering_and_regionalization.ipynb
@@ -2179,13 +2179,20 @@
     "Clustering and regionalization are intimately related to the analysis of spatial autocorrelation as well,\n",
     "since the spatial structure and covariation in multivariate spatial data is what\n",
     "determines the spatial structure and data profile of discovered clusters or regions.\n",
-    "Thus, clustering and regionalization are essential tools for the spatial data scientist."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
+    "Thus, clustering and regionalization are essential tools for the spatial data scientist.\n",
+    "\n",
+    "## Questions\n",
+    "\n",
+    "1. What disciplines employ regionalization? Cite concrete examples for each discipline you list.\n",
+    "2. Contrast and compare  the concepts of *clusters* and *regions*?\n",
+    "3. In evaluating the quality of the solution to a regionalization problem, how might traditional measures of cluster evaluation be used? In what ways might those measures be limited and need expansion to consider the geographical dimensions of the problem?\n",
+    "4. Discuss the implications for the processes of regionalization that follow from the number of *connected components* in the spatial weights matrix that would be used.\n",
+    "5. True or false: The average silhouette score for a spatially constrained solution will be no larger than the average silhouette score for an unconstrained solution. Why, or why not? (add reference and  or explain silhouette)\n",
+    "6. Consider two possible weights matrices for use in a spatially constrained clustering problem. Both form a single connected component for all the areal units. However, they differ in the sparsity of their adjacency graphs (think Rook versus queen). How might this sparsity affect the quality of the clustering solution?\n",
+    "7. What are the challenges and opportunities that spatial dependence pose for spatial cluster formation?\n",
+    "8. In other areas of spatial analysis, the concept of multilevel modeling (cites) exploits the hierarchical nesting of spatial units at different levels of aggregation. How might such nesting be exploited in the implementation of regionalization algorithms? What are some possible limitations/challenges that such nesting imposes/represents in obtaining a regionalization solution.\n",
+    "9. Using a spatial weights object obtained as `w = libpysal.weights.lat2W(20,20)`, what are the number of unique ways to partition the graph into 20 clusters of 20 units each, subject to each cluster being a connected component? What are the unique number of possibilities for `w = libpysal.weights.lat2W(20,20, rook=False)` ?\n",
+    "\n",
     "---\n",
     "\n",
     "<a rel=\"license\" href=\"http://creativecommons.org/licenses/by-nc-nd/4.0/\"><img alt=\"Creative Commons License\" style=\"border-width:0\" src=\"https://i.creativecommons.org/l/by-nc-nd/4.0/88x31.png\" /></a><br />This work is licensed under a <a rel=\"license\" href=\"http://creativecommons.org/licenses/by-nc-nd/4.0/\">Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License</a>."
diff --git a/notebooks/10_clustering_and_regionalization.md b/notebooks/10_clustering_and_regionalization.md
index 63c5849d..688a4fc0 100644
--- a/notebooks/10_clustering_and_regionalization.md
+++ b/notebooks/10_clustering_and_regionalization.md
@@ -877,6 +877,17 @@ since the spatial structure and covariation in multivariate spatial data is what
 determines the spatial structure and data profile of discovered clusters or regions.
 Thus, clustering and regionalization are essential tools for the spatial data scientist.
 
+## Questions
+
+1. What disciplines employ regionalization? Cite concrete examples for each discipline you list.
+2. Contrast and compare  the concepts of *clusters* and *regions*?
+3. In evaluating the quality of the solution to a regionalization problem, how might traditional measures of cluster evaluation be used? In what ways might those measures be limited and need expansion to consider the geographical dimensions of the problem?
+4. Discuss the implications for the processes of regionalization that follow from the number of *connected components* in the spatial weights matrix that would be used.
+5. True or false: The average silhouette score for a spatially constrained solution will be no larger than the average silhouette score for an unconstrained solution. Why, or why not? (add reference and  or explain silhouette)
+6. Consider two possible weights matrices for use in a spatially constrained clustering problem. Both form a single connected component for all the areal units. However, they differ in the sparsity of their adjacency graphs (think Rook versus queen). How might this sparsity affect the quality of the clustering solution?
+7. What are the challenges and opportunities that spatial dependence pose for spatial cluster formation?
+8. In other areas of spatial analysis, the concept of multilevel modeling (cites) exploits the hierarchical nesting of spatial units at different levels of aggregation. How might such nesting be exploited in the implementation of regionalization algorithms? What are some possible limitations/challenges that such nesting imposes/represents in obtaining a regionalization solution.
+9. Using a spatial weights object obtained as `w = libpysal.weights.lat2W(20,20)`, what are the number of unique ways to partition the graph into 20 clusters of 20 units each, subject to each cluster being a connected component? What are the unique number of possibilities for `w = libpysal.weights.lat2W(20,20, rook=False)` ?
 
 ---
 
diff --git a/notebooks/booktools.py b/notebooks/booktools.py
index 6b652512..03aba81c 100644
--- a/notebooks/booktools.py
+++ b/notebooks/booktools.py
@@ -54,4 +54,4 @@ def choropleth(df, column, scheme='Quantiles', k=5, cmap='BluGrn', legend=False,
                                    edgecolor=edgecolor, linewidth=linewidth, \
                                    alpha=alpha, ax=ax)
     return ax
-    
\ No newline at end of file
+