Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft questions for clustering and regionalization chapter #14

Merged
merged 16 commits into from
Jun 28, 2019

Conversation

sjsrey
Copy link
Contributor

@sjsrey sjsrey commented May 29, 2019

No description provided.

@sjsrey sjsrey requested review from darribas and ljwolf May 29, 2019 18:06
@sjsrey sjsrey added the WIP work in progress, do not merge - for discussion only label Jun 19, 2019
@sjsrey
Copy link
Contributor Author

sjsrey commented Jun 19, 2019

Sorry for the blending of the cluster and choro chapters. I got confused on the docker stuff and got branches out of sync.

I will separate these into different branches and prs in the next few days.

@sjsrey sjsrey mentioned this pull request Jun 19, 2019
@ljwolf
Copy link
Member

ljwolf commented Jun 24, 2019

Not sure how this happened, but this & the previous PR now need to be rebased because #16/#17 went in ahead? Not sure what changed, but I'm in the process of rebasing #13 now

@ljwolf
Copy link
Member

ljwolf commented Jun 24, 2019

Hey @sjsrey, I've rebased your PR and synced the jupytext representations up. This was particularly tricky, given that jupytext --sync works by last modified date, which isn't stable when you're also dealing with merge conflicts 😦

I've made sure that this has the questions for the weights,choro,local autocorrelation, & regionalization chapters in both markdown & ipynb representations, and believe this should be merged now that it's been rebased & synced. If you could double-check that your questions are there, I think this has been successfully merged with the changes from #16 #17

Copy link
Contributor Author

@sjsrey sjsrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the questions I drafted made it through. Sorry for the merge issues. I think this will smooth out once the workflow is settled.

@ljwolf
Copy link
Member

ljwolf commented Jun 24, 2019

No prob, this stuff happens 😄 @darribas can you check to be sure that this looks right to you before it merges?

@darribas
Copy link
Member

Will look at them this week as my first task for book matters!

Copy link
Member

@darribas darribas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think overall they're cool questions. My only concern is whether they're too open-ended. They don't really help solidify the concepts seen in the chapter, but they are very good at inviting the reader to think above and beyond those. This might be more of the purpose we might want to pursue.

3. In evaluating the quality of the solution to a regionalization problem, how might traditional measures of cluster evaluation be used? In what ways might those measures be limited and need expansion to consider the geographical dimensions of the problem?
4. Discuss the implications for the processes of regionalization that follow from the number of *connected components* in the spatial weights matrix that would be used.
5. True or false: The average silhouette score for a spatially constrained solution will be no larger than the average silhouette score for an unconstrained solution. Why, or why not? (add reference and or explain silhouette)
6. Consider two possible weights matrices for use in a spatially constrained clustering problem. Both form a single connected component for all the areal units. However, they differ in the sparsity of their adjacency graphs (think Rook versus queen). How might this sparsity affect the quality of the clustering solution?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could recast this question to make it more practical, along the lines @ljwolf does in other chapters. For example, on this one, it could be:

  1. Re-run the analysis in the chapter w/ a different set of weights.
  2. Compare the resulting clusters visually
  3. What are the key differences between the two W's?
  4. How do you think such differences affect the final result?

4. Discuss the implications for the processes of regionalization that follow from the number of *connected components* in the spatial weights matrix that would be used.
5. True or false: The average silhouette score for a spatially constrained solution will be no larger than the average silhouette score for an unconstrained solution. Why, or why not? (add reference and or explain silhouette)
6. Consider two possible weights matrices for use in a spatially constrained clustering problem. Both form a single connected component for all the areal units. However, they differ in the sparsity of their adjacency graphs (think Rook versus queen). How might this sparsity affect the quality of the clustering solution?
7. What are the challenges and opportunities that spatial dependence pose for spatial cluster formation?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one I think it's pretty hard for an introductory text.

5. True or false: The average silhouette score for a spatially constrained solution will be no larger than the average silhouette score for an unconstrained solution. Why, or why not? (add reference and or explain silhouette)
6. Consider two possible weights matrices for use in a spatially constrained clustering problem. Both form a single connected component for all the areal units. However, they differ in the sparsity of their adjacency graphs (think Rook versus queen). How might this sparsity affect the quality of the clustering solution?
7. What are the challenges and opportunities that spatial dependence pose for spatial cluster formation?
8. In other areas of spatial analysis, the concept of multilevel modeling (cites) exploits the hierarchical nesting of spatial units at different levels of aggregation. How might such nesting be exploited in the implementation of regionalization algorithms? What are some possible limitations/challenges that such nesting imposes/represents in obtaining a regionalization solution.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand this one

@darribas
Copy link
Member

OK, I've had a look at them and left some comments. I'd merge this anyway to avoid further conflicts and, if needed, we can open future PRs that feed into the chapter.

@ljwolf ljwolf merged commit b7fb4d3 into gdsbook:master Jun 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
WIP work in progress, do not merge - for discussion only
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants