Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consolidate MGWR #5

Merged
merged 80 commits into from Apr 4, 2018
Merged

Consolidate MGWR #5

merged 80 commits into from Apr 4, 2018

Conversation

TaylorOshan
Copy link
Collaborator

This PR:
(1) Merges together several copies of MGWR that were beginning to diverge from each other.
(2) Merges in several updates to the underlying GWR routines to the MGWR codebase
(3) Adds additional code to estimate a hat matrix in parallel with estimating the MGWR parameters.
(4) Adds some basic tests for MGWR (not including the hat matrix)

It would be good for everyone to a close look at this and make sure your favorite edits to the MGWR code are still here. I did my best to make sure everything was going in, but could have accidentally deleted something.

This should be seen as the starting point for the MGWR code that will underly the GUI and be released to the public, rather than a finished product. Still need to clean code up, add diagnostics, and more tests.

@TaylorOshan
Copy link
Collaborator Author

It should be noted, that the MGWR routine we used for the GA paper had changed the initial bandwidth parameterization so that line 331 in sel_bw.py is if a > self.bw_min: rather than if a < self.bw_min: as it currently is in this PR (and the public GWR repo). Changing it breaks some of the pre-existing tests, which take a manual BW. I think the tests were based on results from GWR4, so I think this is one of the scenarios where we will need to break from GWR4, which I think has it wrong (at least in some circumstances) on how to specify what the minimum BW search space should be. Just be advised for ongoing research that you may need to change line 331 to if a > self.bw_min: if you want to effectively be able to manually set a very small minimum bandwidth.

Copy link
Member

@Ziqi-Li Ziqi-Li left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good job! good to me.

gwr/sel_bw.py Outdated
def _mbw(self):
y = self.y
if self.constant:
X = USER.check_constant(self.X_loc)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation is not consistent.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be consistent now.

bw_max = self.bw_max
interval = self.interval
tol = self.tol
max_iter = self.max_iter
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether there is a need to define new variables since they are not transformed in any way.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably not, but I am going to leave them for now for the sake of getting a canonical version of MGWR set up - any PR's cleaning it up are welcome!

gwr/search.py Outdated
err = optim_model.resid_response.reshape((-1,1))
est = optim_model.params
else:
model = GLM(y, X, family=self.family, constant=False).fit()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no self.family.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, it should just be family=family.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its now just family. Should we think about taking out the GLM initialization?

@weikang9009
Copy link
Member

The interpretation of the optimal bandwidths produced from MGWR calibration depends on the magnitude of the independent variables (both mean and variance). A suggestion has been made in Note 5 of the MGWR paper (AAAG) to standardize them before calibrating an MGWR model to facilitate interpretation in terms of linking the values of the optimal bandwidths to the spatial scales of conditional relationships. I am just wondering if we are going to add some optional parameters to standardize X.

@TaylorOshan
Copy link
Collaborator Author

TaylorOshan commented Mar 15, 2018 via email

@ljwolf
Copy link
Member

ljwolf commented Mar 15, 2018

While an option for mean/variance standardization important, I think we would definitely need to make sure a "summary" function or the MGWR __repr__disclaims that this has been done... I believe users see those betas and assume they're in the units of their input unless they've standardized it themselves.

I wrestled with this same question in spvcm and decided to not standardize by default after shooting myself in the foot a few times and handing the gun to others for a few shots as well.

Copy link
Member

@ljwolf ljwolf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor changes/tiny questions.

@@ -1,5 +1,5 @@
"""
Diagnostics for estimated gwr models
Diagnostics for estimated gwr modesl
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

models.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, fixed it up.

gwr/search.py Outdated
err = optim_model.resid_response.reshape((-1,1))
est = optim_model.params
else:
model = GLM(y, X, family=self.family, constant=False).fit()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, it should just be family=family.

gwr/gwr.py Outdated
@cache_readonly
def predictions(self):
P = self.model.P
if P is None:
raise NotImplementedError('predictions only avaialble if predict'
'method called on GWR model')
raise NotImplementedError('predictions only avaialble if predict'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a TypeError or Exception? feels weird to call it a NotImplementedError, since it's only possible to predict if you've fit.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, TypeError makes a bit more sense. Might be a better way to implement in this in the future that avoids an ambiguous error. But just want to make sure the user gets some sort of message that indicates that GWRResults will only provide predictions if you've run gwr_model.predict() prior.

@weikang9009
Copy link
Member

I agree. The summary output for MGWR definitely needs to incorporate such standardization if it has been made (either it is made by default or not). Even if the users choose not to standardize, I think it is still important to remind them of a different interpretation of the optimal bandwidths in the summary output.

@Ziqi-Li
Copy link
Member

Ziqi-Li commented Mar 15, 2018

@weikang9009 that can be easily done in the GUI's summary output, but for the console, we don't currently have a summary output unless do something like what statsmodel does. I know GWmodel has a text summary output to the R console after finish running.

@Ziqi-Li
Copy link
Member

Ziqi-Li commented Mar 15, 2018

Perhaps can just do like print ("X and Y are standardized.") to the console if the standardization option is checked?

@weikang9009
Copy link
Member

+1. A summary output similar to spregmight be useful.

@TaylorOshan
Copy link
Collaborator Author

I've migrated the conversation about standardization to #6 where we can keep discussing, but I was hoping not to hamstring this PR on that issue. Also fixed up a bunch of minor revisions suggest over review, so I think this is ready to merge and become the up-to-date version of gwr/mgwr.

@TaylorOshan
Copy link
Collaborator Author

I also added a PR so that if a user manually inputs min or max bandwidth search parameters, then those are automatically set to the starting points of the search procedure. As it currently stands the default min/max are both zero and then this value is compared to the "rule of thumb" min/max and depending on if the manual min is lower than the standard min then the manual min becomes the min. However, this causes issues depending on whether the use wants to set a manual bandwidth that is smaller/larger than the standard min. So now the default is None and if they set min and/or the max then that will definitively become the search min/max.

@TaylorOshan TaylorOshan merged commit 8665c8c into pysal:master Apr 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants