Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weights for rasters or arrays with missing data #895

Closed
ozak opened this issue Dec 17, 2016 · 12 comments
Closed

Weights for rasters or arrays with missing data #895

ozak opened this issue Dec 17, 2016 · 12 comments

Comments

@ozak
Copy link

ozak commented Dec 17, 2016

Hi,

I was wondering if there is a way to assign weights to an array or raster using one of the weight functions? Or to change weights to take into account that there are missing data (e.g., GIS rasters). Here's a simple example of what I have in mind...

from scipy.stats import norm
import numpy as np

#Create raster without missing values
yp = norm.rvs(0,10,size=(5,5))
yp_f=yp.flatten()

# Weights
w=pysal.lat2W(5,5, rook=False)

# Works fine
lm = pysal.Moran_Local(yp_f,w)

# Include missing values
yp[:,0] = np.nan
yp_f=yp.flatten()

# Wrong answer, since it maintains weights
lm = pysal.Moran_Local(yp_f,w)

# Identify missing 
miss = np.isnan(yp_f)
missn = np.arange(0,len(miss))[miss]

# Drop missing from weight object?

missn shows which cells should not be included. Is there a way to edit the w object to take this into account? Or is there a function that will create the weights based on the actual raster?

Thanks for the help.

@ozak
Copy link
Author

ozak commented Dec 17, 2016

The following code changes the dictionary of weight by dropping all NaN cells. Assigning it to w.neighbors seems to correctly change the structure of w, but it is not able to compute things correctly and throws back an error at the end. Any help would be appreciate it.

cneighbors = {}
for i in missn:
    for key, value in w.neighbors.items():
        if key != i:
            if i in value:
                value = value.remove(i)
            if value is not None:
                cneighbors[key] = value

w.neighbors = cneighbors
lm = pysal.Moran_Local(yp_f,w)

ValueError: row, column, and data array must all be the same length

@ozak
Copy link
Author

ozak commented Dec 17, 2016

Ok...looking at the code I figured out what was causing the error. I also needed to change the weights. There was also a mistake in the previous code for dropping missing data

missn = set(np.arange(0,len(miss))[miss])
cneighbors = {}
for key, value in w.neighbors.items():
    if key not in missn:
        value = list(set(value).difference(missn))
        cneighbors[key] = value

cweights = {}
for key, value in cneighbors.items():
    cweights[key] = (np.ones_like(value)/len(value)).tolist()
w2 = pysal.W(cneighbors, cweights)
yp_f2 = yp_f[np.isnan(yp_f)==False]
lm2 = pysal.Moran_Local(yp_f2,w2)

Now it produces the data. Still trying to figure out if the output is correct.

@ozak
Copy link
Author

ozak commented Dec 17, 2016

Here's a function to implement this. May create a pull request once I figure out where it could go, and if there's interest in this.

def raster_weights(raster, rook=False, *args, **kwargs):
    """
    Construct PySal weights for rasters
    It drops weights for all cells that have no data or are Inf/NaN
    Usage:

    w = raster_weights(raster, rook=False, *args, **kwargs)

    where 
        raster: (Masked) Numpy array for which weights are to be constructed
        rook: Boolean, type of contiguity. Default is queen. For rook, rook = True
        the rest of arguments are passed to pysal.lat2W
    """
    rasterf=raster.flatten()
    w=pysal.lat2W(*raster.shape, rook=rook, *args, **kwargs)

    # Identify missing/no data
    if type(rasterf)==numpy.ma.core.MaskedArray:
        miss = rasterf.mask
    else:
        miss = np.logical_or(np.isnan(rasterf),np.isinf(rasterf))
    missn = set(np.arange(0,len(miss))[miss])

    cneighbors = {}
    for key, value in w.neighbors.items():
        if key not in missn:
            value = list(set(value).difference(missn))
            cneighbors[key] = value

    cweights = {}
    for key, value in cneighbors.items():
        cweights[key] = (np.ones_like(value)/len(value)).tolist()
    w = pysal.W(cneighbors, cweights)
    return w

@sjsrey
Copy link
Member

sjsrey commented Dec 18, 2016

Does this get to what you are interested in:

import pysal as ps
import numpy as np

def raster_weights(raster, missing, rook=False):
    w = ps.lat2W(*raster.shape, rook=rook)
    neighbors = w.neighbors.copy()
    for unit in missing:
        u_neighbors = neighbors[unit]
        for u_neighbor in u_neighbors:
            neighbors[u_neighbor].remove(unit)
        neighbors.pop(unit)
    return(ps.W(neighbors))

>>> raster = np.arange(9) * 10
>>> raster.shape = (3, 3)
>>> raster

array([[ 0, 10, 20],
       [30, 40, 50],
       [60, 70, 80]])
>>> w_0 = ps.lat2W(*raster.shape, rook=False)
>>> w_0.transform = 'r'
>>> y = raster.flatten()
>>> ps.lag_spatial(w_0, y)

array([ 26.66666667,  28.        ,  33.33333333,  36.        ,
        40.        ,  44.        ,  46.66666667,  52.        ,  53.33333333])
>>> (10 + 30 + 40) /3.
26.666666666666668
>>> missing = [1, 8]
>>> w_1 = raster_weights(raster, missing)
>>> y_1 = np.delete(y, missing)
>>> y_1
array([ 0, 20, 30, 40, 50, 60, 70])
>>> y
array([ 0, 10, 20, 30, 40, 50, 60, 70, 80])
>>> w_1.transform = 'r'
>>> ps.lag_spatial(w_1, y_1)

array([ 35.        ,  45.        ,  42.5       ,  38.33333333,
        43.33333333,  46.66666667,  45.        ])





@ozak
Copy link
Author

ozak commented Dec 18, 2016

@sjsrey Thanks for getting back to me.

Indeed, it generates the same result as in my function above. The main difference is the inputs and the order of some of the neighbors (although that should affect any of the results). Any reason there is no material on rasters in PySal (or for that matter there seems to be an emphasis in the literature /software packages on spatial correlation on geometries, but nothing on rasters)?

@ozak
Copy link
Author

ozak commented Dec 18, 2016

I've included part of your in my function

def raster_weights(raster, rook=False, transform = 'r', **kwargs):
    """
    Construct PySal weights for rasters
    It drops weights for all cells that have no data or are Inf/NaN
    Usage:

    w = raster_weights(raster, rook=False, *args, **kwargs)

    where 
        raster: (Masked) Numpy array for which weights are to be constructed
        rook: Boolean, type of contiguity. Default is queen. For rook, rook = True
        the rest of arguments are passed to pysal.lat2W
    """
    rasterf=raster.flatten()
    if len(raster.shape) == 1:
        shape = (np.sqrt(raster.shape[0]) * np.array([1,1])).astype(int)
    else:
        shape = raster.shape
    w=pysal.lat2W(*shape, rook=rook, **kwargs)

    # Identify missing/no data
    if type(rasterf)==numpy.ma.core.MaskedArray:
        miss = rasterf.mask
    else:
        miss = np.logical_or(np.isnan(rasterf),np.isinf(rasterf))
    missn = set(np.arange(0,len(miss))[miss])

    cneighbors = {}
    for key, value in w.neighbors.items():
        if key not in missn:
            value = list(set(value).difference(missn))
            cneighbors[key] = value
    w = pysal.W(cneighbors)
    w.transform = transform 
    return w

which seems to be much faster than your implementation (at least on the 5x5 array above).

%time raster_weights2(raster, missn)
CPU times: user 1.65 ms, sys: 563 µs, total: 2.21 ms
Wall time: 1.75 ms
Out[311]: <pysal.weights.weights.W at 0x10fc8ed10>

%time raster_weights(raster)
CPU times: user 715 µs, sys: 25 µs, total: 740 µs
Wall time: 920 µs
Out[312]: <pysal.weights.weights.W at 0x10c9e6f10>

@sjsrey
Copy link
Member

sjsrey commented Dec 20, 2016

@ozak thanks for pushing forward on this.

On the ordering of the weights, it is important to ensure that the order matches that in the corresponding attribute vector, otherwise the calculation of things like spatial autocorrelation coefficients and the spatial lag will be incorrect.

On the raster question, PySAL's origins were in vector based spatial analysis so rasters were not a key focus of the original developers. That said, if interested developers want to include support for rasters, we are open to it :->

On that score, for raster weights it may be more memory-efficient to calculate them on the fly rather than storing all the neighbors. Otherwise as the raster size grows, so too will the memory requirements of the weights structure. Of course there will be a trade-off in loss of speed for doing the weights on the fly rather than in-memory, but something to consider.

@ozak
Copy link
Author

ozak commented Dec 21, 2016

@sjsrey I am still trying to figure out if there is way of doing it cleanly. I was trying to implement some spatial correlation methods of PySal for my GeoRasters package, where I can ensure this done correctly using the a function map_vector and doing it internally. I am still trying go over PySal to know what else is there and how to implement it. I may try to link as many functions as possible or that could be useful to people using GIS rasters. Otherwise I would only do it for some functions I need at the moment and leave people to use similar code for their own use case by case.

One option could be to do all this under the hood in PySal, i.e. that PySal constructs and performs all the computations using the flattened raster. But I imagine this may require a lot of reqriting, unless one can adapt the basic classes to take care of this. I am not knowledgable enough about PySal at the moment, so I am not sure.

@ozak
Copy link
Author

ozak commented Dec 22, 2016

@sjsrey I have added the global spatial autocorrelation measures to my package (ozak/georasters@d6b1169). Since GeoRasters keeps properties of the raster with it, these were easy to add. Using the above function it is easy to compute any of these measures for any raster by executing

rasterf = raster.flatten()
nonmiss = rasterf.mask==False
gamma = pysal.Gamma(rasterf[nonmiss], w)

and similarly for the other global measures. I will continue with the local ones, which require a bit of more thought.

@sjsrey
Copy link
Member

sjsrey commented Dec 22, 2016

@ozak very nice to see the work on georasters 👍

I would be happy to help out over on georasters on the pysal interface if that would be of interest.

If so, I think one place that would be a good target would be to add some tests on the missing/mask properties for the raster and the W objects to ensure the alignment is correct. I think it is best to iron that out before moving on to the local autocorrelation stats.

Let me know if this makes sense. If it does, we can close the ticket here and move the discussion over to
georasters.

@ozak
Copy link
Author

ozak commented Dec 22, 2016

@sjsrey that sounds perfect!

@sjsrey
Copy link
Member

sjsrey commented Dec 24, 2016

Closing this here and working on downstream issue ozak/georasters#13

@sjsrey sjsrey closed this as completed Dec 24, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants