# Univariate local join counts (LJC), bivariate LJC, multivariate LJC

### Univariate local join counts (LJC)


To review, the global black-black (1-1) count is the total across the entire study area:

$$BB = \sum_{i} \sum_j w_{ij} x_{i} x_{j}$$

However, the local is: 

$$BB_i = x_i \sum_{j} w_{ij} x_j$$

...where a count of the neighbors with an observation of $x_j=1$ for those locations where $x_i=1$. This focuses on the BB counts of a given polygon (x_i).

What we will do now is to remake the data as it appears in the docstrings of the join_count function and try to get the bivariate, then multivariate, up and running.



In [2]:
import numpy as np
import libpysal
import pandas as pd

# Create a 16x16 grid
w = libpysal.weights.lat2W(4, 4)
y_1 = np.ones(16)
# Set the first 9 of the ones to 0
y_1[0:8] = 0
print('new y_1', y_1)

new y_1 [0. 0. 0. 0. 0. 0. 0. 0. 1. 1. 1. 1. 1. 1. 1. 1.]


Carry out some standardization processes

In [3]:
# Flatten the input vector y
y_1 = np.asarray(y_1).flatten()
print(y_1)
# ensure weights are binary transformed
w.transformation = 'b'

[0. 0. 0. 0. 0. 0. 0. 0. 1. 1. 1. 1. 1. 1. 1. 1.]


Identify univariate local join counts

In [4]:
adj_list = w.to_adjlist(remove_symmetric=False) 
print(adj_list)
zseries_var1 = pd.Series(y_1, index=w.id_order)
focal_var1 = zseries_var1.loc[adj_list.focal].values
neighbor_var1 = zseries_var1.loc[adj_list.neighbor].values
# Identify which adjacency lists are both equal to 1
BBs = (focal_var1 == 1) & (neighbor_var1 == 1)
BBs
# also convert to a 0/1 array
BBs.astype('uint8')

    focal  neighbor  weight
0       0         4     1.0
1       0         1     1.0
2       1         0     1.0
3       1         5     1.0
4       1         2     1.0
5       2         1     1.0
6       2         6     1.0
7       2         3     1.0
8       3         2     1.0
9       3         7     1.0
10      4         0     1.0
11      4         8     1.0
12      4         5     1.0
13      5         1     1.0
14      5         4     1.0
15      5         9     1.0
16      5         6     1.0
17      6         2     1.0
18      6         5     1.0
19      6        10     1.0
20      6         7     1.0
21      7         3     1.0
22      7         6     1.0
23      7        11     1.0
24      8         4     1.0
25      8        12     1.0
26      8         9     1.0
27      9         5     1.0
28      9         8     1.0
29      9        13     1.0
30      9        10     1.0
31     10         6     1.0
32     10         9     1.0
33     10        14     1.0
34     10        11 

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1], dtype=uint8)

Now we need to map these T/F to the arrangement of the vector. While I got the correct output, it's a bit messy. This part is tricky and will likely need optimization - can we exploit the natural structure of the adjacency list without needing to make a new dataframe and run a groupby? 

In [5]:
# Create a df that uses the adjacency list focal values and the BBs counts
temp = pd.DataFrame(adj_list.focal.values, BBs.astype('uint8')).reset_index()
# Temporarily rename the columns
temp.columns = ['BB', 'ID']
temp = temp.groupby(by='ID').sum()
temp.BB.values

array([0, 0, 0, 0, 0, 0, 0, 0, 2, 3, 3, 2, 2, 3, 3, 2], dtype=uint64)

We will still need to sort out inference!

## Bivariate local join counts

Moving on to bivariate local join counts. Using the nomenclature `x` and `z` to represent the two variables.

Firstly, the **co-location** of the events at location $i$ need to be taken into account. As explained in the other workbook, there are two kinds of bivariate local join counts considered. 

**Case 1: no in-situ co-location**: 'where $x_i$ and $z_i$ do NOT take on the same value at either location $i$ (itself) or $j$ (neighbors)

- Example: when $x_i=1$ for location $i$, then $z_i=0$. We count the number of neighbors of $i$ when $x_i=1$ for which the value of $z_j=1$ but the value of $x_j=0$

This effectively becomes a combination of identifcation where `x` is in a black(1)-white(0) join, and where `y` is in a white(0)-black(1) join.

Let's experiment...

In [6]:
x = y_1
z = [0,1,0,1,1,1,1,1,0,0,1,1,0,0,1,1]

print('x', x)
print('z', z)

x [0. 0. 0. 0. 0. 0. 0. 0. 1. 1. 1. 1. 1. 1. 1. 1.]
z [0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1]


Carry out some standardization procedures

In [7]:
# Flatten the input vector y
x = np.asarray(x).flatten()
z = np.asarray(z).flatten()
print(x)
print(z)
# ensure weights are binary transformed
w.transformation = 'b'

[0. 0. 0. 0. 0. 0. 0. 0. 1. 1. 1. 1. 1. 1. 1. 1.]
[0 1 0 1 1 1 1 1 0 0 1 1 0 0 1 1]


Create adjacency list

In [8]:
adj_list = w.to_adjlist(remove_symmetric=False) 
print(adj_list)

    focal  neighbor  weight
0       0         4     1.0
1       0         1     1.0
2       1         0     1.0
3       1         5     1.0
4       1         2     1.0
5       2         1     1.0
6       2         6     1.0
7       2         3     1.0
8       3         2     1.0
9       3         7     1.0
10      4         0     1.0
11      4         8     1.0
12      4         5     1.0
13      5         1     1.0
14      5         4     1.0
15      5         9     1.0
16      5         6     1.0
17      6         2     1.0
18      6         5     1.0
19      6        10     1.0
20      6         7     1.0
21      7         3     1.0
22      7         6     1.0
23      7        11     1.0
24      8         4     1.0
25      8        12     1.0
26      8         9     1.0
27      9         5     1.0
28      9         8     1.0
29      9        13     1.0
30      9        10     1.0
31     10         6     1.0
32     10         9     1.0
33     10        14     1.0
34     10        11 

We should now be able to use this adjacency list to run comparisons between `x` and `z`

In [9]:
# First, set up a series that maps the y values (input as self.y) to the weights table 
zseries_x = pd.Series(x, index=w.id_order)
zseries_z = pd.Series(z, index=w.id_order)

# Next, map the y values to the focal (i) values 
focal_x = zseries_x.loc[adj_list.focal].values
focal_z = zseries_z.loc[adj_list.focal].values

# Repeat the mapping but for the neighbor (j) values
neighbor_x = zseries_x.loc[adj_list.neighbor].values
neighbor_z = zseries_z.loc[adj_list.neighbor].values

Now the `y_1` and `y_2` vectors have been mapped to focal and neighbor objects (respectively `_var1` and `_var2`). 


In [10]:
BJC = (focal_x == 1) & (focal_z == 0) & (neighbor_x == 0) & (neighbor_z == 1)
BJC

array([False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False,  True, False, False,
        True, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False])

In [11]:
# Create a df that uses the adjacency list focal values and the BBs counts
temp = pd.DataFrame(adj_list.focal.values, BJC.astype('uint8')).reset_index()
# Temporarily rename the columns
temp.columns = ['BJC', 'ID']
temp = temp.groupby(by='ID').sum()
temp.BJC.values

array([0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0], dtype=uint64)

Appears to be working! Now onto the next case...

**Case 2: co-location cluster (CLC)**. This is when the interest is in co-located events being surrounded by other co-located events.

This requires $x_i=z_i=1$ as well as $x_j=z_j=1$ for the neighbors. Reviewing, we formally write this as:

$$ CLC_i = x_i * z_i \sum_j w_{ij} x_j z_j $$

Given that $x_i=z_i=1$, this becomes:

$$ CLC_i = 1 * 1 \sum_j w_{ij} x_j z_j $$

Let's now implement this from the above code. The only thing we need to change is how `BJC` are calculated (now `CLC`). 

In [12]:
CLC = (focal_x == 1) & (focal_z == 1) & (neighbor_x == 1) & (neighbor_z == 1)
CLC

array([False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False,  True,  True, False,
        True,  True, False, False, False, False, False,  True, False,
        True,  True,  True])

In [13]:
# Create a df that uses the adjacency list focal values and the BBs counts
temp = pd.DataFrame(adj_list.focal.values, CLC.astype('uint8')).reset_index()
# Temporarily rename the columns
temp.columns = ['CLC', 'ID']
temp = temp.groupby(by='ID').sum()
temp.CLC.values

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 0, 0, 2, 2], dtype=uint64)

Appears to be working! Now onto the multivariate case...

## Multivariate local join counts


Multivariate local join counts, at least those laid out in Anselin and Li 2019, is specific to expanding co-location clusters. Formally:

$$ CLC_i = \Pi^m_{h=1} x_{hi} \sum_j w_{ij} \Pi^m_{h=1} x_{hj} $$

From our example above, let's consider the variables `x`, `z`, and a new third variable called `y`.

In [14]:
x = x.astype(np.int32)
print('x', x)
print('z', z)
y = [0,1,1,1,1,1,1,1,0,0,0,1,0,0,1,1]
y = np.asarray(y).flatten()
print('y', y)

x [0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1]
z [0 1 0 1 1 1 1 1 0 0 1 1 0 0 1 1]
y [0 1 1 1 1 1 1 1 0 0 0 1 0 0 1 1]


While we could expand the conditions above, I want to build this out from the onset as handling several input variables. So let's make a quick toy function that handles multiple inputs:

In [15]:
def multipleinputs(inputs):
    # Printing them...
    print(inputs)    
    # looping through each input...
    for i in inputs:
        print(i)

In [16]:
multipleinputs([x,y,z])

[array([0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1]), array([0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1]), array([0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1])]
[0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1]
[0 1 1 1 1 1 1 1 0 0 0 1 0 0 1 1]
[0 1 0 1 1 1 1 1 0 0 1 1 0 0 1 1]


So we need some kind of function that can:
- create the zseries
- map the focal values
- map the nieghbor values
- run equivalency checks

We should be able to do this with a few list comprehension functions. Let's try:

In [18]:
variables = [x,y,z]

In [69]:
# The zseries
zseries = [pd.Series(i, index=w.id_order) for i in variables]
# The focal values
focal = np.array([zseries[i].loc[adj_list.focal].values for i in range(len(variables))])
# The neighbor values
neighbor = np.array([zseries[i].loc[adj_list.neighbor].values for i in range(len(variables))])

Print out the results and manually validate!

In [70]:
print(zseries)
print(focal)
print(neighbor)

[0     0
1     0
2     0
3     0
4     0
5     0
6     0
7     0
8     1
9     1
10    1
11    1
12    1
13    1
14    1
15    1
dtype: int32, 0     0
1     1
2     1
3     1
4     1
5     1
6     1
7     1
8     0
9     0
10    0
11    1
12    0
13    0
14    1
15    1
dtype: int32, 0     0
1     1
2     0
3     1
4     1
5     1
6     1
7     1
8     0
9     0
10    1
11    1
12    0
13    0
14    1
15    1
dtype: int32]
[[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1
  1 1 1 1 1 1 1 1 1 1 1 1]
 [0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 1
  1 1 0 0 0 0 0 1 1 1 1 1]
 [0 0 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 1 1 1 1 1
  1 1 0 0 0 0 0 1 1 1 1 1]]
[array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0,
       0, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1]), array([1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1,
       1, 1, 1, 0, 0, 1, 0, 

Now comes the most important part, we need to expand the following function to as many variables are input. For former co-location cluster (CLC) now becomes the multivariate co-location cluster (MCLC):

`MCLC = (focal_x == 1) & (focal_z == 1) & (focal_y == 1) & (neighbor_x == 1) & (neighbor_z == 1) & (neighbor_y == 1) & 
... (focal_m == 1) & (neighbor_m == 1)`

From this point we can use a trick from the original pysal esda join count implementation. Because we need the `focal` and `neighbor` values to all equal 1, we can multiply them. Any 0 will automatically reduce to 0, leaving only the valid 1 candidates left.

In [168]:
focal_all = np.all(np.dstack(focal)==1, axis=2)
neighbor_all = np.all(np.dstack(neighbor)==1, axis=2)

In [169]:
print(focal_all)

[[False False False False False False False False False False False False
  False False False False False False False False False False False False
  False False False False False False False False False False False  True
   True  True False False False False False  True  True  True  True  True]]


In [170]:
print(neighbor_all)

[[False False False False False False False False False False False False
  False False False False False False False False False False False  True
  False False False False False False False False False  True  True False
  False  True False False False False  True False False  True  True  True]]


Now we can return to the original implementation of `CLC` and identify those that are both 1...

In [217]:
MCLC = (focal_all == True) & (neighbor_all == True)
# Convert to boolean array
MCLC = list(MCLC*1)
MCLC

[array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,
        0, 1, 1, 1])]

In [219]:
# Create a df that uses the adjacency list focal values and the BBs counts
temp = pd.DataFrame(adj_list.focal.values, MCLC).reset_index()
# Temporarily rename the columns
temp.columns = ['MCLC', 'ID']
temp = temp.groupby(by='ID').sum()
temp.MCLC.values

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 2], dtype=int64)

Amazingly, it works!!! I thought this part would be the hardest for sure (this must mean that the hardest has yet to come!) There is likely some optimization to be done, but I'm quite relieved to have been able to at least pseudo-coded all of the functions in about 3-4 working days. There is still a major hurdle - inference - but I am going to get supervisor input on that. 

The next step is migrating the pseudo-code into functions that match the structure of existing pysal esda functions. That will be in a new workbook called `migration.ipynb`. After they have been migrated I'll then isolate each into a `.py` file.