# Segregation calculation

** Instructions**

For fast processing, you just need to change the file name/path at **Input file** (the file you want to read)  
and the file name/path at section **Save results to a local file** (the file you want to save results)

*make sure you don't use a name already existent or the file will be replaced*

With this in mind, just click on **Cell** menu and select **Run All**

In [1]:
import numpy as np
import pandas as pd
from decimal import Decimal
import time

In [2]:
# Import python script with segreg functions
from segregationMetrics import Metrics

In [3]:
cc = Metrics()

**Input file**

In [4]:
'''
Data prepared as id,  x, y , sum, attribute 1, attributes 2, attributes 3, Attribute n
'''

# cc.readAttributesFile('res/ks201_oa_ethnic.csv')
# cc.readAttributesFile('data/AP2010_CEM_RMSP_EDU.csv')
cc.readAttributesFile('data/oa2011_ks201_final.csv')

matrix([['E00000095', 549358.25887, 184837.90058, ..., 139, 23, 53],
        ['E00000096', 548892.82629, 184783.76665, ..., 59, 21, 15],
        ['E00000097', 548740.34314, 184792.6435, ..., 55, 19, 29],
        ..., 
        ['E00176593', 546272.3182399999, 180352.39283, ..., 194, 17, 67],
        ['E00176594', 539345.40842, 177707.68125999998, ..., 2, 9, 28],
        ['E00176595', 539768.07279, 179191.6625, ..., 44, 52, 58]], dtype=object)

**Compute population intensity**

Distance matrix is calculated at this step

In [5]:
'''
Parameters:
bandwidth - is set to be 5000m by default, you can change it here
weightmethod - 1 for gaussian, 2 for bi-square and empty for moving window
'''

start_time = time.time()
cc.locality = cc.cal_localityMatrix(bandwidth=3000, weightmethod=1)
print("--- %s seconds ---" % (time.time() - start_time))

--- 302.97800517082214 seconds ---


In [6]:
# List of coordinates from file (x, y) for validation
cc.location

matrix([[ 549358.25887,  184837.90058],
        [ 548892.82629,  184783.76665],
        [ 548740.34314,  184792.6435 ],
        ..., 
        [ 546272.31824,  180352.39283],
        [ 539345.40842,  177707.68126],
        [ 539768.07279,  179191.6625 ]])

In [7]:
# Display local population intensity for groups
cc.locality

array([[ 179.55986237,   60.90641023,   61.39497257,   35.41755751],
       [ 170.41920096,   63.32206618,   69.43066773,   37.10399423],
       [ 167.4614122 ,   63.9095381 ,   72.30135194,   37.64640359],
       ..., 
       [ 145.28325008,   81.11422858,   64.61622497,   45.01071513],
       [ 130.03201168,   70.58640076,   49.1570287 ,   65.4631778 ],
       [ 118.01015482,   68.68969255,   69.86525667,   67.67724396]])

In [8]:
# To select locality for a specific line (validation), use the index in[x,:]
cc.locality[5,:]

array([ 192.63810859,   40.5961701 ,   68.65265029,   32.70552713])

**Compute Local Dissimilarity**

In [9]:
# Function to compute local dissimilarity
d_localsimilarity = cc.cal_localDissimilarity()
# To select a specific element use the index [x] or a range [x:x]
d_localsimilarity[0:9]

array([  8.59266977e-06,   6.46957977e-06,   6.90275639e-06,
         7.46691569e-06,   5.51224838e-06,   4.46852226e-06,
         4.70113570e-06,   7.14297804e-06,   7.00274516e-06])

**Compute Global Dissimilarity**

In [10]:
# Fution to compute global dissimilarity
d_globaldis = cc.cal_globalDissimilarity()
# Display gloabl value
d_globaldis

0.395221077266248

** Compute Local Exposure/Isolation**

In [11]:
''''
expo is a matrix of n_group * n_group therefore, exposure (m,n) = rs[m,n]
the columns are exporsure m1 to n1, to n2... n5, m2 to n1....n5
- m,m = isolation index of group m
- m,n = expouse index of group m to n
'''
expo_local = cc.cal_localExposure()

In [12]:
# For validation, shape must be total lines of file by groups squared
expo_local.shape

(49890, 16)

In [13]:
# Result of all combinations of local groups expousure/isolation
expo_local

array([[  1.43918361e-05,   4.88168714e-06,   4.92084572e-06, ...,
          4.15072582e-06,   4.18402098e-06,   2.41367978e-06],
       [  1.06834205e-05,   3.96960118e-06,   4.35254371e-06, ...,
          1.21056839e-06,   1.32735043e-06,   7.09340760e-07],
       [  1.03212364e-05,   3.93896984e-06,   4.45618687e-06, ...,
          2.35492895e-06,   2.66414923e-06,   1.38718896e-06],
       ..., 
       [  5.56774449e-06,   3.10857101e-06,   2.47631183e-06, ...,
          7.01415289e-06,   5.58752871e-06,   3.89219059e-06],
       [  8.83956008e-06,   4.79845480e-06,   3.34168874e-06, ...,
          2.71902922e-06,   1.89355734e-06,   2.52167969e-06],
       [  3.39888557e-06,   1.97837555e-06,   2.01223372e-06, ...,
          5.32873413e-06,   5.41993076e-06,   5.25019149e-06]])

In [14]:
# To select a specific line of m to n, use the index [x]
# Each value is a result of the combinations m,n
expo_local[0]

array([  1.43918361e-05,   4.88168714e-06,   4.92084572e-06,
         2.83873954e-06,   4.87612186e-05,   1.65397252e-05,
         1.66723991e-05,   9.61798056e-06,   6.67172881e-06,
         2.26303945e-06,   2.28119248e-06,   1.31597527e-06,
         1.22368689e-05,   4.15072582e-06,   4.18402098e-06,
         2.41367978e-06])

** Compute Global Exposure/Isolation**

In [15]:
# calculate the global exposure ,the results is a m*m matrix
expo_global = cc.cal_globalExposure()

In [16]:
expo_global

array([ 0.71621982,  0.07410737,  0.08921072,  0.12046208,  0.46530758,
        0.1708179 ,  0.16108029,  0.20279423,  0.45474975,  0.13282653,
        0.2283729 ,  0.18405081,  0.50775785,  0.13388819,  0.14682096,
        0.211533  ])

**Prepare data for saving on a local file**

In [17]:
d_localsimilarity = np.asmatrix(d_localsimilarity).transpose()

In [18]:
d_localsimilarity.shape

(49890, 1)

In [19]:
# Joint result from population intensity and local dissimilarity
rs1 = np.concatenate((cc.locality, d_localsimilarity, expo_local), axis=1)

In [20]:
N_attri = cc.locality.shape[1]

names = ['MSOA_ID','x','y','sum']

for i in range(N_attri):
    names.append('group_'+str(i))
    
for i in range(N_attri):
    names.append('locality_'+str(i))
    
names.append('local_dissimilarity')

for i in range(N_attri**2):
    names.append('expo_'+str(i))

In [21]:
#sf = shapefile.Reader('res/resolution_msoa_2011.shp')
#shp = sf.records()
#shp = np.asmatrix(shp)
#rs = np.concatenate((shp[:,0], rs), axis=1)

rs = np.concatenate((cc.attriMatrix,rs1),axis = 1)

** Save results to a local file**

The first paramenter ("res/rs1.csv"...) is the path/filename, you can use only filename if results  
are being saved on the same folder of the code. It's recommended to save on a different folder.

*The name should be changed for new executions or the local file will be replaced!*

In [22]:
# Change the name of file for results saving
# It's recomended to use the name of the input file adding the sufix ..._result
rs = pd.DataFrame(rs, columns=names);
rs.to_csv("result/rs1test.csv", sep=",", index = False)