# Segregation calculation

** Instructions**

For fast processing, you just need to change the file name/path at **Input file** (the file you want to read)  
and the file name/path at section **Save results to a local file** (the file you want to save results)

*make sure you don't use a name already existent or the file will be replaced*

With this in mind, just click on **Cell** menu and select **Run All**

In [None]:
import numpy as np
import pandas as pd
from decimal import Decimal
import time

In [2]:
# Import python script with segreg functions
from segregationMetrics import Segreg

In [3]:
cc = Segreg()

**Input file**

In [4]:
'''
Data prepared as id,  x, y , sum, attribute 1, attributes 2, attributes 3, Attribute n
'''
# cc.readAttributesFile('res/ks201_oa_ethnic.csv')
cc.readAttributesFile('data/AP2010_CEM_RMSP_EDU copy.csv')
# cc.readAttributesFile('data/oa2011_ks201_final.csv')

matrix([[  2.00000000e+00,   3.67940727e+05,   7.41220127e+06,
           2.50940000e+04,   9.66000000e+03,   4.63400000e+03],
        [  1.00000000e+00,   3.65279005e+05,   7.41308577e+06,
           1.31420000e+04,   5.65900000e+03,   2.53900000e+03],
        [  3.00000000e+00,   3.62939248e+05,   7.41398731e+06,
           2.51200000e+04,   1.20450000e+04,   4.60000000e+03],
        ..., 
        [  6.09000000e+02,   3.25291157e+05,   7.36486407e+06,
           2.67410000e+04,   1.40210000e+04,   5.28300000e+03],
        [  6.25000000e+02,   3.18200304e+05,   7.38684501e+06,
           1.96940000e+04,   6.14300000e+03,   4.15000000e+03],
        [  6.29000000e+02,   3.16921299e+05,   7.38553520e+06,
           3.42580000e+04,   1.74130000e+04,   7.91000000e+03]])

**Compute population intensity**

Distance matrix is calculated at this step

In [5]:
'''
Parameters:
bandwidth - is set to be 5000m by default, you can change it here
weightmethod - 1 for gaussian, 2 for bi-square and empty for moving window
'''

start_time = time.time()
cc.locality = cc.cal_localityMatrix(bandwidth=3000, weightmethod=1)
print("--- %s seconds ---" % (time.time() - start_time))

--- 0.15311193466186523 seconds ---


In [6]:
# List of coordinates from file (x, y) for validation
cc.location

matrix([[  367940.7271,  7412201.274 ],
        [  365279.0048,  7413085.769 ],
        [  362939.2484,  7413987.312 ],
        ..., 
        [  325291.1567,  7364864.069 ],
        [  318200.3041,  7386845.008 ],
        [  316921.2989,  7385535.2   ]])

In [7]:
# Display local population intensity for groups
cc.locality

array([[  8279.84619685,   3600.89806411],
       [  8511.69270302,   3566.92997643],
       [  9577.65961291,   3886.71902318],
       ..., 
       [ 14997.8947738 ,   5860.56757989],
       [ 11150.70551962,   5233.81757421],
       [ 11290.66083373,   5200.98309482]])

In [8]:
# To select locality for a specific line (validation), use the index in[x,:]
# where x is the number of the desired line
cc.locality[5,:]

array([ 8523.54764634,  3116.99774104])

**Compute Local Dissimilarity**

In [9]:
# Function to compute local dissimilarity
d_localsimilarity = cc.cal_localDissimilarity()
# To select a specific element use the index [x] or a range [x:x]
d_localsimilarity

array([  7.14573749e-05,   5.57999743e-05,   1.38988788e-04,
         1.05060453e-04,   1.86438725e-04,   1.68444762e-04,
         8.25550239e-05,   7.61601793e-05,   1.41802077e-04,
         1.07025744e-04,   7.76303207e-06,   2.06381702e-05,
         2.51199105e-05,   6.07698962e-05,   3.06600357e-05,
         1.43203176e-05,   8.48289622e-05,   3.26700886e-05,
         6.56582769e-05,   4.06300299e-05,   1.13260842e-04,
         9.25864081e-05,   3.87934122e-05,   4.64446465e-05,
         8.17682385e-05,   6.67028918e-05,   6.17075771e-05,
         3.00484979e-05,   7.80219294e-05,   3.33439841e-05,
         4.15305566e-05,   5.64679184e-05,   2.59539063e-06,
         1.18722601e-04,   0.00000000e+00,   1.27441999e-05,
         0.00000000e+00,   0.00000000e+00,   1.70888985e-04,
         6.14931630e-05,   5.91500487e-05,   3.67476498e-05,
         2.51007871e-04,   9.76044362e-05,   1.52247649e-05,
         3.81463637e-05,   2.08537349e-04,   1.04956261e-04,
         8.07846662e-06,

**Compute Global Dissimilarity**

In [10]:
# Fution to compute global dissimilarity
d_globaldis = cc.cal_globalDissimilarity()
# Display gloabl value
d_globaldis

0.038405091788094411

** Compute Local Exposure/Isolation**

In [11]:
''''
expo is a matrix of n_group * n_group therefore, exposure (m,n) = rs[m,n]
the columns are exporsure m1 to n1, to n2... n5, m2 to n1....n5
- m,m = isolation index of group m
- m,n = expouse index of group m to n
'''
expo_local = cc.cal_localExposure()

In [12]:
# For validation, shape must be total number of lines from file
# by number of groups squared (m, n)
expo_local.shape

(633, 4)

In [13]:
# Result of all combinations of local groups expousure/isolation
expo_local

array([[ 0.00101566,  0.00044171,  0.00101377,  0.00044089],
       [ 0.00060163,  0.00025212,  0.00056165,  0.00023537],
       [ 0.00129262,  0.00052456,  0.00102716,  0.00041683],
       ..., 
       [ 0.00152096,  0.00059433,  0.00119243,  0.00046596],
       [ 0.00063073,  0.00029604,  0.00088659,  0.00041614],
       [ 0.00179854,  0.00082849,  0.00169996,  0.00078308]])

In [14]:
# To select a specific line of m to n, use the index [x]
# Each value is a result of the combinations m,n
# e.g.: g1xg1, g1xg2, g2,g1, g2xg2 = isolation, expousure, // , isolation
expo_local[0]

array([ 0.00101566,  0.00044171,  0.00101377,  0.00044089])

** Compute Global Exposure/Isolation**

In [15]:
# calculate the global exposure ,the results is a m*m matrix
expo_global = cc.cal_globalExposure()

In [16]:
expo_global

array([[ 0.67631941,  0.32368059],
       [ 0.67362344,  0.32637656]])

**Prepare data for saving on a local file**

In [17]:
d_localsimilarity = np.asmatrix(d_localsimilarity).transpose()

In [18]:
d_localsimilarity.shape

(633, 1)

In [19]:
# Joint result from population intensity and local dissimilarity
rs1 = np.concatenate((cc.locality, d_localsimilarity, expo_local), axis=1)

In [21]:
# N_attri = cc.locality.shape[1]

names = ['ID','x','y','sum']

for i in range(cc.n_group):
    names.append('group_'+str(i))
    
for i in range(cc.n_group):
    names.append('locality_'+str(i))
    
names.append('local_dissimilarity')

# for i in range(N_attri**2):
#     names.append('expo_'+str(i))
    
for i in range(cc.n_group):
    for j in range(cc.n_group):
        if i == j:
            names.append('iso' + str(i) + str(j))
        else:
            names.append('exp' + str(i) + str(j))

In [24]:
# Concatenate the results with original data
rs = np.concatenate((cc.attributeMatrix,rs1),axis = 1)

matrix([[  2.00000000e+00,   3.67940727e+05,   7.41220127e+06, ...,
           4.41708205e-04,   1.01377291e-03,   4.40888976e-04],
        [  1.00000000e+00,   3.65279005e+05,   7.41308577e+06, ...,
           2.52120421e-04,   5.61651926e-04,   2.35367178e-04],
        [  3.00000000e+00,   3.62939248e+05,   7.41398731e+06, ...,
           5.24559546e-04,   1.02715727e-03,   4.16831655e-04],
        ..., 
        [  6.09000000e+02,   3.25291157e+05,   7.36486407e+06, ...,
           5.94329843e-04,   1.19243483e-03,   4.65955057e-04],
        [  6.25000000e+02,   3.18200304e+05,   7.38684501e+06, ...,
           2.96044128e-04,   8.86589675e-04,   4.16139464e-04],
        [  6.29000000e+02,   3.16921299e+05,   7.38553520e+06, ...,
           8.28488059e-04,   1.69995696e-03,   7.83076168e-04]])

** Save Local and global results to a file**

The paramenter **fname** corresponds to the folder/filename, change it as you want.
If desired, only the filename can be used for saving the results in the same folder of the code. Just remove the parte before the "/" (also removing the slash). The global results are automatically using the same name with the addiction of the postfix "_globals".
It's recommended to save on a different folder.

**The fname value should be changed for any new executions or the local file will be overwrited!**

In [75]:
# Change the name of the variable "fname" to save the local results
# It's recomended to use the name of the input file adding the sufix ..._result

fname = "result/test2groups"

rs = pd.DataFrame(rs, columns=names);
rs.to_csv("%s.csv" %fname, sep=",", index = False)

# Change the name of the file in " " to save the global results
with open("%s_globals.csv" %fname, "w") as f:
    f.write('Global dissimilarity: ' + str(d_globaldis))
    f.write('\nGlobal exposure: \n')
    f.write(str(expo_global))