# Segregation calculation

** Instructions**

For fast processing, you just need to change the file name/path at **Input file** (the file you want to read)  
and the file name/path at section **Save results to a local file** (the file you want to save results)

*make sure you don't use a name already existent or the file will be replaced*

With this in mind, just click on **Cell** menu and select **Run All**

In [1]:
import numpy as np
import pandas as pd
from decimal import Decimal
import time

In [2]:
# Import python script with segreg functions
from segregationMetrics import Metrics

In [3]:
cc = Metrics()

**Input file**

In [4]:
'''
Data prepared as id,  x, y , sum, attribute 1, attributes 2, attributes 3, Attribute n
'''

# cc.readAttributesFile('res/ks201_oa_ethnic.csv')
cc.readAttributesFile('data/AP2010_CEM_RMSP_EDU.csv')

matrix([[  2.00000000e+00,   3.67940727e+05,   7.41220127e+06, ...,
           7.21700000e+03,   3.22600000e+03,   3.57000000e+02],
        [  1.00000000e+00,   3.65279005e+05,   7.41308577e+06, ...,
           3.94100000e+03,   8.16000000e+02,   1.87000000e+02],
        [  3.00000000e+00,   3.62939248e+05,   7.41398731e+06, ...,
           6.76300000e+03,   1.40000000e+03,   3.12000000e+02],
        ..., 
        [  6.09000000e+02,   3.25291157e+05,   7.36486407e+06, ...,
           5.57900000e+03,   1.09400000e+03,   7.64000000e+02],
        [  6.25000000e+02,   3.18200304e+05,   7.38684501e+06, ...,
           6.72800000e+03,   2.58500000e+03,   8.80000000e+01],
        [  6.29000000e+02,   3.16921299e+05,   7.38553520e+06, ...,
           7.60000000e+03,   1.03900000e+03,   2.96000000e+02]])

**Compute population intensity**

Distance matrix is calculated at this step

In [5]:
'''
Parameters:
bandwidth - is set to be 5000m by default, you can change it here
weightmethod - 1 for gaussian, 2 for bi-square and empty for moving window
'''

start_time = time.time()
cc.locality = cc.cal_localityMatrix(bandwidth=3000, weightmethod=1)
print("--- %s seconds ---" % (time.time() - start_time))

--- 0.27097201347351074 seconds ---


In [6]:
# List of coordinates from file (x, y) for validation
cc.location

matrix([[  367940.72713,  7412201.27442],
        [  365279.00478,  7413085.76939],
        [  362939.24837,  7413987.31233],
        ..., 
        [  325291.15667,  7364864.06947],
        [  318200.30414,  7386845.00788],
        [  316921.29889,  7385535.20008]])

In [7]:
# Display local population intensity for groups
cc.locality

array([[  8279.8462652 ,   3600.89814857,   5388.00321297,   1689.44639886,
           268.56572077],
       [  8511.69276191,   3566.9300309 ,   5274.27540802,   1377.54685989,
           262.73938253],
       [  9577.65965392,   3886.71905402,   5551.25579861,   1203.69229422,
           277.13216749],
       ..., 
       [ 14997.89485778,   5860.56765295,   6171.27284685,   1068.69496683,
           605.38805821],
       [ 11150.70559268,   5233.81760202,   7196.39321957,   2867.14714079,
           304.8247775 ],
       [ 11290.66084299,   5200.98310512,   6712.36558737,   2011.08553609,
           304.41121067]])

In [8]:
# To select locality for a specific line (validation), use the index in[x,:]
cc.locality[5,:]

array([ 8523.5478202 ,  3116.99777089,  3780.57772084,  2042.23614182,
          54.44648988])

**Compute Local Dissimilarity**

In [14]:
# Function to compute local dissimilarity
d_localsimilarity = cc.cal_localDissimilarity()
# To select a specific element use the index [x] or a range [x:x]
d_localsimilarity[0:9]

array([  9.55860422e-05,   6.61707339e-05,   1.59987254e-04,
         1.26048662e-04,   2.87121813e-04,   1.41013920e-04,
         1.40268287e-04,   1.57788605e-04,   4.80449181e-04])

**Compute Global Dissimilarity**

In [15]:
# Fution to compute global dissimilarity
d_globaldis = cc.cal_globalDissimilarity()
# Display gloabl value
d_globaldis

0.12053398250135673

** Compute Local Exposure/Isolation**

In [21]:
''''
expo is a matrix of n_group * n_group therefore, exposure (m,n) = rs[m,n]
the columns are exporsure m1 to n1, to n2... n5, m2 to n1....n5
- m,m = isolation index of group m
- m,n = expouse index of group m to n
'''
expo_local = cc.cal_localExposure()

In [23]:
# For validation, shape must be total lines of file by groups squared
expo_local.shape

(633, 25)

In [24]:
# Result of all combinations of local groups expousure/isolation
expo_local

array([[  6.27602205e-04,   2.72943669e-04,   4.08404044e-04, ...,
          5.13685003e-04,   1.61069555e-04,   2.56046958e-05],
       [  3.82603561e-04,   1.60334750e-04,   2.37080521e-04, ...,
          2.66632785e-04,   6.96397376e-05,   1.32823806e-05],
       [  8.49138490e-04,   3.44589688e-04,   4.92164593e-04, ...,
          4.33884285e-04,   9.40801846e-05,   2.16605569e-05],
       ..., 
       [  1.10525065e-03,   4.31887028e-04,   4.54784049e-04, ...,
          8.43403419e-04,   1.46054309e-04,   8.27359883e-05],
       [  3.86281001e-04,   1.81309092e-04,   2.49296330e-04, ...,
          1.21544078e-04,   4.84249187e-05,   5.14836329e-06],
       [  1.16228292e-03,   5.35399468e-04,   6.90984164e-04, ...,
          3.99762423e-04,   1.19772443e-04,   1.81295494e-05]])

In [79]:
# To select a specific line of m to n, use the index [x]
# Each value is a result of the combinations m,n
expo_local[0]

array([  6.27602205e-04,   2.72943669e-04,   4.08404044e-04,
         1.28057968e-04,   2.03569527e-05,   6.26438201e-04,
         2.72437445e-04,   4.07646583e-04,   1.27820461e-04,
         2.03191970e-05,   6.71914198e-04,   2.92214917e-04,
         4.37239502e-04,   1.37099529e-04,   2.17942598e-05,
         6.22340933e-04,   2.70655546e-04,   4.04980339e-04,
         1.26984441e-04,   2.01862977e-05,   7.89389442e-04,
         3.43304802e-04,   5.13685003e-04,   1.61069555e-04,
         2.56046958e-05])

** Compute Global Exposure/Isolation**

In [26]:
# calculate the global exposure ,the results is a m*m matrix
expo_global = cc.cal_globalExposure()

In [73]:
expo_global

array([ 0.41024676,  0.1941277 ,  0.27317189,  0.11046947,  0.01198418,
        0.40289845,  0.19287   ,  0.27481578,  0.11760663,  0.01180913,
        0.38953139,  0.18881944,  0.27740802,  0.13282396,  0.0114172 ,
        0.32715575,  0.16767302,  0.27551352,  0.21947063,  0.01018708,
        0.40602988,  0.19276895,  0.27157291,  0.11685554,  0.01277272])

**Prepare data for saving on a local file**

In [37]:
d_localsimilarity = np.asmatrix(d_localsimilarity).transpose()

In [38]:
d_localsimilarity.shape

(633, 1)

In [67]:
# Joint result from population intensity and local dissimilarity
rs1 = np.concatenate((cc.locality, d_localsimilarity, expo_local), axis=1)

In [68]:
N_attri = cc.locality.shape[1]

names = ['MSOA_ID','x','y','sum']

for i in range(N_attri):
    names.append('group_'+str(i))
    
for i in range(N_attri):
    names.append('locality_'+str(i))
    
names.append('local_dissimilarity')

for i in range(N_attri**2):
    names.append('expo_'+str(i))

In [70]:
#sf = shapefile.Reader('res/resolution_msoa_2011.shp')
#shp = sf.records()
#shp = np.asmatrix(shp)
#rs = np.concatenate((shp[:,0], rs), axis=1)

rs = np.concatenate((cc.attriMatrix,rs1),axis = 1)

** Save results to a local file**

The first paramenter ("res/rs1.csv"...) is the path/filename, you can use only filename if results  
are being saved on the same folder of the code. It's recommended to save on a different folder.

*The name should be changed for new executions or the local file will be replaced!*

In [71]:
# Change the name of file for results saving
# It's recomended to use the name of the input file adding the sufix ..._result
rs = pd.DataFrame(rs, columns=names);
rs.to_csv("result/rs1.csv", sep=",", index = False)