# Segregation calculation

** Instructions**

For fast processing, you can just change the following variables **before running**:
* path/name at **Input file** cell (select the file you want to use)
* bandwidth and weigth method at **compute population intensity** cell
* file name in the variable **fname** at section **Save results to a local file** (the file you want to save results)

*make sure you don't use a name already used or the file will be replaced*

With the previous steps in mind, just click on **Cell** menu and select **Run All**

In [19]:
import numpy as np
import pandas as pd
from decimal import Decimal
import time

In [20]:
# Import python script with segreg functions
from segregationMetrics import Segreg

In [3]:
cc = Segreg()

**Input file**

In [4]:
'''
Change the path/name for the input file to be processed.
Data prepared as id, x, y , attribute 1, attributes 2, attributes 3, Attribute n...
'''
cc.readAttributesFile('data/AP2010_CEM_RMSP_EDU copy.csv')

matrix([[  2.00000000e+00,   3.67940727e+05,   7.41220127e+06,
           9.66000000e+03,   4.63400000e+03],
        [  1.00000000e+00,   3.65279005e+05,   7.41308577e+06,
           5.65900000e+03,   2.53900000e+03],
        [  3.00000000e+00,   3.62939248e+05,   7.41398731e+06,
           1.20450000e+04,   4.60000000e+03],
        ..., 
        [  6.09000000e+02,   3.25291157e+05,   7.36486407e+06,
           1.40210000e+04,   5.28300000e+03],
        [  6.25000000e+02,   3.18200304e+05,   7.38684501e+06,
           6.14300000e+03,   4.15000000e+03],
        [  6.29000000e+02,   3.16921299e+05,   7.38553520e+06,
           1.74130000e+04,   7.91000000e+03]])

**Compute Population Intensity**

Distance matrix is calculated at this step

In [5]:
'''
Change the parameters for the population intensity according to your needs
Parameters:
bandwidth - is set to be 5000m by default, you can change it here
weightmethod - 1 for gaussian, 2 for bi-square and empty for moving window
'''

start_time = time.time()
cc.locality = cc.cal_localityMatrix(bandwidth=3000, weightmethod=1)
print("--- %s seconds for processing ---" % (time.time() - start_time))

--- 0.06775212287902832 seconds for processing ---


**For validation only**

Remove the comment (#) if you want to see the values and validate

In [6]:
# np.set_printoptions(threshold=np.inf)
# print('Location (coordinates from data):\n', cc.location)
# print()
# print('Population intensity for all groups:\n', cc.locality)

In [7]:
'''To select locality for a specific line (validation), use the index in[x,:]'''
# where x is the number of the desired line

# cc.locality[5,:]

'To select locality for a specific line (validation), use the index in[x,:]'

**Compute local Dissimilarity**

In [8]:
diss_local = cc.cal_localDissimilarity()
diss_local = np.asmatrix(diss_local).transpose()

# To select a specific element use the index [x] or a range [x:x]
# diss_local

**Compute global Dissimilarity**

In [9]:
diss_global = cc.cal_globalDissimilarity()

# Display gloabl value
# diss_global

** Compute local Exposure/Isolation**

In [10]:
'''
expo is a matrix of n_group * n_group therefore, exposure (m,n) = rs[m,n]
the columns are exporsure m1 to n1, to n2... n5, m2 to n1....n5
- m,m = isolation index of group m
- m,n = expouse index of group m to n

Result of all combinations of local groups expousure/isolation
To select a specific line of m to n, use the index [x]
Each value is a result of the combinations m,n
e.g.: g1xg1, g1xg2, g2,g1, g2xg2 = isolation, expousure, // , isolation
'''
expo_local = cc.cal_localExposure()

# expo_local
# expo_local[0]

** Compute global Exposure/Isolation**

In [11]:
# calculate the global exposure ,the results is a m*m matrix
expo_global = cc.cal_globalExposure()

# expo_global

**Compute local Entropy**

In [12]:
entro_local = cc.cal_localEntropy()
# entro_local

**Compute global Entropy**

In [13]:
entro_global = cc.cal_globalEntropy()
entro_global

0.6306056239696376

**Compute local Index H**

In [14]:
idxh_local = cc.cal_localIndexH()
# idxh_local

**Compute global Index H**

In [15]:
idxh_global = cc.cal_globalIndexH()
idxh_global

array([ 0.6666163 ,  0.32290138])

**Prepare data for saving on a local file**

In [16]:
# Concatenate local values from measures
if len(cc.locality) == 0:
    rs1 = np.concatenate((diss_local, expo_local, entro_local, idxh_local), axis=1)
else:
    rs1 = np.concatenate((cc.locality, diss_local, expo_local, entro_local, idxh_local), axis=1)

# Concatenate the results with original data
rs = np.concatenate((cc.attributeMatrix, rs1),axis = 1)

In [17]:
# Define names for columns

# N_attri = cc.locality.shape[1]

names = ['id','x','y']

for i in range(cc.n_group):
    names.append('group_'+str(i))
    
for i in range(cc.n_group):
    names.append('intens_'+str(i))
    
names.append('diss_loc')
    
for i in range(cc.n_group):
    for j in range(cc.n_group):
        if i == j:
            names.append('iso_' + str(i) + str(j))
        else:
            names.append('exp_' + str(i) + str(j))
            
for i in range(cc.n_group):
    names.append('entrop_'+str(i))
    
for i in range(cc.n_group):
    names.append('idxh_'+str(i))

** Save Local and global results to a file**

The paramenter **fname** corresponds to the folder/filename, change it as you want.
If desired, only the filename can be used for saving the results in the same folder of the code. Just remove the parte before the "/" (also removing the slash). The global results are automatically using the same name with the addiction of the postfix "_globals".
It's recommended to save on a different folder.

**The fname value should be changed for any new executions or the local file will be overwrited!**

In [18]:
# Change the name of the variable "fname" to save the local results
# You can use the name of the input file adding the sufix ..._result

fname = "t2groups"

rs = pd.DataFrame(rs, columns=names)
rs.to_csv("%s_local.csv" % fname, sep=",", index=False)

# Change the name of the file in " " to save the global results
with open("%s_global.csv" % fname, "w") as f:
    f.write('Global dissimilarity: ' + str(diss_global))
    f.write('\nGlobal entropy: ' + str(entro_global))
    f.write('\nGlobal Index H: \n' + str(idxh_global))
    f.write('\nGlobal isolation/exposure: \n')
    f.write(str(expo_global))