# 5. Segregation Indexes
Following the Bojanowski & Corten Paper _Measuring Segregation in Social Networks_ 2014, We calculate some segregation indexes for our graphs along the time. First, we import the data and libraries ised for the creation of the functions in the Prerequiste section. Then We calculate The Freeman Segregation Index and the Spectral Segregatoin Index in The next Sections. This Notebook is divided en the following sections.

1. Prerequisites
2. Freeman Segregation
	- Basic Freeman Segregation
	- Global Freeman Segregation Index (for K groups)
	- Freeman Segregation Index for an specific group
	- Freeman Segregation Index for an specific group (Taking in account Weights)
3. Assortativity
4. Results
4.1. 3 Day Rolling Window
4. Conclusion


## 1. Prerequistes

In [1]:
# Mathematical and Data Managment
import numpy as np
import pandas as pd

# Graph Managment
import graph_tool.all as gt
import utils.Freeman as Fr
import utils.Segregation as Seg

# Miscellaneous
from glob import glob
from tqdm import tqdm
import concurrent.futures
from functools import partial



For the calculation of the segregation Indexes, we define some notation based on (Bojanowski & Corten 2014).

We define a Graph with

$$\mathbb{N}= \{1, \dots, i, \dots, N\}$$

and then, define the set

$$\mathbb{G} = \{G_1, G_2,\dots, G_k\}$$

as the set of $K$ groups in which every $G_g$ is a subset of $\mathbb{N}$ that contains all the nodes that belong to group $g$. define $\eta_{k}$ = $|G_k|$ as the amount of nodes from group $G_k$

Now we define the type vector as 

$$\textbf{t} = [t_1,\dots, t_i, \dots, t_N]$$

where $t_i \in \{1,\dots,K\}$. This vector matches every node with its corresponding group. Using this notation. We can define a type indicator vector for each group $k$ as follows:  

$$\textbf{v}_k = [v_1, \dots, v_i, \dots, v_N]$$ 

where $v_i \in \{0,1\}$. This vector has one entry for every node and the node location will have a 1 if that node corresponds to the group $G_k$. Formally:

$$ v_i = \begin{cases} 1 &\text{ if }t_i = k \\ 0 &\text{ if }t_i \neq k \end{cases} $$

Now we define the Types Matrix $T_{N\times K}$ as a matrix that contains the information of each node and wich group it represents. For Every column of the matrix corresponds to a $\textbf{v}_k$ _types indicator vector_.

In the context of this Research, we will use a Directed Weighted Graph. Our nodes are X (formmerly Twitter) in which the user $i$ is related to the user $j$ if $i$ Retweeted a Tweet without comments of $j$. In this sense. Formarly we discribe the relationship $R$ over $\mathbb{N}\times \mathbb{N}$ that implies our square Adjcency Matrix $X = [X_{ij}]_{\mathbb{N}\times \mathbb{N}}$

For the segregation calculations will will consider the graph as weighted or un weight. In case of takin in account the weights of each edge, eh entries of the Adjacency Matrix will be defined as follows:

$$x_{ij} = \dfrac{\text{\# Tweets from }j\text{ that }i\text{ Retweeted without comments}}{\text{\# of Retweets without comments from }i}$$

Taking in account the unweighted graph, we will define our Simple Adjacency Matrix as:

$$ x_{ij} = \begin{cases} 1 & \text{if } i\text{ Retweeted }j \\ 0 & \text{In other case} \end{cases} $$

Finally, we define the _Mixing Matrix_ ($M_{ghy}$) where $g$ and $h$ are two generic groups and $y$ indexes two types of layers. The first layer _Mixing Matrix_ is the _Contact layer_, defined as follows: (Where we use the weighted or un weighted adjcancecy matrix):

$$M_{gh1} = \sum_{i\in G_g}\sum_{j\in G_h} x_{ij}$$

For the **unweighted** case, we can define the _No Contact Layer_ as follows:

$$M_{gh0} = \sum_{i\in G_g}\sum_{j\in G_h} (1-x_{ij})$$

Finally, in this matrix $M_{gh1}$ shows the amount of attention that group $h$ gets from group $g$

For easyness, we define the follow notation:

- $M_{g+1} = \sum_{h=1}^K M_{gh1}$ Sum across a column

- $M_{+h1} = \sum_{g=1}^K M_{gh1}$ Sum across a row

- $M_{++1} = \sum_{g=1}^K \sum_{h=1}^K M_{gh1}$ Sum of all the Layer

In [2]:
# Indexes
master_id = pd.read_csv('/mnt/disk2/Data/3_Day_Graphs/Master_Index.csv', sep = ';')
date_range = pd.date_range(start='2021-04-28', end='2021-06-27', freq='D')
categories = master_id['Political Affiliation'].unique().tolist()
group_index = pd.MultiIndex.from_product([date_range, categories], names=['Date', 'Political Label'])
individual_index = pd.MultiIndex.from_product([range(0,len(master_id)), categories], names=['Node', 'Political Label'])

# DataFrames with statistics
global_segregation = pd.DataFrame(index=date_range).sort_index()
group_segregation = pd.DataFrame(index=group_index).sort_index()
individual_segregation = pd.DataFrame(index=individual_index).sort_index()

# Load graphs
files = glob('/mnt/disk2/Data/3_Day_Graphs/Graphs/*.graphml')

---
## 2. Freman Segregation

### Basic Freeman Segregation

The basic segregation index proposed by Freeman (1998) tries to see how is the proportion of ties between two different groups against the ties if they were made randomly. This Basic Index is calculated for undirected and unweighted graphs. It is the first approch for segregation for this family of indexes. We define the $\eta_1$ as the amount of nodes belonng to group $G_1$ and $\eta_2$ belonging to group $G_2$. Recalling our past notation, in the contact Layer of the mixing matrix the number of between group ties will be the entry $M_{121}$. We can divide this numero over all the amount of edges (Note that the nuber of edges in the graph is equal to the sum of all the contact layer $M_{++1}$). This is how we get the proportion of between ties

$$p = \frac{M_{121}}{M_{++1}}$$

Now, we calculate the expected value of $p$ Note that the probability of taking a tie between the node of a one group and a node with the other group. Thinking of it as the ratio of Favor cases to total cases, the favor cases are to totall amount of cross ties that are posible. This value will correspond to the number of nodes for group 1 multiplied to the number of nodes from group 2. The number of total cases will correspond to the number of dyads posible in the graph. This will correspond to N choose 2. Conecting this two values we have the following:

$$\pi = \frac{\eta_1 \eta_2}{\frac{N(N-1)}{2}} = \frac{2\eta_1 \eta_2}{N(N-1)}$$

Finally Freeman Segregation Index is defined as:

$$S_{Freeman} = 1- \frac{p}{\pi}$$

In [3]:
def process_file(file, categories):
    results = []
    for pol in categories:
        g = gt.load_graph(file)
        graph_name = file.split('/')[-1].split('.')[0].split('_')[-1]
        seg = Fr.Freeman_Classic(g, types=pol)
        results.append(((graph_name, pol), seg))
    return results

# Run processing in parallel
with concurrent.futures.ProcessPoolExecutor() as executor:
    futures = [executor.submit(process_file, file, categories) for file in files]
    
    # Process results as they complete
    for future in tqdm(concurrent.futures.as_completed(futures), total=len(files)):
        try:
            results = future.result()
            # Update group_segregation DataFrame with results
            for key, value in results:
                group_segregation.loc[key, 'Classic Freeman'] = value
        except Exception as e:
            print(f'Generated an exception: {e}')

100%|██████████| 61/61 [03:55<00:00,  3.87s/it]


---
### Global Freeman Segregation Index (for K groups)

For the Freeman Segregation Index, We will use the formula from (Bojanowski & Corten 2014) for the in which the generalize this index for $K$ groups. The index is define as Follows.

Let $p$ be equal to the proportion of _between_ group ties in the graph. This corresponds to the upper triangle of the $M$ Matrix without counting the diagonal (This diagonal contains the information of the _within_ group ties).

$$p = \frac{\sum_{g,h:g\neq h}M_{gh1}}{\sum_{g=1}^K\sum_{h=1}^K M_{gh1}}$$

Now, we define the expected proportion of between-group ties in a random graph. In the generalize case of $K$ groups. this looks like this

$$\pi = \frac{\left( \sum_{k=1}^K \eta_k\right)^2 - \sum_{k=1}^K \eta_k^2}{N(N-1)}$$

Finally, Freeman Segregation Index is defined as:

$$S_{Freeman} = 1 -\frac{p}{\pi} = 1- \frac{pN(N-1)}{\left( \sum_{k=1}^K \eta_k\right)^2 - \sum_{k=1}^K \eta_k^2}$$

This Index takes into account the case **unweighted case**

In [6]:
# Storage in DataFrame
def process_file(file):
    results = []
    g = gt.load_graph(file)
    graph_name = file.split('/')[-1].split('.')[0].split('_')[-1]
    seg = Fr.Freeman_Global(g,property_label = 'Political Label')
    results.append((graph_name, seg))
    return results

# Run processing in parallel
with concurrent.futures.ProcessPoolExecutor() as executor:
    futures = [executor.submit(process_file, file) for file in files]
    
    # Process results as they complete
    for future in tqdm(concurrent.futures.as_completed(futures), total=len(files)):
        try:
            results = future.result()
            # Update group_segregation DataFrame with results
            for key, value in results:
                global_segregation.loc[key, 'Freeman Global'] = value
        except Exception as e:
            print(f'Generated an exception: {e}')

  0%|          | 0/61 [00:00<?, ?it/s]

100%|██████████| 61/61 [00:59<00:00,  1.02it/s]


---
### Freeman Segregation Index for an specific group

The Freeman Segregation Index is originally computed for the segregation between two groups. This function will compute the index between one group and all the other ones using _Basic Freeman Segregatioon Index_ Formula. This will give a measure of how segregated is one group over all the others. For this case, our contact layer will only consider two groups, the group $g$ for which one would calculate the index and the group $-g$ wich are all the other nodes that do not belong to $g$. Recall our _Contact Matrix_ that looks like this:

$$
M_{gh1} = 
\begin{bmatrix}
    M_{1,1,1} & M_{1,2,1} & \dots & M_{1,k,1} \\
    M_{2,1,1} & M_{2,2,1} & \dots & M_{2,k,1} \\
    \vdots & \vdots & \ddots & \vdots \\
    M_{k,1,1} & M_{k,2,1} & \dots & M_{k,k,1} \\
\end{bmatrix}
$$

For our calculation, we will  have another _Contact Catrix_ called, "Me Vs Ohers" and denoted $\hat{M}$. This matrix will be a $2\times 2$. This will be similar as the the original _Contact Matrix_ but with only two groups, $g$ and $-g$. This matrix is defined as follows:

$$
M^* = 
\begin{bmatrix}
    M*_{gg} & M*_{g-g} \\
    M*_{-gg} & M*_{-g-g} \\
\end{bmatrix}
$$

Where:
- $M^*_{g-g} = M_{gg1}$
- $M^*_{g-g} = \sum_{g = 1}^k M_{gh1} - M_{gg1}$
- $M^*_{-gg} = \sum_{h = 1}^k M_{gh1} - M_{gg1}$
- $M^*_{-g-g} = \sum \sum \hat{M}_{gh}$

For the calculation of the $M*_{-g-g1}$ we substract from the original _Contact Matrix_ the index rows and columns for the group $g$ (Will be denoted as $\hat{M}$). This will be the contact matrix if this group hadn't existed. Thanks to this matrix, we can compute all the between ties from all nodes that aren't in $g$. This will be the sum of all the values in the matrix. Formally,
$$
\hat{M} = 
    \begin{bmatrix}
    a_{1,1} & \dots & a_{1,g-1} & a_{1,g+1} & \dots & a_{1,k} \\
    a_{2,1} & \dots & a_{2,g-1} & a_{2,g+1} & \dots & a_{2,k} \\
    \vdots & \ddots & \vdots & \vdots & \ddots & \vdots \\
    a_{g-1,1} & \dots & a_{g-1,g-1} & a_{g-1,g+1} & \dots & a_{g-1,k} \\
    a_{g+1,1} & \dots & a_{g+1,g-1} & a_{g+1,g+1} & \dots & a_{g+1,k} \\
    \vdots & \ddots & \vdots & \vdots & \ddots & \vdots \\
    a_{k,1} & \dots & a_{k,g-1} & a_{k,g+1} & \dots & a_{k,k} \\
\end{bmatrix}
$$

Now, for the Freeman Formula we compute both $P$ and $\pi$ and calculate $1-\frac{P}{\pi}$

$$P = \frac{M^*_{g-g}}{M^*_{++}}$$

$$\pi = \frac{2|G_g|*|G_{-g}|}{N(N-1)}$$

$$S_{Freeman}^g = 1- \frac{N(N-1)M^*_{g-g}}{2M^*_{++}|G_g||G_{-g}|}$$

In [6]:
def process_file(file, categories):
    results = []
    for pol in categories:
        g = gt.load_graph(file)
        graph_name = file.split('/')[-1].split('.')[0].split('_')[-1]
        seg = Fr.Freeman_Groups(g, 'Political Label', pol)
        results.append(((graph_name, pol), seg))
    return results

# Run processing in parallel
with concurrent.futures.ProcessPoolExecutor() as executor:
    futures = [executor.submit(process_file, file, categories) for file in files]
    
    # Process results as they complete
    for future in tqdm(concurrent.futures.as_completed(futures), total=len(files)):
        try:
            results = future.result()
            # Update group_segregation DataFrame with results
            for key, value in results:
                group_segregation.loc[key, 'Freeman One vs Others'] = value
        except Exception as e:
            print(f'Generated an exception: {e}')

  0%|          | 0/61 [00:00<?, ?it/s]

100%|██████████| 61/61 [04:01<00:00,  3.96s/it]


---
Give a weighted graph in which the nonnegative weights of each individual's outgoing links sum to $1$,  we can define the proximity of individual $j$ to group  $k$  as: 

$$Prox_{j\to k}=\frac{W_{jk}}{(T_k/ \sum_{m\in G} T_m)}$$

where $W_{jk}$  is the sum of all the weights that $j$  puts on members of group $k$ and for each group $m$, $A_m$ denotes  the total number of original tweets by members of group $k$ (tweets made on the time period (i.e. day) in question).  $G$ denotes the set of groups in the populations.  So the denominator captures the fraction of $i$'s outgoing mass (which equals 1) that would have gone onto group $k$ if it had been distributed uniformly at random.  i.e.  if agent $i$ had simply distributes its outgoing mass of $1$  uniformly at random  among the $\sum_{m\in G} T_m)$ written that day.

In [8]:
for file in tqdm(files):
    g = gt.load_graph(file)    
    den_dict = {cat: Seg.at_random_scenario(g, 'Political Label', cat, 'Proximity to Group') for cat in categories}

    def process_individual_segregation(params,den_dict):
        i, cat = params
        num = Seg.individual_attention_to_h(g, i, 'Political Label', cat)
        date = g.gp['Starting Date']
        den = den_dict[cat]
        seg = num/den
        return (i, cat), seg, date
    def main():
        params = [(i, cat) for i in range(36964) for cat in categories]

        # Wrap the function call to include den_dict
        individual_segregation_process = partial(process_individual_segregation, den_dict=den_dict)
        # Use ProcessPoolExecutor to parallelize the computation
        with concurrent.futures.ProcessPoolExecutor() as executor:
            # Map the function over the parameters and wrap with tqdm for progress bar
            results = executor.map(individual_segregation_process, params)

        # Populate the DataFrame with results
        for row_index, result, date in results:
            individual_segregation.loc[row_index, f'Proximity index on {date}'] = result

    if __name__ == '__main__':
        main()

100%|██████████| 61/61 [36:23<00:00, 35.79s/it] 


---
### Index of attention from that group $g$ devotes to others ($-g$)

Using the same philosophy from Freeman index. Here we will calculte the coeficient between the proportion of cross ties in the graph, against the random scenario. For this case, we will take in account wwights and directionality of the graph. For that matter, take in account a _Contact Layer_ in which the entry $M_{g,-g}$ corresponds to the summ of all the weights the nodes from group $g$ devotes to any other group $-g$ In that cases, we define $P$ as:

$$P = \frac{M^*_{g-g}}{M^*_{++}}$$

Recalling the construction of the weights, the sum of all the weights the comes out of a nodes sums up to one (The sum of every row in the weighted adjacency matrix corresponds to one). 

For the expected value of $P$ which we called $\pi$ the number of cross ties weights will be calculated as the amount of weights the gruop $g$ would randomly devote to other $-g$. An edge is made between two nodes $i$ and $j$ if $i$ retweeted $j$. So the expected weight of $i$ could devote to another person $j$ will depend of the amount of original tweets that $j$ made and can be retweeted by $i$. With out loss of generality, we can say that the expected total weights from $g$ to $-g$ will correspond to the total amount of original tweets made from $-g$ nodes. over the total amount of tweets made that day.

We define then T$_i$ as the amount of original tweets made by $i$ and also we define the amount of tweets made by the group $g$ as
$$T^g = \sum_{i\in G_g} T_i$$

Consecuently, the amount of tweets made by other groups other than $g$ will be
$$T^{-g} = \sum_{i\notin G_g} T_i$$

$$\pi = \frac{T^-g}{T^+}$$

In [8]:
def process_file(file, categories):
    results = []
    for pol in categories:
        g = gt.load_graph(file)
        graph_name = file.split('/')[-1].split('.')[0].split('_')[-1]
        num = Seg.attention_g_others(g, 'Political Label', 'Normal Weight', pol)
        den = Seg.at_random_scenario(g, 'Political Label', pol, 'Proximity to Others')
        seg = num/den
        results.append(((graph_name, pol), seg))
    return results

# Run processing in parallel
with concurrent.futures.ProcessPoolExecutor() as executor:
    futures = [executor.submit(process_file, file, categories) for file in files]
    
    # Process results as they complete
    for future in tqdm(concurrent.futures.as_completed(futures), total=len(files)):
        try:
            results = future.result()
            # Update group_segregation DataFrame with results
            for key, value in results:
                group_segregation.loc[key, 'Attention to Others'] = value
        except Exception as e:
            print(f'Generated an exception: {e}')

  0%|          | 0/61 [00:00<?, ?it/s]

100%|██████████| 61/61 [01:13<00:00,  1.20s/it]


In [10]:
def process_file(file, categories):
    results = []
    for pol in categories:
        g = gt.load_graph(file)
        graph_name = file.split('/')[-1].split('.')[0].split('_')[-1]
        num = Seg.attention_g_others(g, 'Political Label', 'Normal Weight', pol, in_attention=False)
        den = Seg.at_random_scenario(g, 'Political Label', pol, 'Proximity to Others')
        seg = num/den
        results.append(((graph_name, pol), seg))
    return results

# Run processing in parallel
with concurrent.futures.ProcessPoolExecutor() as executor:
    futures = [executor.submit(process_file, file, categories) for file in files]
    
    # Process results as they complete
    for future in tqdm(concurrent.futures.as_completed(futures), total=len(files)):
        try:
            results = future.result()
            # Update group_segregation DataFrame with results
            for key, value in results:
                group_segregation.loc[key, "Other's Attention"] = value
        except Exception as e:
            print(f'Generated an exception: {e}')

  0%|          | 0/61 [00:00<?, ?it/s]

100%|██████████| 61/61 [01:19<00:00,  1.30s/it]


---
### Index of attention from that group $h$ devotes an specific group ($k$)

$$Prox_{h\to k}=\frac{(W_{hk}/A_h)}{(T_k/ \sum_{m\in G} T_m)}$$

where  $W_{hk}$  is the sum of all the weights that members of group $h$  put on members of group $k$  and $A_h$  is the number  of  retweet-active members of group $h$ (on the day in  question). That is, the number of members of group $h$ who rewteeted at least one original tweet of some member of the whole community on that day, or what is equivalent: the number of members of group $h$ whose rows in the adjacency matrix (of that day) sum to 1. 

In [11]:
def process_file(file, categories):
    results = []
    for pol_in in categories:
        for pol_out in categories:
            g = gt.load_graph(file)
            graph_name = file.split('/')[-1].split('.')[0].split('_')[-1]
            seg = Seg.attention_g_h(g, 'Political Label', 'Normal Weight', pol_in, pol_out)
            results.append(((graph_name, pol_out), pol_in, seg))
    return results

# Run processing in parallel
with concurrent.futures.ProcessPoolExecutor() as executor:
    futures = [executor.submit(process_file, file, categories) for file in files]
    
    # Process results as they complete
    for future in tqdm(concurrent.futures.as_completed(futures), total=len(files)):
        try:
            results = future.result()
            # Update group_segregation DataFrame with results
            for key, var, value in results:
                group_segregation.loc[key, f'Attention From {var} To'] = value
        except Exception as e:
            print(f'Generated an exception: {e}')

100%|██████████| 61/61 [04:54<00:00,  4.84s/it]


In [12]:
def process_file(file, categories):
    results = []
    for pol1 in categories:
        for pol2 in categories:
            g = gt.load_graph(file)
            graph_name = file.split('/')[-1].split('.')[0].split('_')[-1]
            seg = Seg.attention_g_h(g, 'Political Label', 'Normal Weight', pol1, pol2, in_attention=False)
            results.append(((graph_name, pol2), pol1, seg))
    return results

# Run processing in parallel
with concurrent.futures.ProcessPoolExecutor() as executor:
    futures = [executor.submit(process_file, file, categories) for file in files]
    
    # Process results as they complete
    for future in tqdm(concurrent.futures.as_completed(futures), total=len(files)):
        try:
            results = future.result()
            # Update group_segregation DataFrame with results
            for key, var, value in results:
                group_segregation.loc[key, f'Attention {var} Took From'] = value
        except Exception as e:
            print(f'Generated an exception: {e}')

100%|██████████| 61/61 [04:48<00:00,  4.72s/it]


---
## 3. Assorativity

- **Assortativity:** is a preference for a network's nodes to attach to others that are similar in some way. Though the specific measure of similarity may vary, network theorists often examine assortativity in terms of a node's degree.

    The **assortativity coefficient** is the Pearson correlation coefficient of degree between pairs of linked nodes. Positive values of `r` indicate a correlation between nodes of similar degree, while negative values indicate relationships between nodes of different degree. In general, `r` lies between `−1` and `1`. When `r = 1`, the network is said to have perfect assortative mixing patterns, when `r = 0` the network is non-assortative, while at `r = −1` the network is completely disassortative.

    The *assortativity coefficient* is given by 

    $$
    r = \frac{\sum_{jk}{jk (e_{jk} - q_j q_k)}}{\sigma_{q}^{2}}
    $$

    In this equation:

    - $ \sum_{jk} $ denotes the summation over all degrees $ j $ and $ k $ in the network.
    - $ jk $ represents the product of degrees $ j $ and $ k $.
    - $ e_{jk} $ is the joint probability distribution of the remaining degrees of two connected vertices. In an undirected graph, this is symmetric and must satisfy the sum rules:
        - $ \sum_{jk}{e_{jk}} = 1 $, ensuring that the total probability is 1.
        - $ \sum_{j}{e_{jk}} = q_{k} $, linking it to the distribution of the remaining degree.
    - $ q_j $ and $ q_k $ are the distributions of the remaining degree for vertices of degrees $ j $ and $ k $, respectively. 
    - $ \sigma_{q}^{2} $ is the variance of the distribution of the remaining degree.

    The term $ q_{k} $ represents the distribution of the *remaining degree*, which captures the number of edges leaving a node, excluding the edge that connects the pair in question. This distribution is derived from the degree distribution $ p_{k} $ as follows:

    $$
    q_{k} = \frac{(k+1)p_{k+1}}{\sum_{j \geq 1} j p_j}
    $$

    - Here, $ p_{k} $ is the degree distribution of the network, and $ p_{k+1} $ refers to the probability of a node having $ k+1 $ connections.


- **Categorical Assortativity (assortativity by attribute):** is a measure used to determine how often nodes with a certain categorical attribute, like color or type, connect to other nodes with the same attribute. It is given by:

    $$
    r = \frac{\sum_{ij}{e_{ij} - q_i q_j}}{\sum_{i}{q_i q_i} - \sum_{i}{q_i q_j}}
    $$

    Where:

    - $ e_{ij} $ is the proportion of edges in the network that connect nodes of type $ i $ to nodes of type $ j $.
    - $ q_i $ and $ q_j $ are the proportions of each type of node (type $ i $ and $ j $, respectively) at the ends of a randomly chosen edge.

    In this context:

    - A positive value of $ r $ indicates assortative mixing, where nodes tend to connect to others that are similar.
    - A negative value of $ r $ indicates disassortative mixing, where nodes tend to connect to others that are different.
    - A value of $ r $ close to 0 suggests no particular preference for nodes to connect to others based on the categorical attribute.

In [15]:
# Storage in DataFrame
for file in tqdm(files):
    g = gt.load_graph(file)
    graph_name = file.split('/')[-1].split('.')[0].split('_')[-1]
    for pol in categories:
        # Non weighted
        seg_w = gt.assortativity(g, g.vp[pol])
        group_segregation.loc[(graph_name, pol), 'Non Weighted Assortativity'] = seg_w[0]
        #print(seg_w)
        
        # Weighted
        seg_no_w = gt.assortativity(g, g.vp[pol], eweight=g.ep['Normal Weight'])
        group_segregation.loc[(graph_name, pol), 'Normal Weighted Assortativity'] = seg_no_w[0]
        
        seg_no_w = gt.assortativity(g, g.vp[pol], eweight=g.ep['Number of rts'])
        group_segregation.loc[(graph_name, pol), 'Weighted Assortativity'] = seg_no_w[0]
        
    #Global
    seg = gt.assortativity(g, g.vp['Political Label'], eweight=g.ep['Normal Weight'])
    global_segregation.loc[(graph_name), 'Normal Weighted Assortativity'] = seg[0]
    
    seg = gt.assortativity(g, g.vp['Political Label'], eweight=g.ep['Number of rts'])
    global_segregation.loc[(graph_name), 'Weighted Assortativity'] = seg[0]
    
    seg = gt.assortativity(g, g.vp['Political Label'])
    global_segregation.loc[(graph_name), 'Non Weighted Assortativity'] = seg[0]

100%|██████████| 61/61 [02:39<00:00,  2.62s/it]


---
## Homiphily Index

refers to the tendency of individuals (or nodes in a network) to associate and bond with similar others. The similarity can be based on various attributes such as social characteristics, behaviors, or beliefs. In the context of a network, this implies that nodes are more likely to form connections with other nodes that belong to the same group or share similar attributes. 

Measuring Homophily. We begin with some simple definitions that are important in measuring homophily and also in presenting the model.

Let $ N $ denote the number of type $ i $ individuals in the population, and let $ w_i = \frac{N_i}{N} $ be the relative fraction of type $ i $ in the population, where $ N = \sum_k N_k $.

Let $ s_i $ denote the average number of friendships that agents of type $ i $ have with agents who are of the same type, and let $ d_i $ be the average number of friendships that type $ i $ agents form with agents of types different from $ i $. Let $ t_i = s_i + d_i $ be the average total number of friendships that type $ i $ agents form.

The homophily index $ H_i $ measures the fraction of the ties of individuals of type $ i $ that are with that same type.

**Definition 1** The homophily index $ H_i $ is defined by

$$ H_i = \frac{s_i}{s_i + d_i} $$

The profile $ (s, d) $ exhibits *baseline homophily* for type $ i $ if $ H_i = w_i $.

The profile $ (s, d) $ exhibits *inbreeding homophily* for type $ i $ if $ H_i > w_i $.

Generally, there is a difficulty in simply measuring homophily according to $ H_i $. For example, consider a group that comprises 95% of a population. Suppose that its same-type friendships are 95% of its friendships. Compare this to a group that comprises 5% of a population and has 96% of its friendships being same-type. Although both have the same homophily index, they are very different in terms of how homophilous they are relative to how homophilous they could be. Comparing the homophily index, $ H_i $, to the baseline, $ w_i $, provides some information, but even that does not fully capture the idea of how biased a group is compared to how biased it could potentially be. To take care of this we use the inbreeding homophily index introduced by Coleman [Coleman J. (1958) *Human Organization* 17:28–36] that normalizes the homophily index by the potential extent to which a group could be biased.

**Definition 2** Coleman's inbreeding homophily index of type $i$ is

$$IH_i = \frac{H_i - w_i}{1 - w_i}$$

This index measures the amount of bias with respect to baseline homophily as it relates to the maximum possible bias (the term $ 1 - w_i $). It can be easily checked that we have inbreeding homophily for type $ i $ if and only if $ IH_i > 0 $, and inbreeding heterophily for type $ i $ if and only if $ IH_i < 0 $. The index of inbreeding homophily is 0 if there is pure baseline homophily, and 1 if a group completely inbreeds.

In [16]:
def process_file(file, categories):
    results = []
    g = gt.load_graph(file)
    graph_name = file.split('/')[-1].split('.')[0].split('_')[-1]
    Homiphily_dict = Seg.homophily_index(graph = g, property_name = "Political Label")
    H = Homiphily_dict ['H_i']
    IH = Homiphily_dict ['IH_i']
    for pol in categories:
        results.append(((graph_name, pol), H[pol], IH[pol]))
    return results

# Run processing in parallel
with concurrent.futures.ProcessPoolExecutor() as executor:
    futures = [executor.submit(process_file, file, categories) for file in files]
    
    # Process results as they complete
    for future in tqdm(concurrent.futures.as_completed(futures), total=len(files)):
        try:
            results = future.result()
            # Update group_segregation DataFrame with results
            for key, H_value, IH_value in results:
                group_segregation.loc[key, 'Homiphily Index'] = H_value
                group_segregation.loc[key, 'Inbreeding Homiphily Index'] = IH_value
        except Exception as e:
            print(f'Generated an exception: {e}')

100%|██████████| 61/61 [00:18<00:00,  3.24it/s]


---
## Spectral Segregation Index

Explicación pendiente

In [None]:
# CODIGOOOOOOOOOOOOOOOOOO

---
## Outputs

In [17]:
group_segregation.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Classic Freeman,Attention to Others,Other's Attention,Attention From Izquierda To,Attention From Derecha To,Attention From Centro To,Attention From Sin Clasificar To,Attention Izquierda Took From,Attention Derecha Took From,Attention Centro Took From,Attention Sin Clasificar Took From,Non Weighted Assortativity,Normal Weighted Assortativity,Weighted Assortativity,Homiphily Index,Inbreeding Homiphily Index
Date,Political Label,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
2021-04-28,Centro,0.231295,0.637107,1.141433,1.585731,0.84438,5.189496,1.083448,0.560056,0.381721,5.189496,6.399353,0.157806,0.250145,0.961728,0.401065,0.337611
2021-04-28,Derecha,0.797677,0.338138,0.108538,0.065723,4.097602,0.356637,0.562548,0.053359,4.097602,0.788893,5.013155,0.794928,0.752882,1.042385,0.748014,0.691135
2021-04-28,Izquierda,0.573422,0.464827,0.134231,1.578295,0.065587,0.643164,0.537307,1.578295,0.080784,1.821039,3.545422,0.558983,0.571562,1.100028,0.762949,0.367111
2021-04-28,Sin Clasificar,0.291349,0.60246,4.205248,0.392408,0.682009,0.813383,2.110595,0.059469,0.076531,0.137711,2.110595,0.033214,0.187617,0.951325,0.520936,0.470878
2021-04-29,Centro,0.207262,0.603253,1.211583,1.710093,0.936834,5.756704,1.430013,0.566692,0.349552,5.756704,6.044028,0.150752,0.260061,1.0038,0.413519,0.351384


In [18]:
global_segregation.head()

Unnamed: 0,Freeman Global,Normal Weighted Assortativity,Weighted Assortativity,Non Weighted Assortativity
2021-04-28,0.537315,0.5016,1.182048,0.501394
2021-04-29,0.54888,0.508322,0.997113,0.48994
2021-04-30,0.561385,0.511913,0.947548,0.484228
2021-05-01,0.55082,0.511692,-1.844281,0.46701
2021-05-02,0.524167,0.493515,0.792956,0.449811


In [13]:
individual_segregation.sort_index(axis=1, inplace=True)
individual_segregation.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Proximity index on 2021-04-28,Proximity index on 2021-04-29,Proximity index on 2021-04-30,Proximity index on 2021-05-01,Proximity index on 2021-05-02,Proximity index on 2021-05-03,Proximity index on 2021-05-04,Proximity index on 2021-05-05,Proximity index on 2021-05-06,Proximity index on 2021-05-07,...,Proximity index on 2021-06-18,Proximity index on 2021-06-19,Proximity index on 2021-06-20,Proximity index on 2021-06-21,Proximity index on 2021-06-22,Proximity index on 2021-06-23,Proximity index on 2021-06-24,Proximity index on 2021-06-25,Proximity index on 2021-06-26,Proximity index on 2021-06-27
Node,Political Label,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
0,Centro,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
0,Derecha,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
0,Izquierda,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
0,Sin Clasificar,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Centro,0.0,6.494631,2.176593,0.903936,1.468899,2.237959,3.303213,3.411673,3.78605,2.412731,...,3.320311,2.011789,1.813559,2.807586,5.121759,5.15352,3.979674,2.349102,0.0,0.0


In [14]:
# Run to save
#group_segregation.to_pickle('/mnt/disk2/Data/Pickle/group_segregation.pkl')
#global_segregation.to_pickle('/mnt/disk2/Data/Pickle/global_segregation.pkl')
individual_segregation.to_pickle('/mnt/disk2/Data/Pickle/individual_segregation.pkl')

In [7]:
group_segregation = pd.read_pickle('/mnt/disk2/Data/Pickle/group_segregation.pkl')
global_segregation = pd.read_pickle('/mnt/disk2/Data/Pickle/global_segregation.pkl')