<a href="https://colab.research.google.com/github/moshe-hadad/knesset-network/blob/main/knessset_network.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<div style="text-align: center;">
    <img src="https://en.idi.org.il/media/9162/knesset.jpg" alt="Knesset" />
    <h1>Knesset Network Project - Finding the rebel</h1>
</div>

# Introduction
As part of the Social Network Analysis course at the University of Haifa, we are undertaking the Knesset Network Project. <br>
This project aims to explore and reveal the hidden connections between members of the Knesset, connections that are not immediately apparent. Utilizing advanced graph network algorithms, we will analyze the relationships and interactions between these members to uncover patterns and insights into their social networks. Through this analysis, we hope to gain a deeper understanding of the political landscape and the dynamics within the Knesset. <br>

### The first step is to install external libraries for supporting visualizations




In [2]:
import pandas as pd
from tqdm import tqdm
# Run this in first run to install pyvis for the graph display
!pip install pyvis

Looking in indexes: https://pypi.python.org/simple, https://private_pypi:****@pkgs.dev.azure.com/labsTLV/Labs_TelAviv/_packaging/CyTwin-pypi/pypi/simple/



[notice] A new release of pip available: 22.3.1 -> 25.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


### Next, we set up constants variables for the links to the data sets on GitHub


In [1]:
DATA_BASE_URL ='https://raw.githubusercontent.com/moshe-hadad/knesset-network/main/data'

FACTION_URL =f'{DATA_BASE_URL}/factions.csv'
FACTION_PEOPLE_URL = f'{DATA_BASE_URL}/faction_people.csv'
PERSONS_URL = f'{DATA_BASE_URL}//KNS_Person_data.csv'
BILLS_INITIATORS_URL = f'{DATA_BASE_URL}//KNS_BillInitiator.csv'
BILLS_URL = f'{DATA_BASE_URL}/KNS_Bill.csv'
CHUNK_SIZE =10000

KNESSET_NUM = 24

In [None]:
# Define a function to remoatly load the data sets with a progress bar
def remote_loader(url: str, chunk_size: int = CHUNK_SIZE):
    # Initialize tqdm progress bar (unknown total)
    progress_bar = tqdm(unit="lines")
    # Read CSV in chunks with an iterator
    data_frames = pd.read_csv(url, chunksize=chunk_size)

    # Process each chunk while updating progress bar
    final_df = pd.DataFrame()  # Empty DataFrame to store results
    for chunk in data_frames:
        final_df = pd.concat([final_df, chunk], ignore_index=True)
        progress_bar.update(len(chunk))

    # Close the progress bar
    progress_bar.close()

    # Display the loaded DataFrame (optional)
    return final_df

# Merging Data Sets: Bills and Bill Initiators

In this section, we will  merge two datasets: **Bills** and **Bill_Initiators**. The **Bills** dataset contains information about bills issued since the beginning of the Knesset, including the unique `Bill ID` and the `Knesset Number`. The **Bill_Initiators** dataset contains information about Knesset members who initiated bills, with `Bill ID` serving as a reference to the corresponding bill.

Our goal is to:
1. Filter the **Bills** dataset to include only bills issued during a specific Knesset number.
2. Merge the filtered **Bills** dataset with the **Bill_Initiators** dataset.

---

## Example Datasets

### Bills Dataset
| Knesset Number | Bill ID | Bill Title       |
|----------------|---------|------------------|
| 20             | 101     | Education Reform |
| 20             | 102     | Health Act       |
| 21             | 201     | Tax Reduction    |
| 21             | 202     | Environmental Law|

### Bill_Initiators Dataset
| Bill ID | Initiator Name    |
|---------|-------------------|
| 101     | John Doe          |
| 102     | Jane Smith        |
| 201     | Michael Johnson   |
| 202     | Emily Davis       |

---

## Steps to Achieve the Goal

### Step 1: Filter Bills by Knesset Number
We will filter the **Bills** dataset to include only bills from a specific Knesset number. For example, if we are interested in bills from Knesset number `20`, we will extract all rows where `Knesset Number` equals `20`.

Filtered Bills Dataset for Knesset Number `20`:
| Knesset Number | Bill ID | Bill Title       |
|----------------|---------|------------------|
| 20             | 101     | Education Reform |
| 20             | 102     | Health Act       |

### Step 2: Merge Datasets
Next, we will merge the filtered **Bills** dataset with the **Bill_Initiators** dataset using the `Bill ID` column as the key, this is an inner join, meaning only rows
which have a match will be merged and the reset is filtered out.

Merged Dataset:
| Knesset Number | Bill ID | Bill Title       | Initiator Name |
|----------------|---------|------------------|----------------|
| 20             | 101     | Education Reform | John Doe       |
| 20             | 102     | Health Act       | Jane Smith     |

---

## Python Code Implementation

Below is an example Python code snippet that demonstrates how to achieve this using pandas:


In [57]:
# Load data sets from the remote location
bills_initiators = remote_loader(BILLS_INITIATORS_URL)
bills = remote_loader(BILLS_URL)

Unnamed: 0,FactionID,Name,KnessetNum,StartDate,EndDate,IsCurrent,NumberOfMembers,StatusID,StatusDesc,LastUpdatedDate
0,3,ישראל אחת,15,1999-06-07T00:00:00,,False,,,,2019-01-24T11:45:06.46
1,4,הליכוד,15,1999-06-07T00:00:00,,False,,,,2019-01-24T11:45:06.46
2,5,"ש""ס",15,1999-06-07T00:00:00,,False,,,,2019-01-24T11:45:06.46
3,6,מרצ,15,1999-06-07T00:00:00,,False,,,,2019-01-24T11:45:06.46
4,7,האיחוד הלאומי - ישראל ביתנו,15,2000-02-01T00:00:00,,False,,,,2022-01-23T15:10:00


In [None]:
# Filter bills to contain only the bills for the specific Knesset term (number)
filter_per_knesset_term_df = bills[bills['KnessetNum'] == str(KNESSET_NUM)]

# Merge the data sets
bills_and_members = pd.merge(bills_initiators, filter_per_knesset_term_df, on='BillID', how='inner')

# Merging Datasets: Factions and Faction People

Next we will merge two datasets: **Factions** and **Faction People**. The **Factions** dataset contains information about Knesset parties (factions) for the entire history of the Knesset, including `Faction ID`, `Faction Name`, and `Knesset Number`. The **Faction People** dataset maps Knesset members to their respective factions during specific Knesset terms, including `Faction ID`, `Knesset Number`.

Our goal is to:
1. Filter both datasets for a specific Knesset number.
2. Merge the filtered datasets using `Faction ID` as the key.

---

## Example Datasets

### Factions Dataset
| Knesset Number | Faction ID | Faction Name         |
|----------------|------------|----------------------|
| 20             | F001       | Likud               |
| 20             | F002       | Yesh Atid           |
| 21             | F003       | Shas                |
| 21             | F004       | National Unity      |

### Faction People Dataset
| Knesset Number | Faction ID | Member Name         |
|----------------|------------|---------------------|
| 20             | F001       | Benjamin Netanyahu  |
| 20             | F002       | Yair Lapid          |
| 21             | F003       | Aryeh Deri          |
| 21             | F004       | Benny Gantz         |

---

## Steps to Achieve the Goal

### Step 1: Filter Data for a Specific Knesset Number
We will filter both datasets to include only rows corresponding to a specific Knesset number. For example, if we are interested in data from Knesset number `20`, we will extract rows where `Knesset Number` equals `20`.

Filtered **Factions** Dataset for Knesset Number `20`:
| Knesset Number | Faction ID | Faction Name   |
|----------------|------------|----------------|
| 20             | F001       | Likud          |
| 20             | F002       | Yesh Atid      |

Filtered **Faction People** Dataset for Knesset Number `20`:
| Knesset Number | Faction ID | Member Name         |
|----------------|------------|---------------------|
| 20             | F001       | Benjamin Netanyahu  |
| 20             | F002       | Yair Lapid          |

### Step 2: Merge Datasets
Next, we will merge the filtered **Factions** dataset with the filtered **Faction People** dataset using the `Faction ID` column as the key.

Merged Dataset:
| Knesset Number | Faction ID | Faction Name   | Member Name         |
|----------------|------------|----------------|---------------------|
| 20             | F001       | Likud          | Benjamin Netanyahu  |
| 20             | F002       | Yesh Atid      | Yair Lapid          |

---

## Python Code Implementation

Below is an example Python code snippet that demonstrates how to achieve this using pandas:


In [None]:
members = remote_loader(FACTION_PEOPLE_URL)
factions_df = remote_loader(FACTION_URL)

# members.set_index('PersonID', inplace=True)
members = members[members['KnessetNum'] == KNESSET_NUM]
factions_df = factions_df[factions_df['KnessetNum'] == KNESSET_NUM]
factions_df = factions_df.drop('KnessetNum', axis=1)

factions_and_members = pd.merge(factions_df, members, on='FactionID', how='inner')

# Merging the Two Main Datasets: Bills and Factions

Now we will merge the two main datasets : **Bills** (merged with Bill Initiators) and **Factions** (merged with Faction People). The merging process will be based on a shared key, `PersonID`, which uniquely identifies individuals across both datasets.

---

## Goal

The goal is to combine the legislative data (Bills and their Initiators) with the factional data (Factions and their Members) to create a unified dataset that provides insights into the legislative activities of Knesset members alongside their factional affiliations during specific Knesset terms.

---

## Example Datasets

### Merged Bills Dataset (Bills + Bill Initiators)
| Knesset Number | Bill ID | PersonID | Initiator Name    |
|----------------|---------|----------|-------------------|
| 20             | 101     | P001     | John Doe          |
| 20             | 102     | P002     | Jane Smith        |
| 21             | 201     | P003     | Michael Johnson   |
| 21             | 202     | P004     | Emily Davis       |

### Merged Factions Dataset (Factions + Faction People)
| Knesset Number | Faction ID | Faction Name   | PersonID | Member Name         |
|----------------|------------|----------------|----------|---------------------|
| 20             | F001       | Likud          | P001     | John Doe            |
| 20             | F002       | Yesh Atid      | P002     | Jane Smith          |
| 21             | F003       | Shas           | P003     | Michael Johnson     |
| 21             | F004       | National Unity | P004     | Emily Davis         |

---

## Steps to Achieve the Goal

### Step 1: Filter Data by Knesset Number
Before merging, ensure that both datasets are filtered for a specific Knesset number if needed. For example, if we are interested in data from Knesset number `20`, we filter both datasets accordingly.

Filtered Merged Bills Dataset for Knesset Number `20`:
| Knesset Number | Bill ID | Bill Title       | PersonID | Initiator Name    |
|----------------|---------|------------------|----------|-------------------|
| 20             | 101     | Education Reform | P001     | John Doe          |
| 20             | 102     | Health Act       | P002     | Jane Smith        |

Filtered Merged Factions Dataset for Knesset Number `20`:
| Knesset Number | Faction ID | Faction Name   | PersonID | Member Name         |
|----------------|------------|----------------|----------|---------------------|
| 20             | F001       | Likud          | P001     | John Doe            |
| 20             | F002       | Yesh Atid      | P002     | Jane Smith          |

### Step 2: Merge Datasets on `PersonID`
Merge the two filtered datasets using `PersonID` as the key. This will combine legislative data with factional affiliations.

Merged Dataset:
| Knesset Number_x | Bill ID   | Bill Title       | PersonID   | Initiator Name    | Knesset Number_y   | Faction ID   | Faction Name   |
|------------------|-----------|------------------|------------|-------------------|--------------------|--------------|----------------|
| 20               | 101       | Education Reform | P001       | John Doe          | 20                 | F001         | Likud          |
| 20               | 102       | Health Act       | P002       | Jane Smith        | 20                 | F002         | Yesh Atid      |

---

## Python Code Implementation

Below is an example Python code snippet that demonstrates how to achieve this using pandas:



In [59]:
# Drop the name columns since it will collide with the Faction Name (also named Name)
bills_and_members = bills_and_members.drop('Name', axis=1)
bills_and_factions = pd.merge(bills_and_members, factions_and_members, on='PersonID', how='left')
print(bills_and_factions.head())
bills_and_factions =  bills_and_factions[['BillInitiatorID', 'BillID', 'PersonID', 'Name']]

Unnamed: 0_level_0,LastName,FirstName,GenderID,GenderDesc,Email,IsCurrent,LastUpdatedDate
PersonID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
48,מלר-הורוביץ,ירדנה,250,נקבה,,False,2015-11-16T09:33:02.47
122,פורדס,גיורא,251,זכר,,False,2020-03-23T23:33:55.387
128,לב,דוד,251,זכר,,False,2020-03-23T23:40:56.46
134,ימין,אלינור,250,נקבה,אלינור ימין,True,2020-11-19T13:43:11.277
136,קפלן,רות,250,נקבה,,False,2020-03-23T23:40:27.3


In [69]:
import networkx as nx
# Function to get person full name for an ID
def person(person_id):
  return f"{persons_df.loc[person_id]['FirstName']} {persons_df.loc[person_id]['LastName']}"

def gender(person_id):
  return persons_df.loc[person_id]['GenderDesc']

G = nx.Graph()

# Function to add a faction
def add_faction(faction_id, name):
    G.add_node(str(faction_id), type='Faction', Name=name,
               NumberOfMembers=0, label=name
              #  ,
              #  shape='icon',
              #  icon={'face': 'FontAwesome', 'code': '\uf0c0', 'size': 50, 'color': 'blue'}
               )

# Function to add a member and connect to a faction
def add_member(person_id, full_name, gender_desc, faction_id):
  faction_id = str(faction_id)
  person_id = str(person_id)
  if faction_id in G and G.nodes[faction_id]['type'] == 'Faction':
      G.add_node(person_id, type='Member', FullName=full_name,
                 GenderDesc=gender_desc, label=full_name
                #  ,
                #  shape='icon',
                #  icon={'face': 'FontAwesome', 'code': '\uf007', 'size': 40, 'color': 'green'}
                 )
      G.add_edge(person_id, faction_id)
      # Update the NumberOfMembers for the faction
      G.nodes[faction_id]['NumberOfMembers'] = G.degree[faction_id]

Creating the Knesset Graph.<br>
In this graph we will create members, factions and which member beongs to which faction.

In [70]:
# Limit the data to the 25th Knesset
factions_df = factions_df[factions_df['KnessetNum'] == KNESSET_TERM]
factions_ppl_df = factions_ppl_df[factions_ppl_df['KnessetNum'] == KNESSET_TERM]

def knesset_graph():
  # Add factions to the graph from the DataFrame
  for _, row in factions_df.iterrows():
      add_faction(row['FactionID'], row['Name'])

  # Add members to the graph from the DataFrame and connect them to factions
  for _, row in factions_ppl_df.iterrows():
    person_id = row['PersonID']
    add_member(row['PersonID'], person(person_id),gender(person_id),
               row['FactionID'])

  # # Display the graph nodes with attributes
  # for node, data in G.nodes(data=True):
  #     print(f"{node}: {data}")

  return G

In [71]:
from pyvis import network
from IPython.display import display, HTML

# Create a sample graph
G = knesset_graph()

# Create a network object with inline resources
net = network.Network(notebook=True, cdn_resources='remote')

# Load the NetworkX graph into the network object
net.from_nx(G)

# net.prep_notebook()
# net.show_buttons(filter_=['nodes'])

# Show the network
net.save_graph("networkx-pyvis.html")
HTML(filename="networkx-pyvis.html")
# net.show("karate_club_graph.html")