# How bills moved in the 20th National Assembly

This notebook answers one question:
- Where was the bottleneck in the legislative process of the 20th National Assembly in South Korea?

#### Summary
- This notebook uses one publicly available dataset of the National Assembly's legislation.
- Data from the 20th Congressional Session (May 2016 ~ May 2020) will be analyzed.
- For data analysis, this notebook uses Pandas Dataframe.
- For visualization, this notebook uses Sankey Diagram (https://plotly.com/python/sankey-diagram/#what-about-dash)

In [31]:
# Import modules.
import requests
import itertools
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.font_manager as fm
from matplotlib.pyplot import figure
import xml.etree.ElementTree as et 
import pandas as pd
from bs4 import BeautifulSoup
import plotly.graph_objects as go
import random

### 1. Getting and Cleaning the Data

For analysis, this notebook needs one dataset:
- List of all bills related to North Korea

Then, using Pandas Dataframe, this notebook will parse bills that were proposed in the 20th National Assembly.

In [32]:
# Import local excel file as dataframe.
bills = pd.read_excel("all_bills.xlsx")

In [33]:
# Filter bills proposed in the 20th National Assembly. Check the dataframe.
twenty_bills = bills[bills["대수"] == 20]
twenty_bills.head(1)

Unnamed: 0,의안활동구분,대수,의안번호,의안명,제안자,소관위원회,의결결과,총투표수,찬성,반대,기권,제안일,위원회심사_회부일,위원회심사_의결일,법사위체계자구심사_회부일,법사위체계자구심사_의결일,본회의심의_의결일,정부이송일,공포일
8,법률안,20,2002850,정치자금법 일부개정법률안(이원욱의원 등 15인),이원욱의원 등 15인,행정안전위원회,임기만료폐기,,,,,2016-10-24,2019-09-02,,,,2020-05-29,,


### 2. Analysis

Legislation in the National Assembly can be divided into four steps.
- One. Lawmaker(s) proposes a bill to a committee.
- Two. The committee receives the bill and puts it into a vote. If passed, the bill moves to the legislation and judiciary committee. 
- Three. The legislation and judiciary commitee decides whether the bill conflicts with the existing laws or constitution. If passed, the bill moves to the general session.
- Four. The general session puts the bill into a vote. If passed, the bill moves to the presidential Blue House, where the president should sign or veto the bill within 15 days.

The dataframe lists when a bill passed each stage in the legislative process. If a bill did not move through, the dataframe records it as NaN (non-value). By using groupby and count function, this notebook counts how many bills passed each step (The count function does not count NaN).

In [34]:
# Parse data needed for analysis. Change Korean names of the committees into English.
columns_sub =  {'소관위원회':'COMMITTEE',
                '제안일':'PROPOSE',
                '위원회심사_회부일':'COMMITEE_SUBMIT',
                '위원회심사_의결일':'COMMITEE_VOTE',
                '법사위체계자구심사_회부일':'LAW_SUBMIT',
                '법사위체계자구심사_의결일':'LAW_VOTE',
                '총투표수':'GENERAL_SESSION',
                '공포일':'ANNOUNCEMENT'}

twenty_bills_2 = twenty_bills[columns_sub.keys()].rename(columns = columns_sub).reset_index()
twenty_bills_2 = twenty_bills_2.drop(columns=['index'])

In [35]:
# Check the dataframe.
twenty_bills_2.head(5)

Unnamed: 0,COMMITTEE,PROPOSE,COMMITEE_SUBMIT,COMMITEE_VOTE,LAW_SUBMIT,LAW_VOTE,GENERAL_SESSION,ANNOUNCEMENT
0,행정안전위원회,2016-10-24,2019-09-02,,,,,
1,정무위원회,2018-09-12,2018-09-13,,,,,
2,산업통상자원중소벤처기업위원회,2019-05-02,2019-05-03,,,,,
3,환경노동위원회,2016-11-01,2016-11-02,,,,,
4,국토교통위원회,2019-12-23,2019-12-24,,,,,


In [36]:
# Using groupby function, categorize bills into committees to which they were proposed.
twenty_bills_count = twenty_bills_2.groupby(by = 'COMMITTEE').count().sort_values(by = ["PROPOSE"])
a = twenty_bills_count.iloc[0:6].sum()

# For readability, group minor committees together and create a new row.
twenty_bills_count.loc["기타위원회",:] = twenty_bills_count.iloc[0:6].sum(axis = 0)
twenty_bills_count = twenty_bills_count.drop(["사법개혁 특별위원회", "정치개혁 특별위원회", 
                                              "헌법개정 및 정치개혁 특별위원회",
                                              "미래창조과학방송통신위원회", "정보위원회", 
                                              "안전행정위원회", "산업통상자원위원회"])

# Change all values into integer.
twenty_bills_count = twenty_bills_count.astype(int).sort_values(by = ["PROPOSE"])

In [37]:
# Create list of colors for visualization.
temp_color_list = []
temp_color_list += ['#845ec2', '#D65DB1', '#FF6F91', '#FF9671', '#FFC75F', '#F9F871'] * 3
twenty_bills_count["COLOR"] = temp_color_list 

In [41]:
# Check dataframe.
twenty_bills_count

Unnamed: 0_level_0,PROPOSE,COMMITEE_SUBMIT,COMMITEE_VOTE,LAW_SUBMIT,LAW_VOTE,GENERAL_SESSION,ANNOUNCEMENT,COLOR
COMMITTEE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
기타위원회,115,88,66,65,64,65,65,#845ec2
교육문화체육관광위원회,149,108,119,119,119,125,125,#D65DB1
외교통일위원회,255,234,57,44,43,43,43,#FF6F91
여성가족위원회,263,225,75,75,74,74,74,#FF9671
국방위원회,412,377,82,69,68,68,68,#FFC75F
국회운영위원회,468,450,27,26,26,26,26,#F9F871
문화체육관광위원회,694,661,113,109,103,103,103,#845ec2
과학기술정보방송통신위원회,826,776,181,125,124,126,126,#D65DB1
교육위원회,835,801,95,98,90,97,97,#FF6F91
산업통상자원중소벤처기업위원회,1055,976,229,219,215,216,214,#FF9671


### 3. Visualization

This notebook uses Plotly's Sankey Diagram to visualize how bills moved in the 20th National Assembly. The diagram will allow us to see in which stage in the legislation were the largest number of bills were killed.

In [51]:
# Build a sankey diagram.
labels = ["LAWMAKERS", 
          "ETC", "EDU&CULTURE", "FOREIGN&UNIFICATION", "WOMEN&FAMILY", "DEFENSE", 
          "CONGRESS MGMT", "CULTURE&SPORTS&TOUR", "SCI&TECH", "EDUCATION", "INDUSTRY",
          "AGRI&FOOD", "BUDGET", "POLICY", "ENVIRON&LABOR", "LAND&TRANSPORT", 
          "JUDICIARY", "HEALTH&WELFARE", "SAFETY&ADMIN",
          "JUDICIARY", "GENERAL", "DECLARED"]

# Create source nodes.
source_list = []
source_list += 18 * [0]
source_list += list(range(1,19))
source_list += 18 * [19]
source_list += 18 * [20]

# Create target nodes.
target_list = []
target_list += list(range(1,19))
target_list += 18 * [19]
target_list += 18 * [20]
target_list += 18 * [21]

# Attribute values to each node.
value_list = []
value_list += twenty_bills_count['PROPOSE'].tolist()
value_list += twenty_bills_count['COMMITEE_VOTE'].tolist()
value_list += twenty_bills_count['GENERAL_SESSION'].tolist()
value_list += twenty_bills_count['ANNOUNCEMENT'].tolist()

# Add colors to each node.
tab_color_list = []
tab_color_list += ["gray"]
tab_color_list += twenty_bills_count['COLOR'].tolist()
tab_color_list += ["yellow", "orange", "red"]

fig = go.Figure(data = [go.Sankey(node = dict(pad = 10, 
                                              thickness = 10, 
                                              line = dict(color = "black", width = 0.5),
                                              label = labels,
                                              color = tab_color_list),
                                  link = dict(source = source_list,
                                              target = target_list,
                                              value = value_list,
                                             ))])

fig.update_layout(title_text="How bills moved in the 20th National Assembly (Simplified)", 
                  font_size=10)
fig.show()

### 4. Conclusion

- The bottleneck in the legislative process in the 20th National Assembly was each committee to which bills were proposed. In some cases, the proportion of passed bills to the total bills proposed was less than 10%.

- Next time, I will look at which factors cause a bottleneck in each committee.