# A graph of the articles of the Constitution of India

**Data source**: # https://web.archive.org/web/20081022080607/http://www.commonlii.org/in/legis/const/2004/index.html

In [1]:
import re
import networkx as nx
from collections import defaultdict

The text of the constitution was manually copied into a single text file.

## Reading in the file

In [2]:
with open('constitution.txt', 'r') as f:
    lines = f.readlines()

## Splitting into Articles

This is based on the observation that an Article begins with a number followed optionally by a letter and then a dot.

Whenever a line begins in such a way, it is considered as the beginning of a new Article. As such it does not care about the text delimiting PARTs.

In [3]:
articles = defaultdict(str)
articleno = 0
for line in lines:
    match = re.match('^\d+\w?\.', line) # re.match('^\d+\.|\d+\w\.', line)
    if match:
        articleno = match.string[match.start():match.end()].replace('.', '')
    articles[articleno] += line

In [4]:
print(f'List of articles: {list(articles.keys())}')

List of articles: [0, '1', '2', '2A', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29', '30', '31', '31A', '31B', '31C', '31D', '32', '32A', '33', '34', '35', '36', '37', '38', '39', '39A', '40', '41', '42', '43', '43A', '44', '45', '46', '47', '48', '48A', '49', '50', '51', '51A', '52', '53', '54', '55', '56', '57', '58', '59', '60', '61', '62', '63', '64', '65', '66', '67', '68', '69', '70', '71', '72', '73', '74', '75', '76', '77', '78', '79', '80', '81', '82', '83', '84', '85', '86', '87', '88', '89', '90', '91', '92', '93', '94', '95', '96', '97', '98', '99', '100', '101', '102', '103', '104', '105', '106', '107', '108', '109', '110', '111', '112', '113', '114', '115', '116', '117', '118', '119', '120', '121', '122', '123', '124', '125', '126', '127', '128', '129', '130', '131', '131A', '132', '133', '134', '134A', '135', '136', '137', '138', '139', '139A', '140', '141', '142',

## Creating a graph based on mentions

This is based on observing the patterns of how Articles are mentioned in the text.

e.g. "article 234", "articles 123 and 145", "articles 124, 125 and 135" etc

When such mentions are found a link is created between the Articles.

In [5]:
patterns = [r"article \d+\w?",
            r"articles (\d+\w?, ){0}\d+\w? and \d+\w?", 
            r"articles (\d+\w?, ){1}\d+\w? and \d+\w?",
            r"articles (\d+\w?, ){2}\d+\w? and \d+\w?",
            r"articles (\d+\w?, ){3}\d+\w? and \d+\w?",
            r"articles (\d+\w?, ){4}\d+\w? and \d+\w?",
            r"articles (\d+\w?, ){5}\d+\w? and \d+\w?",
            r"articles (\d+\w?, ){6}\d+\w? and \d+\w?",
            r"articles (\d+\w?, ){7}\d+\w? and \d+\w?",
            r"articles (\d+\w?, ){8}\d+\w? and \d+\w?",
            r"articles (\d+\w?, ){9}\d+\w? and \d+\w?",
            r"articles (\d+\w?, ){10}\d+\w? and \d+\w?",
            r"articles (\d+\w?, ){11}\d+\w? and \d+\w?",
            r"articles (\d+\w?, ){12}\d+\w? and \d+\w?",
            r"articles (\d+\w?, ){13}\d+\w? and \d+\w?"]

In [6]:
g = nx.Graph()

In [7]:
for article, text in articles.items():
    for pattern in patterns:
        mentions = [x.string[x.start():x.end()] 
                    for x in 
                    re.finditer(pattern, text)] # sample: "articles 12, 33A and 53"

        if len(mentions) > 0:
            for mention in mentions:
                mentioned_articles = re.findall(r'\d+\w?', mention) # sample: [12, 33A, 53]
                for mentioned_article in mentioned_articles:
                    g.add_edge(mentioned_article, article)

# Analyzing the Graph

Out of the 452 (inclusive of the ones with letter suffixes) Articles that were discovered, 236 seem to have some connection to another.

In [8]:
len(g.nodes)

236

Article 394 scores high in various graph centrality measures, but that's mostly because of this one historic line: 

**"This article and articles 5, 6, 7, 8, 9, 60, 324, 366, 367, 379, 380, 388, 391, 392 and 393 shall come into force at once, and the remaining provisions of this Constitution shall come into force on the twenty-sixth day of January, 1950, which day is referred to in this Constitution as the commencement of this Constitution."**

Getting the nodes with the highest scores for a few of the measures below so that 394 doesn't hog all the limelight...

### Most connected Articles

In [9]:
d = dict(nx.degree(g))
sorted(d, key=d.get, reverse=True)[:3]

['394', '368', '239A']

### Articles with high betweenness centrality

In [10]:
d = dict(nx.betweenness_centrality(g))
sorted(d, key=d.get, reverse=True)[:3]

['368', '13', '394']

### Articles with high eigenvector centrality

In [11]:
d = dict(nx.eigenvector_centrality(g))
sorted(d, key=d.get, reverse=True)[:3]

['394', '5', '6']

### Edges with high betweenness

In [12]:
d = dict(nx.edge_betweenness_centrality(g))
sorted(d, key=d.get, reverse=True)[:3]

[('368', '13'), ('368', '239A'), ('13', '31A')]

## Creating an Obsidian Vault

Using the same ideas as above, we create an [Obsidian](https://obsidian.md) vault (bunch of .md files in a folder), with the Article numbers surrounded in double square brackets.

In [13]:
for article, text in articles.items():
    with open(f'obsidian/{article}.md', 'w') as f:
        for pattern in patterns:
            mentions = [x.string[x.start():x.end()] 
                        for x in 
                        re.finditer(pattern, text)]

            if len(mentions) > 0:
                for mention in mentions:
                    new_mention = re.sub(r'(\d+\w?)', '[[\\1]]', mention) + ' '
                    text = re.sub(re.compile(mention + '[\,|\.| ]'), new_mention, text)

        f.write(text)

And now when the folder is opened in Obsidian, we get a novel UI to interact with the Constitution of India with links, backlinks, search, and a cool graphs!