## More on Analyzing Constitutions

In the previous workbook, we didn't get very far with our analysis. A reason for this is that we did not attempt to break down the constitutions very seriously. They have a definite structure, so we should take that into account and see where it gets us. Accordingly:

In [1]:
import os

import numpy as np
import pandas as pd

We select a file randomly, just to look at its structure. So:

In [16]:
filelist = os.listdir()
file     = filelist[16]
print(file)

AR1864_final_parts_0.txt


Let's read in the file line by line and make a list out of the non-blank components of the list.

In [17]:
with open(file) as f:
    content = f.readlines()
    
content = [x.strip() for x in content]
content = list(filter(None, content))

In [18]:
len(content)

983

We see that we have these definitive markers for the beginning and end of certain things. So, ASTART marks the beginning of an article, and AEND marks the end of an article. We also can add in beginnings and endings for Sections. Anyways:

In [19]:
artbeg = np.zeros(len(content))
artend = np.zeros(len(content))

secbeg = np.zeros(len(content))
secend = np.zeros(len(content))

conbeg = np.zeros(len(content))
conend = np.zeros(len(content))

count = 0

for line in content:
    if line.find("CSTART") > 0:
        conbeg[count] = 1
    if line.find("CEND")   > 0:
        conend[count] = 1
    if line.find("ASTART") > 0:
        artbeg[count] = 1
    if line.find("AEND")   > 0:
        artend[count] = 1
    if line.find("SSTART") > 0:
        secbeg[count] = 1
    if line.find("SEND")   > 0:
        secend[count] = 1
    count = count + 1

So, we now can make a dataframe out of the constitution, for one:

In [20]:
Foo = pd.DataFrame([content, list(artbeg), list(artend), list(secbeg), list(secend), list(conbeg), list(conend)])

In [21]:
Foo = Foo.T

In [22]:
text_lines=np.zeros(len(content))
count = 0
for line in Foo[0]:
    if any(c for c in line if c.islower()):
        text_lines[count] = 1
    count = count + 1

In [23]:
TL = pd.DataFrame(text_lines)

In [24]:
Foo['textdum'] = TL

In [25]:
paragraphs = []
count = 0
for line in Foo[0]:
    newline = []
    if Foo['textdum'][count] == 1:
        newline.append(Foo['textdum'][count])
        
    count = count + 1

In [26]:
Foo

Unnamed: 0,0,1,2,3,4,5,6,textdum
0,Arkansas was done by Doug Campbell,0,0,0,0,0,0,1.0
1,It is ready to go.,0,0,0,0,0,0,1.0
2,JW 04/05/2005,0,0,0,0,0,0,0.0
3,*** CSTART AR 03/04/1864 02/11/1868 ***,0,0,0,0,1,0,0.0
4,*** ASTART 9001.0 AR 1864 ***,1,0,0,0,0,0,0.0
5,CONSTITUTION OF ARKANSAS-1864,0,0,0,0,0,0,0.0
6,"We, the people of the State of Arkansas, havin...",0,0,0,0,0,0,1.0
7,in conformity with the Constitution of the Uni...,0,0,0,0,0,0,1.0
8,"consequences of the existing rebellion, do her...",0,0,0,0,0,0,1.0
9,"of the State of Arkansas, which assembled in t...",0,0,0,0,0,0,1.0
