## More on Analyzing Constitutions

In the previous workbook, we didn't get very far with our analysis. A reason for this is that we did not attempt to break down the constitutions very seriously. They have a definite structure, so we should take that into account and see where it gets us. Accordingly:

In [27]:
import os

import numpy as np
import pandas as pd

We select a file randomly, just to look at its structure. So:

In [6]:
filelist = os.listdir()
file     = filelist[10]
print(file)

AL1875_final_parts_0.txt


Let's read in the file line by line and make a list out of the non-blank components of the list.

In [37]:
with open(file) as f:
    content = f.readlines()
    
content = [x.strip() for x in content]
content = list(filter(None, content))

In [38]:
content[0:26]

['This constitution is ready to go.',
 'JW 12.29.01',
 'CONSTITUTION OF 1875',
 'WHOLE TEXT',
 '*** CSTART AL 1/1/1875 11/28/1901 ***',
 '*** ASTART  9001.0 AL 1875 ***',
 'PREAMBLE',
 'We, the people of the State of Alabama, in order to establish justice, insure domestic tranquillity,',
 'provide for the common defense, promote the general welfare, and secure to ourselves and to our',
 'posterity life, liberty, and property; profoundly grateful to Almighty God for this inestimable right,',
 'and invoking His favor and guidance, do ordain and establish the following Constitution and form',
 'of government for the State of Alabama:',
 '*** AEND ***',
 '*** ASTART 001.0 AL 1875 ***',
 'ARTICLE I.',
 'DECLARATION OF RIGHTS.',
 'That the great, general, and essential principles of liberty and free government may be recognized',
 'and established, we declare:',
 '*** SSTART 001.0 001.0 0 AL 1875 ***',
 'Section 1. That all men are equally free and independent; that they are endowed by their

We see that we have these definitive markers for the beginning and end of certain things. So, ASTART marks the beginning of an article, and AEND marks the end of an article. We also can add in beginnings and endings for Sections. Anyways:

In [43]:
artbeg = np.zeros(len(content))
artend = np.zeros(len(content))

secbeg = np.zeros(len(content))
secend = np.zeros(len(content))

conbeg = np.zeros(len(content))
conend = np.zeros(len(content))

count = 0

for line in content:
    if line.find("CSTART") > 0:
        conbeg[count] = 1
    if line.find("CEND")   > 0:
        conend[count] = 1
    if line.find("ASTART") > 0:
        artbeg[count] = 1
    if line.find("AEND")   > 0:
        artend[count] = 1
    if line.find("SSTART") > 0:
        secbeg[count] = 1
    if line.find("SEND")   > 0:
        secend[count] = 1
    count = count + 1

So, we now can make a dataframe out of the constitution, for one:

In [44]:
Foo = pd.DataFrame([content, list(artbeg), list(artend), list(secbeg), list(secend), list(conbeg), list(conend)])

In [45]:
Foo = Foo.T

In [46]:
Foo

Unnamed: 0,0,1,2,3,4,5,6
0,This constitution is ready to go.,0,0,0,0,0,0
1,JW 12.29.01,0,0,0,0,0,0
2,CONSTITUTION OF 1875,0,0,0,0,0,0
3,WHOLE TEXT,0,0,0,0,0,0
4,*** CSTART AL 1/1/1875 11/28/1901 ***,0,0,0,0,1,0
5,*** ASTART 9001.0 AL 1875 ***,1,0,0,0,0,0
6,PREAMBLE,0,0,0,0,0,0
7,"We, the people of the State of Alabama, in ord...",0,0,0,0,0,0
8,"provide for the common defense, promote the ge...",0,0,0,0,0,0
9,"posterity life, liberty, and property; profoun...",0,0,0,0,0,0


In [47]:
np.sum(conbeg)

1.0

In [48]:
np.sum(conend)

1.0