# Estrada SONAs

This processes all collated Estrada SONA. Reminder to run the [Philippines SONA](https://github.com/pmagtulis/ph-sona.git) scraper file to collect the **merged** CSV file here.

## Do all your imports

In [1]:
import pandas as pd
import numpy as np
import re
import altair as alt
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfVectorizer
import stopwordsiso as stopwords

## Read CSV

In [2]:
merged= pd.read_csv('../csv/merged.csv')
merged

Unnamed: 0,president,date,title,link,venue,session,speech
0,Manuel L. Quezon,"November 25, 1935",Message to the First Assembly on National Defense,http://www.officialgazette.gov.ph/1935/11/25/m...,"Legislative Building, Manila","First National Assembly, First Session","Mr. Speaker, gentlemen of the National Assemb..."
1,Manuel L. Quezon,"June 16, 1936",On the Country’s Conditions and Problems,http://www.officialgazette.gov.ph/1936/06/16/m...,"Legislative Building, Manila","First National Assembly, First Session","Mr. Speaker, Gentlemen of the National Assemb..."
2,Manuel L. Quezon,"October 18, 1937","Improvement of Philippine Conditions, Philippi...",http://www.officialgazette.gov.ph/1937/10/18/m...,"Legislative Building, Manila","First National Assembly, Second Session","Mr. Speaker, Gentlemen of the National Assemb..."
3,Manuel L. Quezon,"January 24, 1938",Revision of the System of Taxation,http://www.officialgazette.gov.ph/1938/01/24/m...,"Legislative Building, Manila","First National Assembly, Third Session",Gentlemen of the National Assembly: The state...
4,Manuel L. Quezon,"January 24, 1939",The State of the Nation and Important Economic...,http://www.officialgazette.gov.ph/1939/01/24/m...,"Legislative Building, Manila","Second National Assembly, First Session",Gentlemen of the National Assembly: I take pl...
...,...,...,...,...,...,...,...
79,Rodrigo Roa Duterte,"July 23, 2018",Third State of the Nation Address,https://www.officialgazette.gov.ph/2018/07/23/...,"Batasang Pambansa, Quezon City","Seventeenth Congress, Third Session",Kindly sit down. Thank you for your courtesy....
80,Rodrigo Roa Duterte,"July 22, 2019",Fourth State of the Nation Address,https://www.officialgazette.gov.ph/2019/07/22/...,"Batasang Pambansa, Quezon City","Eighteenth Congress, First Session",Thank you. Kindly sit down. Kumusta po kayo...
81,Rodrigo Roa Duterte,"July 27, 2020",Fifth State of the Nation Address,https://www.officialgazette.gov.ph/2020/07/27/...,"Batasang Pambansa, Quezon City","Eighteenth Congress, Second Session",Kindly… Senate President Vicente Sotto III an...
82,Rodrigo Roa Duterte,"July 26, 2021",Sixth State of the Nation Address,https://www.officialgazette.gov.ph/2021/07/26/...,"Batasang Pambansa, Quezon City","Eighteenth Congress, Third Session",Kindly sit down. By far this is the most bea...


# Initial analysis

## regex

We are now ready to take an **initial analysis** of the texts that we have. For this part, I provided some examples below using **regex**.

An important note on this method: the **str.contains** and **str.extractall** functions **ONLY** count *the number of speeches* that contain the word, *not how many times* the word was mentioned in the speech. We would look into the count of the words on the speeches later at a deeper analysis.

Words we ran here are based from peer-reviewed textual studies that gauge **populism.**

### 'elite'

The word "elite" is found to have been often used by populist leaders. We find based on this initial analysis that in the case of Philippine presidents, three leaders (one of whom was **dictator** Ferdinand Marcos Sr.) were found to have included the word in their SONAs.

In [3]:
merged[merged.speech.str.contains(r"\belite", case=False, regex=True)].president.value_counts()

Ferdinand E. Marcos        2
Joseph Ejercito Estrada    1
Rodrigo Roa Duterte        1
Name: president, dtype: int64

In [4]:
pd.set_option('display.max_colwidth', None)
merged.speech.str.extractall(r'(.*\belite.+)', re.IGNORECASE)

Unnamed: 0_level_0,Unnamed: 1_level_0,0
Unnamed: 0_level_1,match,Unnamed: 2_level_1
31,0,"It is fortunate that the nation will, just two years from now, call a constitutional convention. I leave it to the delegates of that convention to evolve a truly democratic system, one which will not merely bend, as our system does today, to the wishes of a traditional elite and perpetuate the status quo. Democratic institutions must be instruments of national advancement. Democracy must symbolize change."
37,0,"Clearly, we face here the danger that our New Society is giving birth to a new government elite, who resurrect in our midst the privileges we fought in the past, who employ the powers of high office for their personal enrichment, as well as of their business colleagues, relatives, and friends."
60,0,"Our war on poverty is in the acceleration of the land redistribution processes under the agrarian reform program. We distributed more than 266,000 hectares of land to 175,000 landless farmers, including land owned by the traditional rural elite. []"
81,0,Great wealth enables economic elites and corporations to influence public policy to their advantage. Media is a powerful tool in the hands of oligarchs like the Lopezes who used their media outlets in their battles with political figures. I am a casualty of the Lopezes during the 2016 election.


### 'democracy' and 'demokrasya'

Dictator Ferdinand E. Marcos mentioned the word **"democracy"** in 10 of his SONAs followed by Gloria Arroyo (7 of 9 SONAs). In Filipino, Benigno Aquino III mentioned **"demokrasya"** in two of his six speeches. 



**Joseph Estrada**, whose term was cut short by a popular revolt in 2001, and **Rodrigo Duterte** mentioned the word in a single SONA. 

In [5]:
merged[merged.speech.str.contains(r"(.*\bdemocracy.+)", case=False, regex=True)].president.value_counts()

  merged[merged.speech.str.contains(r"(.*\bdemocracy.+)", case=False, regex=True)].president.value_counts()


Ferdinand E. Marcos        10
Gloria Macapagal-Arroyo     7
Manuel L. Quezon            5
Corazon C. Aquino           5
Fidel V. Ramos              5
Ramon Magsaysay             4
Diosdado Macapagal          4
Manuel Roxas                3
Elpidio Quirino             3
Carlos P. Garcia            2
Joseph Ejercito Estrada     1
Rodrigo Roa Duterte         1
Name: president, dtype: int64

In [6]:
merged[merged.speech.str.contains(r"(.*\bdemokrasya.+)", case=False, regex=True)].president.value_counts()

  merged[merged.speech.str.contains(r"(.*\bdemokrasya.+)", case=False, regex=True)].president.value_counts()


Benigno S. Aquino III      2
Ferdinand E. Marcos        1
Corazon C. Aquino          1
Gloria Macapagal-Arroyo    1
Name: president, dtype: int64

In [7]:
merged.speech.str.extractall(r'(.*\bdemocracy.+)', re.IGNORECASE).head(7)

Unnamed: 0_level_0,Unnamed: 1_level_0,0
Unnamed: 0_level_1,match,Unnamed: 2_level_1
1,0,"In our day and generation democracy, as an effective system of government, is being challenged. Let this new democracy of ours show to the world that democracy can be as efficient as a dictatorship, without trespassing upon individual liberty and the sacred rights of the people."
2,0,"Still more: The Filipino workingman has heard, if he is not able to read, of the equality before the law of the poor and the rich. He has heard of democracy, liberty, and justice, since every candidate for an elective office discourses on these topics, painting to him in glowing terms the meaning of these words."
2,1,"One of the discoveries which we have made since the establishment of the Government of the Commonwealth is that, despite the large number of children that have gone through our public schools, as shown in the reports of the Bureau of Education, the literacy of the Islands has not increased proportionally, and the knowledge of those rudimentary subjects which the citizen of a democracy should have, has not been acquired by a population corresponding to the number of children that appear to have entered the public schools. The reason for this is simple. A large proprtion of the boys and girls who have been admitted to the schools have not remained long enough to acquire any kind of useful knowledge."
2,2,"Gentlemen of the National Assembly, before closing, allow me to emphasize the need to of giving the common man in the Philippines the benefits that the citizenry of every progressive democracy is entitled to receive. I am sure that every one of you will give to this noble task the best that is in him. An opportunity has been offered us that no past or coming generation has had or will ever have –that of creating a nation where there will be no privileged class, where poverty will be unknown, where every citizen will be duly equipped with the knowledge that will enable him to perform his duties and to exercise his rights properly and conscientiously, and where every man, woman, and child his fireside will be thankful to God for living in this beautiful and blessed land."
3,0,"We are earnestly concerned with social justice. Without a strict application of social justice to all elements of the community, general satisfaction of the people with their government is impossible to achieve. Here, in the just and equitable solution of social problems, is the real test of the sufficiency of democracy to meet present-day conditions of society."
4,0,"As a final word respecting the Army, I want to urge you, once again, to give to all matters concerning our future security the earnest consideration their fundamental importance deserves. If eternal vigilance is the price of freedom, let us then be ceaselessly vigilant. Our defensive system requires no unusual sacrifice by any individual, but its success depends primarily and almost exclusively upon a unification of the efforts of all toward this common and vital purpose. To attain such unification in a democracy, the military plan must be supported by popular intelligence, confidence, and enthusiasm. It is a special function of Government to see that this confidence is fairly earned and assiduously sustained. To this end let us see to it that every law we pass and every military measure we adopt shall reflect an unselfish and national purpose, that it shall impose injustice on none, and that it shall promote the security and defend the peace, the possessions and the liberty of all."
4,1,"Gentlemen of the National Assembly, the world in which we live today is an entirely different world from that which we knew only a few years ago. Whereas before the World War, democracy was gaining ground everywhere, mankind is now divided into two great camps—those who believe in democracy and those who feel contempt for it as a completely discredited system of government. By our political education, by our convictions and by our inclinations, we are a democracy. We have established a democratic system of government and the perpetuation of this system will depend upon our ability to convince our people that democracy can be freed from those vices which have destroyed it in some countries, and that it can be made as efficient as any other system of government known to man. It behooves us; therefore, to prove that through a wise use of democratic processes, the welfare and the safety of the people can be promoted, thus contributing our share to the preservation of democracy in the world."


In [8]:
merged.speech.str.extractall(r'(.*\bdemokrasya.+)', re.IGNORECASE).head()

Unnamed: 0_level_0,Unnamed: 1_level_0,0
Unnamed: 0_level_1,match,Unnamed: 2_level_1
40,0,"Nasa harap ng kapulungang ito ngayon ang katipunan ng mga hamon at pagsubok sa nakalipas na mga Kongreso, at ito na sana ang pangwakas na pagsubok kung makakaya natin gamitin ang demokrasya bilang mabisang sangkap ng katatagan at kaunlarang pambansa. Bagaman at kailangan pa ring magpatuloy ang pansamantalang pamahalaan, taglay ng kapulungang ito ang binhi ng matatag at masiglang lehislaturang tutugon sa ating pangangailangan kung ihahandog natin dito ang lahat ng ating talino at kakayahan."
40,1,"Tayo ngayon ay isang bansang pinalakas ng mga pagsubok na ating pinagdaanan, higit na nagkakaisa pagkaraan ng mga sigalutang dinanas, at higit na handa sa anumang uri ng pagsubok at suliranin. Natapos nating lampasan ang mahihigpit na balakid sa nakaraang lima-at-kalahating taon. Sa liwanag ng makabuluhang yugtong ito ng ating buhay bilang bansa at lahi, magagawa natin ang ating tungkuling pagtahak sa landas ng katuparan ng ating matayog na pangarap na pag-unlad, pagkakapantay-pantay, at ng tunay na demokrasya."
51,0,Binigyang buhay ng mga Kabisig nating ito ang diwa ng ating Saligang Batas; binigyang halimbawa nila ang tunay na kahulugan ng demokrasya.
51,1,"May katiyakan ang ating tagumpay kung tayo’y magkakaisa. Kung kaya’t hinihimok ko kayo—kagalang-galang na mga Senador, Kongresista, at ang iba pang mga pinuno ng bayan—na muli tayong manumpa sa pangarap na nagbigkis sa atin noong 1986: ibalik at panatiliin ang demokrasya, kalayaan, karapatan, katatagang pangkabuhayan, at katarungang panlipunan."
65,0,"Pinapangako ko ang isang bagong direksyon: mamamayan muna. Ang taong bayan ang pinakamalaki nating yaman. Ngunit madalas, kaunti lang ang atensyon na binibigay sa kanilang pag-unlad. Di tuloy matawid ang agwat ng mayaman at mahirap. Di tuloy mapa-abot sa lahat ang biyaya ng demokrasya."


## Segregating by president

We create separate dataframes from a select number of presidents to analyze using text analysis.

In [3]:
#Post-martial law
cory = merged[(merged['president'] == 'Corazon C. Aquino')] #Cory Aquino
ramos = merged[(merged['president'] == 'Fidel V. Ramos')] #Fidel Ramos
aquino = merged[(merged['president'] == 'Benigno S. Aquino III')] #Aquino
duterte = merged[(merged['president'] == 'Rodrigo Roa Duterte')] #Duterte
erap = merged[(merged['president'] == 'Joseph Ejercito Estrada')] #Erap
arroyo = merged[(merged['president'] == 'Gloria Macapagal-Arroyo')] #Arroyo
marcosjr = merged[(merged['president'] == 'Ferdinand R. Marcos Jr.')] #Marcos Jr.

marcos = merged[(merged['president'] == 'Ferdinand E. Marcos')] #Marcos Sr.

# Pre-martial law
macapagal = merged[(merged['president'] == 'Diosdado Macapagal')] #Diosdado Macapagal
garcia = merged[(merged['president'] == 'Carlos P. Garcia')] #Carlos Garcia
magsaysay = merged[(merged['president'] == 'Ramon Magsaysay')] #Ramon Magsaysay
quirino = merged[(merged['president'] == 'Elpidio Quirino')] #Elpidio Quirino

## Isolate 'Erap' speeches

The merged file contains all speeches by Philippine presidents since 1935. 

In [4]:
erap = merged[(merged['president'] == 'Joseph Ejercito Estrada')] #Erap

## Text analysis

Now, we can proceed with the text analysis proper. First stop, we set the parameters in the immediate cell below, most importantly the stopwords we want our analysis to disregard.

In [5]:
def preprocess_text(text):
    text = text.lower()
    text = re.sub(r'\d+', '', text)
    return text #removes all numbers

In [6]:
y_columns = ['president', 'speeches']
BINARY=False
NGRAM_RANGE=(1,1)
MIN_DF=0
STPWORDS=stopwords.stopwords(["en", 'tl']) #removes Tagalog stopwords
STPWORDS.update(['yung', 'iyan', 'yan', 'diyan', 'applause', 'laughter', 'palakpakan', 'rin', 'din', 'po',
                'pong', 'pang', 'pa', 'nang', 'ng', 'pag',
                'kapag', 'nga', 'naman', 'natin', 'kayo',
                'nating', 'natin', 'tayong', 'lang']) #adds more Tagalog stopwords not included in the package 

vectorizer = CountVectorizer(
    stop_words=STPWORDS,
    ngram_range=NGRAM_RANGE,
    binary=BINARY,
    min_df=MIN_DF,
    preprocessor=preprocess_text
)

## Vectorizing

Simple counting of words that occur in a speech.

In [7]:
X = vectorizer.fit_transform(erap['speech'])
X



<3x3280 sparse matrix of type '<class 'numpy.int64'>'
	with 4173 stored elements in Compressed Sparse Row format>

In [8]:
erap_vectors = pd.DataFrame(X.toarray(), columns=vectorizer.get_feature_names_out())
# [print(x) for x in marcosjr.speech]
erap_vectors.round(2)

Unnamed: 0,aangat,abdication,ability,abolish,abolished,abraham,abril,abruptly,absolute,abstention,...,yaman,yata,yield,yo,yon,youth,yugto,yumaman,zone,zooming
0,0,0,0,1,0,0,0,0,0,0,...,1,0,0,0,0,0,1,1,0,0
1,1,0,0,0,1,1,2,1,0,0,...,0,1,0,2,0,1,0,0,0,1
2,0,1,2,0,0,1,0,0,1,1,...,0,0,1,0,1,0,0,0,2,0


In [9]:
erap_vectors = erap_vectors.transpose() #swapping columns and row positions

In [12]:
erap_vectors.columns = ['SONA1', 'SONA2', 'SONA3']
erap_vectors.sort_values('SONA3', ascending=False).head(20)

Unnamed: 0,SONA1,SONA2,SONA3
government,18,21,40
peace,3,5,27
mindanao,1,3,26
percent,17,2,18
country,4,15,15
economy,2,5,15
philippines,1,10,14
war,0,15,13
congress,5,5,13
development,1,8,12


## Add a 'total' mention column

Totally optional, just in case you wanted to find the total number of mentions.

In [17]:
erap_vectors['total'] = erap_vectors.SONA1 + erap_vectors.SONA2 + erap_vectors.SONA3 


In [18]:
erap_vectors = erap_vectors.sort_values('total', ascending=False)
erap_vectors.head(15)

Unnamed: 0,SONA1,SONA2,SONA3,total
government,18,21,40,79
percent,17,2,18,37
peace,3,5,27,35
country,4,15,15,34
poverty,0,27,4,31
mindanao,1,3,26,30
war,0,15,13,28
philippines,2,11,15,28
congress,6,5,14,25
time,4,16,5,25


# TF-IDF

## Erap speeches

In [19]:
vectorizer = TfidfVectorizer(
    stop_words=STPWORDS, 
    ngram_range=NGRAM_RANGE,
    binary=BINARY,
    min_df=MIN_DF,
    preprocessor=preprocess_text
)
X = vectorizer.fit_transform(erap['speech'])
erap_idf = pd.DataFrame(X.toarray(), columns=vectorizer.get_feature_names_out())
#[print(x) for x in speeches.sentence]
erap_idf.round(2)



Unnamed: 0,aangat,abdication,ability,abolish,abolished,abraham,abril,abruptly,absolute,abstention,...,yield,yo,yon,youth,youtube,yugto,yumaman,zone,zooming,zyrjgxpyo
0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.04,0.02,0.02,0.0,0.0,0.02
1,0.01,0.0,0.0,0.0,0.01,0.01,0.03,0.01,0.0,0.0,...,0.0,0.03,0.0,0.01,0.03,0.0,0.0,0.0,0.01,0.0
2,0.0,0.01,0.03,0.0,0.0,0.01,0.0,0.0,0.01,0.01,...,0.01,0.0,0.01,0.0,0.02,0.0,0.0,0.03,0.0,0.0


In [20]:
erap_idf2 = erap_idf.transpose()
erap_idf2.columns = ['SONA1', 'SONA2', 'SONA3']

In [21]:
erap_idf2.sort_values('SONA1', ascending=False).head(15)

Unnamed: 0,SONA1,SONA2,SONA3
government,0.210256,0.181823,0.318928
percent,0.198575,0.017316,0.143517
national,0.151851,0.043291,0.039866
pilipino,0.138442,0.0,0.0
wala,0.138442,0.0,0.0
upang,0.12849,0.017316,0.007973
galang,0.118665,0.0,0.0
kagalang,0.118665,0.0,0.0
lalo,0.105289,0.011149,0.0
budget,0.105128,0.069266,0.015946


## Looking for specific words

In this part, we are looking for specific words that we think made a mark during Aquino SONAs, whether because they are often mentioned, or because it is unusual for the Chief Executive to say it. 

We also include here words that we think were said because they were the topic at hand at the time the speech was delivered.

In [22]:
# erap_slice = erap_idf[['mahirap', 'government']] # you can change this
# erap_slice.sort_index().round(decimals=2)

In [23]:
# erap_slice = erap_slice.stack().reset_index()
# erap_slice = erap_slice.rename(columns={'level_0': 'sona_no','level_1': 'term', 'tfidf': 'term', 0: 'tfidf'})
# erap_slice.head()

In [24]:
# top_tfidf = erap_slice.sort_values(by=['sona_no','tfidf'], ascending=[True,False]).groupby(['sona_no']).head(10)
# top_tfidf.head()

## Chart it

In [25]:
# # # Terms in this list will get a red dot in the visualization
# term_list = ['boss', 'wangwang'] # you can change this

# # adding a little randomness to break ties in term ranking
# top_tfidf_plusRand = top_tfidf.copy()
# top_tfidf_plusRand['tfidf'] = top_tfidf_plusRand['tfidf'] + np.random.rand(top_tfidf.shape[0])*0.0001

# # base for all visualizations, with rank calculation
# base = alt.Chart(top_tfidf_plusRand).encode(
#     x = 'rank:O',
#     y = 'sona_no:N'
# ).transform_window(
#     rank = "rank()",
#     sort = [alt.SortField("tfidf", order="descending")],
#     groupby = ["sona_no"],
# )

# # heatmap specification
# heatmap = base.mark_rect().encode(
#     color = 'tfidf:Q'
# )

# # red circle over terms in above list
# circle = base.mark_circle(size=100).encode(
#     color = alt.condition(
#         alt.FieldOneOfPredicate(field='term', oneOf=term_list),
#         alt.value('red'),
#         alt.value('#FFFFFF00')        
#     )
# )

# # text labels, white for darker heatmap colors
# text = base.mark_text(baseline='middle').encode(
#     text = 'term:N',
#     color = alt.condition(alt.datum.tfidf >= 0.23, alt.value('white'), alt.value('black'))
# )

# # display the three superimposed visualizations
# (heatmap + circle + text).properties(width = 600, height=400)

## Entire SONAs

In here, we do the same thing for all of SONA *without* isolating key words.

In [26]:
erap_idf = erap_idf.stack().reset_index()
erap_idf

Unnamed: 0,level_0,level_1,0
0,0,aangat,0.000000
1,0,abdication,0.000000
2,0,ability,0.000000
3,0,abolish,0.019777
4,0,abolished,0.000000
...,...,...,...
9883,2,yugto,0.000000
9884,2,yumaman,0.000000
9885,2,zone,0.027000
9886,2,zooming,0.000000


In [27]:
erap_idf = erap_idf.rename(columns={'level_0': 'sona_no','level_1': 'term', 0: 'tfidf'})
erap_idf

Unnamed: 0,sona_no,term,tfidf
0,0,aangat,0.000000
1,0,abdication,0.000000
2,0,ability,0.000000
3,0,abolish,0.019777
4,0,abolished,0.000000
...,...,...,...
9883,2,yugto,0.000000
9884,2,yumaman,0.000000
9885,2,zone,0.027000
9886,2,zooming,0.000000


In [28]:
all_erap = erap_idf.sort_values(by=['sona_no','tfidf'], ascending=[True,False]).groupby(['sona_no']).head(10)
all_erap.head()

Unnamed: 0,sona_no,term,tfidf
1120,0,government,0.210256
2219,0,percent,0.198575
1976,0,national,0.151851
2244,0,pilipino,0.138442
3233,0,wala,0.138442


In [29]:
# # Terms in this list will get a red dot in the visualization
term_list = ['boss', 'wangwang']

# adding a little randomness to break ties in term ranking
all_erap_plusRand = all_erap.copy()
all_erap_plusRand['tfidf'] = all_erap_plusRand['tfidf'] + np.random.rand(all_erap.shape[0])*0.0001

# base for all visualizations, with rank calculation
base = alt.Chart(all_erap_plusRand).encode(
    x = 'rank:O',
    y = 'sona_no:N'
).transform_window(
    rank = "rank()",
    sort = [alt.SortField("tfidf", order="descending")],
    groupby = ["sona_no"],
)

# heatmap specification
heatmap = base.mark_rect().encode(
    color = 'tfidf:Q'
)

# red circle over terms in above list
circle = base.mark_circle(size=100).encode(
    color = alt.condition(
        alt.FieldOneOfPredicate(field='term', oneOf=term_list),
        alt.value('red'),
        alt.value('#FFFFFF00')        
    )
)

# text labels, white for darker heatmap colors
text = base.mark_text(baseline='middle').encode(
    text = 'term:N',
    color = alt.condition(alt.datum.tfidf >= 0.23, alt.value('white'), alt.value('black'))
)

# display the three superimposed visualizations
(heatmap + circle + text).properties(width = 600, height=400)