# Text Summarization and Natural Language Generation Assignment

In [1]:
# !pip install markovify

import re
import markovify
from nltk import pos_tag
from nltk import sent_tokenize
from gensim.summarization import summarize
import nltk

import requests
from bs4 import BeautifulSoup

### Scrape and clean the text from the 3 Presidential State of the Union Address URLs below and save them into a list.

In [2]:
lincoln = 'https://en.wikisource.org/wiki/Abraham_Lincoln%27s_First_State_of_the_Union_Address'
roosevelt = 'https://en.wikisource.org/wiki/Theodore_Roosevelt%27s_First_State_of_the_Union_Address'
obama = 'https://en.wikisource.org/wiki/Barack_Obama%27s_Second_State_of_the_Union_Address'

In [3]:
response = requests.get(lincoln)
content = response.text

soup = BeautifulSoup(content, 'lxml')
article = soup.find_all('div', class_='mw-parser-output') 
text = [tag.get_text() for tag in article]
lincoln_text = text[0].replace('\n', '')

In [4]:
response = requests.get(roosevelt)
content = response.text

soup = BeautifulSoup(content, 'lxml')
article = soup.find_all('div', class_='mw-parser-output') 
text = [tag.get_text() for tag in article]
roosevelt_text = text[0].replace('\n', '')

In [5]:
response = requests.get(obama)
content = response.text

soup = BeautifulSoup(content, 'lxml')
article = soup.find_all('div', class_='mw-parser-output') 
text = [tag.get_text() for tag in article]
obama_text = text[0].replace('\n', '')

### For each State of the Union Address, use the Gensim `summarize` function and print a summary of each address approximately 200 words long.

In [6]:
print(summarize(lincoln_text, word_count=200))

I am informed by some whose opinions I respect that all the acts of Congress now in force and of a permanent and general nature might be revised and rewritten so as to be embraced in one volume (or at most two volumes) of ordinary and convenient size; and I respectfully recommend to Congress to consider of the subject, and if my suggestion be approved to devise such plan as to their wisdom shall seem most proper for the attainment of the end proposed.
But the powers of Congress, I suppose, are equal to the anomalous occasion, and therefore I refer the whole matter to Congress, with the hope that a plan may be devised for the administration of justice in all such parts of the insurgent States and Territories as may be under the control of this Government, whether by a voluntary return to allegiance and order or by the power of our arms; this, however, not to be a permanent institution, but a temporary substitute, and to cease as soon as the ordinay courts can be reestablished in peace.


In [7]:
print(summarize(roosevelt_text, word_count=200))

Just how far this is must be determined according to the individual case, remembering always that every application of our tariff policy to meet our shifting national needs must be conditioned upon the cardinal fact that the duties must never be reduced below the point that will cover the difference between the labor cost here and abroad.
The Congressmen who voted years in advance the money to lay down the ships, to build the guns, to buy the armor-plate; the Department officials and the business men and wage-workers who furnished what the Congress had authorized; the Secretaries of the Navy who asked for and expended the appropriations; and finally the officers who, in fair weather and foul, on actual sea service, trained and disciplined the crews of the ships when there was no war in sight—all are entitled to a full share in the glory of Manila and Santiago, and the respect accorded by every true American to those who wrought such signal triumph for our country.


In [8]:
print(summarize(obama_text, word_count=200))

(Applause.)Now, as we stabilized the financial system, we also took steps to get our economy growing again, save as many jobs as possible, and help Americans who had become unemployed.
(Applause.)  And to encourage these and other businesses to stay within our borders, it is time to finally slash the tax breaks for companies that ship our jobs overseas, and give those tax breaks to companies that create jobs right here in the United States of America.
You can see the results of last year's investments in clean energy -– in the North Carolina company that will create 1,200 jobs nationwide helping to make advanced batteries; or in the California business that will put a thousand people to work making solar panels.But to create more of these clean energy jobs, we need more production, more efficiency, more incentives.
(Applause.)  So tonight, we set a new goal:  We will double our exports over the next five years, an increase that will support two million jobs in America.
(Applause.)  Thi

### Sentence tokenize each address and save the tokenized sentences to a separate list.

In [9]:
lincoln_sents = sent_tokenize(lincoln_text)
roosevelt_sents = sent_tokenize(roosevelt_text)
obama_sents = sent_tokenize(obama_text)

### Train a Markov chain model for each tokenized address and generate 5 sentences based on the language used for each one.

In [10]:
model = markovify.Text(lincoln_sents, state_size=4)
for i in range(5):  
    print(model.make_short_sentence(max_chars=200,
                                    min_chars=30,
                                    tries=100), '\n')    

If useful, no State should be denied them; if not useful, no State should be denied them; if not useful, no State should have them. 

It is gratifying to know that the patriotism of the people has placed at the disposal of the Government the whole of their limited acquisitions. 

If useful, no State should be denied them; if not useful, no State should be denied them; if not useful, no State should have them. 

Since, however, it is apparent that the attention of Congress to our great lakes and rivers. 

I respectfully recommend to the consideration of Congress the interests of the District of Columbia. 



In [11]:
model = markovify.Text(roosevelt_sents, state_size=4)
for i in range(5):
    print(model.make_short_sentence(max_chars=500,
                                    min_chars=30,
                                    tries=500), '\n')    

To be permanently effective, aid must always take the form of the acquisition of territory by any non-American power. 

In all industries carried on directly or indirectly for the United States to assist in this work. 

Already the largest single collection of books on the Western Hemisphere, and give them an increasing sense of unity. 

Their ability to purchase our products should as far as possible be repaid by the land reclaimed. 

The National Government should be to aid irrigation in the several States and Territories where they may add materially to our resources. 



In [12]:
model = markovify.Text(obama_sents, state_size=4)
for i in range(5):
    print(model.make_short_sentence(max_chars=200,
                                    min_chars=30,
                                    tries=100), '\n')    

And according to the Congressional Budget Office -– the independent organization that both parties have fed divisions that are deeply entrenched. 

You can see the results of last year's investments in clean energy because they want those jobs. 

And according to the Congressional Budget Office -– the independent organization that both parties have fed divisions that are deeply entrenched. 

The steps we took last year to shore up the same banks that helped cause this crisis. 

And according to the Congressional Budget Office -– the independent organization that both parties have fed divisions that are deeply entrenched. 



### Add part of speech tags to the Markov chain model and regenerate 5 sentences for each address.

In [13]:
class POSifiedText(markovify.Text):
    def word_split(self, sentence):
        words = re.split(self.word_split_pattern, sentence)
        words = [ "::".join(tag) for tag in nltk.pos_tag(words) ]
        return words

    def word_join(self, words):
        sentence = " ".join(word.split("::")[0] for word in words)
        return sentence  

In [14]:
model = POSifiedText(lincoln_sents, state_size=4)

for i in range(5):
    print(model.make_short_sentence(max_chars=200,
                                    min_chars=30,
                                    tries=100), '\n')    

I respectfully recommend to the consideration of Congress the interests of the District of Columbia. 

I respectfully recommend to the consideration of Congress the interests of the District of Columbia. 

If useful, no State should be denied them; if not useful, no State should be denied them; if not useful, no State should have them. 

In the exercise of my best discretion I have adhered to the act of Congress to confiscate property used for insurrectionary purposes. 

I respectfully recommend to the consideration of Congress the interests of the District of Columbia. 



In [15]:
model = POSifiedText(roosevelt_sents, state_size=4)

for i in range(5):
    print(model.make_short_sentence(max_chars=200,
                                    min_chars=30,
                                    tries=100), '\n')    

There should be a continuous reduction in the number of very large individual, and especially of very large corporate, fortunes. 

It is just that the great agricultural population should share in the improvement of the Shanghai River and the control of its navigation. 

The administration of these islands should be as wholly free from the bitter animosities incident to public life. 

There should be a continuous reduction in the number of very large individual, and especially of very large corporate, fortunes. 

The conditions of modern war are such as to offer great prizes as the rewards of success. 



In [16]:
model = POSifiedText(obama_sents, state_size=4)

for i in range(5):
    print(model.make_short_sentence(max_chars=200,
                                    min_chars=30,
                                    tries=100), '\n')    

There are projects like that all across this country have faced this year. 

You can see the results of last year's investments in clean energy because they want those jobs. 

The steps we took last year to shore up the same banks that helped cause this crisis. 

You can see the results of last year's investments in clean energy because they want those jobs. 

The steps we took last year to shore up the same banks that helped cause this crisis. 

