# Introduction

In [1]:
# Warnings
import warnings
warnings.filterwarnings('ignore')

# BEGIN: fix Python or Notebook SSL CERTIFICATE_VERIFY_FAILED
import os, ssl
if (not os.environ.get('PYTHONHTTPSVERIFY', '') and getattr(ssl, '_create_unverified_context', None)):
    ssl._create_default_https_context = ssl._create_unverified_context
# END: fix Python or Notebook SSL CERTIFICATE_VERIFY_FAILED

## Installing pre-requsite libraries
* https://pypi.org/project/bert-extractive-summarizer/

In [2]:
!pip -q install sumy transformers sentencepiece



### Import libraries

In [3]:
from sumy.nlp.tokenizers import Tokenizer
from sumy.parsers.plaintext import PlaintextParser
from sumy.summarizers.lex_rank import LexRankSummarizer

In [4]:
content = [
    "Text_Summarize_Text/Technology_and_Engineering.txt", #  Introduction
    "Text_Summarize_Text/Enough_With_the_Trolley_Problem.txt",  # Introduction to Ethics
    "Text_Summarize_Text/How_Algorithms_Can_Learn_to_Discredit.txt", # Disinformation
    "Text_Summarize_Text/A_Framework_for_Understanding_Sources of_Harm_throughout_the_Machine_Learning_LifeCycle.txt", # Bias/Fairness
    "Text_Summarize_Text/Your_apps_know_where_you_were_last_night.txt",  # Privacy
    "Text_Summarize_Text/The_fundamental_problem_with_Silicon_Valleys_favorite_growth_strategy.txt", # Technological Colonialism
    "Text_Summarize_Text/Computing_in_the_Image_of_God.txt" # Wrapping up
]

output_sentences_count = 10

with open(content[5], "r", encoding="utf-8") as f: # open(r'C:\Users\...site_1.html', "r") as f:
    article = f.read()  
    
# article

In [5]:
my_parser = PlaintextParser.from_string(article, Tokenizer('english'))

# Creating a summary of 3 sentences.
lex_rank_summarizer = LexRankSummarizer()
lexrank_summary = lex_rank_summarizer(my_parser.document, sentences_count = output_sentences_count)

# Printing the summary
for sentence in lexrank_summary:
  print(sentence)

Would these companies now be profitable instead of hemorrhaging billions of dollars a year?
So we set out to democratize the web, with the premise that everyone who had a browser should also be able to have a server.
Even though Uber had far more money, the price war between the two companies cost Uber far more in markets where its share was large and Lyft’s was small.
While Hoffman and Yeh’s book claims that companies like Google, Facebook, Microsoft, Apple, and Amazon are icons of the blitzscaling approach, this idea is plausible only with quite a bit of revisionist history.
But is a business really a business if it can’t pay its own way?
They can build a business that they love, like I did, and continue to operate it for many years.
In our earlier funds, though, we were trying to shoehorn these companies into a traditional venture model when what we really needed was a new approach to financing.
While Google has continued to grow the business for its partners, the company has grown 

## LSA (Latent semantic analysis)

In [6]:
from sumy.summarizers.lsa import LsaSummarizer

# creating the summarizer
lsa_summarizer = LsaSummarizer()
lsa_summary = lsa_summarizer(my_parser.document, sentences_count = output_sentences_count)

# Printing the summary
for sentence in lsa_summary:
    print(sentence)

They also persuaded most customers to make those payments via direct bank transfers, which have much lower fees than credit cards.
Too often, though, those passing new laws have given insufficient thought to their implementation, leaving existing bureaucratic processes in place.
So we’re asking ourselves, is it enough to show what’s possible and hope that other do the right thing?
And even though we abandoned these two opportunities when the blitzscalers arrived, O’Reilly Media has had enormous financial success, becoming a leader in each of our chosen markets.
To too many investors and entrepreneurs, it means building companies that achieve massive financial exits, either by being sold or going public.
Funding dries up, and companies that could have built a sustainable niche if they’d grown organically go out of business instead.
Peter Rahal, one of the two founders, recalls that he and co-founder Jared Smith were in his parents’ kitchen, discussing how they would go about raising cap

## Luhn Summarization algorithm’s approach is based on TF-IDF (Term Frequency-Inverse Document Frequency). 

In [7]:
from sumy.summarizers.luhn import LuhnSummarizer

#  Creating the summarizer
luhn_summarizer = LuhnSummarizer()
luhn_summary = luhn_summarizer(my_parser.document, sentences_count = output_sentences_count)

# Printing the summary
for sentence in luhn_summary:
  print(sentence)

The internet is awash in billionaires who made their fortune by following a strategy summed up in Mark Zuckerberg’s advice to “move fast and break things.” Hoffman and Yeh invoke the storied successes of Apple, Microsoft, Amazon, Google, and Facebook, all of whom have dominated their respective markets and made their founders rich in the process, and suggest that it is blitzscaling that got them there.
If they’d waited to figure out the business model, someone else might have beat them to the customer that made them a success: eBay, which went on to buy Paypal for $1.5 billion (which everyone thought was a lot of money in those days), launching Thiel and Hoffman on their storied careers as serial entrepreneurs and investors.
However, Hoffman and Yeh do a good job of explaining the conditions in which blitzscaling makes sense: The market has to be really big; there has to be a sustainable competitive advantage (e.g., network effects) from getting bigger faster than the competition; you 

## extractive method is the KL-Sum algorithm

In [8]:
from sumy.summarizers.kl import KLSummarizer
kl_summarizer = KLSummarizer()
kl_summary = kl_summarizer(my_parser.document, sentences_count = output_sentences_count)

# Printing the summary
for sentence in kl_summary:
    print(sentence)

Would these companies now be profitable instead of hemorrhaging billions of dollars a year?
We see the consequences of this selection bias in the history of the on-demand ride-hailing market.
In the real world, though, while Sunil Paul’s company went out of business, it was Travis Kalanick of Uber who got fired.
Meanwhile, the forced bloom of Uber’s market share lead became a liability even in the US.
Even Amazon, which lost billions before achieving profitability, raised only $108 million in venture capital before its IPO.
Hypergrowth was the result rather than the cause of these companies’ success.
But is a business really a business if it can’t pay its own way?
Is it a business or a financial instrument?
In the long run it’s a weighing machine.” That is, in the short term, investors vote (or more accurately, place bets) on the present value of the future earnings of a company.
What is the responsibility of the winners?


## Summarization with T5 Transformers

In [9]:
from transformers import T5Tokenizer, T5Config, T5ForConditionalGeneration

my_model = T5ForConditionalGeneration.from_pretrained('t5-small')
tokenizer = T5Tokenizer.from_pretrained('t5-small')

input_ids = tokenizer.encode(article, return_tensors='pt', max_length=750, truncation=False)
summary_ids = my_model.generate(input_ids)

t5_summary = tokenizer.decode(summary_ids[0])
print(t5_summary)

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


<pad> <extra_id_0>,<extra_id_1>,<extra_id_2>,<extra_id_3>,<extra_id_4>,<extra_id_5>, and a market that grew


# GPT-2 Transformers

In [10]:
# Importing model and tokenizer
from transformers import GPT2Tokenizer,GPT2LMHeadModel

# Instantiating the model and tokenizer with gpt-2
tokenizer=GPT2Tokenizer.from_pretrained('gpt2')
model=GPT2LMHeadModel.from_pretrained('gpt2')

# Encoding text to get input ids & pass them to model.generate()
inputs=tokenizer.batch_encode_plus([article], return_tensors='pt', max_length=750, truncation=False)
summary_ids=model.generate(inputs['input_ids'], early_stopping=True)

GPT_summary=tokenizer.decode(summary_ids[0], skip_special_tokens=True)
print(GPT_summary)

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Input length of input_ids is 750, but ``max_length`` is set to 20.This can lead to unexpected behavior. You should consider increasing ``config.max_length`` or ``max_length``.


Look no further than the race between Lyft and Uber to dominate the online ride-hailing market. Both companies are gearing up for their IPOs in the next few months. Street talk has Lyft shooting for a valuation between $15 and $30 billion dollars, and Uber valued at an astonishing $120 billion dollars. Neither company is profitable; their enormous valuations are based on the premise that if a company grows big enough and fast enough, profits will eventually follow.

Most monopolies or duopolies develop over time, and have been considered dangerous to competitive markets; now they are sought after from the start and are the holy grail for investors. If LinkedIn co-founder Reid Hoffman and entrepreneur Chris Yeh’s new book Blitzscaling is to be believed, the Uber-style race to the top (or the bottom, depending on your point of view) is the secret of success for today’s technology businesses.

Blitzscaling promises to teach techniques that are “the lightning fast path to building massivel

## Huggingface sentence transformers

https://huggingface.co/sentence-transformers/paraphrase-multilingual-mpnet-base-v2

In [11]:
pip -q install -U sentence-transformers

Note: you may need to restart the kernel to use updated packages.


In [12]:
from sentence_transformers import SentenceTransformer

model = SentenceTransformer('sentence-transformers/paraphrase-multilingual-mpnet-base-v2')
embeddings = model.encode(article)
embeddings.unique_consecutive

AttributeError: 'numpy.ndarray' object has no attribute 'unique_consecutive'

In [None]:
embeddings

# XLM Transformers

In [None]:
# Importing model and tokenizer
from transformers import XLMWithLMHeadModel, XLMTokenizer

# Instantiating the model and tokenizer 
tokenizer = XLMTokenizer.from_pretrained('xlm-mlm-en-2048')
model = XLMWithLMHeadModel.from_pretrained('xlm-mlm-en-2048')

# Encoding text to get input ids & pass them to model.generate()
inputs = tokenizer.batch_encode_plus([article], return_tensors='pt')
summary_ids = model.generate(inputs['input_ids'], early_stopping=False)

# Decode and print the summary
XLM_summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
print(XLM_summary)