# Generate Summaries from UnpackAI's medium articles

Run all cells, replace the article variable with content article

In [1]:
from transformers import BartTokenizerFast, BartForConditionalGeneration
import re

In [2]:
model_name = "facebook/bart-large-cnn"
device = "cuda:0" # use cpu if you don't have GPU but it'll be SLOW
model = BartForConditionalGeneration.from_pretrained(model_name).to(device)
tokenizer = BartTokenizerFast.from_pretrained(model_name)

In [3]:
def summarize_text(ipt, max_length=200):
    def capitalize(match):
        return match.group(1) + " " + match.group(2).upper()
    def format_summary(summary_text):
        summary_text = re.sub('\s\.','.', summary_text) # remove spaces before periods (.)
        summary_text = re.sub('\s+',' ', summary_text) # remove too many spaces
        summary_text = re.sub('([.!?"]) (\w)',capitalize, summary_text) # capitalize after some punctuations
        summary_text = summary_text.strip()
        summary_text = summary_text[0].capitalize() + summary_text[1:]
        return summary_text
    
    # https://huggingface.co/transformers/main_classes/model.html#transformers.generation_utils.GenerationMixin.generate
    outputs = model.generate(**ipt, 
                             max_length=max_length,
                             num_beams=10, 
                             temperature=1.0,
                             length_penalty=1.2,
                            )
    summary_best = "BEST: \n" + format_summary(tokenizer.decode(outputs[0], skip_special_tokens=True))
    
    outputs = model.generate(**ipt, 
                             max_length=max_length,
                             no_repeat_ngram_size=5,
                             length_penalty=1.2,
                             do_sample=True,
                             top_k=50,
                             top_p=0.85,
                             num_return_sequences=3
                            )
    
    summaries = tokenizer.batch_decode(outputs, skip_special_tokens=True)
    summaries = list(map(format_summary, summaries))
    summaries.insert(0, summary_best)
    
    return summaries

In [4]:
def summarize_article(article):
    article = re.sub("\s+"," ",article) 
    ipt = tokenizer(article,
                   truncation=True, # Truncate the rest
                   return_tensors="pt").to(device)
    
    summaries = summarize_text(ipt)
    [print(summary, "-"*89, sep="\n") for summary in summaries]

In [5]:
article = """This post will take a tour of the different machine learning techniques, but before going more in-depth, it’s always good to define machine learning for newbies.

    According to Arthur Samuel, the term Machine learning means the ability of computers to learn on their own.

    Of course, it’s crazy how a computer can learn by itself?

In other words, Machine learning is just a branch of artificial intelligence (AI) that allows computers to “learn” from some experience and develop on their own without having to be directly programmed.

    Learning for humans means finding knowledge either by going to school and being trained by a teacher, either by self-training.

The computer also uses the same way to learn; it must find data that need to be trained by someone(supervised) or can train itself(unsupervised).

Machine learning algorithms are often classified as supervised or unsupervised learning.
    Supervised learning means to be trained by a teacher; in other words, the computer program needs to learn from the data and its target.
    Unsupervised learning means the computer doesn’t need to be trained by someone (Self-training). In this case, the computer program learns from data without a target.
    Reinforcement Learning means the computer will learn from the mistakes of different trials. It’s like when a child doesn’t go to school and doesn’t learn independently but keeps taking the exam. Of course, there is 100% that he will continue to fail unless he is a genius or a gifted one. However, he can learn from the different exams he has taken and maybe finally pass the exam.

    Let’s have a look at Machine learning techniques.

Supervised Learning
As we have seen previously, supervised needs data and its target; therefore, based on the specific target, two types of supervised learning exist Regression and Classification.

Regression is used when we want to predict a continuous outcome, while classification is used when the target is a class; below are some most used techniques or algorithms.

·Support-vector machines.

· Linear regression.

· Logistic regression.

· Random Forest

· Naive Bayes.

· Linear discriminant analysis.

· Decision trees.

· K-nearest neighbor algorithm.

Ensemble approaches are models made up of several weak models that are individually trained and whose predictions are merged to produce the overall forecast. Much work is being made to combine the forms of vulnerable learners and the contexts they can combine. This is an advantageous technique and is very common as such.

    Boosting
    Bootstrapped Aggregation (Bagging)
    AdaBoost
    Weighted Average (Blending)
    Stacked Generalization (Stacking)
    Gradient Boosting Machines (GBM)
    Gradient Boosted Regression Trees (GBRT)
    Random Forest

Real-life application of classification:

Malware classification (malware or not malware)

Email classification (spam or not spam)

Customer behavior classification (Good, Excellent or Bad Customer)

Classification image (dog or cat)

Real-life application of regression:

Weather prediction

Stock price prediction

Blood pressure prediction
Unsupervised Learning

We don’t have any outcome variables to forecast in this methodology. The machine is trained with unmarked results. Unsupervised methods tend to discover hidden mechanisms, such as finding clusters of images, but they are a little difficult to execute and not as commonly used as supervised learning.

Example of Unsupervised learning

· k-Means

· k-Medians

· Expectation Maximisation (EM)

· Hierarchical Clustering

Real-life application of unsupervised learning:

Anomaly detection

Biology gene grouping

Grouping
Other machine learning techniques

Dimension reduction: This method helps to reduce the data into a small dimension.

    Principal Component Analysis (PCA)
    Principal Component Regression (PCR)
    Partial Least Squares Regression (PLSR)
    Sammon Mapping
    Multidimensional Scaling (MDS)
    Projection Pursuit
    Linear Discriminant Analysis (LDA)
    Mixture Discriminant Analysis (MDA)
    Quadratic Discriminant Analysis (QDA)
    Flexible Discriminant Analysis (FDA)"""

In [6]:
summarize_article(article)

BEST: 
Machine learning is a branch of artificial intelligence (AI) that allows computers to “learn” from some experience and develop on their own. Machine learning algorithms are often classified as supervised or unsupervised. Unsupervised learning means the computer doesn’t need to be trained by someone (Self-training)
-----------------------------------------------------------------------------------------
Machine learning is a branch of artificial intelligence (AI) that allows computers to “learn” from some experience and develop on their own. Machine learning algorithms are often classified as supervised or unsupervised. Unsupervised learning means the computer doesn’t need to be trained by someone (Self-training). In this case, the computer program learns from data without a target.
-----------------------------------------------------------------------------------------
Machine learning is a branch of artificial intelligence (AI) that allows computers to “learn” from some experi