In [2]:
import numpy as np
import nltk
import re
from collections import Counter

In [3]:
# This removes the numbers and punctuation from the article and keeps just the words. 
def cleaning(sentence):
    return re.sub(r'[^a-z]', ' ', re.sub("’", '', sentence))
# Converts the sentences into tokens
def tokenization(sentence):
    return sentence.split()

In [4]:
# This the turns the whole story into a collection of sentences
def sentence_split(paragraph):
    return nltk.sent_tokenize(paragraph)

In [12]:
# This is to count all the words in the article and is used for the sentence ranking.
wordfreq = Counter(news.lower().split())

# This function is used to weigh up the importance of the sentences in order to work out which sentences are considered
# to be the best to summarize the whole story. 
def sentence_weight(data):
    weights = []
    for words in data:
        temp = 0
        for word in words:
            temp += wordfreq[word]
        weights.append(temp)
    return weights

In [7]:
news = """
The UK’s most powerful supercomputer, which its creators hope will make the process of preventing, diagnosing and treating disease better, faster and cheaper, is operational.

Christened Cambridge-1, the supercomputer represents a $100m investment by US-based computing company Nvidia. The idea capitalises on artificial intelligence (AI) – which combines big data with computer science to facilitate problem-solving – in healthcare.

“If you could imagine ganging up 10 refrigerators in a row and then having several rows of those refrigerators – that is the size and shape of this computer,” said Kimberly Powell, vice-president of healthcare at Nvidia.

The UK has already made strides with massive datasets such as the UK Biobank, which encompasses anonymised of medical and lifestyle records from half a million middle-aged Britons.

AI for healthcare is booming in the UK, with a range of startups and larger pharmaceutical companies mining the vast quantities of data available to discover potential drugs, pinpoint why some people are susceptible to certain diseases, and improve and personalise patient care. 
Cambridge-1’s first projects will be with AstraZeneca, GSK, Guy’s and St Thomas’ NHS foundation trust, King’s College London and Oxford Nanopore. They will seek to develop a deeper understanding of diseases such as dementia, design new drugs, and improve the accuracy of finding disease-causing variations in human genomes.

A key way the supercomputer can help, said Dr Kim Branson, global head of artificial intelligence and machine learning at GSK, is in patient care.

In the field of immuno-oncology, for instance, existing medicines harness the patient’s own immune system to fight cancer. But it isn’t always apparent which patients will gain the most benefit from these drugs – some of that information is hidden in the imaging of the tumours and in numerical clues found in blood. Cambridge-1 can be key to helping fuse these different datasets, and building large models to help determine the best course of treatment for patients, Branson said.

Traditional drug development is expensive, protracted and comes with a low chance of success. The idea that machines can wean researchers away from the breadth of resources it takes to discover a potentially potent compound and pinpoint the medical conditions it could be used to treat has long been promised.

But the technology is still in its early days and the hype is yet to be fully realised, said Roel Bulthuis, head of the healthcare team at Inkef Capital, a Dutch venture capital fund focused on technology and healthcare. That being said, Bulthuis is excited about the promise of the supercomputer.

“It’s great to see it in the UK ecosystem,” he said. “Many European healthcare systems are not as advanced in their thinking about using data and integrating that into the healthcare system.”

The UK has the ingredients to take advantage of this computing ability because of its huge resources of data. As well as large structured datasets like the UK Biobank, it has access to a wide range of clinicians via the NHS. No such comparable infrastructure exists at such a size elsewhere, GSK’s Branson said. “This is why GSK built an AI team and AI hubs in London and not in San Francisco, where I am."""

In [9]:

sentence_list = sentence_split(news)
data = []
for sentence in sentence_list:
    data.append(tokenization(cleaning(casefolding(sentence))))
data = (list(filter(None, data)))
data[:3]

[['the',
  'uks',
  'most',
  'powerful',
  'supercomputer',
  'which',
  'its',
  'creators',
  'hope',
  'will',
  'make',
  'the',
  'process',
  'of',
  'preventing',
  'diagnosing',
  'and',
  'treating',
  'disease',
  'better',
  'faster',
  'and',
  'cheaper',
  'is',
  'operational'],
 ['christened',
  'cambridge',
  'the',
  'supercomputer',
  'represents',
  'a',
  'm',
  'investment',
  'by',
  'us',
  'based',
  'computing',
  'company',
  'nvidia'],
 ['the',
  'idea',
  'capitalises',
  'on',
  'artificial',
  'intelligence',
  'ai',
  'which',
  'combines',
  'big',
  'data',
  'with',
  'computer',
  'science',
  'to',
  'facilitate',
  'problem',
  'solving',
  'in',
  'healthcare']]

In [10]:
wordfreq

Counter({'the': 31,
         'uk’s': 1,
         'most': 2,
         'powerful': 1,
         'supercomputer,': 1,
         'which': 4,
         'its': 3,
         'creators': 1,
         'hope': 1,
         'will': 4,
         'make': 1,
         'process': 1,
         'of': 22,
         'preventing,': 1,
         'diagnosing': 1,
         'and': 21,
         'treating': 1,
         'disease': 1,
         'better,': 1,
         'faster': 1,
         'cheaper,': 1,
         'is': 10,
         'operational.': 1,
         'christened': 1,
         'cambridge-1,': 1,
         'supercomputer': 2,
         'represents': 1,
         'a': 11,
         '$100m': 1,
         'investment': 1,
         'by': 1,
         'us-based': 1,
         'computing': 2,
         'company': 1,
         'nvidia.': 2,
         'idea': 2,
         'capitalises': 1,
         'on': 2,
         'artificial': 2,
         'intelligence': 2,
         '(ai)': 1,
         '–': 4,
         'combines': 1,
         'big': 1

In [13]:
rank = sentence_weight(data)
rank

[160,
 51,
 92,
 214,
 165,
 286,
 65,
 163,
 141,
 123,
 252,
 134,
 76,
 228,
 266,
 110,
 77,
 272,
 146,
 34,
 103]

In [14]:
n = 3
result = ''
sort_list = np.argsort(rank)[::-1][:n]
for i in range(n):
    result += '{} '.format(sentence_list[sort_list[i]])

In [15]:
print(result)

AI for healthcare is booming in the UK, with a range of startups and larger pharmaceutical companies mining the vast quantities of data available to discover potential drugs, pinpoint why some people are susceptible to certain diseases, and improve and personalise patient care. “Many European healthcare systems are not as advanced in their thinking about using data and integrating that into the healthcare system.”

The UK has the ingredients to take advantage of this computing ability because of its huge resources of data. But the technology is still in its early days and the hype is yet to be fully realised, said Roel Bulthuis, head of the healthcare team at Inkef Capital, a Dutch venture capital fund focused on technology and healthcare. 


In [20]:
# Now we put it all into one big function and run through the news story. 
def full_summary(news):
    def casefolding(sentence):
        return sentence.lower()

    def cleaning(sentence):
        return re.sub(r'[^a-z]', ' ', re.sub("’", '', sentence))

    def tokenization(sentence):
        return sentence.split()
    def sentence_split(paragraph):
        return nltk.sent_tokenize(paragraph)

    word_freq = Counter(news.lower().split())

    def sentence_weight(data):
        weights = []
        for words in data:
            temp = 0
            for word in words:
                temp += wordfreq[word]
            weights.append(temp)
        return weights
    
    sentence_list = sentence_split(news)
    data = []
    for sentence in sentence_list:
        data.append(tokenization(cleaning(casefolding(sentence))))
    data = (list(filter(None, data)))
    data[:3]

    rank = sentence_weight(data)
    n = 3
    result = ''
    sort_list = np.argsort(rank)[::-1][:n]
    for i in range(n):
        result += '{} '.format(sentence_list[sort_list[i]])
    return print(result)
    



In [21]:
# This is the result 
full_summary(news)

AI for healthcare is booming in the UK, with a range of startups and larger pharmaceutical companies mining the vast quantities of data available to discover potential drugs, pinpoint why some people are susceptible to certain diseases, and improve and personalise patient care. “Many European healthcare systems are not as advanced in their thinking about using data and integrating that into the healthcare system.”

The UK has the ingredients to take advantage of this computing ability because of its huge resources of data. But the technology is still in its early days and the hype is yet to be fully realised, said Roel Bulthuis, head of the healthcare team at Inkef Capital, a Dutch venture capital fund focused on technology and healthcare. 
