<a href="https://colab.research.google.com/github/mfaridn03/TextSummariser/blob/main/text_summarisier.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Text Summarisation

Recall an earlier notebook where you were given a task to summarise online content and produce a report. We converted an audio file to text. Similarly, we could write a notebook to convert a PDF or Word document to text or Web pages to text. The strategy is to convert everything to text, summarise the text, and use the summary in the final report.

# The Challenge

Create a project to summarise text and publish the project in Binder. 

You can choose how to input the text.  Some ideas include pasting it into a string, reading from a file, extract from a PDF or a webpage. 
 
It is okay to follow a online tutoirial or youtube video but make sure you have some understanding of what you are doing. You can ask you tutor for help if needed.  They will either help search, or perhaps explain the code in a tutorial.


# Task 0 - Initialise a NEW repository

We are going to deploy this notebook using Binder.

* Initialise a new PUBLIC GitHub repository, say called, text_summariser.
* Import this notebook into the new repository



# Install required libraries

In [None]:
!pip install PyInputPlus
!pip install spacy
!pip install pyperclip

Import modules onto program

In [None]:
import os
import pyinputplus as pyip
import pyperclip
import spacy


# Initialising spacy
# https://github.com/explosion/spaCy/issues/4577
try:
    nlp = spacy.load("en_core_web_sm")
except OSError:
    import en_core_web_sm
    nlp = en_core_web_sm.load()

Writing some helper functions

In [None]:
# Get content from a valid file
def get_file_content(filename: str):
  content = None
  with open(filename, 'r') as op:
    content = op.read().strip()
  return content

# Custom input prompt that validates filename input because
# pyip's mustExist=True parameter in inputFilePath does not work
def valid_file_prompt(prompt: str):
  fn = input(prompt)
  while not os.path.isfile(fn):
    print("Not a valid file.\n")
    fn = input(prompt)
  
  return fn

# Get content from user's clipboard. Does not work in colab, only
# local machine
def get_clipboard_content():
  return pyperclip.paste()

Summarise function

In [None]:
# Code is from tutorial lesson
# Returns the first two sentence of the article as a very basic 'summary'
def summarise(string: str):
  doc = nlp(string)
  sentences = [sentence.text for sentence in doc.sents][:2]
  return ' '.join(sentences)

Main function

In [65]:
def main():
  choice = pyip.inputMenu(
      ["Paste from clipboard", "Load from file", "Exit"],
      prompt="Enter text input method:\n",
      numbered=True
    )
  
  if "clipboard" in choice:
    # Only works when run in local machine
    try:
      paste = get_clipboard_content()
    except pyperclip.PyperclipException as error:
      if 'could not find a copy/paste mechanism for your system' in str(error):
        print("Copy paste does not work on Google Colab. Must run in local machine to use this option")
      else:
        print(error)
    else:
      print(summarise(paste))

  elif "from file" in choice:
    fn = valid_file_prompt("Enter filename: ")
    fcontent = get_file_content(fn)
    print(summarise(fcontent))

  else:
    print("Exiting.")

In [67]:
if __name__ == "__main__":
  main()
  

Enter text input method:
1. Paste from clipboard
2. Load from file
3. Exit
2
Enter filename: abc.txt
Not a valid file.

Enter filename: article.txt
Perth's beautiful beaches are facing a new danger: shark attacks. Local surfers and swimmers are reporting a sudden rise in shark activity in recent weeks.
