Text Simplification

Overview

It is a rule based approach for sentence simplification, implemented from the paper “A NovelSystem for Generating Simple Sentences from Complex and Compound Sentences”.

This method used Stanford Parser and Stanford CoreNLPto simplify sentences. Parser provides the dependency parsing and Coreference Resolution system of CoreNLP helps to solve coreference problem.

According to the Stanford Typed Dependency Manual, a simple sentence will have only one ‘nsubj’ or ‘nsubjpaas’

Algorithm

> Break text into group of sentences { S1, S2, ………….. Si }
> For each sentence in { S1,……………… Si }
>> Create Basic and Collapsed SD of Si using Stanford Parser.
>> Generate Modified SD (MSD) of Si by taking subject clauses, ‘nsubj’ or ‘nsubjpass’ from Collapsed SD and other linked parts from Basic SD.
>> Remove dependencies, marked as ‘acl’, ‘appos’, ‘advcl’, ‘cc’, ‘ccomp’, ‘conj’, ‘dep’, ‘mark’, ‘parataxis’ and ‘ref’.
>> Identify subject clause ‘nsubj’ or ‘nsubjpass’.
>> Traverse all the parts that have links to each subject clause (‘nsubj’ or ‘nsubjpass’) and find the linked words.
>> Rearrange the words found in Step 5 to generate the complete simple sentences.

For example : SachinTendulkar, who is a MP of Indian Parliament, played cricket’.

Following are the basic and modified dependencies obtained from the stanford parser-

Basic Stanford Dependencies	Modified Stanford Dependencies
compound(Tendulkar-2, Sachin-1)	nsubj(MP-7, Tendulkar-2)
nsubj(played-12, Tendulkar-2)	nsubj(played-12, Tendulkar-2)
nsubj(MP-7, who-4)	compound(Tendulkar-2, Sachin-1)
cop(MP-7, is-5)	cop(MP-7, is-5)
det(MP-7, a-6)	det(MP-7, a-6)
acl:relcl(Tendulkar-2, MP-7)	case(Parliament-10, of-8)
case(Parliament-10, of-8)	compound(Parliament-10, Indian-9)
compound(Parliament-10, Indian-9)	nmod(MP-7, Parliament-10)
nmod(MP-7, Parliament-10)	root(ROOT-0, played-12)
root(ROOT-0, played-12)	dobj(played-12, cricket-13)
dobj(played-12, cricket-13)

Obtained Modified Stanford Dependencies are arranged according to the number linked with it. Here in the current example, we obtained two ‘nsubj’

nsubj(MP-7, Tendulkar-2)
nsubj(played-12, Tendulkar-2)

Now for each nsubj we arrange the rest dependencies in order -

Sachin{1} Tendulkar{2} is{5} a{6} MP{7} of{8} Indian{9} Parliament{10}
Sachin{1} Tendulkar{2} played{12} cricket{13}

Disadvantage

It highly rule based methods and it can produce false results even with a small error. Below are the outputs from the algorithm

Original Sentence : Do not drive forward if obstacles are closer than 5m

Result: ['Obstacles are closer 5m ahead','Do not drive forward']

But, if there is a grammatical error in the sentence.

Original Sentence : Do not drive forward if obstacles closer than 5m

Result: ['Do not drive forward']

Here, it is clearly visible that a grammatical mistake of ‘are’ has changed the whole meaning of the sentence.

Usage

Download Stanford-nlp-core

wget http://nlp.stanford.edu/software/stanford-corenlp-full-2018-10-05.zip

Unzip this core file

unzip stanford-corenlp-full-2018-10-05.zip

Install requirements

pip install -r requirements.txt

Run

python main.py

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
codes		codes
.gitignore		.gitignore
README.md		README.md
Syntactic Simplification Stanford Parser.ipynb		Syntactic Simplification Stanford Parser.ipynb
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text Simplification

Overview

Algorithm

Disadvantage

Usage

About

Releases

Packages

Languages

rajatvajpayee/syntactic-simplification-StanfordNLP

Folders and files

Latest commit

History

Repository files navigation

Text Simplification

Overview

Algorithm

Disadvantage

Usage

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages