# Simplifying with Openie6

The Openie6 software takes as input a possibly complex or compound sentence X,
and it returns a list of simple sentences that contain all the
information in the original sentence X.

Anton Alekseev (AA) and Anastasia Predelina (AP) wrote a jupyter notebook
that installs and runs the code in the Openie6 repo https://github.com/alexeyev/openie6
An exact copy of notebook by  AA/AP is included in this folder. It is also publicly available at AA's google drive
under the URL

 https://colab.research.google.com/drive/1samvO-SH6Xgjf9ItlhAF1EmBZo5grBQb?usp=sharing



This notebook adds new code to the end of the AA/AP notebook. The purpose of the
new code is
to simplify short stories and movie scripts.

In [1]:
# decide whether to use GPU or not at the beginning
NB_WITH_GPU = True

# IGL-CA: inference pipeline
Coordination analysis inference using the OpenIE6 model.

* Anton's [OpenIE6 fork](https://github.com/alexeyev/openie6)
* [OpenIE6 original repo](https://github.com/dair-iitd/openie6)
* [OpenIE6 original paper](https://aclanthology.org/2020.emnlp-main.306/)

Prepared by [Anton Alekseev](https://github.com/alexeyev) and [Anastasia Predelina](https://github.com/PredelinaAsya). Quite a bit of effort, tbh.

**NOTA BENE**: GPU environment should be enabled before running the code! If not possible, another code cell for CPU-only environment is available at the very end of the notebook.

## Getting the OpenIE6 code

In [2]:
# ! rmdir openie6

In [3]:
! git clone https://github.com/alexeyev/openie6.git

Cloning into 'openie6'...
remote: Enumerating objects: 1882, done.[K
remote: Counting objects: 100% (62/62), done.[K
remote: Compressing objects: 100% (44/44), done.[K
remote: Total 1882 (delta 40), reused 26 (delta 18), pack-reused 1820[K
Receiving objects: 100% (1882/1882), 8.52 MiB | 22.31 MiB/s, done.
Resolving deltas: 100% (387/387), done.


In [4]:
! mkdir openie6/models

## Downloading the model

Let's get the CA model trained by the paper authors. Re-uploaded it to my gdrive to reduce the downloading time.

In [5]:
# nb = 'amazon'
nb = 'google'
if nb == 'google':  # colab
    # copies from Anton's google drive to /content/conj_model.zip, 3.61 Gb
    ! gdown 185DK6eWKT6j6Oes04OIVFwJOKri8EVXV
elif nb == 'amazon':  # sagemaker studio lab
    # copies from rrtucci google drive to ./conj_model.zip
    !wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1JAyUs9jvRrpFgw6W8pl2uHVoJPxugEf4' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=1JAyUs9jvRrpFgw6W8pl2uHVoJPxugEf4" -O conj_model.zip && rm -rf /tmp/cookies.txt

Downloading...
From: https://drive.google.com/uc?id=185DK6eWKT6j6Oes04OIVFwJOKri8EVXV
To: /content/conj_model.zip
100% 3.61G/3.61G [00:31<00:00, 114MB/s]


In [6]:
! unzip conj_model.zip

Archive:  conj_model.zip
   creating: conj_model/
  inflating: conj_model/epoch=28_eval_acc=0.854.ckpt  
   creating: conj_model/logs/
   creating: conj_model/logs/test/
  inflating: conj_model/logs/test.txt  
   creating: conj_model/logs/test/checkpoints/
  inflating: conj_model/logs/test/events.out.tfevents.1601263425.vsky017.5076.0  
  inflating: conj_model/logs/test/events.out.tfevents.1601263521.vsky017.5784.0  
  inflating: conj_model/logs/test/hparams.yaml  


In [7]:
! rm conj_model.zip*
! mv conj_model* openie6/models/

## Creating a text sample

In [8]:
ztz1 = 'The man, who had never liked the words' \
  ' "booby" and "boobyhatch,"' \
  ' and who liked them even less on a shining morning when there' \
  ' was a unicorn in the garden, thought for a moment.'
ztz2 = "I love Luciano Pavarotti and Jose Carreras."
ztz12 = ztz1 + "\n" + ztz2
print(ztz12)
with open("sentences.txt", "w") as f:
  f.write(ztz12)

The man, who had never liked the words "booby" and "boobyhatch," and who liked them even less on a shining morning when there was a unicorn in the garden, thought for a moment.
I love Luciano Pavarotti and Jose Carreras.


## Setting up Python3.6

In [9]:
! sudo apt-get update -y -qq
! sudo apt-get install python3.6 python3.6-distutils python3.6-dev -qq

debconf: unable to initialize frontend: Dialog
debconf: (No usable dialog-like program is installed, so the dialog based frontend cannot be used. at /usr/share/perl5/Debconf/FrontEnd/Dialog.pm line 76, <> line 9.)
debconf: falling back to frontend: Readline
debconf: unable to initialize frontend: Readline
debconf: (This frontend requires a controlling tty.)
debconf: falling back to frontend: Teletype
dpkg-preconfigure: unable to re-open stdin: 
Selecting previously unselected package libpython3.6-minimal:amd64.
(Reading database ... 123069 files and directories currently installed.)
Preparing to unpack .../0-libpython3.6-minimal_3.6.15-1+focal3_amd64.deb ...
Unpacking libpython3.6-minimal:amd64 (3.6.15-1+focal3) ...
Selecting previously unselected package python3.6-minimal.
Preparing to unpack .../1-python3.6-minimal_3.6.15-1+focal3_amd64.deb ...
Unpacking python3.6-minimal (3.6.15-1+focal3) ...
Selecting previously unselected package libpython3.6-stdlib:amd64.
Preparing to unpack .../

In [10]:
! ls /usr/bin/python3*

/usr/bin/python3	    /usr/bin/python3.6m
/usr/bin/python3.10	    /usr/bin/python3.6m-config
/usr/bin/python3.10-config  /usr/bin/python3.8
/usr/bin/python3.6	    /usr/bin/python3.8-config
/usr/bin/python3.6-config   /usr/bin/python3-config


Here you will have to select `python3.6` manually

In [11]:
!sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.6 3
!sudo update-alternatives --config python3

update-alternatives: using /usr/bin/python3.6 to provide /usr/bin/python3 (python3) in auto mode
There are 3 choices for the alternative python3 (providing /usr/bin/python3).

  Selection    Path                 Priority   Status
------------------------------------------------------------
* 0            /usr/bin/python3.6    3         auto mode
  1            /usr/bin/python3.10   2         manual mode
  2            /usr/bin/python3.6    3         manual mode
  3            /usr/bin/python3.8    1         manual mode

Press <enter> to keep the current choice[*], or type selection number: 


## Setting up the dependencies

In [12]:
! rm get-pip*

rm: cannot remove 'get-pip*': No such file or directory


In [13]:
! python3 --version
! wget https://bootstrap.pypa.io/pip/3.6/get-pip.py
! python get-pip.py

Python 3.6.15
--2023-06-21 20:22:56--  https://bootstrap.pypa.io/pip/3.6/get-pip.py
Resolving bootstrap.pypa.io (bootstrap.pypa.io)... 151.101.0.175, 151.101.64.175, 151.101.128.175, ...
Connecting to bootstrap.pypa.io (bootstrap.pypa.io)|151.101.0.175|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2159061 (2.1M) [text/x-python]
Saving to: ‘get-pip.py’


2023-06-21 20:22:56 (72.5 MB/s) - ‘get-pip.py’ saved [2159061/2159061]

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting pip<22.0
  Downloading pip-21.3.1-py3-none-any.whl (1.7 MB)
     |████████████████████████████████| 1.7 MB 11.3 MB/s            
[?25hCollecting setuptools
  Downloading setuptools-59.6.0-py3-none-any.whl (952 kB)
     |████████████████████████████████| 952 kB 59.4 MB/s            
[?25hCollecting wheel
  Downloading wheel-0.37.1-py2.py3-none-any.whl (35 kB)
Installing collected packages: wheel, setuptools, pip
Successfully inst

Satisfying a few more dependencies just in case

In [14]:
! python -m pip install -q wheel setuptools pip --upgrade



In [15]:
! python -m pip install -q -r openie6/requirements.txt

     |████████████████████████████████| 12.0 MB 11.6 MB/s            
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
     |████████████████████████████████| 776.8 MB 16 kB/s             
     |████████████████████████████████| 123 kB 74.8 MB/s            
     |████████████████████████████████| 104 kB 76.9 MB/s            
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
     |████████████████████████████████| 3.7 MB 54.4 MB/s            
     |████████████████████████████████| 156 kB 77.5 MB/s            
     |████████████████████████████████| 133 kB 62.1 MB/s            
     |████████████████████████████████| 82 kB 1.1 MB/s             
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
     |████████████████████████████████| 829 kB 56.0 MB/s            
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
     |████████████████████████████████| 90 kB 8.9 MB/s             
     |████████████████████████████████| 3.0 MB 68.8 MB/s            
     

In [16]:
! python -m nltk.downloader stopwords
! python -m nltk.downloader punkt

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


Colab-specific dependencies

In [17]:
! python -m pip install -q ipykernel google-colab ipywidgets

     |████████████████████████████████| 121 kB 11.0 MB/s            
     |████████████████████████████████| 72 kB 1.2 MB/s             
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
     |████████████████████████████████| 123 kB 58.1 MB/s            
     |████████████████████████████████| 427 kB 57.7 MB/s            
     |████████████████████████████████| 130 kB 66.2 MB/s            
     |████████████████████████████████| 64 kB 2.4 MB/s             
     |████████████████████████████████| 104 kB 58.3 MB/s            
     |████████████████████████████████| 758 kB 55.8 MB/s            
     |████████████████████████████████| 8.0 MB 51.6 MB/s            
     |████████████████████████████████| 10.1 MB 59.1 MB/s            
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
     |████████████████████████████████| 57 kB 5.1 MB/s             
     |████████████████████████████████| 484 kB 63.8 MB/s            
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdo

## Running coordination analysis model

Please don't pay attention to the `UnknownBackend` warnings.

The `sentences.txt` samples will be converted to shorter sentences in `predictions.txt.conj`.

### GPU version (20 seconds or so)

In [18]:
if NB_WITH_GPU:
  ! cd openie6 && CUDA_DEVICE_ORDER=PCI_BUS_ID CUDA_VISIBLE_DEVICES=0 PYTHONPATH=imojie:imojie/allennlp:imojie/pytorch_transformers:$PYTHONPATH python run.py --save models/conj_model --mode predict --inp ../sentences.txt --batch_size 1 --model_str bert-large-cased --task conj --gpus 1 --out ../predictions.txt

  self.config = config
[0;31m[0m
[0;31mUnknownBackend[0mTraceback (most recent call last)
[0;32m/usr/local/lib/python3.6/dist-packages/IPython/core/interactiveshell.py[0m in [0;36menable_matplotlib[0;34m(self, gui)[0m
[1;32m   2953[0m         [0;31m# Now we must activate the gui pylab wants to use, and fix %run to take[0m[0;34m[0m[0;34m[0m[0;34m[0m[0m
[1;32m   2954[0m         [0;31m# plot updates into account[0m[0;34m[0m[0;34m[0m[0;34m[0m[0m
[0;32m-> 2955[0;31m         [0mself[0m[0;34m.[0m[0menable_gui[0m[0;34m([0m[0mgui[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0m[1;32m   2956[0m         [0mself[0m[0;34m.[0m[0mmagics_manager[0m[0;34m.[0m[0mregistry[0m[0;34m[[0m[0;34m'ExecutionMagics'[0m[0;34m][0m[0;34m.[0m[0mdefault_runner[0m [0;34m=[0m[0;31m [0m[0;31m\[0m[0;34m[0m[0;34m[0m[0m
[1;32m   2957[0m             [0mpt[0m[0;34m.[0m[0mmpl_runner[0m[0;34m([0m[0mself[0m[0;34m.[0m[0msafe_execfile[0m[0;34

Finally, the results.

In [19]:
if NB_WITH_GPU:
  ! cat predictions.txt.conj

The man , who had never liked the words `` booby '' and `` boobyhatch , '' and who liked them even less on a shining morning when there was a unicorn in the garden , thought for a moment .
The man , '' , thought for a moment .
The man , who liked them even less on a shining morning when there was a unicorn in the garden , thought for a moment .
The man , who had never liked the words `` booby , thought for a moment .
The man , who had never liked the words `` boobyhatch , thought for a moment .

I love Luciano Pavarotti and Jose Carreras .
I love Luciano Pavarotti .
I love Jose Carreras .



### CPU version (40 seconds or so)

In [20]:
if not NB_WITH_GPU:
  ! cd openie6 && PYTHONPATH=imojie:imojie/allennlp:imojie/pytorch_transformers:$PYTHONPATH python run.py --save models/conj_model --mode predict --inp ../sentences.txt --batch_size 1 --model_str bert-large-cased --task conj --out ../predictions.txt

In [21]:
if not NB_WITH_GPU:
  ! cat predictions.txt.conj

## Application of this software in Mappa Mundi

In [22]:
! git clone https://github.com/rrtucci/mappa_mundi.git

Cloning into 'mappa_mundi'...
remote: Enumerating objects: 830, done.[K
remote: Counting objects: 100% (56/56), done.[K
remote: Compressing objects: 100% (43/43), done.[K
remote: Total 830 (delta 27), reused 39 (delta 13), pack-reused 774[K
Receiving objects: 100% (830/830), 7.65 MiB | 20.10 MiB/s, done.
Resolving deltas: 100% (524/524), done.


In [23]:
%ls

get-pip.py    [0m[01;34mopenie6[0m/              [01;34msample_data[0m/
[01;34mmappa_mundi[0m/  predictions.txt.conj  sentences.txt


In [24]:
reload_it = False
if reload_it:
  import importlib as ilib
  import simp_openie6
  ilib.reload(simp_openie6)

In [25]:
import sys
sys.path.insert(0,'/content/mappa_mundi')
from my_globals import *
from utils import my_listdir
from simp_openie6 import openie6_simplify_batch_of_m_scripts
print(ZTZ_SIMPLIFIER)

simp_openie6


In [29]:
USE_GPU = NB_WITH_GPU
from time import time


def main3():
    start_time = time()
    print("************ simplifier:", ZTZ_SIMPLIFIER)
    in_dir = "short_stories_spell"
    out_dir = "short_stories_simp"
    in_dir = 'mappa_mundi/' + in_dir
    batch_file_names = my_listdir(in_dir)[0:3]
    openie6_simplify_batch_of_m_scripts(
        in_dir, out_dir,
        batch_file_names,
        verbose=False)
    time_now = (time() - start_time) / 60
    print(f"run time for short story simp: {time_now:.2f} minutes\n")


def main4():
    start_time = time()
    print("************ simplifier:", ZTZ_SIMPLIFIER)
    remove_dialogs = False
    in_dir = SPELL_DIR if not remove_dialogs else SPELL_RD_DIR
    out_dir = SIMP_DIR if not remove_dialogs else SIMP_RD_DIR
    in_dir = 'mappa_mundi/' + in_dir
    batch_file_names = my_listdir(in_dir)[0:3]
    openie6_simplify_batch_of_m_scripts(
        in_dir, out_dir,
        batch_file_names)
    time_now = (time() - start_time) / 60
    print(f"run time for movie script simp: {time_now:.2f} minutes\n")


In [30]:
main3()

************ simplifier: simp_openie6
run time for short story simp: 2.00 minutes



In [31]:
main4()

************ simplifier: simp_openie6
run time for movie script simp: 5.26 minutes



In [32]:
# from google.colab import drive
# drive.mount('/content/drive')

In [33]:
!zip -r m_scripts_simp.zip m_scripts_simp
!zip -r short_stories_simp.zip short_stories_simp

  adding: m_scripts_simp/ (stored 0%)
  adding: m_scripts_simp/toy-story.txt (deflated 67%)
  adding: m_scripts_simp/up.txt (deflated 66%)
  adding: m_scripts_simp/wall-e.txt (deflated 64%)
  adding: short_stories_simp/ (stored 0%)
  adding: short_stories_simp/bill-the-bloodhound.txt (deflated 67%)
  adding: short_stories_simp/extricating-young-gussie.txt (deflated 69%)
  adding: short_stories_simp/wiltons-holiday.txt (deflated 68%)
