# Calculating ROUGE scores
This notebook's purpose is to provide easy-to-install environment for calculating ROUGE scores on provided summaries.

## Prequisites
To calculate ROUGE score, one must connect to Google Drive and have these directories and files:
* directory `result/<model_name>` with zipped directory with summaries generated by the model after running `main.py` script and zipped directory with gold standard summaries.

## Mounting Google Drive

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


## Installing ROUGE and Python wrapper - `pyrouge`

In [2]:
!git clone https://github.com/andersjo/pyrouge.git rouge

Cloning into 'rouge'...
remote: Enumerating objects: 393, done.[K
remote: Total 393 (delta 0), reused 0 (delta 0), pack-reused 393[K
Receiving objects: 100% (393/393), 298.74 KiB | 11.49 MiB/s, done.
Resolving deltas: 100% (109/109), done.


In [3]:
!pip install pyrouge

Collecting pyrouge
  Downloading pyrouge-0.1.3.tar.gz (60 kB)
[?25l[K     |█████▍                          | 10 kB 20.0 MB/s eta 0:00:01[K     |██████████▉                     | 20 kB 26.9 MB/s eta 0:00:01[K     |████████████████▎               | 30 kB 14.3 MB/s eta 0:00:01[K     |█████████████████████▋          | 40 kB 8.4 MB/s eta 0:00:01[K     |███████████████████████████     | 51 kB 9.6 MB/s eta 0:00:01[K     |████████████████████████████████| 60 kB 4.8 MB/s 
[?25hBuilding wheels for collected packages: pyrouge
  Building wheel for pyrouge (setup.py) ... [?25l[?25hdone
  Created wheel for pyrouge: filename=pyrouge-0.1.3-py3-none-any.whl size=191621 sha256=610d59b99c53dd90c9e6232ce9394e64b2f49a830df77f9d034f0e3ba2cab65e
  Stored in directory: /root/.cache/pip/wheels/68/35/6a/ffb9a1f51b2b00fee42e7f67f5a5d8e10c67d048cda09ccd57
Successfully built pyrouge
Installing collected packages: pyrouge
Successfully installed pyrouge-0.1.3


In [4]:
!pyrouge_set_rouge_path '/content/rouge/tools/ROUGE-1.5.5'

2022-05-10 17:08:29,950 [MainThread  ] [INFO ]  Set ROUGE home directory to /content/rouge/tools/ROUGE-1.5.5.


In [None]:
%cd rouge/tools/ROUGE-1.5.5/data
!rm "WordNet-2.0.exc.db"
!perl ./WordNet-2.0-Exceptions/buildExeptionDB.pl ./WordNet-2.0-Exceptions ./smart_common_words.txt ./WordNet-2.0.exc.db
!cpan install XML::DOM

In [6]:
from pyrouge import Rouge155
import csv
import os

## Testing on MatchSum summaries

In [None]:
r = Rouge155()
r.system_dir = '/content/drive/MyDrive/result/MatchSum_cnndm_bert.ckpt/dec/'
r.model_dir = '/content/drive/MyDrive/result/MatchSum_cnndm_bert.ckpt/ref/'
r.system_filename_pattern = '(\d+).dec'
r.model_filename_pattern = '#ID#.ref'

output = r.convert_and_evaluate()
output_dict = r.output_to_dict(output)

In [14]:
print(output)

---------------------------------------------
1 ROUGE-1 Average_R: 0.52030 (95%-conf.int. 0.51769 - 0.52309)
1 ROUGE-1 Average_P: 0.40144 (95%-conf.int. 0.39880 - 0.40394)
1 ROUGE-1 Average_F: 0.43960 (95%-conf.int. 0.43750 - 0.44185)
---------------------------------------------
1 ROUGE-2 Average_R: 0.24209 (95%-conf.int. 0.23948 - 0.24490)
1 ROUGE-2 Average_P: 0.18796 (95%-conf.int. 0.18555 - 0.19047)
1 ROUGE-2 Average_F: 0.20497 (95%-conf.int. 0.20260 - 0.20741)
---------------------------------------------
1 ROUGE-3 Average_R: 0.14129 (95%-conf.int. 0.13887 - 0.14387)
1 ROUGE-3 Average_P: 0.11112 (95%-conf.int. 0.10882 - 0.11329)
1 ROUGE-3 Average_F: 0.12028 (95%-conf.int. 0.11801 - 0.12251)
---------------------------------------------
1 ROUGE-4 Average_R: 0.09416 (95%-conf.int. 0.09201 - 0.09645)
1 ROUGE-4 Average_P: 0.07523 (95%-conf.int. 0.07317 - 0.07724)
1 ROUGE-4 Average_F: 0.08070 (95%-conf.int. 0.07859 - 0.08270)
---------------------------------------------
1 ROUGE-L Aver

In [15]:
with open("/content/drive/MyDrive/result/MatchSum_cnndm_bert.ckpt/rouge.csv", "w", newline="") as csvfile:
    writer = csv.DictWriter(csvfile, fieldnames=["metric", "value"])
    writer.writeheader()
    for metric, value in output_dict.items():
        writer.writerow({"metric": metric, "value":value})

# Our models results

In [7]:
%env SUMMARY_DIR=/content/drive/MyDrive/result/pso_first_model/
%env SUMMARY_FILE=summary.zip
%env TARGET_DIR=/content/drive/MyDrive/result/
%env TARGET_FILE=targets.zip

env: SUMMARY_DIR=/content/drive/MyDrive/result/pso_first_model/
env: SUMMARY_FILE=summary.zip
env: TARGET_DIR=/content/drive/MyDrive/result/
env: TARGET_FILE=targets.zip


In [8]:
! unzip $SUMMARY_DIR$SUMMARY_FILE -d $SUMMARY_DIR

Archive:  /content/drive/MyDrive/result/pso_first_model/summary.zip
replace /content/drive/MyDrive/result/pso_first_model/summary/00000.txt? [y]es, [n]o, [A]ll, [N]one, [r]ename: N


In [9]:
! unzip $TARGET_DIR$TARGET_FILE -d $TARGET_DIR

Archive:  /content/drive/MyDrive/result/targets.zip
replace /content/drive/MyDrive/result/targets/00000.txt? [y]es, [n]o, [A]ll, [N]one, [r]ename: N


In [None]:
r = Rouge155()
r.system_dir = os.environ["SUMMARY_DIR"] + os.environ["SUMMARY_FILE"].split(".")[0]
r.model_dir = os.environ["TARGET_DIR"] + os.environ["TARGET_FILE"].split(".")[0]
r.system_filename_pattern = '(\d+).txt'
r.model_filename_pattern = '#ID#.txt'

output = r.convert_and_evaluate()
output_dict = r.output_to_dict(output)

In [11]:
print(output)

---------------------------------------------
1 ROUGE-1 Average_R: 0.49291 (95%-conf.int. 0.49023 - 0.49575)
1 ROUGE-1 Average_P: 0.28480 (95%-conf.int. 0.28253 - 0.28719)
1 ROUGE-1 Average_F: 0.34204 (95%-conf.int. 0.33993 - 0.34433)
---------------------------------------------
1 ROUGE-2 Average_R: 0.18437 (95%-conf.int. 0.18194 - 0.18683)
1 ROUGE-2 Average_P: 0.10853 (95%-conf.int. 0.10692 - 0.11040)
1 ROUGE-2 Average_F: 0.12963 (95%-conf.int. 0.12794 - 0.13162)
---------------------------------------------
1 ROUGE-3 Average_R: 0.09730 (95%-conf.int. 0.09544 - 0.09935)
1 ROUGE-3 Average_P: 0.05871 (95%-conf.int. 0.05739 - 0.06033)
1 ROUGE-3 Average_F: 0.06942 (95%-conf.int. 0.06797 - 0.07111)
---------------------------------------------
1 ROUGE-4 Average_R: 0.06228 (95%-conf.int. 0.06063 - 0.06406)
1 ROUGE-4 Average_P: 0.03818 (95%-conf.int. 0.03696 - 0.03958)
1 ROUGE-4 Average_F: 0.04475 (95%-conf.int. 0.04346 - 0.04624)
---------------------------------------------
1 ROUGE-L Aver

In [12]:
with open(os.environ["SUMMARY_DIR"] + "rouge.csv", "w", newline="") as csvfile:
    writer = csv.DictWriter(csvfile, fieldnames=["metric", "value"])
    writer.writeheader()
    for metric, value in output_dict.items():
        writer.writerow({"metric": metric, "value":value})