Skip to content
master
Switch branches/tags
Code

Source Code and data used for CIKM Readability 2015 short paper

You can obtain the text at:

  1. ACM

  2. Research Gate

Dependences

The code used in this work takes advantage of the ReadabilityCalculator python module, that can be downloaded using pip:

$ pip install ReadabilityCalculator

Data:

readability_scores*.tar.gz

It contains the readability scores for every document from CLEF eHealth 2014/2015 dataset.

distrib.tar.gz

It contains the distribution of words and sentences for each preprocessing variant for the documents in CLEF eHealth 2014/2015 dataset.

lucene_html.out

It is the lucene result list based on a default VSM search using the topics from CLEF eHealth 2014.


Code:

check_num_words.py

Script that creates table 2 from the paper.

unpack_dat.py

Script to unpack the original '.dat' files from CLEF and preprocess them using any of the boilerplate removal options.

calculate_readability.py

Python script used to create the files in readability_score.tar.gz

correlations.py

Calculates the correlations between the ranking list generated by different readability measure for the same Lucene based initial ranked list.

About

This repo contains the data and code used in this CIKM 2015 paper: http://dl.acm.org/citation.cfm?id=2806613

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages