Releases · kmpoon/hlta

04 Nov 07:47

jef33

v2.3

3297b1a

v2.3 Latest

Latest

Version 2.3 Hotfix 3 (updated on Jan28, 2019)

Requirement: Java 8

Features:
Switched to using Stanford NLP to preprocess English
Added correlation test in HLTA model building (optional)
Removed unnecessary .dict.csv files in converting text to data
Seed words can now read .dict.csv files directly
Reduced HLTA-deps.jar size

Bug fixes:
Tried to upgrade pdfbox to solve the pdfbox.baseParser.pushBackSize issue
Fixed the issue of processing "bro-\r\nken word" in PDF
Fixed the library issue in TopicCompactness
Fixed the issue of having infinity in TopicCoherence
Fixed wrong directory for website dependencies
Fixed large memory consumption in reading .sparse.txt

This version includes all functions and new features from v2.2

Assets 4

13 Aug 14:01

tianzhiliang

v2.2

d072c19

v2.2: release multi-core version

Features:

multi-core training
remove island bridging (old feature)
release multiple machines by MPI on MultiMachineByMPI branch
update readme (explain the options and assemble)
release new HLTA.jar

Notice that:

HLTA.jar of this version is only for training(subroutine2), please use HLTA.jar from old version for topic building and predicting.

Assets 4

06 May 10:10

jef33

v2.1

47e3f01

v2.1

Please download the HLTA-deps.jar from v2.0

Features:
Added loglikelihood evaluation support for various data format
Added support for training set and testing set split
Made number of keywords in NDT Doc2VecAssignment adjustable

Bug fixes:
Fixed bug that LDA data all becomes binary
Fixed failure in ExtractTopicTree --broad option
Fixed failure in preprocessing Chinese

Assets 3

19 Mar 14:25

jef33

v2.0

b11d4e8

2.0

Requirement: Java 8

New Features:

All-in-one command for hierarchical topic detection
Webpage visualization with direct link to corresponding documents
Evaluation metrics: topic coherence, topic compactness(scala ver.)
Allow input document to be listed line by line
Supports non-ascii characters
Supports LDA data format
Added option to skip tree level
Simplified HLTA parameters
Supports seedwords of any word length
Parallel computation in computing word-pair MI

Other changes:

Default using Narrowly Defined Topics
Scala calls use Stepwise EM for parameter estimation
User defined encoding scheme in data conversion
Pre-processor now remove punctuation instead of replacing it with underscore
Subroutines now accept all data formats, while sparse data will be the default format
Data Conversion default only outputs sparse data format
Data Conversion now reads PDF directly
Sparse data format now counts docId from 0
HLCM data format now uses extension .hlcm
Legacy fixes of collision with .bif format reserved words
Fixed invalid json format

Assets 4

27 Mar 14:14

kmpoon

v1.4.1

b8001ec

1.4.1

Bug fixes:

fixed a bug that caused frequent word not considered in building n-grams
fixed a bug that outputs a filename that may crash the reader in the HLCM data format

(HLTA-deps.jar has not been changed)

Assets 4

08 Mar 04:17

kmpoon

v1.4

a89f325

1.4

Bug fixes

Assets 4

25 Jan 07:35

kmpoon

v1.3

2735291

1.3

Performance update:

Added StepwiseEMHLTA
Includes also the sparse data format as output during the conversion

Assets 4

16 Dec 07:02

kmpoon

v1.2

d4948ad

1.2

General improvements and fixes:

Added support StochasticPEM
Added support for narrowly defined topics
Updated the n-gram algorithm, where it now controls the number of concatenations used for building n-grams
Moved the Convert object from tm.pdf package to tm.text package
Combined the steps for extracting topics and generating Javascript topic tree
Shows 3 decimal places in topic size in topic tree

Only the HLTA.jar has been updated. The HLTA-deps.jar in the previous release can be used.

Assets 3

31 Jul 15:15

kmpoon

v1.1

93d71d2

1.1

Split package files into core file (HLTA.jar) and dependency file (HLTA-deps.jar).
Reduced memory footprint substantially for tm.pdf.Convert.
Better option parser and logger support for tm.pdf.Convert.

Assets 4

21 Jul 16:17

kmpoon

v1.0.1

3f88c7f

1.0.1

Fixed some possibly missing dependencies.

Assets 3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: kmpoon/hlta

v2.3

v2.2: release multi-core version

v2.1

2.0

1.4.1

1.4

1.3

1.2

1.1

1.0.1