Releases: svilupp/LLMTextAnalysis.jl
Releases · svilupp/LLMTextAnalysis.jl
v0.5.0
LLMTextAnalysis v0.5.0
Added
- Added a classification function
train_classifier
to train a model to classify documents into a set of predefined labels (as opposed to the more open-ended topic modeling inbuild_clusters!
). You can either provide a small set of labeled documents to train the model (that are in theindex
), or just specify thenum_samples
and the LLM model will generate its own training data based on thelabels
andlabels_description
provided. - Added a new template
TextWriterFromLabel
to generate synthetic documents for any given label (=topic). - Added methods for
build_clusters!
to add custom topic levels, eg, from aTrainedClassifier
(build_clusters!(index,cls; topic_level="MyTopics")
) or directly via providing a vector of documentassignments
(build_clusters!(index, assignments; topic_level="MyTopics")
). The convention is to usetopic_level::Integer
for auto-generated topics, andtopic_level::String
for custom topics.
Updated
- Updated to use
PromptingTools
0.15.
Fixed
- Fixed a bug where
keywords
were not properly filtered before being provided to the auto-labeling function.
Commits
Merged pull requests:
v0.4.0
LLMTextAnalysis v0.4.0
Added
- Added a new example on topic label customization (
examples/3_customize_topic_labels.jl
) and the corresponding sections in the FAQ. - Added a few string cleanup tricks in
build_topic
function to strip unnecessary repetition of the prompt template in the generated labels. - Added new templates
TopicLabelerWithInstructions
andTopicSummarizerWithInstructions
that include the placeholderinstructions
to allow users to easily customize the labels and summaries, respectively.
Fixed
- Fixed small typos in templates
TopicLabelerBasic
andTopicSummarizerBasic
.
Updated
- Updated logic in the
plot
to ensure topic labels are generated only when necessary. Usebuild_clusters!
to force the generation of topic labels, orplot
to generate them only if necessary. - Increased compatibility for PromptingTools to 0.12.
Commits
Merged pull requests:
v0.3.2
v0.3.1
LLMTextAnalysis v0.3.1
Fixed
wrap_string
utility would error with SubString chunks. Now it works with any AbstractString type.
Commits
Merged pull requests:
v0.3.0
LLMTextAnalysis v0.3.0
Added
- Changed compat for PromptingTools to 0.10.0 (with new default models! Ie, default embeddings will not match the previous version)
Commits
Merged pull requests:
v0.2.1
v0.2.0
LLMTextAnalysis v0.2.0
Added
- Added
train_concept
. Introduces the ability to train a model focusing on a single, specific concept within a collection of documents. This function helps in identifying and scoring the presence or intensity of the selected concept across the document set. Ideal for thematic studies, sentiment analysis, or tracking specific ideas in the text. - Added
train_spectrum
. Adds functionality to analyze documents across a spectrum defined by two contrasting concepts. This feature allows for a comparative analysis, providing insights into how documents align or contrast with two polar themes or sentiments. - Spectrum and concept can be plotted using
plot
function. - Improved plotting support:
- Added package extension for
PlotlyJS
forplot
function. - Enabled
plot
function to accept an arbitraryhoverdata
table with information to be added to the tooltip for each document (expects Tables.jl-compatible data).
- Added package extension for
Commits
Merged pull requests: