Skip to content

Latest commit

 

History

History
332 lines (226 loc) · 28.6 KB

CHANGELOG.md

File metadata and controls

332 lines (226 loc) · 28.6 KB

v0.18.0 (2024-03-29)

Feature

  • methodology: Add detailed explanation for Aspect-Based Sentiment Analysis and Generative language models application (49f87bd)

Fix

  • dependencies: Update python and hyfi versions; update thematos version (671ae1b)

Documentation

  • implementation: Refine methodology detail (685f3e3)
  • book: Un-comment implementation chapter in table of contents (c070ae7)
  • book: Add implementation details to the book paper (e98572c)

v0.17.0 (2023-09-09)

Feature

  • pipeline: Add datasets save predictions configuration (d29c1d2)
  • config: Add datasets-save-predictions.yaml (bb93396)

v0.16.0 (2023-09-04)

Feature

  • corprep-runner: Add corprep-gpt4-train_year agent (2d7e1c5)

Fix

  • dependencies: Upgrade hyfi-absa to 0.4.0 (5dc8ca8)

v0.15.0 (2023-09-04)

Feature

  • corprep-gpt3.yaml: Add newsId to configuration (623012f)
  • corprep: Update text parsing instructions (42d7134)
  • workflow: Add corprep-runner.yaml config file (d9b7ff7)
  • config: Add input_filename attribute to corprep-gpt3-sample.yaml, use input_filename for file path (6dd5b65)

Fix

  • dependencies: Upgrade hyfi to 1.32.1, lexikanon to 0.6.4 and hyfi-absa to 0.3.3 (82f72bb)

v0.14.0 (2023-08-08)

Feature

  • runner: Add new corprep-gpt4-train_2019.yaml configuration file. (444b8c9)
  • pipeline: Add dataframe_select_columns task to datasets.yaml (2f796d7)
  • corprep.yaml: Add datasets-filter-year to pipeline, add filter_year variable (dc215a7)
  • pipeline: Add datasets filter configuration (db5f422)

v0.13.0 (2023-08-07)

Feature

  • corprep: Add remove_columns function to dataset config (dd89120)
  • workflow: Add datasets-save pipeline (7a9f1d1)
  • pipeline: Add dataset_remove_columns to datasets.yaml (9cdb2c8)
  • pipeline: Add new pipeline for saving datasets (eff2232)

Fix

  • config: Correct dataset_path in corprep-gpt3-sample.yaml (837b9d9)
  • dependencies: Upgrade lexikanon to 0.5.2 (44fa3df)
  • dependencies: Upgrade hyfi to 1.20.1 (b95917d)

v0.12.2 (2023-08-04)

Fix

  • pipeline: Replace find_similar_docs_ac with find_similar_docs_by_clustering in datasets, change column names in datasets-similar.yaml (78ad328)
  • pipeline: Update tokenizer parameter name (f96b9b1)

v0.12.1 (2023-08-04)

Fix

  • tokenizer: Update stopwords path in kakao config (8158062)
  • pipeline: Increase sample size and adjust worker count, remove specific columns from removal list (28529da)
  • workflow: Enable noun and similar pipelines (f242a63)
  • pipeline: Update filter_dataset to filter_and_sample_data (1e97c46)
  • pipeline: Change num_samples to sample_size in datasets-test.yaml (c9ba5cb)
  • pipeline: Rename num_samples to sample_size in datasets-similar config (2cdc541)
  • pipeline: Change num_samples to sample_size in datasets-noun configuration (5c32059)
  • pipeline: Add sample filename in datasets filter config (b0609d0)

v0.12.0 (2023-08-03)

Feature

  • corprep: Add revision1 for more detailed sentiment task description (03edfb2)

v0.11.0 (2023-08-03)

Feature

  • config/runner: Add new configuration files for GPT3 and GPT4 (6d83abc)
  • pipeline: Add new dataset filter and load steps (441e063)
  • corprep: Add filter pipeline in config and comment out noun and similar pipelines (6bbd818)
  • filter: Add verbose print statements. (08cf03e)
  • corprep: Add filter_dataset configuration files (6edb66f)
  • corprep/datasets: Add filter functionality to datasets (3a2835a)

v0.10.0 (2023-08-03)

Feature

  • config/runner: Add new gpt3 and gpt4 test configuration files (be6e168)
  • corprep-conf-runner: Add corprep-gpt3 and corprep-gpt4 configuration files (c82df5e)
  • corprep: Add gpt3 and gpt4 agent configurations (b2a52cd)
  • corprep: Add config for TRIPLE and QUAD tasks in prompts (cd8d7e4)

v0.9.3 (2023-08-03)

Fix

  • corprep: Add secrets directory initialization (0bce74f)
  • pipeline: Replace dataframe_save with save_dataframes, remove redundant dataframe_save and save_dataframes files (666226a)
  • dependencies: Upgrade hyfi to 1.13.0 (41eb5e8)

v0.9.2 (2023-07-30)

Fix

  • corprep: Add hyabsa to plugins (6dbea3f)
  • dependencies: Upgrade hyfi to 1.12.5 and add hyfi-absa 0.1.0 (b0c679c)

v0.9.1 (2023-07-28)

Fix

  • dependencies: Upgrade hyfi to 1.12.1 (21c83f5)
  • pipeline: Rename dataset_load_raw to load_raw_dataset, add file_pattern and set verbose to false (74cd0c2)
  • corprep: Add load_raw_dataset configuration (fa0b9be)
  • corprep: Remove specific tasks and columns in absa_agent_predict.yaml (2debf24)
  • absa_agent_predict: Add null value to pipe_obj_arg_name property (99a6473)
  • find_similar_docs: Rename to find_similar_docs_ac.yaml (fc62043)
  • pipeline: Rename dataset related functions for clarity (90ab34e)
  • pipeline: Reduce defaults in absa-kakao and gpt35 configurations (91b4e7c)
  • dependencies: Upgrade lexikanon to 0.3.2, thematos to 0.2.1 (85257e4)
  • corprep: Add 'cluster' column to DataFrame in similar_docs functions (1903b6f)

v0.9.0 (2023-07-27)

Feature

  • corprep: Add thematos plugin (45ae7f3)
  • pyproject: Add thematos dependency (6253d99)
  • book: Add data.md in supplementary with filtering details (a8ed8ca)
  • corprep: Add dataset_to_pandas and pandas_print_head configuration files (834f55b)
  • similarity.py: Add multiple data-processing and plotting functions, switch clustering method from DBSCAN to Agglomerative Clustering (4f38cba)
  • corprep: Add yaml configurations for saving dataframes (5e5b235)
  • corprep: Add find_similar_docs configuration (c468597)
  • config: Add new pipeline and task configuration for dataset simulation (3710f93)

v0.8.0 (2023-07-26)

Feature

  • corprep: Add run configurations for absa_agent_predict, filter_dataset and load_raw_dataset (7290fca)
  • workflow: Add datasets-test configuration file (b612cc4)
  • pipeline/config: Enhance datasets.yaml (28a6108)
  • tokenize: Add extract_tokens function to handle part-of-speech tagging (c6482fb)
  • corprep: Add dataset_extract_nouns configuration, add dataset_extract_tokens configuration (5ddf017)
  • tokenizer: Add kakao configuration (9ae964f)
  • pipeline: Add extract tokens step with kakao tokenizer config (04e66d2)

Fix

  • workflow: Add workflow_name field to workflows (f9dfb5b)
  • dependencies: Upgrade hyfi to 1.9.4 (98e0228)

v0.7.1 (2023-07-25)

Fix

  • corprep: Replace package name with path (adb8932)
  • dependencies: Upgrade hyfi to 1.9.3 and lexikanon to 0.2.3 (1dd2766)

v0.7.0 (2023-07-24)

Feature

  • tokenize: Add load_from_cache_file option (f4e0057)
  • pipeline: Add load from cache option in tokenizer config (9617708)
  • corprep: Add lexikanon plugin to HyFI initialization (2fe976a)

Fix

  • dependencies: Upgrade hyfi to 1.9.0 and ekonlpy to 2.0.1 (f53fa18)

v0.6.0 (2023-07-23)

Feature

  • tests: Add tokenizer test in corprep module (5487cf7)
  • corprep/datasets: Add similarity.py file with similarity analysis functions (708e79f)
  • tokenizer: Add strip_pos option to SimpleTokenizer configuration (7fc3991)
  • corprep: Add tokenizer_config_name and token_col to dataset tokenize configuration (4691e71)
  • config/task: Add new datasets-tokenize.yaml file (c3f2af5)
  • pipeline: Create datasets-tokenize.yaml for tokenization in pipeline (743b8d7)
  • tokenizer: Add flatten option to MecabTokenizer config (070740b)
  • tokenizer: Add new tokenizer configurations for SimpleTokenizer, MecabTokenizer, NLTKTokenizer, add new tagger configurations for mecab and nltk (26496c1)
  • tokenizer: Add new tokenizer classes and methods (df773b9)
  • tokenizer: Add hanja table loading function (b420649)
  • tests: Add stopwords test in tokenizer (30335b3)
  • tokenizer: Add stopwords functionality (073a176)
  • corprep: Add new stopwords configuration (bb79233)
  • pyproject.toml: Add nltk dependency (cf00d4c)
  • corprep: Add new configuration files for text normalization (ce82466)
  • normalizer: Add new configurations for text normalization (ac53a45)
  • corprep: Add new about information (bf79ee0)
  • corprep/resources/dictionaries/mecab: Add new ekon_v1.dic file (d408222)
  • tokenizer: Add utils for text normalization and string metrics (6b1d818)
  • tokenizer: Add hangle encoder with normalization and decomposition functions (afe155b)
  • corprep/tokenizer/hanja: Add new translation functions and character handling for Hangul and Hanja (490c763)
  • tokenizer: Add normalizer.py with normalizer functionality (8e579aa)
  • dependencies: Add scikit-learn version 1.3.0 (6c34d44)

Fix

  • pipeline: Correct typo in tokenize step (c9158d8)
  • NLTKTokenizer: Modify parse method return type (48ee9cd)
  • corprep: Streamline main function and import statements (11533a2)
  • corprep: Change how HyFi is initialized and used (c2e0344)

v0.5.0 (2023-07-20)

Feature

  • config: Add new configuration files for absa, pipeline, task, and workflow (0883120)

Documentation

  • Update URL from Github pages to subdomain (f6b5f59)

v0.4.0 (2023-07-19)

Feature

  • corprep: Add new absa workflow configuration (c0fb068)
  • corprep: Add gpt35 pipeline to absa task configuration (d6c2fe8)
  • pipeline: Add new absa-kakao-gpt35.yaml configuration file (594c317)
  • corprep: Add new gpt35.yaml configuration file for ABSA task (ceb35ab)

Fix

  • absa/config: Handle additional exceptions in call_api function (7a53ed4)
  • corprep: Handle api responses and modify related functions (3400a26)
  • corprep/absa: Handle InvalidRequestError in call_api function (4d70fbf)
  • corprep/datasets: Add number of samples logging (872879c)
  • absa: Adjust agent call function and return structure (d10ad61)

v0.3.0 (2023-07-19)

Feature

  • corprep: Add absa-kakao pipeline configuration (bc1418d)
  • corprep: Add absa_agent_predict configuration file (62ef975)
  • absa/prompts: Add default.yaml configuration for TRIPLE and QUAD tasks (1c72bff)
  • corprep: Add new absa default configuration (68b6a58)
  • corprep/absa: Add config module with API logic and data models (623bc76)
  • corprep/absa: Add agent module with predict functionalities (1e9285b)
  • corprep/absa: Add new file__init__.py (32dc886)
  • corprep: Add dataset_load and absa configurations (f40c599)
  • corprep: Add absa task (653e525)
  • corprep/datasets/io.py: Add load_dataset function (3cb4294)
  • pipe: Add dataset_sample and a second dataset_save to steps (57cc25c)
  • corprep: Add sample_dataset function and related configuration (4b64a40)
  • pipeline: Add tokenize step to datasets pipeline (1536430)
  • corprep: Add new file for tokenizing dataset (6fb827e)
  • corprep/datasets/preprocessing: Add tokenize_dataset function (2db1824)

Fix

  • .tasks.toml: Lower coverage fail threshold to 1% (b2b7718)
  • corprep: Introduce setLogger for HyFI (c78e6e2)
  • datasets: Add path and file_pattern parameters to load_raw_dataset (2bf977b)
  • dependencies: Upgrade hyfi to 1.2.14 (ca9ac2b)

v0.2.0 (2023-07-17)

Feature

  • corprep: Add save_dataset pipe (b7a3dff)
  • corprep: Add save_raw_dataset configuration (7e331d2)
  • corprep: Add datasets.yaml configuration for pipeline (4cf7bd4)
  • corprep: Add new project configuration file (be8890d)
  • corprep: Add new configuration file (4cdf806)
  • datasets: Add function to save raw datasets (8a5319c)
  • datasets: Create corprep/datasets/init.py file (5808b35)

Fix

  • corprep: Add global_workspace_name to yaml config file (ffe0c12)
  • dependencies: Upgrade hyfi to 1.2.13 (d1b7658)
  • dependencies: Upgrade hyfi to 1.2.10 (68314de)
  • datasets: Add Dataset import to raw.py (c959747)
  • dependencies: Upgrade hyfi to 1.2.7 (82ab4e6)
  • dependencies: Upgrade hyfi to 1.2.6 (cfb2f3f)

Documentation

  • book: Add new sections (introduction, literature, methodology, results, conclusion, supplementary materials) (e1f0aaf)

v0.1.2 (2023-07-11)

Fix

  • dependencies: Upgrade hyfi to 1.2.2 (dfdc822)

v0.1.1 (2023-06-28)

Fix

  • dependencies: Upgrade hyfi to 0.15.0 (cc9463b)

v0.1.0 (2023-06-07)

Feature

Fix