Skip to content

Releases: masanorihirano/llm-japanese-dataset

1.0.3

18 Jan 07:35
66804cf
Compare
Choose a tag to compare
  • dropped nan input and output in alt

CC-BY-SA 4.0 version has 9,074,340 lines
MIT version has 1,481,970 lines

This release is automatically generated.
Please see the pull request for more details.
#89

1.0.2

04 Jan 03:31
70c4734
Compare
Choose a tag to compare

In Wikipedia summary:

  • removed samples of blank output
  • version of Wikipedia was updated to 20240101

CC-BY-SA 4.0 version has 9,074,350 lines
MIT version has 1,481,980 lines
This release is automatically generated.
Please see the pull request for more details.
#85

vanilla-1.0.2

04 Jan 03:42
5e7f6f0
Compare
Choose a tag to compare

In Wikipedia summary:

  • removed samples of blank output
  • version of Wikipedia was updated to 20240101

CC-BY-SA 4.0 version has 2,492,588 lines
MIT version has 296,422 lines
This release is automatically generated.
Please see the pull request for more details.
#86

1.0.1

04 Jul 09:29
4da339a
Compare
Choose a tag to compare
  • drop alpaca
    This release is automatically generated.
    Please see the pull request for more details.
    #79

vanilla-1.0.1

04 Jul 08:44
21b3643
Compare
Choose a tag to compare
  • dropped alpaca

CC-BY-SA 4.0 version has 2,463,624 lines
MIT version has 296,422 lines

This release is automatically generated.
Please see the pull request for more details.
#78

1.0.0

03 Jul 16:08
6c4da8b
Compare
Choose a tag to compare

Added:

  • jqac
  • wikipedia ja typo corpus

CC-BY-SA 4.0 version has 9,097,388 lines
MIT version has 1,533,982 lines
This release is automatically generated.
Please see the pull request for more details.
#72

0.1.1

03 Jul 14:29
bc43176
Compare
Choose a tag to compare

Bug fix in JESC datasets.

CC-BY-SA 4.0 version has 1,811,964 lines
MIT version has 348,424 lines
This release is automatically generated.
Please see the pull request for more details.
#67

vanilla-1.0.0

03 Jul 13:58
b7f1561
Compare
Choose a tag to compare

Added:

  • jqac
  • wikipedia ja typo corpus

CC-BY-SA 4.0 version has 2,515,626 lines
MIT version has 348,424 lines
This release is automatically generated.
Please see the pull request for more details.
#69

vanilla-0.1.0

28 Jun 14:07
a885a98
Compare
Choose a tag to compare

CC-BY-SA 4.0 version has 1,811,964 lines
MIT version has 348,424 lines
This release is automatically generated.
Please see the pull request for more details.
#65

0.1.0

30 Apr 15:33
8f1f65d
Compare
Choose a tag to compare

CC-BY-SA 4.0 version has 8,393,726 lines
MIT version has 1,533,982 lines
This release is automatically generated.
Please see the pull request for more details.
#49