ko-nlp · lovit · Nov 11, 2020 · Nov 5, 2020 · Nov 5, 2020 · Nov 5, 2020
diff --git a/en-docs/corpuslist/modu_written.md b/en-docs/corpuslist/modu_written.md
@@ -4,4 +4,49 @@ sort: 19
 
 # Modu: Written
 
-TBD
+Modu: Written is a dataset released by National Institute of Korean Language.
+Data specification is as follows.
+
+- author: National Institute of Korean Language
+- repository: [https://corpus.korean.go.kr](https://corpus.korean.go.kr)
+- references: [document](https://rlkujwkk7.toastcdn.net/NIKL_WRITTEN(v1.0).pdf)
+- size:
+  - train: 20,188 examples
+
+```warning
+Due to the licensing issue of Modu corpus, `Korpora` does not provide any download functions for this corpus. Rather, it only offers a load function.
+If you wish to use this corpus, please complete the authentication process required by the National Institue of Korean Language and manually download the corpus.
+```
+
+You can load the corpus from your Python console as follows.
+
+```python
+from Korpora import Korpora
+corpus = Korpora.load("modu_written")
+```
+
+```warning
+The code assumes that the corpus has already been unzipped into NIKL_WRITTEN directory within `~/Korpora` (`~/Korpora/NIKL_WRITTEN`).
+If the root directory is not `~/Korpora`, add `root_dir=custom_path` argument to the `load` method. 
+```
+
+You can also load the corpus as follows.
+The output of these codes is identical to that of previous codes.
+
+```python
+from Korpora import ModuWrittenKorpus
+corpus = ModuWrittenKorpus()
+```
+
+```warning
+The codes assumes that the corpus has already been unzipped into `~/Korpora/NIKL_WRITTEN` within the current user's local root.
+If the corpus exists in another directory, add `root_dir_or_paths=custom_path` argument in `ModuWrittenKorpus` class declaration.
+```
+
+If you use either one of these previous examples, you can load the corpus into the variable `corpus`.
+`train` refers to the training dataset of the corpus, and you can check its first training instance as follows.
+
+```
+>>> corpus.train[0]
+01범보다 무서운 곶감
+```
diff --git a/en-docs/corpuslist/namuwikitext.md b/en-docs/corpuslist/namuwikitext.md
@@ -4,4 +4,115 @@ sort: 8
 
 # NamuWikiText
 
-TBD
+NamuWikiText is a dataset released by lovit@github. It provides Namu Wikipedia in a text format.
+Data specification is as follows.
+
+- author: lovit@github
+- repository: [https://github.com/lovit/namuwikitext](https://github.com/lovit/namuwikitext)
+- size:
+  - train: 31,235,096 lines (500,104 docs, 4.6G)
+  - dev: 153,605 lines (2,525 docs, 23M)
+  - test: 160,233 lines (2,527 docs, 24M)
+
+Data structure is as follows:
+
+|Attributes|Property|
+|---|---|
+|text|a body of a section|
+|pair|a title of a section|
+
+
+## 1. Using in Python
+
+You can download and load the corpus after executing your Python console.
+
+### Downloading the corpus
+
+You can download NamuWikiText corpus into your local directory with the following Python codes.
+
+```python
+from Korpora import Korpora
+Korpora.fetch("namuwikitext")
+```
+
+```note
+By default, the corpus is downloaded to a Korpora directory within the user's root directory (`~/Korpora`). If you wish to download the corpus to another directory,
+add `root_dir=custom_path` argument to the fetch method.
+```
+
+```tip
+When the fetch method is executed with `force_download=True` argument, it ignores the existing corpus in the local directory and re-downloads the corpus. The default value of `force_download` is `False`.
+```
+
+
+### Loading the corpus
+
+You can load NamuWikiText corpus from your Python console with the following codes.
+If the corpus does not exist in the local directory, it is also downloaded as well.
+
+```python
+from Korpora import Korpora
+corpus = Korpora.load("namuwikitext")
+```
+
+You can also load the corpus as follows.
+The output of these codes is identical to that of previous codes.
+
+```python
+from Korpora import NamuwikiTextKorpus
+corpus = NamuwikiTextKorpus()
+```
+
+If you use either one of these previous examples, you can load the corpus into the variable `corpus`.
+`train` refers to the training dataset of the corpus, and you can check its first training instance as follows.
+
+```
+>>> corpus.train[0]
+SentencePair(text='상위 문서: 아스날 FC\n2009-10 시즌 2011-12 시즌\n2010 -11 시즌...', pair=' = 아스날 FC/2010-11 시즌 =')
+>>> corpus.train[0].text
+상위 문서: 아스날 FC\n2009-10 시즌 2011-12 시즌\n2010 -11 시즌...
+>>> corpus.train[0].pair
+= 아스날 FC/2010-11 시즌 =
+```
+
+`dev` and `test` refer to the validation and test datasets of the corpus, respectively. Each of their first instance can be accessed as follows.
+
+```
+>>> corpus.dev[0]
+SentencePair(text='상위 항목: 축구 관련 인물, 외국인 선수/역대 프로축구\n...', pair=' = 소말리아(축구선수) =')
+>>> corpus.test[0]
+SentencePair(text='', pair=' = 덴덴타운 =')
+```
+
+By executing the `get_all_texts` method, you can access all texts (bodies of sections) within the corpus.
+
+```
+>>> corpus.get_all_texts()
+['상위 문서: 아스날 FC\n2009-10 시즌 2011-12 시즌\n2010 -11 시즌...', ... ]
+```
+
+By executing the `get_all_pairs` method, you can access all pairs (titles of sections) within the corpus.
+
+```
+>>> corpus.get_all_pairs()
+['= 아스날 FC/2010-11 시즌 =', ... ]
+```
+
+
+## 2. Using in a terminal
+
+You can directly download the corpus without executing Python console.
+To do so, use the following command.
+
+```bash
+korpora fetch --corpus namuwikitext
+```
+
+```note
+By default, the corpus is downloaded to a Korpora directory within the user's root directory (`~/Korpora`). If you wish to download the corpus to another directory,
+add `--root_dir custom_path` argument to the fetch command.
+```
+
+```tip
+If you add `--force_download` argument when executing the fetch command in the terminal, it ignores the existing corpus in the local directory and re-downloads the corpus.
+```
diff --git a/en-docs/corpuslist/naver_changwon_ner.md b/en-docs/corpuslist/naver_changwon_ner.md
@@ -4,4 +4,116 @@ sort: 9
 
 # NAVER x Changwon NER
 
-TBD
+NAVER x Changwon NER is a dataset released by lovit@github. It provides the Korean Wikipedia in a text format.
+Data specification is as follows.
+
+- author: Naver + Changwon National University
+- repository: [https://github.com/naver/nlp-challenge/tree/master/missions/ner](https://github.com/naver/nlp-challenge/tree/master/missions/ner)
+- reference: [http://air.changwon.ac.kr/?page_id=10](http://air.changwon.ac.kr/?page_id=10)
+- size:
+  - train: 90,000 examples
+
+Data structure is as follows:
+
+|Attributes|Property|
+|---|---|
+|text|a string of space delimited words|
+|words|a word sequence|
+|tags|a sequence of entity tags of words|
+
+
+## 1. Using in Python
+
+You can download and load the corpus after executing your Python console.
+
+### Downloading the corpus
+
+You can download NAVER x Changwon NER corpus into your local directory with the following Python codes.
+
+```python
+from Korpora import Korpora
+Korpora.fetch("naver_changwon_ner")
+```
+
+```note
+By default, the corpus is downloaded to a Korpora directory within the user's root directory (`~/Korpora`). If you wish to download the corpus to another directory,
+add `root_dir=custom_path` argument to the fetch method.
+```
+
+```tip
+When the fetch method is executed with `force_download=True` argument, it ignores the existing corpus in the local directory and re-downloads the corpus. The default value of `force_download` is `False`.
+```
+
+
+### Loading the corpus
+
+You can load NAVER x Changwon NER corpus from your Python console with the following codes.
+If the corpus does not exist in the local directory, it is also downloaded as well.
+
+```python
+from Korpora import Korpora
+corpus = Korpora.load("naver_changwon_ner")
+```
+
+You can also load the corpus as follows.
+The output of these codes is identical to that of previous codes.
+
+```python
+from Korpora import NaverChangwonNERKorpus
+corpus = NaverChangwonNERKorpus()
+```
+
+If you use either one of these previous examples, you can load the corpus into the variable `corpus`.
+`train` refers to the training dataset of NAVER x Changwon NER corpus, and you can check its first training instance as follows.
+
+```
+>>> corpus.train[0]
+WordTag(text='비토리오 양일 만에 영사관 감호 용퇴, 항룡 압력설 의심만 가율 ', words=['비토리오', '양일', '만에', '영사관', '감호', '용퇴,', '항룡', '압력설', '의심만', '가율'], tags=['PER_B', 'DAT_B', '-', 'ORG_B', 'CVL_B', '-', '-', '-', '-', '-'])
+>>> corpus.train[0].text
+비토리오 양일 만에 영사관 감호 용퇴, 항룡 압력설 의심만 가율 
+>>> corpus.train[0].words
+['비토리오', '양일', '만에', '영사관', '감호', '용퇴,', '항룡', '압력설', '의심만', '가율']
+>>> corpus.train[0].tags
+['PER_B', 'DAT_B', '-', 'ORG_B', 'CVL_B', '-', '-', '-', '-', '-']
+```
+
+By executing the `get_all_words` method, you can access all words (word sequences) within NAVER x Changwon NER corpus.
+
+```
+>>> corpus.get_all_words()
+[['비토리오', '양일', '만에', '영사관', '감호', '용퇴,', '항룡', '압력설', '의심만', '가율'], ... ]
+```
+
+By executing the `get_all_tags` method, you can access all tags (a sequence of entity tags of words) within the corpus.
+
+```
+>>> corpus.get_all_tags()
+[['PER_B', 'DAT_B', '-', 'ORG_B', 'CVL_B', '-', '-', '-', '-', '-'], ... ]
+```
+
+By executing the `get_all_texts` method, you can access all texts (a string of space delimited words) within the corpus.
+
+```
+>>> corpus.get_all_texts()
+['비토리오 양일 만에 영사관 감호 용퇴, 항룡 압력설 의심만 가율 ', ... ]
+```
+
+
+
+## 2. Using in a terminal
+
+You can directly download the corpus without executing Python console.
+To do so, use the following command.
+
+```bash
+korpora fetch --corpus naver_changwon_ner
+```
+
+```note
+By default, the corpus is downloaded to a Korpora directory within the user's root directory (`~/Korpora`). If you wish to download the corpus to another directory,
+add `--root_dir custom_path` argument to the fetch command.
+```
+
+```tip
+If you add `--force_download` argument when executing the fetch command in the terminal, it ignores the existing corpus in the local directory and re-downloads the corpus.
+```