Skip to content

教育部臺灣閩南語字詞頻調查工作資料轉換工具

License

Notifications You must be signed in to change notification settings

sih4sing5hong5/KIPsupin_doc2yaml

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

教育部臺灣閩南語字詞頻調查工作資料轉換工具

Build Status Coverage Status

參考允言老師主機ip093的程式/home/luibenghan/src/kip/ke-si/doc2db使用wvWare來轉doc

sudo apt-get install -y wv g++ libxml2-dev libxslt1-dev python3-dev
virtualenv --python python3 venv
source venv/bin/activate
pip install --upgrade pip
pip install KIPsupin_doc2yaml
轉換doc到json <doc的資料夾> <json的資料夾>

相關格式專案

轉換了json格式

  • 頭前的出版年文類、…,看語料才知影有抑無
  • 一定有資料
  • 資料內底一定有,無一定有作者文類出版年、…。看語料有照逐筆資料提供無
{
  "出版年": "2007",
  "文類": "報導文學",
  "書名": "臺灣閩南語朗讀文章選輯",
  "書寫系統": "漢羅",
  "資料": [
    {
      "作者": "林文平",
      "段": [
        [
          "漢字",
          "白話字"
        ],
        [
          "漢字",
          "白話字"
        ],
        ]
      ],
      "篇名": "芎蕉王國──旗山"
    },
    {
      "作者": "江榮慶",
      "段": [
        [
          "漢字",
          "白話字"
        ],
        [
          "漢字",
          "白話字"
        ],
        ],
      "篇名": "毋免放尿的囡仔"
    },
  ]
}

開發

sudo apt-get install -y wv g++ libxml2-dev libxslt1-dev python3-dev
virtualenv --python python3 venv
source venv/bin/activate
pip install --upgrade pip
pip install beautifulsoup4 lxml
python -m unittest

About

教育部臺灣閩南語字詞頻調查工作資料轉換工具

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages