Skip to content

qishe-nlp/subtitle-analyzer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Installation from pip3

pip3 install --verbose subtitle_analyzer
python -m spacy download en_core_web_trf
python -m spacy download es_dep_news_trf
python -m spacy download de_dep_news_trf

Usage

Please refer to api docs.

Excutable usage

Since the x2cdict needs environment variables DICT_DB_HOST and DEEPL_AUTH_KEY, so Don't forget!!!.

  • Write ass file with vocabulary information
sta_vocab --srtfile movie.srt --lang en --assfile en_vocab.ass --external False
  • Write ass file with phrase information
sta_phrase --srtfile movie.srt --lang en --assfile en_phrase.ass --external False

Package usage

from subtitlecore import Subtitle
from subtitle_analyzer import VocabAnalyzer, PhraseAnalyzer
from subtitle_analyzer import VocabASSWriter, PhraseASSWriter
import json

def subtitle_vocab(srtfile, lang, assfile, external):

  phase = {"step": 1, "msg": "Start sentenizing"}
  print(json.dumps(phase), flush=True)

  sf = Subtitle(srtfile, lang)
  sens = sf.sentenize()
  for e in sens:
    print(e)

  phase = {"step": 2, "msg": "Finish sentenizing"}
  print(json.dumps(phase), flush=True)

  analyzer = VocabAnalyzer(lang)
  exs = analyzer.get_line_vocabs(sens, external)
  shown = exs[:20]

  phase = {"step": 3, "msg": "Finish vocabs dictionary lookup", "vocabs": shown}
  print(json.dumps(phase), flush=True)

  if assfile:
    ass_writer = VocabASSWriter(srtfile)
    ass_writer.write(exs, assfile, {"animation": False})
    
    phase = {"step": 4, "msg": "Finish ass saving"} 
    print(json.dumps(phase), flush=True)

def subtitle_phrase(srtfile, lang, assfile, external):

  phase = {"step": 1, "msg": "Start sentenizing"}
  print(json.dumps(phase), flush=True)

  sf = Subtitle(srtfile, lang)
  sens = sf.sentenize()
  for e in sens:
    print(e)

  phase = {"step": 2, "msg": "Finish sentenizing"}
  print(json.dumps(phase), flush=True)

  analyzer = PhraseAnalyzer(lang)
  exs = analyzer.get_line_phrases(sens, external)

  phase = {"step": 3, "msg": "Finish phrases dictionary lookup", "vocabs": exs[:10]}
  print(json.dumps(phase), flush=True)

  if assfile:
    ass_writer = PhraseASSWriter(srtfile)
    ass_writer.write(exs, assfile, {"animation": False})
    
    phase = {"step": 4, "msg": "Finish ass saving"} 
    print(json.dumps(phase), flush=True)

Development

Clone project

git clone https://github.com/qishe-nlp/subtitle-analyzer.git

Install poetry

Install dependencies

poetry update

Test

poetry run pytest -rP

which run tests under tests/*

Execute

poetry run sta_vocab --help
poetry run sta_phrase --help

Create sphinx docs

poetry shell
cd apidocs
sphinx-apidoc -f -o source ../subtitle_analyzer
make html
python -m http.server -d build/html

Host docs on github pages

cp -rf apidocs/build/html/* docs/

Build

  • Change version in pyproject.toml and subtitle_analyzer/__init__.py
  • Build python package by poetry build

Git commit and push

Publish from local dev env

  • Set pypi test environment variables in poetry, refer to poetry doc
  • Publish to pypi test by poetry publish -r test

Publish through CI

git tag [x.x.x]
git push origin master

About

Analyze subtitle into vocabularies, phrases, sentence structures.

Resources

Stars

Watchers

Forks

Packages

No packages published