Skip to content

Latest commit

 

History

History
85 lines (55 loc) · 2.13 KB

changelog.md

File metadata and controls

85 lines (55 loc) · 2.13 KB

5.6.0

  • update deps

5.5.0

  • [fix] - allow setting mongodb url #92 (Thanks Vid!)
  • update deps

5.4.0

  • [fix] - for encoding in references #89
  • [new] - support for custom urls #88
  • update deps

5.1.0

  • update to wtf_wikipedia 7.2.9
  • [fix] for doc.title() in custom parser
  • update deps

v5

  • more consistent template json, via wtf_wikipedia@7
  • removal of empty [] results in Section.
  • fs fixes for node > 9

v4

  • major json format changes from wtf_wikipedia v6.0.0
  • get skip_redirects actually working
  • reduce default batch_size even lower
  • add verbose_skip option, to log disambig/redirect skipping

3.6.0

  • ⚠️ remove .infoboxes and .citations from top-level result. this is duplicate data. find them both in section[i].templates
  • improve handling of redirect pages
  • refactor encoding logic

3.4.2

  • update deps, wtf library improvements
  • relicense as MIT
  • use latest mongo api

3.3.0

  • bugfix for runtime parsing error

v3.2.0

v3.1.0

  • fix connection time-outs & improve logging output
  • change default collection name to pages
  • add .custom() function support

v3

  • MASSIVE SPEEDUP! full re-write by @devrim 🙏 to fix #59
  • rename from wikipedia-to-mongo to dumpster-dive
  • use wtf_wikipedia v3 (a big re-factor too!)
  • use line-by-line, and worker-nodes to run parsing in parallel

v2.4.0

  • add a 3s 'break' to avoid build-up of mongo inserts
  • add new --verbose and --skip_first options

v2.3.0

  • add try/catch
  • supoprt --skip_redirects && --skip_disambig

v2

  • updates to use wtf_wikipedia@2.0.0 - a major result-format change

  • renames bin cmd to wiki2mongo

  • supports use from cli, or use via javascript require()

  • support --plaintext flag