-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor into micro-libraries #131
Comments
Did a first try with wtf_fetch as NPM repository. Hope that was OK for you. https://www.npmjs.com/package/wtf_fetch |
oh hey, sorry for the delay. and of course - go nuts. make all the stuff you want! ;) |
wtf_fetch
, wtf_parse
, wtf_output
- chaining in wtf_wikipedia
I am working in parallel on the Wiki2Reveal the current implementation of
The |
hey Engelbert, yes this is correct. Order is lost at the Section level. The initial goal of this library was getting data out of wikipedia, and into a database. I'm not sure AST representations are in the scope. Preserving chronology, and treating paragraphs as first-class objects are in the long-term plan. Both will be tremendous tasks. Lots of QA, testing, and hardening of the library to do before then. Template-parsing, in particular is really rough, and the infobox/table parsers have a lot of repeated code. That's the current focus, for the time-being. |
The benefit of |
You can remove the label |
yeah, i think the best way to go forward with the AST is for you to create a library
that's what I'd do. There will be cases where this is wrong, but I should get around to doing this proper order stuff somehow, under a similar api structure as the current setup. how does that sound? |
Good idea, thank you for that. I thought about Parsoid doing CONCLUSIONS:
|
Added Conclusion to Wiki |
Hi Spencer,
you explained to me, how the integration of "promises" led to the broken build mechanism on MacOSX. When I tried to find a solution for that, I thought it might be an option to split
wtf_wikipedia
into the following 3 repositories:wtf_fetch
, that fetches the wiki source from Wikipedia, Wikiversity, .... (MediaWiki domain) with the parameters language (e.g.en
,de
,.. ) and domain (e.g.wikipedia
,wikiversity
,wikivoyage
, ...)wtf_parse
, that parses wiki source into aDocument
object (Abstract Syntax Tree)wtf_output
, that generates/renders the output for a specific format from a givenDocument
object.wtf_wikipedia
will integrate all 3 submodules. At leastwtf_parse
andwtf_output
may still support the build process on MacOSX. Furthermore it improves maintainance, reusablility of submodules and it separates thetasks
in the recommended submoduleswtf_fetch
,wtf_parse
,wtf_output
fromchaining
the tasks here inwtf_wikipedia
. Citation management would be a submodulewtf_citation
that would be chained here. You modular structure insrc/
can be preserved and will mainly replace a localrequire
withinsrc/
by a require of the recommended submodules fromnpm
.This could be documented in the
README.md
as developer recommendation and helps developers to understand the way forward and how they could add newwtf_modules
in the chaining process. In this sensewtf_wikipedia
will become the chain managment module ofwtf_submodules
.Hope that makes sense to you and will attract more developers to support your work. Thank you for all the contributions to the OpenSource community for handling
MediaWiki
content.The text was updated successfully, but these errors were encountered: