Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parser Update #55

Closed
wants to merge 226 commits into from
Closed

Parser Update #55

wants to merge 226 commits into from

Conversation

mpevner
Copy link
Contributor

@mpevner mpevner commented Feb 19, 2013

This is a nigh complete rewrite of the parser to do proper multi-element support while also being substantially easier to work with as well as being more pythonic.
It does have run-time overhead in that the parse(catalog) consumes ~1gb ram on a 250mb RDF file, but a reasonable computer should process this quickly and release quickly.
This should hopefully resolve Issue #20 while it's at it.

# get the function out of the lookup_table that matches 'tag'
#func = new_book.lookup_table[tag]
# call the function on the child element
#func(child)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete my example code that you commented out

statics/fixes is_bag
strips comments, etc
forgot to set pgcat data (type elements)
This fixes the issue of bags in two ways:
one, I was passing the 'root' element and not the iterated item
two, I generated a static function leaf_element -- this returns the outermost 'first' element, so be careful if your branch splits on the way and you care
@mpevner mpevner closed this Feb 20, 2013
@mpevner mpevner reopened this Feb 20, 2013
@mpevner
Copy link
Contributor Author

mpevner commented Feb 20, 2013

oops, wrong button. traversal issue also fixed.

This implements __setitem__ and __getitem__ for an Ebook item, while also dumping all the set_ methods.  It looks a little kludgier, but it makes it much more flexible
@mpevner
Copy link
Contributor Author

mpevner commented Feb 21, 2013

Ok, so this parser is
a) not yet a drop in replacement
b) has some minor(?) differences
concerning A, to make it a drop-in would require replicating the Gutenberg class inasmuch as GITenberg.py cares, this is not an issue
concerning B, it ignores file info right now, which is potentially important, given that the original culls books that have no associated filenames. This can be replicated, and even extended on, as a proper File class can be created, with links to/from files and ebooks, at which point you'd cull any book with no linked files.

headed to different workstation
this Should be able to drop-in replace RDFparse.py now.  It generates the pickle differently, so here be dragons.
@mpevner
Copy link
Contributor Author

mpevner commented Feb 21, 2013

Updated it to make it fit into GITenberg.py so it can just drop right in. In theory.

mpevner and others added 14 commits February 21, 2013 12:53
getitem now returns NoneType for items the object doesn't have
This does not yet set mdate/filename for a book though, but it comes close.
Still doesnt set ebook mdate/filename data
Added in book culling, so this should now operate as a true drop-in replacement of rdfparse.py
unsure how necessary this is now, but its in the original, and is EASILY removed.
Add navigational support to help orient newcomers
Fix link to web site in contributing template
@mpevner
Copy link
Contributor Author

mpevner commented Aug 20, 2014

Closing due to obsolescence.

@mpevner mpevner closed this Aug 20, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants