diff --git a/README.md b/README.md index 6ab997d..619b798 100644 --- a/README.md +++ b/README.md @@ -1,14 +1,16 @@ # Slaw [![Build Status](https://travis-ci.org/longhotsummer/slaw.svg)](http://travis-ci.org/longhotsummer/slaw) -Slaw is a lightweight library for generating and rendering Akoma Ntoso 2.0 Act XML from plain text and PDF documents. -It is used to power [openbylaws.org.za](http://openbylaws.org.za) and [steno.openbylaws.org.za](http://steno.openbylaws.org.za) -and uses grammars developed for South African acts and by-laws. +Slaw is a lightweight library for generating Akoma Ntoso 2.0 Act XML from plain text and PDF documents. +It is used to power [Indigo](https://github.com/OpenUpSA/indigo) and uses grammars developed for the legal +traditions in these countries: + +* South Africa +* Poland Slaw allows you to: -1. extract plain text from PDFs and clean up that text -2. parse plain text and transform it into an Akoma Ntoso Act XML document -3. render the XML document into HTML +1. parse plain text and transform it into an Akoma Ntoso Act XML document +2. unparse Akoma Ntoso XML into a plain-text format suitable for re-parsing Slaw is lightweight because it wraps around a Nokogiri XML representation of the parsed document. It provides some support methods for manipulating these @@ -40,7 +42,7 @@ installed by default on most systems (including Mac). On Ubuntu you can use: The simplest way to use Slaw is via the commandline: - $ slaw parse myfile.pdf + $ slaw parse myfile.pdf --grammar za ## Overview @@ -61,152 +63,15 @@ formats. The grammar cannot catch some subtleties of an act or by-law -- such as nested list numbering -- so Slaw performs some post-processing on the XML produced by the parser. In particular, -it nests lists correctly and looks for specially defined terms and their occurrences in the document. - -## Quick Start - -Install the gem using - - gem install slaw - -Extract text from a PDF and parse it as a South African by-law: - -```ruby -require 'slaw' - -# extract text from a PDF file and clean it up -extractor = Slaw::Extract::Extractor.new -text = extractor.extract_from_pdf('/path/to/file.pdf') - -# parse the text into a XML and -generator = Slaw::ActGenerator.new -bylaw = generator.generate_from_text(text) -puts bylaw.to_xml(indent: 2) - -# render the by-law as HTML, using / as the root -# for relative URLs -renderer = Slaw::Render::HTMLRenderer.new -puts renderer.render(bylaw.doc, '/') -``` - -## Extraction - -Extraction is done by the `Slaw::Extract::Extractor` class. It currently handles -PDF and plain text files. Slaw uses `pdftotext` from the `xpdf` package to extract -the plain text from PDFs. PDFs are great for presentation, but suck for accurately storing -text. As a result, the extraction can produce oddities, such as lines broken in weird -places (or not broken when they should be). Slaw gets around this by running -some cleanup routines on the extracted text. - -For example, it knows that these lines: - - (b) any wall, swimming pool, reservoir or bridge - or any other structure connected therewith; (c) any fuel pump or any - tank used in connection therewith - -should probably be broken at the section numbers: - - (b) any wall, swimming pool, reservoir or bridge or any other structure connected therewith; - (c) any fuel pump or any tank used in connection therewith - -If your region's numbering format differs significantly from this, these rules might not work. - -Some other steps Slaw takes after extraction include (check `Slaw::Parse::Cleanser` for the full set): - -* changing newlines to `\n`, and normalising quotation characters -* removing page numbers and other boilerplate -* stripping the table of contents (we can generate our own from the parsed document) -* changing tabs to spaces, stripping leading and trailing spaces and removing blank lines +it nests lists correctly. ## Parsing Slaw uses Treetop to compile a grammar into a backtracking parser. The parser builds a parse -tree, each node of which knows how to serialize itself in XML format. - -While most South African by-laws are superficially very similar, there are a sufficient differences -in their typesetting to make parsing them difficult. The grammar handles most -edge cases but may not catch them all. The one thing it cannot yet detect well is the difference -between section titles before and after a section number: - - 1. Definitions - In this by-law, the following words ... - - Definitions - 1. In this by-law, the following words ... +tree, the nodes of which know how to serialize themselves in XML format. -This must be set by the user before parsing: - -```ruby -generator = Slaw::ZA::BylawGenerator.new -generator.parser.options = {section_number_after_title: true} -``` - -The parser does its best not to choke on input it doesn't understand, preferring a best effort -to a completely accurate result. For example it may not be able to work out a section heading -and so will treat it as simply another statement in the previous section. This causes the parser -to use a lot of backtracking and negative lookahead assertions, which can be slow for large documents. - -The grammar supports a number of subsection numbering formats, which are often mixed -in a document to indicate different levels of nesting. - - (a) - (2) - (3b) - (ii) - 3.4 - -During post-processing it works out how to nest these appropriately. - -Special words, such as ``part`` and ``chapter`` are ignored if the line starts with a backslash ``\``. - -For more information see the South African by-law grammar at -[lib/slaw/za/bylaw.treetop](lib/slaw/za/bylaw.treetop) and the list nesting -at [lib/slaw/parse/blocklists.rb](lib/slaw/parse/blocklists.rb). - -## Rendering - -Slaw renders XML to HTML using XSLT. For the most part there is a direct mapping between -Akoma Ntoso structure and the HTML layout, so most AN nodes are simply mapped to `div` or `span` -elements with a class attribute derived from the name of the AN element and an ID element taken -from the node, if any. This makes it both fast and flexible, since it's easy to -apply layout rules with CSS. - -Slaw can render either an entire document like this, or just a portion of the XML tree. - -```ruby -# render an entire document -renderer = Slaw::Render::HTMLRenderer.new -puts renderer.render(bylaw.doc, '/') - -# render the first section only -puts renderer.render(bylaw.sections[0], '/') -``` - -For more information, see [/lib/slaw/render/html.rb](/lib/slaw/render/html.rb). - -## Meta-data - -Acts and by-laws have metadata which it is not possible to get from their plain text representations, -such as their title, date and format of publication or act number. Slaw provides some helpers -for manipulating this meta-data. For example, - -```ruby -bylaw = Slaw::ByLaw.new('spec/fixtures/community-fire-safety.xml') -print bylaw.id_uri -bylaw.title = 'A new title' -bylaw.name = 'a-new-title' -bylaw.published!(date: '2014-09-28') -print bylaw.id_uri -``` - -## Schedules - -South African acts and by-laws can have addendums called schedules. They are technically a part of -the act but are not part of the primary body and have more relaxed formatting. Slaw finds schedules -by looking for section headings, but makes no effort to capture the format of their contents. - -Akoma Ntoso has no explicit support for schedules. Instead, Slaw stores all schedules under a single -Akoma Ntoso `component` elements at the end of the XML document, with a name of `schedules`. +Supporting formats from other country's legal traditions probably requires creating a new grammar +and parser. ## Contributing @@ -218,6 +83,15 @@ Akoma Ntoso `component` elements at the end of the XML document, with a name of ## Changelog +### 1.0.0 + +* Improved support for other legal traditions / grammars. +* Add Polish legal tradition grammar. +* Slaw no longer does too much introspection of a parsed document, since that can be so tradition-dependent. +* Move reformatting out of Slaw since it's tradition-dependent. +* Remove definition linking, Slaw no longer supports it. +* Remove unused code for interacting with the internals of acts. + ### 0.17.2 * Match defined terms in 'definition' section. diff --git a/bin/slaw b/bin/slaw index aefa03e..d896684 100755 --- a/bin/slaw +++ b/bin/slaw @@ -17,19 +17,14 @@ class SlawCLI < Thor desc "parse FILE", "Parse FILE into Akoma Ntoso XML" option :input, enum: ['text', 'pdf'], desc: "Type of input if it can't be determined automatically" option :pdftotext, desc: "Location of the pdftotext binary if not in PATH" - option :definitions, type: :boolean, desc: "Find and link definitions (this can be slow). Default: false" option :fragment, type: :string, desc: "Akoma Ntoso element name that the imported text represents. Support depends on the grammar." option :id_prefix, type: :string, desc: "Prefix to be used when generating ID elements when parsing a fragment." option :section_number_position, enum: ['before-title', 'after-title', 'guess'], desc: "Where do section titles come in relation to the section number? Default: before-title" - option :reformat, type: :boolean, desc: "Reformat common formatting issues to make grammar matching better. Default: true for PDF files, false otherwise" option :crop, type: :string, desc: "Crop box for PDF files, as 'left,top,width,height'." + option :grammar, type: :string, desc: "Grammar name (usually a two-letter country code). Default is za." def parse(name) logging - if options[:fragment] and options[:definitions] - raise Thor::Error.new("--definitions can't be used together with --fragment") - end - Slaw::Extract::Extractor.pdftotext_path = options[:pdftotext] if options[:pdftotext] extractor = Slaw::Extract::Extractor.new @@ -43,16 +38,13 @@ class SlawCLI < Thor case options[:input] when 'pdf' text = extractor.extract_from_pdf(name) - options[:reformat] = true if options[:reformat].nil? when 'text' text = extractor.extract_from_text(name) else text = extractor.extract_from_file(name) end - generator = Slaw::ActGenerator.new - - text = generator.reformat(text) if options[:reformat] + generator = Slaw::ActGenerator.new(options[:grammar] || 'za') if options[:fragment] generator.document_class = Slaw::Fragment @@ -94,25 +86,13 @@ class SlawCLI < Thor exit 1 end - # definitions? - generator.builder.link_definitions(act.doc) if options[:definitions] - puts act.to_xml(indent: 2) end - desc "link-definitions FILE", "Find and link defined terms in FILE" - def link_definitions(name) - builder = Slaw::ActGenerator.new.builder - - doc = File.open(name, 'r') { |f| doc = builder.parse_xml(f.read) } - builder.link_definitions(doc) - - puts builder.to_xml(doc) - end - desc "unparse FILE", "Unparse FILE from Akoma Ntoso XML back into text suitable for re-parsing" + option :grammar, type: :string, desc: "Grammar name (usually a two-letter country code). Default is za." def unparse(name) - generator = Slaw::ActGenerator.new + generator = Slaw::ActGenerator.new(options[:grammar] || 'za') doc = File.open(name, 'r') { |f| doc = generator.builder.parse_xml(f.read) } puts generator.text_from_act(doc) diff --git a/lib/slaw.rb b/lib/slaw.rb index 39ff64e..30c1d36 100644 --- a/lib/slaw.rb +++ b/lib/slaw.rb @@ -4,14 +4,8 @@ require 'slaw/namespace' require 'slaw/logging' -require 'slaw/act' -require 'slaw/bylaw' -require 'slaw/collection' - require 'slaw/xml_support' -require 'slaw/lifecycle_event' -require 'slaw/render/html' require 'slaw/parse/blocklists' require 'slaw/parse/builder' require 'slaw/parse/cleanser' diff --git a/lib/slaw/act.rb b/lib/slaw/act.rb deleted file mode 100644 index 0997cc7..0000000 --- a/lib/slaw/act.rb +++ /dev/null @@ -1,452 +0,0 @@ -module Slaw - class AknBase - include Slaw::Namespace - - attr_accessor :doc - - # Serialise the XML for this act, passing `args` to the Nokogiri serialiser. - # The most useful argument is usually `indent: 2` if you like your XML perdy. - # - # @return [String] serialized XML - def to_xml(*args) - @doc.to_xml(*args) - end - - # Parse the XML contained in the file-like or String object `io` - # - # @param io [String, file-like] io object or String with XML - def parse(io) - self.doc = Nokogiri::XML(io) - end - end - - # A fragment is a part of a larger document and doesn't have the context associated - # with the document. - class Fragment < AknBase - alias_method :fragment, :doc - end - - # An Act wraps a single {http://www.akomantoso.org/ AkomaNtoso 2.0 XML} act document in the form of a - # Nokogiri::XML::Document object. - # - # The Act object provides quick access to certain sections of the document, - # such as the metadata and the body, as well as common operations such as - # identifying whether it has been amended ({#amended?}), repealed - # ({#repealed?}) or what chapters ({#chapters}), parts ({#parts}) and - # sections ({#sections}) it contains. - class Act < AknBase - - # Allow us to jump from the XML document for an act to the - # Act instance itself - @@acts = {} - - # [Nokogiri::XML::Document] The underlying {Nokogiri::XML::Document} instance - attr_accessor :doc - - # [Nokogiri::XML::Node] The `meta` XML node - attr_reader :meta - - # [Nokogiri::XML::Node] The `body` XML node - attr_reader :body - - # [String] The year this act was published - attr_reader :year - - # [String] The act number in the year this act was published - attr_reader :num - - # [String] The FRBR URI of this act, which uniquely identifies it globally - attr_reader :id_uri - - # [String, nil] The source filename, or nil - attr_reader :filename - - # [Time, nil] The mtime of when the source file was last modified - attr_reader :mtime - - # [String] The underlying nature of this act, usually `act` although subclasses my override this. - attr_reader :nature - - # [Nokogiri::XML::Schema] schema to validate against - attr_accessor :schema - - # Get the act that wraps the document that owns this XML node - # @param node [Nokogiri::XML::Node] - # @return [Act] owning act - def self.for_node(node) - @@acts[node.document] - end - - # Create a new instance, loading from `filename` if given. - # @param filename [String] filename to load XML from - def initialize(filename=nil) - self.load(filename) if filename - @schema = nil - end - - # Load the XML in `filename` into this instance - # @param filename [String] filename - def load(filename) - @filename = filename - @mtime = File::mtime(@filename) - - File.open(filename) { |f| parse(f) } - end - - # Set the XML document backing this bylaw. - # - # @param doc [Nokogiri::XML::Document] document - def doc=(doc) - @doc = doc - @meta = @doc.at_xpath('/a:akomaNtoso/a:act/a:meta', a: NS) - @body = @doc.at_xpath('/a:akomaNtoso/a:act/a:body', a: NS) - - @@acts[@doc] = self - - extract_id_uri - end - - # Directly set the FRBR URI of this act. This must be a well-formed URI, - # such as `/za/act/2002/2`. This will, in turn, update the {#year}, {#nature}, - # {#country} and {#num} attributes. - # - # You probably don't want to use this method. Instead, set each component - # (such as {#date}) manually. - # - # @param uri [String] new URI - def id_uri=(uri) - for component, xpath in [['main', '//a:act/a:meta/a:identification'], - ['schedules', '//a:component/a:doc/a:meta/a:identification']] do - ident = @doc.at_xpath(xpath, a: NS) - next if not ident - - # work - ident.at_xpath('a:FRBRWork/a:FRBRthis', a: NS)['value'] = "#{uri}/#{component}" - ident.at_xpath('a:FRBRWork/a:FRBRuri', a: NS)['value'] = uri - - # expression - ident.at_xpath('a:FRBRExpression/a:FRBRthis', a: NS)['value'] = "#{uri}/#{component}/eng@" - ident.at_xpath('a:FRBRExpression/a:FRBRuri', a: NS)['value'] = "#{uri}/eng@" - - # manifestation - ident.at_xpath('a:FRBRManifestation/a:FRBRthis', a: NS)['value'] = "#{uri}/#{component}/eng@" - ident.at_xpath('a:FRBRManifestation/a:FRBRuri', a: NS)['value'] = "#{uri}/eng@" - end - - extract_id_uri - end - - # The date at which this act was first created/promulgated. - # - # @return [String] date, YYYY-MM-DD - def date - node = @meta.at_xpath('./a:identification/a:FRBRWork/a:FRBRdate[@name="Generation"]', a: NS) - node && node['date'] - end - - # Set the date at which this act was first created/promulgated. This is usually the same - # as the publication date but this is not enforced. - # - # This also updates the {#year} of this act, which in turn updates the {#id_uri}. - # - # @param date [String] date, YYYY-MM-DD - def date=(value) - for frbr in ['FRBRWork', 'FRBRExpression'] do - @meta.at_xpath("./a:identification/a:#{frbr}/a:FRBRdate[@name=\"Generation\"]", a: NS)['date'] = value - end - - self.year = value.split('-')[0] - end - - # Set the year for this act. You probably want to call {#date=} instead. - # - # This will also update the {#id_uri} but will not change {#date} at all. - # - # @param year [String, Number] year - def year=(year) - @year = year.to_s - rebuild_id_uri - end - - # An applicable short title for this act, either from the `FRBRalias` element - # or based on the act number and year. - # @return [String] - def title - node = @meta.at_xpath('./a:identification/a:FRBRWork/a:FRBRalias', a: NS) - node ? node['value'] : "Act #{num} of #{year}" - end - - # Change the title of this act. - def title=(value) - node = @meta.at_xpath('./a:identification/a:FRBRWork/a:FRBRalias', a: NS) - unless node - node = @doc.create_element('FRBRalias') - @meta.at_xpath('./a:identification/a:FRBRWork/a:FRBRuri', a: NS).after(node) - end - - node['value'] = value - end - - # Has this act been amended? This is determined by testing the `contains` - # attribute of the `act` root element. - # - # @return [Boolean] - def amended? - @doc.at_xpath('/a:akomaNtoso/a:act', a: NS)['contains'] != 'originalVersion' - end - - # Get a list of {Slaw::LifecycleEvent} objects for amendment events, in date order. - # @return [Array] possibly empty list of lifecycle events - def amendment_events - @meta.xpath('./a:lifecycle/a:eventRef[@type="amendment"]', a: NS).map do |event| - LifecycleEvent.new(event) - end.sort_by { |e| e.date } - end - - # Mark this act as being amended by another act, either `act` - # or the details in `opts`. - # - # It is assumed that there can be only one amendment event on a particular - # date. An existing amendment on this date is overwritten. - # - # @option opts [String] :uri uri of the amending act - # @option opts [String] :title title of the amending act - # @option opts [String] :date date of the amendment (YYYY-MM-DD) - def amended_by!(act, opts={}) - if act - opts[:uri] ||= act.id_uri - opts[:title] ||= act.short_title - opts[:date] ||= act.publication['date'] - end - - date = opts[:date] - source_id = "amendment-#{date}" - - # assume we now hold a single version and not the original version - @doc.at_xpath('/a:akomaNtoso/a:act', a: NS)['contains'] = 'singleVersion' - - # add the lifecycle event - lifecycle = @meta.at_xpath('./a:lifecycle', a: NS) - if not lifecycle - lifecycle = @doc.create_element('lifecycle', source: "#this") - @meta.at_xpath('./a:publication', a: NS).after(lifecycle) - end - - event = lifecycle.at_xpath('./a:eventRef[@date="' + date + '"][@type="amendment"]', a: NS) - if event - # clear up old event - src = @doc.at_css(event['source']) - src.remove if src - else - # new event - event = @doc.create_element('eventRef', type: 'amendment') - lifecycle << event - end - - event['date'] = date - event['id'] = "amendment-event-#{date}" - event['source'] = '#' + source_id - - # add reference - ref = @doc.create_element('passiveRef', - id: source_id, - href: opts[:uri], - showAs: opts[:title]) - - @meta.at_xpath('./a:references/a:TLCTerm', a: NS).before(ref) - end - - # Does this Act have parts? - # @return [Boolean] - def parts? - !parts.empty? - end - - # Top-level parts of this act. Parts inside chapters are ignored. - # @return [Array] part nodes - def parts - @body.xpath('./a:part', a: NS) - end - - # Does this Act have chapters? - # @return [Boolean] - def chapters? - !chapters.empty? - end - - # Top-level chapters of this act. Chapters inside parts are ignored. - # @return [Array] chapter nodes - def chapters - @body.xpath('./a:chapter', a: NS) - end - - # Sections of this act - # @return [Array] section nodes - def sections - @body.xpath('.//a:section', a: NS) - end - - # The primary definitions section of this act, identified by - # either an `id` of `definitions` or the first section with a heading - # of `Definitions`. - # - # @return [Nokogiri::XML::Node, nil] definitions node or nil - def definitions - # try looking for the definition list - defn = @body.at_css('#definitions') - return defn.parent if defn - - # try looking for the heading - defn = @body.at_xpath('.//a:section/a:heading[text() = "Definitions"]', a: NS) - return defn.parent if defn - - nil - end - - # An act can contain schedules, additional (generally free-form) documents - # that are addendums to the the main body. A definition element must be - # part of a separate `component` and have a `doc` element with a name attribute - # of `schedules`. - # - # @return [Nokogiri::XML::Node, nil] schedules document node - def schedules - @doc.at_xpath('/a:akomaNtoso/a:components/a:component/a:doc[@name="schedules"]/a:mainBody', a: NS) - end - - # Get a map from term ids to `[term, defn]` pairs, - # where `term+ is the plain text term and `defn` is - # the {Nokogiri::XML::Node} containing the definition. - # - # @return {String => List(String, Nokogiri::XML::Node)} map from strings to `[term, definition]` pairs - def term_definitions - terms = {} - - @meta.xpath('a:references/a:TLCTerm', a: NS).each do |node| - # - - # find the point with id 'def-term-foo' - defn = @body.at_xpath(".//*[@id='def-#{node['id']}']", a: NS) - next unless defn - - terms[node['id']] = [node['showAs'], defn] - end - - terms - end - - # Returns the publication element, if any. - # - # @return [Nokogiri::XML::Node, nil] - def publication - @meta.at_xpath('./a:publication', a: NS) - end - - # Update the publication details of the act. All elements are optional. - # - # @option details [String] :name name of the publication - # @option details [String] :number publication number - # @option details [String] :date date of publication (YYYY-MM-DD) - def published!(details) - node = @meta.at_xpath('./a:publication', a: NS) - unless node - node = @doc.create_element('publication') - @meta.at_xpath('./a:identification', a: NS).after(node) - end - - node['showAs'] = details[:name] if details.has_key? :name - node['name'] = details[:name] if details.has_key? :name - node['date'] = details[:date] if details.has_key? :date - node['number'] = details[:number] if details.has_key? :number - end - - # Has this by-law been repealed? - # - # @return [Boolean] - def repealed? - !!repealed_on - end - - # The date on which this act was repealed, or nil if never repealed - # - # @return [String] date of repeal or nil - def repealed_on - repeal_el = repeal - repeal_el ? Time.parse(repeal_el['date']) : nil - end - - # The element representing the reference that caused the repeal of this - # act, or nil. - # - # @return [Nokogiri::XML::Node] element of reference to repealing act, or nil - def repealed_by - repeal_el = repeal - return nil unless repeal_el - - source_id = repeal_el['source'].sub(/^#/, '') - @meta.at_xpath("./a:references/a:passiveRef[@id='#{source_id}']", a: NS) - end - - # The XML element representing the event of repeal of this act, or nil - # - # @return [Nokogiri::XML::Node] - def repeal - # - # - # - # - # - @meta.at_xpath('./a:lifecycle/a:eventRef[@type="repeal"]', a: NS) - end - - # The date at which this particular XML manifestation of this document was generated. - # - # @return [String] date, YYYY-MM-DD - def manifestation_date - node = @meta.at_xpath('./a:identification/a:FRBRManifestation/a:FRBRdate[@name="Generation"]', a: NS) - node && node['date'] - end - - # Validate the XML behind this document against the Akoma Ntoso schema and return - # any errors. - # - # @return [Object] array of errors, possibly empty - def validate - @schema ||= Dir.chdir(File.dirname(__FILE__) + "/schemas") { Nokogiri::XML::Schema(File.read('akomantoso20.xsd')) } - @schema.validate(@doc) - end - - # Does this document validate against the schema? - # - # @see {#validate} - def validates? - validate.empty? - end - - def inspect - "<#{self.class.name} @id_uri=\"#{@id_uri}\">" - end - - protected - - # Parse the FRBR Uri into its constituent parts - def extract_id_uri - @id_uri = @meta.at_xpath('./a:identification/a:FRBRWork/a:FRBRuri', a: NS)['value'] - empty, @country, @nature, date, @num = @id_uri.split('/') - - # yyyy-mm-dd - @year = date.split('-', 2)[0] - end - - def build_id_uri - # /za/act/2002/3 - "/#{@country}/#{@nature}/#{@year}/#{@num}" - end - - # This rebuild's the FRBR uri for this document using its constituent components. It will - # update the XML then re-split the URI and grab its components. - def rebuild_id_uri - self.id_uri = build_id_uri - end - end - -end diff --git a/lib/slaw/bylaw.rb b/lib/slaw/bylaw.rb deleted file mode 100644 index 4852caf..0000000 --- a/lib/slaw/bylaw.rb +++ /dev/null @@ -1,62 +0,0 @@ -require 'slaw/act' - -module Slaw - # An extension of {Slaw::Act} which wraps an AkomaNtoso XML document describing an By-Law. - # - # There are minor differences between Acts and By-laws, the most notable being that a by-law - # is not identified by a year and a number, and therefore has a different FRBR uri structure. - class ByLaw < Act - - # [String] The code of the region this by-law applies to - attr_reader :region - - # [String] A short file-like name of this by-law, unique within its year and region - attr_reader :name - - # ByLaws don't have numbers, use their short-name instead - def num - name - end - - def title - node = @meta.at_xpath('./a:identification/a:FRBRWork/a:FRBRalias', a: NS) - title = node ? node['value'] : "(Unknown)" - - if amended? and not title.end_with?("as amended") - title = title + " as amended" - end - - title - end - - # Set the short (file-like) name for this bylaw. This changes the {#id_uri}. - def name=(value) - @name = value - rebuild_id_uri - end - - # Set the region code for this bylaw. This changes the {#id_uri}. - def region=(value) - @region = value - rebuild_id_uri - end - - protected - - def extract_id_uri - # /za/by-law/cape-town/2010/public-parks - - @id_uri = @meta.at_xpath('./a:identification/a:FRBRWork/a:FRBRuri', a: NS)['value'] - empty, @country, @nature, @region, date, @name = @id_uri.split('/') - - # yyyy[-mm-dd] - @year = date.split('-', 2)[0] - end - - def build_id_uri - # /za/by-law/cape-town/2010/public-parks - "/#{@country}/#{@nature}/#{@region}/#{@year}/#{@name}" - end - - end -end diff --git a/lib/slaw/collection.rb b/lib/slaw/collection.rb deleted file mode 100644 index 1cc3c8e..0000000 --- a/lib/slaw/collection.rb +++ /dev/null @@ -1,60 +0,0 @@ -require 'forwardable' - -module Slaw - # A collection of Act instances. - # - # This is useful for looking up acts by their FRBR uri and for - # loading a collection of XML act documents. - # - # This collection is enumerable and can be iterated over. Use {#items} to - # access the underlying array of objects. - # - # @example Load a collection of acts and then iterate over them. - # - # acts = Slaw::DocumentCollection.new - # acts.discover('/path/to/acts/') - # - # for act in acts - # puts act.short_name - # end - # - class DocumentCollection - - include Enumerable - extend Forwardable - - # [Array] The underlying array of acts - attr_accessor :items - - def_delegators :items, :each, :<<, :length - - def initialize(items=nil) - @items = items || [] - end - - # Find all XML files in `path` and add them into this - # collection. - # - # @param path [String] the path to glob for xml files - # @param cls [Class] the class to instantiate for each file - # - # @return [DocumentCollection] this collection - def discover(path, cls=Slaw::Act) - for fname in Dir.glob("#{path}/**/*.xml") - @items << cls.new(fname) - end - - self - end - - # Try to find an act who's FRBRuri matches this one, - # returning nil on failure - # - # @param uri [String] the uri to look for - # - # @return [Act, nil] the act, or nil - def for_uri(uri) - return @items.find { |doc| doc.id_uri == uri } - end - end -end diff --git a/lib/slaw/generator.rb b/lib/slaw/generator.rb index f50e41e..bcd17a1 100644 --- a/lib/slaw/generator.rb +++ b/lib/slaw/generator.rb @@ -1,33 +1,43 @@ module Slaw # Base class for generating Act documents class ActGenerator - Treetop.load(File.dirname(__FILE__) + "/za/act.treetop") - # [Treetop::Runtime::CompiledParser] compiled parser attr_accessor :parser # [Slaw::Parse::Builder] builder used by the generator attr_accessor :builder - # The type that will hold the generated document - attr_accessor :document_class + @@parsers = {} + + def initialize(grammar) + @grammar = grammar - def initialize - @parser = Slaw::ZA::ActParser.new + @parser = build_parser @builder = Slaw::Parse::Builder.new(parser: @parser) + @parser = @builder.parser @cleanser = Slaw::Parse::Cleanser.new - @document_class = Slaw::Act + end + + def build_parser + unless @@parsers[@grammar] + # load the grammar + grammar_file = File.dirname(__FILE__) + "/grammars/#{@grammar}/act.treetop" + Treetop.load(grammar_file) + + grammar_class = "Slaw::Grammars::#{@grammar.upcase}::ActParser" + @@parsers[@grammar] = eval(grammar_class) + end + + @parser = @@parsers[@grammar].new end # Generate a Slaw::Act instance from plain text. # # @param text [String] plain text # - # @return [Slaw::Act] the resulting act + # @return [Nokogiri::Document] the resulting xml def generate_from_text(text) - act = @document_class.new - act.doc = @builder.parse_and_process_text(cleanup(text)) - act + @builder.parse_and_process_text(cleanup(text)) end # Run basic cleanup on text, such as ensuring clean newlines @@ -66,8 +76,7 @@ def guess_section_number_after_title(text) # Transform an Akoma Ntoso XML document back into a plain-text version # suitable for re-parsing back into XML with no loss of structure. def text_from_act(doc) - here = File.dirname(__FILE__) - xslt = Nokogiri::XSLT(File.read(File.join([here, 'za/act_text.xsl']))) + xslt = Nokogiri::XSLT(File.read(File.join([File.dirname(__FILE__), "grammars/#{@grammar}/act_text.xsl"]))) xslt.transform(doc).child.to_xml end end diff --git a/lib/slaw/grammars/core_nodes.rb b/lib/slaw/grammars/core_nodes.rb new file mode 100644 index 0000000..e262da6 --- /dev/null +++ b/lib/slaw/grammars/core_nodes.rb @@ -0,0 +1,17 @@ +module Slaw + module Grammars + class GroupNode < Treetop::Runtime::SyntaxNode + def to_xml(b, *args) + children.elements.each { |e| e.to_xml(b, *args) } + end + end + + class Body < Treetop::Runtime::SyntaxNode + def to_xml(b) + b.body { |b| + children.elements.each_with_index { |e, i| e.to_xml(b, '', i) } + } + end + end + end +end diff --git a/lib/slaw/grammars/inlines.treetop b/lib/slaw/grammars/inlines.treetop new file mode 100644 index 0000000..ac62e9d --- /dev/null +++ b/lib/slaw/grammars/inlines.treetop @@ -0,0 +1,45 @@ +# encoding: UTF-8 + +require 'slaw/grammars/terminals' +require 'slaw/grammars/inlines_nodes' + +module Slaw + module Grammars + grammar Inlines + ########## + # inline content + + rule inline_statement + space? '\\'? clauses eol + + end + + # one or more words, allowing inline elements + rule clauses + (remark / image / ref / [^\n])+ + + end + + rule remark + '[[' content:(ref / (!']]' .))+ ']]' + + end + + rule image + # images like markdown + # eg. ![title text](image url) + # + # the title text is optional, but the enclosing square brackets aren't + '![' content:(!'](' [^\n])* '](' href:([^)\n]+) ')' + + end + + rule ref + # links like markdown + # eg. [link text](link url) + '[' content:(!'](' [^\n])+ '](' href:([^)\n]+) ')' + + end + end + end +end diff --git a/lib/slaw/grammars/inlines_nodes.rb b/lib/slaw/grammars/inlines_nodes.rb new file mode 100644 index 0000000..84ea039 --- /dev/null +++ b/lib/slaw/grammars/inlines_nodes.rb @@ -0,0 +1,58 @@ +module Slaw + module Grammars + module Inlines + class NakedStatement < Treetop::Runtime::SyntaxNode + def to_xml(b, idprefix, i=0) + b.p { |b| clauses.to_xml(b, idprefix) } if clauses + end + + def content + clauses + end + end + + class Clauses < Treetop::Runtime::SyntaxNode + def to_xml(b, idprefix=nil) + for e in elements + if e.respond_to? :to_xml + e.to_xml(b, idprefix) + else + b << e.text_value + end + end + end + end + + class Remark < Treetop::Runtime::SyntaxNode + def to_xml(b, idprefix) + b.remark(status: 'editorial') do |b| + b << '[' + for e in content.elements + if e.respond_to? :to_xml + e.to_xml(b, idprefix) + else + b << e.text_value + end + end + b << ']' + end + end + end + + class Image < Treetop::Runtime::SyntaxNode + def to_xml(b, idprefix) + attrs = {src: href.text_value} + attrs[:alt] = content.text_value unless content.text_value.empty? + b.img(attrs) + end + end + + class Ref < Treetop::Runtime::SyntaxNode + def to_xml(b, idprefix) + b.ref(content.text_value, href: href.text_value) + end + end + + end + end +end diff --git a/lib/slaw/grammars/pl/act.treetop b/lib/slaw/grammars/pl/act.treetop new file mode 100644 index 0000000..43fa302 --- /dev/null +++ b/lib/slaw/grammars/pl/act.treetop @@ -0,0 +1,240 @@ +# encoding: UTF-8 + +require 'slaw/parse/grammar_helpers' +require 'slaw/grammars/pl/act_nodes' + +require 'slaw/grammars/terminals' +require 'slaw/grammars/tables' +require 'slaw/grammars/schedules' +require 'slaw/grammars/inlines' + +module Slaw + module Grammars + module PL + grammar Act + include Slaw::Parse::GrammarHelpers + + ######## + # major containers + + rule act + empty_line* + preface:preface? + preamble:preamble? + body + schedules:schedules_container? + end + + rule preface + !'PREAMBLE' + ('PREFACE'i space? eol)? + statements:(!'PREAMBLE' pre_body_statement)* + end + + rule preamble + 'PREAMBLE'i space? eol + statements:pre_body_statement* + end + + rule body + children:(division / subdivision / chapter / article / section / paragraph / point / litera / indents / block_paragraphs)+ + end + + rule division + heading:division_heading + children:(subdivision / chapter / article / section / paragraph / point / litera / indents / block_paragraphs)* + + end + + rule subdivision + heading:subdivision_heading + children:(chapter / article / section / paragraph / point / litera / indents / block_paragraphs)* + + end + + rule chapter + heading:chapter_heading + children:(article / section / paragraph / point / litera / indents / block_paragraphs)* + + end + + rule article + # Art. 55. some optional text + # 1. first paragraph etc. + article_prefix intro + children:(section / paragraph / point / litera / indents / block_paragraphs)*
+ end + + rule section + # § 55. foo + section_prefix intro + children:(paragraph / point / litera / indents / block_paragraphs)*
+ end + + rule paragraph + # ustęp: + # 34. ... + paragraph_prefix intro + children:(point / litera / indents / block_paragraphs)* + end + + rule point + # 12) aoeuaoeu + # 12a) aoeuaoeu + point_prefix intro + children:(litera / indents / block_paragraphs)* + end + + rule litera + # a) aoeuaoeu + litera_prefix intro + children:(indents / block_paragraphs)* + end + + rule indents + # - foo + # - bar + children:indent_item+ + end + + rule indent_item + indent_prefix item_content:inline_block_element? eol? + end + + rule intro + (intro_inline:inline_block_element / (eol intro_block:block_element))? eol? + end + + ########## + # group elements + # + # these are used externally and provide support when parsing just + # a particular portion of a document + + rule articles + children:article+ + end + + rule chapters + children:chapter+ + end + + rule divisions + children:division+ + end + + rule paragraphs + children:paragraph+ + end + + rule sections + children:section+ + end + + rule subdivisions + children:subdivision+ + end + + ########## + # headings + + rule division_heading + space? prefix:division_heading_prefix heading:(newline? content)? eol + + end + + rule subdivision_heading + space? prefix:subdivision_heading_prefix heading:(newline? content)? eol + + end + + rule chapter_heading + space? prefix:chapter_heading_prefix heading:(newline? content)? eol + + end + + ########## + # blocks of content inside containers + + rule block_paragraphs + block_element+ + end + + rule block_element + table / naked_statement + end + + # Block elements that don't have to appear at the start of a line. + # ie. we don't need to guard against the start of a chapter, section, etc. + rule inline_block_element + table / inline_statement + end + + ########## + # statements - single lines of content + # + # If a statement starts with a backslash, it's considered to have escaped the subsequent word, + # and is ignored. This allows escaping of section headings, etc. + + rule naked_statement + space? !(division_heading / subdivision_heading / chapter_heading / article_prefix / section_prefix / schedule_title / paragraph_prefix / point_prefix / litera_prefix / indent_prefix) '\\'? clauses eol + + end + + rule pre_body_statement + space? !(division_heading / subdivision_heading / chapter_heading / article_prefix / section_prefix / schedule_title) '\\'? clauses eol + + end + + ########## + # prefixes + + rule division_heading_prefix + 'dzia'i ('ł'/'Ł') space alphanums [ :-]* + end + + rule subdivision_heading_prefix + 'oddzia'i ('ł'/'Ł') space alphanums [ :.-]* + end + + rule chapter_heading_prefix + 'rozdzia'i ('ł'/'Ł') space alphanums [ :.-]* + end + + rule article_prefix + ('Art.'i / ('Artyku'i 'ł'/'Ł')) space number_letter '.'? + end + + rule section_prefix + '§' space alphanums '.'? + end + + rule paragraph_prefix + number_letter '.' + end + + rule point_prefix + # 1) foo + # 2A) foo + number_letter ')' + end + + rule litera_prefix + # a) foo + # bb) foo + letters:letter+ ')' + end + + rule indent_prefix + # these are two different dash characters + '–' / '-' space + end + + include Slaw::Grammars::Inlines + include Slaw::Grammars::Tables + include Slaw::Grammars::Schedules + include Slaw::Grammars::Terminals + end + end + end +end diff --git a/lib/slaw/grammars/pl/act_nodes.rb b/lib/slaw/grammars/pl/act_nodes.rb new file mode 100644 index 0000000..cb83531 --- /dev/null +++ b/lib/slaw/grammars/pl/act_nodes.rb @@ -0,0 +1,441 @@ +require 'slaw/grammars/core_nodes' + +module Slaw + module Grammars + module PL + module Act + class Act < Treetop::Runtime::SyntaxNode + FRBR_URI = '/pl/act/1980/01' + WORK_URI = FRBR_URI + EXPRESSION_URI = "#{FRBR_URI}/pol@" + MANIFESTATION_URI = EXPRESSION_URI + + def to_xml(b, idprefix=nil, i=0) + b.act(contains: "originalVersion") { |b| + write_meta(b) + write_preface(b) + write_preamble(b) + write_body(b) + } + write_schedules(b) + end + + def write_meta(b) + b.meta { |b| + write_identification(b) + + b.references(source: "#this") { + b.TLCOrganization(id: 'slaw', href: 'https://github.com/longhotsummer/slaw', showAs: "Slaw") + b.TLCOrganization(id: 'council', href: '/ontology/organization/za/council', showAs: "Council") + } + } + end + + def write_identification(b) + b.identification(source: "#slaw") { |b| + # use stub values so that we can generate a validating document + b.FRBRWork { |b| + b.FRBRthis(value: "#{WORK_URI}/main") + b.FRBRuri(value: WORK_URI) + b.FRBRalias(value: 'Short Title') + b.FRBRdate(date: '1980-01-01', name: 'Generation') + b.FRBRauthor(href: '#council') + b.FRBRcountry(value: 'za') + } + b.FRBRExpression { |b| + b.FRBRthis(value: "#{EXPRESSION_URI}/main") + b.FRBRuri(value: EXPRESSION_URI) + b.FRBRdate(date: '1980-01-01', name: 'Generation') + b.FRBRauthor(href: '#council') + b.FRBRlanguage(language: 'eng') + } + b.FRBRManifestation { |b| + b.FRBRthis(value: "#{MANIFESTATION_URI}/main") + b.FRBRuri(value: MANIFESTATION_URI) + b.FRBRdate(date: Time.now.strftime('%Y-%m-%d'), name: 'Generation') + b.FRBRauthor(href: '#slaw') + } + } + end + + def write_preface(b) + preface.to_xml(b) if preface.respond_to? :to_xml + end + + def write_preamble(b) + preamble.to_xml(b) if preamble.respond_to? :to_xml + end + + def write_body(b) + body.to_xml(b) + end + + def write_schedules(b) + if schedules.text_value != "" + schedules.to_xml(b) + end + end + end + + class Preface < Treetop::Runtime::SyntaxNode + def to_xml(b, *args) + if text_value != "" + b.preface { |b| + statements.elements.each { |element| + for e in element.elements + e.to_xml(b, "") if e.is_a? Slaw::Grammars::Inlines::NakedStatement + end + } + } + end + end + end + + class Preamble < Treetop::Runtime::SyntaxNode + def to_xml(b, *args) + if text_value != "" + b.preamble { |b| + statements.elements.each { |e| + e.to_xml(b, "") + } + } + end + end + end + + class Part < Treetop::Runtime::SyntaxNode + def num + heading.num + end + + def to_xml(b, *args) + id = "part-#{num}" + + # include a chapter number in the id if our parent has one + if parent and parent.parent.is_a?(Chapter) and parent.parent.num + id = "chapter-#{parent.parent.num}.#{id}" + end + + b.part(id: id) { |b| + heading.to_xml(b) + children.elements.each_with_index { |e, i| e.to_xml(b, id + '.', i) } + } + end + end + + class GenericHeading < Treetop::Runtime::SyntaxNode + def num + prefix.alphanums.text_value + end + + def title + if heading.text_value and heading.respond_to? :content + heading.content.text_value.strip + end + end + + def to_xml(b) + b.num(num) + b.heading(title) if title + end + end + + class Division < Treetop::Runtime::SyntaxNode + def num + heading.num + end + + def to_xml(b, *args) + id = "division-#{num}" + + b.division(id: id) { |b| + heading.to_xml(b) + children.elements.each_with_index { |e, i| e.to_xml(b, id + '.', i) } + } + end + end + + class Subdivision < Treetop::Runtime::SyntaxNode + def num + heading.num + end + + def to_xml(b, *args) + id = "subdivision-#{num}" + + b.subdivision(id: id) { |b| + heading.to_xml(b) + children.elements.each_with_index { |e, i| e.to_xml(b, id + '.', i) } + } + end + end + + class Chapter < Treetop::Runtime::SyntaxNode + def num + heading.num + end + + def to_xml(b, *args) + id = "chapter-#{num}" + + # TODO: do this for the oddzial and zial + # include a part number in the id if our parent has one + if parent and parent.parent.is_a?(Part) and parent.parent.num + id = "part-#{parent.parent.num}.#{id}" + end + + b.chapter(id: id) { |b| + heading.to_xml(b) + children.elements.each_with_index { |e, i| e.to_xml(b, id + '.', i) } + } + end + end + + class BlockWithIntroAndChildren < Treetop::Runtime::SyntaxNode + def intro_node + if intro.elements.length >= 1 + el = intro.elements[0] + + if el.respond_to? :intro_inline + el.intro_inline + elsif el.respond_to? :intro_block + el.intro_block + end + end + end + + def intro_and_children_xml(b, idprefix) + if intro_node and !intro_node.empty? + if not children.empty? + b.intro { |b| intro_node.to_xml(b, idprefix) } + else + b.content { |b| intro_node.to_xml(b, idprefix) } + end + elsif children.empty? + b.content { |b| b.p } + end + + children.elements.each_with_index { |e, i| e.to_xml(b, idprefix, i) } + end + end + + class Article < BlockWithIntroAndChildren + def num + article_prefix.number_letter.text_value + end + + def to_xml(b, *args) + id = "article-#{num}" + idprefix = "#{id}." + + b.article(id: id) { |b| + b.num("#{num}.") + intro_and_children_xml(b, idprefix) + } + end + end + + class Section < BlockWithIntroAndChildren + def num + section_prefix.alphanums.text_value + end + + def to_xml(b, *args) + id = "section-#{num}" + idprefix = "#{id}." + + b.section(id: id) { |b| + b.num("#{num}.") + intro_and_children_xml(b, idprefix) + } + end + end + + class Paragraph < BlockWithIntroAndChildren + def num + paragraph_prefix.number_letter.text_value + end + + def to_xml(b, idprefix='', *args) + id = "#{idprefix}paragraph-#{num}" + idprefix = id + "." + + b.paragraph(id: id) { |b| + b.num(paragraph_prefix.text_value) + intro_and_children_xml(b, idprefix) + } + end + end + + class Point < BlockWithIntroAndChildren + def num + point_prefix.number_letter.text_value + end + + def to_xml(b, idprefix='', i) + id = "#{idprefix}point-#{num}" + idprefix = id + "." + + b.point(id: id) { |b| + b.num(point_prefix.text_value) + intro_and_children_xml(b, idprefix) + } + end + end + + class Litera < BlockWithIntroAndChildren + def num + litera_prefix.letters.text_value + end + + def to_xml(b, idprefix='', i) + id = "#{idprefix}alinea-#{num}" + idprefix = id + "." + + b.alinea(id: id) { |b| + b.num(litera_prefix.text_value) + intro_and_children_xml(b, idprefix) + } + end + end + + class BlockParagraph < Treetop::Runtime::SyntaxNode + def to_xml(b, idprefix='', i=0) + id = "#{idprefix}subparagraph-0" + idprefix = id + "." + + b.subparagraph(id: id) { |b| + b.content { |b| + elements.each_with_index { |e, i| e.to_xml(b, idprefix, i) } + } + } + end + end + + class Indents < Treetop::Runtime::SyntaxNode + # Render a list of indent items. + def to_xml(b, idprefix, i=0) + id = idprefix + "list-#{i}" + idprefix = id + '.' + + b.list(id: id) { |b| + children.elements.each_with_index { |e, i| e.to_xml(b, idprefix, i) } + } + end + end + + class IndentItem < Treetop::Runtime::SyntaxNode + def to_xml(b, idprefix, i) + id = idprefix + "indent-#{i}" + idprefix = id + '.' + + b.indent(id: id) { |b| + b.content { |b| + if not item_content.empty? + item_content.to_xml(b, idprefix) + else + b.p + end + } + } + end + end + + class ScheduleContainer < Treetop::Runtime::SyntaxNode + def to_xml(b) + b.components { |b| + schedules.children.elements.each_with_index { |e, i| + e.to_xml(b, "", i+1) + } + } + end + end + + class Schedule < Treetop::Runtime::SyntaxNode + def num + n = schedule_title.num.text_value + return (n && !n.empty?) ? n : nil + end + + def alias + if not schedule_title.title.text_value.blank? + schedule_title.title.text_value + elsif num + "Schedule #{num}" + else + "Schedule" + end + end + + def heading + if schedule_title.heading.respond_to? :content + schedule_title.heading.content.text_value + else + nil + end + end + + def to_xml(b, idprefix=nil, i=1) + if num + n = num + component = "schedule#{n}" + else + n = i + # make a component name from the schedule title + component = self.alias.downcase().strip().gsub(/[^a-z0-9]/i, '').gsub(/ +/, '') + end + + id = "#{idprefix}#{component}" + + b.component(id: "component-#{id}") { |b| + b.doc_(name: component) { |b| + b.meta { |b| + b.identification(source: "#slaw") { |b| + b.FRBRWork { |b| + b.FRBRthis(value: "#{Act::WORK_URI}/#{component}") + b.FRBRuri(value: Act::WORK_URI) + b.FRBRalias(value: self.alias) + b.FRBRdate(date: '1980-01-01', name: 'Generation') + b.FRBRauthor(href: '#council') + b.FRBRcountry(value: 'za') + } + b.FRBRExpression { |b| + b.FRBRthis(value: "#{Act::EXPRESSION_URI}/#{component}") + b.FRBRuri(value: Act::EXPRESSION_URI) + b.FRBRdate(date: '1980-01-01', name: 'Generation') + b.FRBRauthor(href: '#council') + b.FRBRlanguage(language: 'eng') + } + b.FRBRManifestation { |b| + b.FRBRthis(value: "#{Act::MANIFESTATION_URI}/#{component}") + b.FRBRuri(value: Act::MANIFESTATION_URI) + b.FRBRdate(date: Time.now.strftime('%Y-%m-%d'), name: 'Generation') + b.FRBRauthor(href: '#slaw') + } + } + } + + b.mainBody { |b| + idprefix = "#{id}." + + # there is no good AKN hierarchy container for schedules, so we + # just use article because we don't use it anywhere else. + b.article(id: id) { |b| + b.heading(heading) if heading + body.children.elements.each_with_index { |e| e.to_xml(b, idprefix, i) } if body.is_a? Body + } + } + } + } + end + end + + class ScheduleStatement < Treetop::Runtime::SyntaxNode + def to_xml(b, idprefix) + b.p { |b| clauses.to_xml(b, idprefix) } if clauses + end + end + end + end + end +end diff --git a/lib/slaw/grammars/pl/act_text.xsl b/lib/slaw/grammars/pl/act_text.xsl new file mode 100644 index 0000000..e6a4fe4 --- /dev/null +++ b/lib/slaw/grammars/pl/act_text.xsl @@ -0,0 +1,271 @@ + + + + + + + + + + + + + + + + \ + + + + + + + + + + + + + + PREFACE + + + + + + + + PREAMBLE + + + + + + + + Dział + + - + + + + + + + + + Rozdział + + - + + + + + + + + + Art. + + + + + + + + + § + + + + + + + + + + + + + + + + + + - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Schedule - + + + + + + + + + + + + + + + + + + + + + {| + + + + + =" + + " + + +|- + + + +|} + + + + + + + +|- + + + + + + +! + + + +| + + + + + + + + =" + + " + + | + + + + + + + + + + + + + + [ + + ] + + + + [ + + ]( + + ) + + + + ![ + + ]( + + ) + + + + + + + + + + + + + + diff --git a/lib/slaw/grammars/schedules.treetop b/lib/slaw/grammars/schedules.treetop new file mode 100644 index 0000000..59503ad --- /dev/null +++ b/lib/slaw/grammars/schedules.treetop @@ -0,0 +1,33 @@ +# encoding: UTF-8 + +require 'slaw/grammars/schedules_nodes' + +module Slaw + module Grammars + grammar Schedules + rule schedules_container + schedules:schedules + end + + rule schedules + children:schedule+ + end + + rule schedule + schedule_title + body:body? + + end + + rule schedule_title + space? schedule_title_prefix space? "\""? num:alphanums? "\""? [ \t:.-]* title:(content)? + heading:(newline space? content)? + eol + end + + rule schedule_title_prefix + 'schedule'i 's'i? + end + end + end +end diff --git a/lib/slaw/grammars/schedules_nodes.rb b/lib/slaw/grammars/schedules_nodes.rb new file mode 100644 index 0000000..2972ae4 --- /dev/null +++ b/lib/slaw/grammars/schedules_nodes.rb @@ -0,0 +1,107 @@ +require 'slaw/grammars/core_nodes' + +module Slaw + module Grammars + module Schedules + FRBR_URI = '/za/act/1980/01' + WORK_URI = FRBR_URI + EXPRESSION_URI = "#{FRBR_URI}/eng@" + MANIFESTATION_URI = EXPRESSION_URI + + class ScheduleContainer < Treetop::Runtime::SyntaxNode + def to_xml(b) + b.components { |b| + schedules.children.elements.each_with_index { |e, i| + e.to_xml(b, "", i+1) + } + } + end + end + + class Schedule < Treetop::Runtime::SyntaxNode + def num + n = schedule_title.num.text_value + return (n && !n.empty?) ? n : nil + end + + def alias + if not schedule_title.title.text_value.blank? + schedule_title.title.text_value + elsif num + "Schedule #{num}" + else + "Schedule" + end + end + + def heading + if schedule_title.heading.respond_to? :content + schedule_title.heading.content.text_value + else + nil + end + end + + def to_xml(b, idprefix=nil, i=1) + if num + n = num + component = "schedule#{n}" + else + n = i + # make a component name from the schedule title + component = self.alias.downcase().strip().gsub(/[^a-z0-9]/i, '').gsub(/ +/, '') + end + + id = "#{idprefix}#{component}" + + b.component(id: "component-#{id}") { |b| + b.doc_(name: component) { |b| + b.meta { |b| + b.identification(source: "#slaw") { |b| + b.FRBRWork { |b| + b.FRBRthis(value: "#{WORK_URI}/#{component}") + b.FRBRuri(value: WORK_URI) + b.FRBRalias(value: self.alias) + b.FRBRdate(date: '1980-01-01', name: 'Generation') + b.FRBRauthor(href: '#council') + b.FRBRcountry(value: 'za') + } + b.FRBRExpression { |b| + b.FRBRthis(value: "#{EXPRESSION_URI}/#{component}") + b.FRBRuri(value: EXPRESSION_URI) + b.FRBRdate(date: '1980-01-01', name: 'Generation') + b.FRBRauthor(href: '#council') + b.FRBRlanguage(language: 'eng') + } + b.FRBRManifestation { |b| + b.FRBRthis(value: "#{MANIFESTATION_URI}/#{component}") + b.FRBRuri(value: MANIFESTATION_URI) + b.FRBRdate(date: Time.now.strftime('%Y-%m-%d'), name: 'Generation') + b.FRBRauthor(href: '#slaw') + } + } + } + + b.mainBody { |b| + idprefix = "#{id}." + + # there is no good AKN hierarchy container for schedules, so we + # just use article because we don't use it anywhere else. + b.article(id: id) { |b| + b.heading(heading) if heading + body.children.elements.each_with_index { |e| e.to_xml(b, idprefix, i) } if body.is_a? Body + } + } + } + } + end + end + + class ScheduleStatement < Treetop::Runtime::SyntaxNode + def to_xml(b, idprefix) + b.p { |b| clauses.to_xml(b, idprefix) } if clauses + end + end + end + end +end diff --git a/lib/slaw/grammars/tables.treetop b/lib/slaw/grammars/tables.treetop new file mode 100644 index 0000000..bf72829 --- /dev/null +++ b/lib/slaw/grammars/tables.treetop @@ -0,0 +1,59 @@ +# encoding: UTF-8 + +require 'slaw/grammars/terminals' +require 'slaw/grammars/tables_nodes' + +module Slaw + module Grammars + grammar Tables + ########## + # wikimedia-style tables + # + # this grammar doesn't support inline table cells (eg: | col1 || col2 || col3) + # instead, the builder preprocesses tables to break inline cells onto their own + # lines, which we do support. + + rule table + space? '{|' eol + table_body + '|}' eol + + end + + rule table_body + (table_row / table_cell)* + end + + rule table_row + '|-' space? eol + end + + rule table_cell + # don't match end-of-table + !'|}' + [!|] attribs:table_attribs? space? + # first content line, then multiple lines + content:(line:table_line (![!|] space? line:table_line)*) + + end + + rule table_line + clauses:clauses? eol + + end + + rule table_attribs + space? attribs:(table_attrib+) '|' + end + + rule table_attrib + name:([a-z_-]+) '=' value:( + ('"' (!'"' .)* '"') / + ("'" (!"'" .)* "'")) + space? + end + + include Terminals + end + end +end diff --git a/lib/slaw/grammars/tables_nodes.rb b/lib/slaw/grammars/tables_nodes.rb new file mode 100644 index 0000000..7af5703 --- /dev/null +++ b/lib/slaw/grammars/tables_nodes.rb @@ -0,0 +1,74 @@ +module Slaw + module Grammars + module Tables + class Table < Treetop::Runtime::SyntaxNode + def to_xml(b, idprefix, i=0) + b.table(id: "#{idprefix}table#{i}") { |b| + # we'll gather cells into this row list + rows = [] + cells = [] + + for child in table_body.elements + if child.is_a? TableCell + # cell + cells << child + else + # new row marker + rows << cells unless cells.empty? + cells = [] + end + end + rows << cells unless cells.empty? + + for row in rows + b.tr { |tr| + for cell in row + cell.to_xml(tr, "") + end + } + end + } + end + end + + class TableCell < Treetop::Runtime::SyntaxNode + def to_xml(b, idprefix) + tag = text_value[0] == '!' ? 'th' : 'td' + + attrs = {} + if not attribs.empty? + for item in attribs.attribs.elements + # key=value (strip quotes around value) + attrs[item.name.text_value.strip] = item.value.text_value[1..-2] + end + end + + b.send(tag.to_sym, attrs) { |b| + b.p { |b| + # first line, and the rest + lines = [content.line] + content.elements.last.elements.map(&:line) + + lines.each_with_index do |line, i| + line.to_xml(b, i, i == lines.length-1) + end + } + } + end + end + + class TableLine < Treetop::Runtime::SyntaxNode + # line of table content + def to_xml(b, i, tail) + clauses.to_xml(b) unless clauses.empty? + + # add trailing newlines. + # for the first line, eat whitespace at the start + # for the last line, eat whitespace at the end + if not tail and (i > 0 or not clauses.empty?) + eol.text_value.count("\n").times { b.eol } + end + end + end + end + end +end diff --git a/lib/slaw/grammars/terminals.treetop b/lib/slaw/grammars/terminals.treetop new file mode 100644 index 0000000..24e78ee --- /dev/null +++ b/lib/slaw/grammars/terminals.treetop @@ -0,0 +1,84 @@ +# encoding: UTF-8 + +module Slaw + module Grammars + grammar Terminals + ######### + ## one line of basic content + + rule content + # anything but a newline, followed by a + # newline or end of file (without consuming the newline) + [^\n]+ &eol + end + + ########## + # terminals + + # eg. 2, 2A, 2b + rule number_letter + number letter* + end + + rule letter_ordinal + letter (letter / digit)* + end + + rule dotted_number_3 + number '.' number ('.' number)+ + end + + rule dotted_number_2 + number '.' number + end + + rule number + digit+ + end + + rule digit + [0-9] + end + + rule letter + [a-zA-Z] + end + + rule alphanums + [a-zA-Z0-9]+ + end + + rule quotes + ["“”] + end + + rule non_quotes + [^"“”] + end + + ########## + # whitespace + + rule space + [ \t]+ + end + + rule whitespace + [ \t\n]* + end + + rule empty_line + space? newline + end + + rule eol + newline + empty_line* + end + + rule newline + "\n" + end + end + end +end diff --git a/lib/slaw/grammars/za/act.treetop b/lib/slaw/grammars/za/act.treetop new file mode 100644 index 0000000..50f04ef --- /dev/null +++ b/lib/slaw/grammars/za/act.treetop @@ -0,0 +1,222 @@ +# encoding: UTF-8 + +require 'slaw/parse/grammar_helpers' +require 'slaw/grammars/za/act_nodes' + +require 'slaw/grammars/terminals' +require 'slaw/grammars/tables' +require 'slaw/grammars/schedules' +require 'slaw/grammars/inlines' + +module Slaw + module Grammars + module ZA + grammar Act + include Slaw::Parse::GrammarHelpers + + ######## + # major containers + + rule act + empty_line* + preface:preface? + preamble:preamble? + body + schedules:schedules_container? + end + + rule preface + !'PREAMBLE' + ('PREFACE'i space? eol)? + statements:(!'PREAMBLE' pre_body_statement)* + end + + rule preamble + 'PREAMBLE'i space? eol + statements:pre_body_statement* + end + + rule body + children:(chapter / part / section / subsection / block_paragraphs)+ + end + + rule chapter + heading:chapter_heading + children:(part / section / subsection / block_paragraphs)* + + end + + rule part + heading:part_heading + children:(section / subsection / block_paragraphs)* + + end + + rule section + section_title + children:(subsection / block_paragraphs)*
+ end + + rule subsection + space? subsection_prefix space? + # eg: (2) (a) foo + first_child:inline_block_element? + # eg: (2) + eol? + children:block_element* + end + + ########## + # group elements + # + # these are used externally and provide support when parsing just + # a particular portion of a document + + rule chapters + children:chapter+ + end + + rule parts + children:part+ + end + + rule sections + children:section+ + end + + ########## + # headings + + rule chapter_heading + space? chapter_heading_prefix heading:(newline? content)? eol + + end + + rule part_heading + space? part_heading_prefix heading:(newline? content)? eol + + end + + rule section_title + section_title_1 / section_1_title + end + + rule section_title_1 + &{ |s| options[:section_number_after_title] } + # Section title + # 1. Section content + content eol + section_title_prefix whitespace + end + + rule section_1_title + # 1. Section title + # Section content + # + # Additionally, the section title is optional. + !{ |s| options[:section_number_after_title] } + space? section_title_prefix section_title:section_title_content? eol? + + end + + rule section_title_content + # don't match subsections, eg. + # + # 10. (1) subsection content... + space !subsection_prefix content eol + end + + ########## + # blocks of content inside containers + + rule block_paragraphs + block_element+ + end + + rule block_element + (table / blocklist / naked_statement) + end + + # Block elements that don't have to appear at the start of a line. + # ie. we don't need to guard against the start of a chapter, section, etc. + rule inline_block_element + (table / blocklist / inline_statement) + end + + rule blocklist + blocklist_item+ + end + + rule blocklist_item + # TODO: this whitespace should probably be space, to allow empty blocklist items followed by plain text + space? blocklist_item_prefix whitespace item_content:(!blocklist_item_prefix clauses:clauses? eol)? eol? + + end + + rule blocklist_item_prefix + ('(' letter_ordinal ')') / dotted_number_3 + end + + ########## + # statements - single lines of content + # + # If a statement starts with a backslash, it's considered to have escaped the subsequent word, + # and is ignored. This allows escaping of section headings, etc. + + rule naked_statement + space? !(chapter_heading / part_heading / section_title / schedule_title / subsection_prefix) '\\'? clauses eol + + end + + rule pre_body_statement + space? !(chapter_heading / part_heading / section_title / schedule_title) '\\'? clauses eol + + end + + ########## + # prefixes + + rule part_heading_prefix + 'part'i space alphanums [ :-]* + end + + rule chapter_heading_prefix + 'chapter'i space alphanums [ :-]* + end + + rule section_title_prefix + number_letter '.'? + end + + rule subsection_prefix + # there are two subsection handling syntaxes: + # + # (1) foo + # (2A) foo + # + # and + # + # 8.2 for + # 8.3 bar + # + # The second is less common, but this allows us to handle it. + # Note that it is usually accompanied by a similar list number format: + # + # 8.2.1 item 1 + # 8.2.2 item 2 + # + # which aren't subsections, but lists, so force the space at the end + # of the number to catch this case. + num:('(' number_letter ')') + / + num:dotted_number_2 '.'? space + end + + include Slaw::Grammars::Inlines + include Slaw::Grammars::Tables + include Slaw::Grammars::Schedules + include Slaw::Grammars::Terminals + end + end + end +end diff --git a/lib/slaw/grammars/za/act_nodes.rb b/lib/slaw/grammars/za/act_nodes.rb new file mode 100644 index 0000000..30a4b0f --- /dev/null +++ b/lib/slaw/grammars/za/act_nodes.rb @@ -0,0 +1,307 @@ +require 'slaw/grammars/core_nodes' + +module Slaw + module Grammars + module ZA + module Act + class Act < Treetop::Runtime::SyntaxNode + FRBR_URI = '/za/act/1980/01' + WORK_URI = FRBR_URI + EXPRESSION_URI = "#{FRBR_URI}/eng@" + MANIFESTATION_URI = EXPRESSION_URI + + def to_xml(b, idprefix=nil, i=0) + b.act(contains: "originalVersion") { |b| + write_meta(b) + write_preface(b) + write_preamble(b) + write_body(b) + } + write_schedules(b) + end + + def write_meta(b) + b.meta { |b| + write_identification(b) + + b.references(source: "#this") { + b.TLCOrganization(id: 'slaw', href: 'https://github.com/longhotsummer/slaw', showAs: "Slaw") + b.TLCOrganization(id: 'council', href: '/ontology/organization/za/council', showAs: "Council") + } + } + end + + def write_identification(b) + b.identification(source: "#slaw") { |b| + # use stub values so that we can generate a validating document + b.FRBRWork { |b| + b.FRBRthis(value: "#{WORK_URI}/main") + b.FRBRuri(value: WORK_URI) + b.FRBRalias(value: 'Short Title') + b.FRBRdate(date: '1980-01-01', name: 'Generation') + b.FRBRauthor(href: '#council') + b.FRBRcountry(value: 'za') + } + b.FRBRExpression { |b| + b.FRBRthis(value: "#{EXPRESSION_URI}/main") + b.FRBRuri(value: EXPRESSION_URI) + b.FRBRdate(date: '1980-01-01', name: 'Generation') + b.FRBRauthor(href: '#council') + b.FRBRlanguage(language: 'eng') + } + b.FRBRManifestation { |b| + b.FRBRthis(value: "#{MANIFESTATION_URI}/main") + b.FRBRuri(value: MANIFESTATION_URI) + b.FRBRdate(date: Time.now.strftime('%Y-%m-%d'), name: 'Generation') + b.FRBRauthor(href: '#slaw') + } + } + end + + def write_preface(b) + preface.to_xml(b) if preface.respond_to? :to_xml + end + + def write_preamble(b) + preamble.to_xml(b) if preamble.respond_to? :to_xml + end + + def write_body(b) + body.to_xml(b) + end + + def write_schedules(b) + if schedules.text_value != "" + schedules.to_xml(b) + end + end + end + + class Preface < Treetop::Runtime::SyntaxNode + def to_xml(b, *args) + if text_value != "" + b.preface { |b| + statements.elements.each { |element| + for e in element.elements + e.to_xml(b, "") if e.is_a? Slaw::Grammars::Inlines::NakedStatement + end + } + } + end + end + end + + class Preamble < Treetop::Runtime::SyntaxNode + def to_xml(b, *args) + if text_value != "" + b.preamble { |b| + statements.elements.each { |e| + e.to_xml(b, "") + } + } + end + end + end + + class Part < Treetop::Runtime::SyntaxNode + def num + heading.num + end + + def to_xml(b, *args) + id = "part-#{num}" + + # include a chapter number in the id if our parent has one + if parent and parent.parent.is_a?(Chapter) and parent.parent.num + id = "chapter-#{parent.parent.num}.#{id}" + end + + b.part(id: id) { |b| + heading.to_xml(b) + children.elements.each_with_index { |e, i| e.to_xml(b, id + '.', i) } + } + end + end + + class PartHeading < Treetop::Runtime::SyntaxNode + def num + part_heading_prefix.alphanums.text_value + end + + def title + if heading.text_value and heading.respond_to? :content + heading.content.text_value.strip + end + end + + def to_xml(b) + b.num(num) + b.heading(title) if title + end + end + + class Chapter < Treetop::Runtime::SyntaxNode + def num + heading.num + end + + def to_xml(b, *args) + id = "chapter-#{num}" + + # include a part number in the id if our parent has one + if parent and parent.parent.is_a?(Part) and parent.parent.num + id = "part-#{parent.parent.num}.#{id}" + end + + b.chapter(id: id) { |b| + heading.to_xml(b) + children.elements.each_with_index { |e, i| e.to_xml(b, id + '.', i) } + } + end + end + + class ChapterHeading < Treetop::Runtime::SyntaxNode + def num + chapter_heading_prefix.alphanums.text_value + end + + def title + if heading.text_value and heading.respond_to? :content + heading.content.text_value.strip + end + end + + def to_xml(b) + b.num(num) + b.heading(title) if title + end + end + + class Section < Treetop::Runtime::SyntaxNode + def num + section_title.num + end + + def title + section_title.title + end + + def to_xml(b, *args) + id = "section-#{num}" + b.section(id: id) { |b| + b.num("#{num}.") + b.heading(title) + + idprefix = "#{id}." + + children.elements.each_with_index { |e, i| e.to_xml(b, idprefix, i) } + } + end + end + + class SectionTitleType1 < Treetop::Runtime::SyntaxNode + # a section title of the form: + # + # Definitions + # 1. In this act... + + def num + section_title_prefix.number_letter.text_value + end + + def title + content.text_value + end + end + + class SectionTitleType2 < Treetop::Runtime::SyntaxNode + # a section title of the form: + # + # 1. Definitions + # In this act... + # + # In this format, the title is optional and the section content may + # start where we think the title is. + + def num + section_title_prefix.number_letter.text_value + end + + def title + section_title.empty? ? "" : section_title.content.text_value + end + end + + class BlockParagraph < Treetop::Runtime::SyntaxNode + def to_xml(b, idprefix='', i=0) + id = "#{idprefix}paragraph-0" + idprefix = "#{id}." + + b.paragraph(id: id) { |b| + b.content { |b| + elements.each_with_index { |e, i| e.to_xml(b, idprefix, i) } + } + } + end + end + + class Subsection < Treetop::Runtime::SyntaxNode + def num + subsection_prefix.num.text_value + end + + def to_xml(b, idprefix, i) + id = idprefix + num.gsub(/[()]/, '') + idprefix = id + "." + + kids = children.elements + kids = [first_child] + kids if first_child and !first_child.empty? + + b.subsection(id: id) { |b| + b.num(num) + b.content { |b| + if kids.empty? + # schema requires a non-empty content element + b.p + else + kids.each_with_index { |e, i| e.to_xml(b, idprefix, i) } + end + } + } + end + end + + class Blocklist < Treetop::Runtime::SyntaxNode + # Render a block list to xml. If a block is given, + # yield to it a builder to insert a listIntroduction node + def to_xml(b, idprefix, i=0, &block) + id = idprefix + "list#{i}" + idprefix = id + '.' + + b.blockList(id: id) { |b| + b.listIntroduction { |b| yield b } if block_given? + + elements.each { |e| e.to_xml(b, idprefix) } + } + end + end + + class BlocklistItem < Treetop::Runtime::SyntaxNode + def num + blocklist_item_prefix.text_value + end + + def to_xml(b, idprefix) + b.item(id: idprefix + num.gsub(/[()]/, '')) { |b| + b.num(num) + b.p { |b| + item_content.clauses.to_xml(b, idprefix) if respond_to? :item_content and item_content.respond_to? :clauses + } + } + end + end + + end + end + end +end diff --git a/lib/slaw/za/act_text.xsl b/lib/slaw/grammars/za/act_text.xsl similarity index 100% rename from lib/slaw/za/act_text.xsl rename to lib/slaw/grammars/za/act_text.xsl diff --git a/lib/slaw/lifecycle_event.rb b/lib/slaw/lifecycle_event.rb deleted file mode 100644 index a919b91..0000000 --- a/lib/slaw/lifecycle_event.rb +++ /dev/null @@ -1,23 +0,0 @@ -module Slaw - # An event in the lifecycle of an act - class LifecycleEvent - include Slaw::Namespace - - # Date of the event - attr_accessor :date - - # type of the event - attr_accessor :type - - # the source of the event, an XML reference element - attr_accessor :source - - def initialize(element) - @date = element['date'] - @type = element['type'] - - source_id = element['source'][1..-1] - @source = element.document.at_xpath("//a:references/*[@id=\"#{source_id}\"]", a: NS) - end - end -end diff --git a/lib/slaw/parse/builder.rb b/lib/slaw/parse/builder.rb index 765fab6..579a851 100644 --- a/lib/slaw/parse/builder.rb +++ b/lib/slaw/parse/builder.rb @@ -23,11 +23,12 @@ class Builder include Slaw::Namespace include Slaw::Logging - @@parsers = {} - # Additional hash of options to be provided to the parser when parsing. attr_accessor :parse_options + # The parser to use + attr_accessor :parser + # Prefix to use when generating IDs for fragments attr_accessor :fragment_id_prefix @@ -36,26 +37,10 @@ class Builder # Specify either `:parser` or `:grammar_file` and `:grammar_class`. # # @option opts [Treetop::Runtime::CompiledParser] :parser parser to use - # @option opts [String] :grammar_file grammar filename to load a parser from - # @option opts [String] :grammar_class name of the class that the grammar will generate + # @option opts Hash :parse_options options to parse to the parser def initialize(opts={}) - if opts[:parser] - @parser = opts[:parser] - elsif opts[:grammar_file] and opts[:grammar_class] - if @@parsers[opts[:grammar_class]] - # already compiled the grammar, just use it - @parser = @@parsers[opts[:grammar_class]] - else - # load the grammar - Treetop.load(opts[:grammar_file]) - cls = eval(opts[:grammar_class]) - @parser = cls.new - end - else - raise ArgumentError.new("Specify either :parser or :grammar_file and :grammar_class") - end - - @parse_options = {} + @parser = opts[:parser] + @parse_options = opts[:parse_optiosn] || {} end # Do all the work necessary to parse text into a well-formed XML document. @@ -167,7 +152,6 @@ def to_xml(doc) # @return [Nokogiri::XML::Document] the updated document def postprocess(doc) normalise_headings(doc) - find_short_title(doc) adjust_blocklists(doc) doc @@ -189,186 +173,6 @@ def normalise_headings(doc) end end - # Find the short title and add it as an FRBRalias element in the meta section - # - # @param doc [Nokogiri::XML::Document] - def find_short_title(doc) - logger.info("Finding short title") - - # Short title and commencement - # 8. This Act shall be called the Legal Aid Amendment Act, 1996, and shall come - # into operation on a date fixed by the President by proclamation in the Gazette. - - doc.xpath('//a:body//a:heading[contains(text(), "hort title")]', a: NS).each do |heading| - section = heading.parent.at_xpath('a:subsection', a: NS) - if section and section.text =~ /this act (is|shall be called) the (([a-zA-Z\(\)]\s*)+, \d\d\d\d)/i - short_title = $2 - - logger.info("+ Found title: #{short_title}") - - node = doc.at_xpath('//a:meta//a:FRBRalias', a: NS) - node['value'] = short_title - break - end - end - end - - # Find definitions of terms and introduce them into the - # meta section of the document. - # - # @param doc [Nokogiri::XML::Document] - def link_definitions(doc) - logger.info("Finding and linking definitions") - - terms = find_definitions(doc) - add_terms_to_references(doc, terms) - find_term_references(doc, terms) - renumber_terms(doc) - end - - # Find `def` elements in the document and return a Hash from - # term ids to the text of each term - # - # @param doc [Nokogiri::XML::Document] - # - # @return [Hash{String, String}] - def find_definitions(doc) - guess_at_definitions(doc) - - terms = {} - doc.xpath('//a:def', a: NS).each do |defn| - #

"affected land" means land in respect of which an application has been lodged in terms of section 17(1);

- if defn['refersTo'] - id = defn['refersTo'].sub(/^#/, '') - term = defn.content - terms[id] = term - - logger.info("+ Found definition for: #{term}") - end - end - - terms - end - - # Find defined terms in the document. - # - # This looks for heading elements with the words 'definitions' or 'interpretation', - # and then looks for phrases like - # - # "this word" means something... - # - # It identifies "this word" as a defined term and wraps it in a def tag with a refersTo - # attribute referencing the term being defined. The surrounding block - # structure is also has its refersTo attribute set to the term. This way, the term - # is both marked as defined, and the container element with the full - # definition of the term is identified. - def guess_at_definitions(doc) - doc.xpath('//a:section', a: NS).select do |section| - # sections with headings like Definitions - heading = section.at_xpath('a:heading', a: NS) - heading && heading.content =~ /definition|interpretation/i - end.each do |section| - # find items like "foo" means blah... - - section.xpath('.//a:p|.//a:listIntroduction', a: NS).each do |container| - # only if we don't already have a definition here - next if container.at_xpath('a:def', a: NS) - - # get first text node - text = container.children.first - next if (not text or not text.text?) - - match = /^\s*["“”](.+?)["“”]/.match(text.text) - if match - term = match.captures[0] - term_id = 'term-' + term.gsub(/[^a-zA-Z0-9_-]/, '_') - - #

"affected land" means land in respect of which an application has been lodged in terms of section 17(1);

- refersTo = "##{term_id}" - defn = doc.create_element('def', term, refersTo: refersTo) - rest = match.post_match - - text.before(defn) - defn.before(doc.create_text_node('"')) - text.content = '"' + rest - - # adjust the container's refersTo attribute - parent = find_up(container, ['item', 'point', 'blockList', 'list', 'paragraph', 'subsection', 'section', 'chapter', 'part']) - parent['refersTo'] = refersTo - end - end - end - end - - def add_terms_to_references(doc, terms) - refs = doc.at_xpath('//a:meta/a:references', a: NS) - unless refs - refs = doc.create_element('references', source: "#this") - doc.at_xpath('//a:meta/a:identification', a: NS).after(refs) - end - - # nuke all existing term reference elements - refs.xpath('a:TLCTerm', a: NS).each { |el| el.remove } - - for id, term in terms - # - refs << doc.create_element('TLCTerm', - id: id, - href: "/ontology/term/this.eng.#{id.gsub(/^term-/, '')}", - showAs: term) - end - end - - # Find and decorate references to terms in the document. - # The +terms+ param is a hash from term_id to actual term. - def find_term_references(doc, terms) - logger.info("+ Finding references to terms") - - i = 0 - - # sort terms by the length of the defined term, desc, - # so that we don't find short terms inside longer - # terms - terms = terms.to_a.sort_by { |pair| -pair[1].size } - - # look for each term - for term_id, term in terms - doc.xpath('//a:body//text()', a: NS).each do |text| - # replace all occurrences in this text node - - # unless we're already inside a def or term element - next if (["def", "term"].include?(text.parent.name)) - - # don't link to a term inside its own definition - owner = find_up(text, 'subsection') - next if owner and owner.at_xpath(".//a:def[@refersTo='##{term_id}']", a: NS) - - while posn = (text.content =~ /\b#{Regexp::escape(term)}\b/) - #

A delegation under subsection (1) shall not prevent the Minister from exercising the power himself or herself.

- node = doc.create_element('term', term, refersTo: "##{term_id}", id: "trm#{i}") - - pre = (posn > 0) ? text.content[0..posn-1] : nil - post = text.content[posn+term.length..-1] - - text.before(node) - node.before(doc.create_text_node(pre)) if pre - text.content = post - - i += 1 - end - end - end - end - - # recalculate ids for elements - def renumber_terms(doc) - logger.info("Renumbering terms") - - doc.xpath('//a:term', a: NS).each_with_index do |term, i| - term['id'] = "trm#{i}" - end - end - # Adjust blocklists: # # - nest them correctly diff --git a/lib/slaw/parse/cleanser.rb b/lib/slaw/parse/cleanser.rb index a922ad7..82e6284 100644 --- a/lib/slaw/parse/cleanser.rb +++ b/lib/slaw/parse/cleanser.rb @@ -14,23 +14,11 @@ class Cleanser def cleanup(s) s = scrub(s) s = correct_newlines(s) - s = fix_quotes(s) s = expand_tabs(s) s = chomp(s) s = enforce_newline(s) end - # Run deeper introspections and reformat the text, such as - # unwrapping/re-wrapping lines. These may not be safe to run - # multiple times. - def reformat(s) - s = remove_boilerplate(s) - s = unbreak_lines(s) - s = break_lines(s) - s = strip_toc(s) - s = enforce_newline(s) - end - # ------------------------------------------------------------------------ def remove_empty_lines(s) @@ -50,36 +38,12 @@ def scrub(s) .gsub(/\n* /, '') end - # change weird quotes to normal ones - def fix_quotes(s) - s.gsub(/‘‘|’’|''/, '"') - end - # tabs to spaces def expand_tabs(s) s.gsub(/\t/, ' ')\ .gsub("\u00A0", ' ') # non-breaking space end - # Try to remove boilerplate lines found in many files, such as page numbers. - def remove_boilerplate(s) - # nuke any line to do with Sabinet and the government printer - s.gsub(/^.*Sabinet.*Government Printer.*$/i, '')\ - .gsub(/^.*Provincial Gazette \d+.*$/i, '')\ - .gsub(/^.*Provinsiale Koerant \d+.*$/i, '')\ - .gsub(/^.*PROVINCIAL GAZETTE.*$/, '')\ - .gsub(/^.*PROVINSIALE KOERANT.*$/, '')\ - .gsub(/^\s*\d+\s*$/, '')\ - .gsub(/^.*This gazette is also available.*$/, '')\ - # get rid of date lines - .gsub(/^\d{1,2}\s+\w+\s+\d{4}$/, '')\ - # get rid of page number lines - .gsub(/^\s*page \d+( of \d+)?\s*\n/i, '')\ - .gsub(/^\s*\d*\s*No\. \d+$/, '')\ - # get rid of lines with lots of ____ or ---- chars, they're usually pagebreaks - .gsub(/^.*[_-]{5}.*$/, '') - end - # Get rid of whitespace at the end of lines and at the start and end of the # entire string. def chomp(s) @@ -94,106 +58,6 @@ def enforce_newline(s) # ensure string ends with a newline s.end_with?("\n") ? s : (s + "\n") end - - # Make educated guesses about lines that should - # have been broken but haven't, and break them. - # - # This is very dependent on a locale's legislation grammar, there are - # lots of rules of thumb that make this work. - def break_lines(s) - # often we find a section title munged onto the same line as its first statement - # eg: - # foo bar. New section title 62. (1) For the purpose - s = s.gsub(/\. ([^.]+) (\d+\. ?\(1\) )/, ".\n" + '\1' + "\n" + '\2') - - # New section title 62. (1) For the purpose - s = s.gsub(/(\w) (\d+\. ?\(1\) )/, '\1' + "\n" + '\2') - - # (1) foo; (2) bar - # (1) foo. (2) bar - s = s.gsub(/(\w{3,}[;.]) (\([0-9a-z]+\))/, "\\1\n\\2") - - # (1) foo; and (2) bar - # (1) foo; or (2) bar - s = s.gsub(/; (and|or) \(/, "; \\1\n(") - - # The officer-in-Charge may – (a) remove all withered natural... \n(b) - # We do this last, because by now we should have reconised that (b) should already - # be on a new line. - s = s.gsub(/ (\(a\) .+?\n\(b\))/, "\n\\1") - - # "foo" means ...; "bar" means - s = s.gsub(/; (["”“][^"”“]+?["”“] means)/, ";\n\\1") - - # CHAPTER 4 PARKING METER PARKING GROUNDS Place of parking - s = s.gsub(/([A-Z0-9 ]{5,}) ([A-Z][a-z ]{5,})/, "\\1\n\\2") - - s - end - - # Find likely candidates for unnecessarily broken lines - # and unbreaks them. - def unbreak_lines(s) - lines = s.split(/\n/) - output = [] - - # set of regex matcher pairs, one for the prev line, one for the current line - matchers = [ - [/[a-z0-9]$/, /^\s*[a-z]/], # line ends with and starst with lowercase - [/;$/, /^\s*(and|or)/], # ends with ; then and/or on new line - ] - - prev = nil - lines.each_with_index do |line, i| - if i == 0 - output << line - else - prev = output[-1] - unbreak = false - - for prev_re, curr_re in matchers - if prev =~ prev_re and line =~ curr_re - unbreak = true - break - end - end - - if unbreak - output[-1] = prev + ' ' + line - else - output << line - end - end - end - - output.join("\n") - end - - # Do our best to remove table of contents at the start, - # it really confuses the grammer. - def strip_toc(s) - # first, try to find 'TABLE OF CONTENTS' anywhere within the first 4K of text, - if toc_start = s[0..4096].match(/TABLE OF CONTENTS/i) - - # grab the first non-blank line after that, it's our end-of-TOC marker - if eol = s.match(/^(.+?)$/, toc_start.end(0)) - marker = eol[0] - - # search for the first line that is a prefix of marker (or vv), and delete - # everything in between - posn = eol.end(0) - while m = s.match(/^(.+?)$/, posn) - if marker.start_with?(m[0]) or m[0].start_with?(marker) - return s[0...toc_start.begin(0)] + s[m.begin(0)..-1] - end - - posn = m.end(0) - end - end - end - - s - end end end end diff --git a/lib/slaw/render/html.rb b/lib/slaw/render/html.rb deleted file mode 100644 index 91dbf72..0000000 --- a/lib/slaw/render/html.rb +++ /dev/null @@ -1,70 +0,0 @@ -module Slaw - module Render - - # Support for transforming XML AN documents into HTML. - # - # This rendering is done using XSLT stylesheets. Both an entire - # document and fragments can be rendered. - class HTMLRenderer - - # [Hash] A Hash of Nokogiri::XSLT objects - attr_accessor :xslt - - def initialize - here = File.dirname(__FILE__) - - @xslt = { - act: Nokogiri::XSLT(File.open(File.join([here, 'xsl/act.xsl']))), - fragment: Nokogiri::XSLT(File.open(File.join([here, 'xsl/fragment.xsl']))), - } - end - - # Transform an entire XML document (a Nokogiri::XML::Document object) into HTML. - # Specify `base_url` to manage the base for relative URLs generated by - # the transform. - # - # @param doc [Nokogiri::XML::Document] document to render - # @param base_url [String] root URL for relative URLs (cannot be empty) - # - # @return [String] - def render(doc, base_url='') - params = _transform_params({'base_url' => base_url}) - _run_xslt(:act, doc, params) - end - - # Transform just a single node and its children into HTML. - # - # If +elem+ has an id, we use xpath to tell the XSLT which - # element to transform. Otherwise we copy the node into a new - # tree and apply the XSLT to that. - # - # @param node [Nokogiri::XML::Node] node to render - # @param base_url [String] root URL for relative URLs (cannot be empty) - # - # @return [String] - def render_node(node, base_url='') - params = _transform_params({'base_url' => base_url}) - - if node.id - params += ['root_elem', "//*[@id='#{node.id}']"] - doc = node.document - else - # create a new document with just this element at the root - doc = Nokogiri::XML::Document.new - doc.root = node - params += ['root_elem', '*'] - end - - _run_xslt(:fragment, doc, params) - end - - def _run_xslt(xslt, doc, params) - @xslt[xslt].transform(doc, params).to_s - end - - def _transform_params(params) - Nokogiri::XSLT.quote_params(params) - end - end - end -end diff --git a/lib/slaw/render/xsl/act.xsl b/lib/slaw/render/xsl/act.xsl deleted file mode 100644 index 8cf8ed5..0000000 --- a/lib/slaw/render/xsl/act.xsl +++ /dev/null @@ -1,15 +0,0 @@ - - - - - - - - - - - - - diff --git a/lib/slaw/render/xsl/elements.xsl b/lib/slaw/render/xsl/elements.xsl deleted file mode 100644 index 9811def..0000000 --- a/lib/slaw/render/xsl/elements.xsl +++ /dev/null @@ -1,120 +0,0 @@ - - - - - - an-act - - - - - - - -
-

- Part - - - - -

- - -
-
- - -
-

- Chapter - -
- -

- - -
-
- - - -
-

- Schedule - -
- -

- - -
-
- - -
-

- - - -

- - -
-
- - - - - - - - - - - - - - - /definitions/#def- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
diff --git a/lib/slaw/render/xsl/fragment.xsl b/lib/slaw/render/xsl/fragment.xsl deleted file mode 100644 index fd2467c..0000000 --- a/lib/slaw/render/xsl/fragment.xsl +++ /dev/null @@ -1,16 +0,0 @@ - - - - - - - - - - - - - - diff --git a/lib/slaw/version.rb b/lib/slaw/version.rb index f496ad9..34aac3f 100644 --- a/lib/slaw/version.rb +++ b/lib/slaw/version.rb @@ -1,3 +1,3 @@ module Slaw - VERSION = "0.17.2" + VERSION = "1.0.0" end diff --git a/lib/slaw/za/act.treetop b/lib/slaw/za/act.treetop deleted file mode 100644 index 96b5a7e..0000000 --- a/lib/slaw/za/act.treetop +++ /dev/null @@ -1,393 +0,0 @@ -# encoding: UTF-8 - -require 'slaw/parse/grammar_helpers' -require 'slaw/za/act_nodes' - -module Slaw - module ZA - grammar Act - include Slaw::Parse::GrammarHelpers - - ######## - # major containers - - rule act - empty_line* - preface:preface? - preamble:preamble? - body - schedules:schedules_container? - end - - rule preface - !'PREAMBLE' - ('PREFACE'i space? eol)? - statements:(!'PREAMBLE' pre_body_statement)* - end - - rule preamble - 'PREAMBLE'i space? eol - statements:pre_body_statement* - end - - rule body - children:(chapter / part / section / subsection / block_paragraphs)+ - end - - rule chapter - heading:chapter_heading - children:(part / section / subsection / block_paragraphs)* - - end - - rule part - heading:part_heading - children:(section / subsection / block_paragraphs)* - - end - - rule section - section_title - children:(subsection / block_paragraphs)*
- end - - rule subsection - space? subsection_prefix space? - # eg: (2) (a) foo - first_child:inline_block_element? - # eg: (2) - eol? - children:block_element* - end - - rule schedules_container - schedules:schedules - end - - rule schedules - children:schedule+ - end - - rule schedule - schedule_title - body:body? - - end - - ########## - # group elements - # - # these are used externally and provide support when parsing just - # a particular portion of a document - - rule chapters - children:chapter+ - end - - rule parts - children:part+ - end - - rule sections - children:section+ - end - - ########## - # headings - - rule chapter_heading - space? chapter_heading_prefix heading:(newline? content)? eol - - end - - rule part_heading - space? part_heading_prefix heading:(newline? content)? eol - - end - - rule section_title - section_title_1 / section_1_title - end - - rule section_title_1 - &{ |s| options[:section_number_after_title] } - # Section title - # 1. Section content - content eol - section_title_prefix whitespace - end - - rule section_1_title - # 1. Section title - # Section content - # - # Additionally, the section title is optional. - !{ |s| options[:section_number_after_title] } - space? section_title_prefix section_title:section_title_content? eol? - - end - - rule section_title_content - # don't match subsections, eg. - # - # 10. (1) subsection content... - space !subsection_prefix content eol - end - - rule schedule_title - space? schedule_title_prefix space? "\""? num:alphanums? "\""? [ \t:.-]* title:(content)? - heading:(newline space? content)? - eol - end - - ########## - # blocks of content inside containers - - rule block_paragraphs - block_element+ - end - - rule block_element - (table / blocklist / naked_statement) - end - - # Block elements that don't have to appear at the start of a line. - # ie. we don't need to guard against the start of a chapter, section, etc. - rule inline_block_element - (table / blocklist / inline_statement) - end - - rule blocklist - blocklist_item+ - end - - rule blocklist_item - # TODO: this whitespace should probably be space, to allow empty blocklist items followed by plain text - space? blocklist_item_prefix whitespace item_content:(!blocklist_item_prefix clauses:clauses? eol)? eol? - - end - - rule blocklist_item_prefix - ('(' letter_ordinal ')') / dotted_number_3 - end - - ########## - # wikimedia-style tables - # - # this grammar doesn't support inline table cells (eg: | col1 || col2 || col3) - # instead, the builder preprocesses tables to break inline cells onto their own - # lines, which we do support. - - rule table - space? '{|' eol - table_body - '|}' eol -
- end - - rule table_body - (table_row / table_cell)* - end - - rule table_row - '|-' space? eol - end - - rule table_cell - # don't match end-of-table - !'|}' - [!|] attribs:table_attribs? space? - # first content line, then multiple lines - content:(line:table_line (![!|] space? line:table_line)*) - - end - - rule table_line - clauses:clauses? eol - - end - - rule table_attribs - space? attribs:(table_attrib+) '|' - end - - rule table_attrib - name:([a-z_-]+) '=' value:( - ('"' (!'"' .)* '"') / - ("'" (!"'" .)* "'")) - space? - end - - ########## - # statements - single lines of content - # - # If a statement starts with a backslash, it's considered to have escaped the subsequent word, - # and is ignored. This allows escaping of section headings, etc. - - rule naked_statement - space? !(chapter_heading / part_heading / section_title / schedule_title / subsection_prefix) '\\'? clauses eol - - end - - rule pre_body_statement - space? !(chapter_heading / part_heading / section_title / schedule_title) '\\'? clauses eol - - end - - rule inline_statement - space? '\\'? clauses eol - - end - - ########## - # inline content - - # one or more words, allowing inline elements - rule clauses - (remark / image / ref / [^\n])+ - - end - - rule remark - '[[' content:(ref / (!']]' .))+ ']]' - - end - - rule image - # images like markdown - # eg. ![title text](image url) - # - # the title text is optional, but the enclosing square brackets aren't - '![' content:(!'](' [^\n])* '](' href:([^)\n]+) ')' - - end - - rule ref - # links like markdown - # eg. [link text](link url) - '[' content:(!'](' [^\n])+ '](' href:([^)\n]+) ')' - - end - - ########## - # prefixes - - rule part_heading_prefix - 'part'i space alphanums [ :-]* - end - - rule chapter_heading_prefix - 'chapter'i space alphanums [ :-]* - end - - rule schedule_title_prefix - 'schedule'i 's'i? - end - - rule section_title_prefix - number_letter '.'? - end - - rule subsection_prefix - # there are two subsection handling syntaxes: - # - # (1) foo - # (2A) foo - # - # and - # - # 8.2 for - # 8.3 bar - # - # The second is less common, but this allows us to handle it. - # Note that it is usually accompanied by a similar list number format: - # - # 8.2.1 item 1 - # 8.2.2 item 2 - # - # which aren't subsections, but lists, so force the space at the end - # of the number to catch this case. - num:('(' number_letter ')') - / - num:dotted_number_2 '.'? space - end - - ######### - ## one line of basic content - - rule content - # anything but a newline, followed by a - # newline or end of file (without consuming the newline) - [^\n]+ &eol - end - - ########## - # terminals - - # eg. 2, 2A, 2b - rule number_letter - number letter* - end - - rule letter_ordinal - letter (letter / digit)* - end - - rule dotted_number_3 - number '.' number ('.' number)+ - end - - rule dotted_number_2 - number '.' number - end - - rule number - digit+ - end - - rule digit - [0-9] - end - - rule letter - [a-zA-Z] - end - - rule alphanums - [a-zA-Z0-9]+ - end - - rule quotes - ["“”] - end - - rule non_quotes - [^"“”] - end - - ########## - # whitespace - - rule space - [ \t]+ - end - - rule whitespace - [ \t\n]* - end - - rule empty_line - space? newline - end - - rule eol - newline - empty_line* - end - - rule newline - "\n" - end - end - end -end diff --git a/lib/slaw/za/act_nodes.rb b/lib/slaw/za/act_nodes.rb deleted file mode 100644 index a2bbf11..0000000 --- a/lib/slaw/za/act_nodes.rb +++ /dev/null @@ -1,532 +0,0 @@ -module Slaw - module ZA - module Act - class Act < Treetop::Runtime::SyntaxNode - FRBR_URI = '/za/act/1980/01' - WORK_URI = FRBR_URI - EXPRESSION_URI = "#{FRBR_URI}/eng@" - MANIFESTATION_URI = EXPRESSION_URI - - def to_xml(b, idprefix=nil, i=0) - b.act(contains: "originalVersion") { |b| - write_meta(b) - write_preface(b) - write_preamble(b) - write_body(b) - } - write_schedules(b) - end - - def write_meta(b) - b.meta { |b| - write_identification(b) - - b.references(source: "#this") { - b.TLCOrganization(id: 'slaw', href: 'https://github.com/longhotsummer/slaw', showAs: "Slaw") - b.TLCOrganization(id: 'council', href: '/ontology/organization/za/council', showAs: "Council") - } - } - end - - def write_identification(b) - b.identification(source: "#slaw") { |b| - # use stub values so that we can generate a validating document - b.FRBRWork { |b| - b.FRBRthis(value: "#{WORK_URI}/main") - b.FRBRuri(value: WORK_URI) - b.FRBRalias(value: 'Short Title') - b.FRBRdate(date: '1980-01-01', name: 'Generation') - b.FRBRauthor(href: '#council') - b.FRBRcountry(value: 'za') - } - b.FRBRExpression { |b| - b.FRBRthis(value: "#{EXPRESSION_URI}/main") - b.FRBRuri(value: EXPRESSION_URI) - b.FRBRdate(date: '1980-01-01', name: 'Generation') - b.FRBRauthor(href: '#council') - b.FRBRlanguage(language: 'eng') - } - b.FRBRManifestation { |b| - b.FRBRthis(value: "#{MANIFESTATION_URI}/main") - b.FRBRuri(value: MANIFESTATION_URI) - b.FRBRdate(date: Time.now.strftime('%Y-%m-%d'), name: 'Generation') - b.FRBRauthor(href: '#slaw') - } - } - end - - def write_preface(b) - preface.to_xml(b) if preface.respond_to? :to_xml - end - - def write_preamble(b) - preamble.to_xml(b) if preamble.respond_to? :to_xml - end - - def write_body(b) - body.to_xml(b) - end - - def write_schedules(b) - if schedules.text_value != "" - schedules.to_xml(b) - end - end - end - - class Body < Treetop::Runtime::SyntaxNode - def to_xml(b) - b.body { |b| - children.elements.each_with_index { |e, i| e.to_xml(b, '', i) } - } - end - end - - class GroupNode < Treetop::Runtime::SyntaxNode - def to_xml(b, *args) - children.elements.each { |e| e.to_xml(b, *args) } - end - end - - class Preface < Treetop::Runtime::SyntaxNode - def to_xml(b, *args) - if text_value != "" - b.preface { |b| - statements.elements.each { |element| - for e in element.elements - e.to_xml(b, "") if e.is_a? NakedStatement - end - } - } - end - end - end - - class Preamble < Treetop::Runtime::SyntaxNode - def to_xml(b, *args) - if text_value != "" - b.preamble { |b| - statements.elements.each { |e| - e.to_xml(b, "") - } - } - end - end - end - - class Part < Treetop::Runtime::SyntaxNode - def num - heading.num - end - - def to_xml(b, *args) - id = "part-#{num}" - - # include a chapter number in the id if our parent has one - if parent and parent.parent.is_a?(Chapter) and parent.parent.num - id = "chapter-#{parent.parent.num}.#{id}" - end - - b.part(id: id) { |b| - heading.to_xml(b) - children.elements.each_with_index { |e, i| e.to_xml(b, id + '.', i) } - } - end - end - - class PartHeading < Treetop::Runtime::SyntaxNode - def num - part_heading_prefix.alphanums.text_value - end - - def title - if heading.text_value and heading.respond_to? :content - heading.content.text_value.strip - end - end - - def to_xml(b) - b.num(num) - b.heading(title) if title - end - end - - class Chapter < Treetop::Runtime::SyntaxNode - def num - heading.num - end - - def to_xml(b, *args) - id = "chapter-#{num}" - - # include a part number in the id if our parent has one - if parent and parent.parent.is_a?(Part) and parent.parent.num - id = "part-#{parent.parent.num}.#{id}" - end - - b.chapter(id: id) { |b| - heading.to_xml(b) - children.elements.each_with_index { |e, i| e.to_xml(b, id + '.', i) } - } - end - end - - class ChapterHeading < Treetop::Runtime::SyntaxNode - def num - chapter_heading_prefix.alphanums.text_value - end - - def title - if heading.text_value and heading.respond_to? :content - heading.content.text_value.strip - end - end - - def to_xml(b) - b.num(num) - b.heading(title) if title - end - end - - class Section < Treetop::Runtime::SyntaxNode - def num - section_title.num - end - - def title - section_title.title - end - - def to_xml(b, *args) - id = "section-#{num}" - b.section(id: id) { |b| - b.num("#{num}.") - b.heading(title) - - idprefix = "#{id}." - - children.elements.each_with_index { |e, i| e.to_xml(b, idprefix, i) } - } - end - end - - class SectionTitleType1 < Treetop::Runtime::SyntaxNode - # a section title of the form: - # - # Definitions - # 1. In this act... - - def num - section_title_prefix.number_letter.text_value - end - - def title - content.text_value - end - end - - class SectionTitleType2 < Treetop::Runtime::SyntaxNode - # a section title of the form: - # - # 1. Definitions - # In this act... - # - # In this format, the title is optional and the section content may - # start where we think the title is. - - def num - section_title_prefix.number_letter.text_value - end - - def title - section_title.empty? ? "" : section_title.content.text_value - end - end - - class BlockParagraph < Treetop::Runtime::SyntaxNode - def to_xml(b, idprefix='', i=0) - id = "#{idprefix}paragraph-0" - idprefix = "#{id}." - - b.paragraph(id: id) { |b| - b.content { |b| - elements.each_with_index { |e, i| e.to_xml(b, idprefix, i) } - } - } - end - end - - class Subsection < Treetop::Runtime::SyntaxNode - def num - subsection_prefix.num.text_value - end - - def to_xml(b, idprefix, i) - id = idprefix + num.gsub(/[()]/, '') - idprefix = id + "." - - kids = children.elements - kids = [first_child] + kids if first_child and !first_child.empty? - - b.subsection(id: id) { |b| - b.num(num) - b.content { |b| - if kids.empty? - # schema requires a non-empty content element - b.p - else - kids.each_with_index { |e, i| e.to_xml(b, idprefix, i) } - end - } - } - end - end - - class NakedStatement < Treetop::Runtime::SyntaxNode - def to_xml(b, idprefix, i=0) - b.p { |b| clauses.to_xml(b, idprefix) } if clauses - end - - def content - clauses - end - end - - class Clauses < Treetop::Runtime::SyntaxNode - def to_xml(b, idprefix=nil) - for e in elements - if e.respond_to? :to_xml - e.to_xml(b, idprefix) - else - b << e.text_value - end - end - end - end - - class Remark < Treetop::Runtime::SyntaxNode - def to_xml(b, idprefix) - b.remark(status: 'editorial') do |b| - b << '[' - for e in content.elements - if e.respond_to? :to_xml - e.to_xml(b, idprefix) - else - b << e.text_value - end - end - b << ']' - end - end - end - - class Image < Treetop::Runtime::SyntaxNode - def to_xml(b, idprefix) - attrs = {src: href.text_value} - attrs[:alt] = content.text_value unless content.text_value.empty? - b.img(attrs) - end - end - - class Ref < Treetop::Runtime::SyntaxNode - def to_xml(b, idprefix) - b.ref(content.text_value, href: href.text_value) - end - end - - class Blocklist < Treetop::Runtime::SyntaxNode - # Render a block list to xml. If a block is given, - # yield to it a builder to insert a listIntroduction node - def to_xml(b, idprefix, i=0, &block) - id = idprefix + "list#{i}" - idprefix = id + '.' - - b.blockList(id: id) { |b| - b.listIntroduction { |b| yield b } if block_given? - - elements.each { |e| e.to_xml(b, idprefix) } - } - end - end - - class BlocklistItem < Treetop::Runtime::SyntaxNode - def num - blocklist_item_prefix.text_value - end - - def to_xml(b, idprefix) - b.item(id: idprefix + num.gsub(/[()]/, '')) { |b| - b.num(num) - b.p { |b| - item_content.clauses.to_xml(b, idprefix) if respond_to? :item_content and item_content.respond_to? :clauses - } - } - end - end - - class Table < Treetop::Runtime::SyntaxNode - def to_xml(b, idprefix, i=0) - b.table(id: "#{idprefix}table#{i}") { |b| - # we'll gather cells into this row list - rows = [] - cells = [] - - for child in table_body.elements - if child.is_a? TableCell - # cell - cells << child - else - # new row marker - rows << cells unless cells.empty? - cells = [] - end - end - rows << cells unless cells.empty? - - for row in rows - b.tr { |tr| - for cell in row - cell.to_xml(tr, "") - end - } - end - } - end - end - - class TableCell < Treetop::Runtime::SyntaxNode - def to_xml(b, idprefix) - tag = text_value[0] == '!' ? 'th' : 'td' - - attrs = {} - if not attribs.empty? - for item in attribs.attribs.elements - # key=value (strip quotes around value) - attrs[item.name.text_value.strip] = item.value.text_value[1..-2] - end - end - - b.send(tag.to_sym, attrs) { |b| - b.p { |b| - # first line, and the rest - lines = [content.line] + content.elements.last.elements.map(&:line) - - lines.each_with_index do |line, i| - line.to_xml(b, i, i == lines.length-1) - end - } - } - end - end - - class TableLine < Treetop::Runtime::SyntaxNode - # line of table content - def to_xml(b, i, tail) - clauses.to_xml(b) unless clauses.empty? - - # add trailing newlines. - # for the first line, eat whitespace at the start - # for the last line, eat whitespace at the end - if not tail and (i > 0 or not clauses.empty?) - eol.text_value.count("\n").times { b.eol } - end - end - end - - class ScheduleContainer < Treetop::Runtime::SyntaxNode - def to_xml(b) - b.components { |b| - schedules.children.elements.each_with_index { |e, i| - e.to_xml(b, "", i+1) - } - } - end - end - - class Schedule < Treetop::Runtime::SyntaxNode - def num - n = schedule_title.num.text_value - return (n && !n.empty?) ? n : nil - end - - def alias - if not schedule_title.title.text_value.blank? - schedule_title.title.text_value - elsif num - "Schedule #{num}" - else - "Schedule" - end - end - - def heading - if schedule_title.heading.respond_to? :content - schedule_title.heading.content.text_value - else - nil - end - end - - def to_xml(b, idprefix=nil, i=1) - if num - n = num - component = "schedule#{n}" - else - n = i - # make a component name from the schedule title - component = self.alias.downcase().strip().gsub(/[^a-z0-9]/i, '').gsub(/ +/, '') - end - - id = "#{idprefix}#{component}" - - b.component(id: "component-#{id}") { |b| - b.doc_(name: component) { |b| - b.meta { |b| - b.identification(source: "#slaw") { |b| - b.FRBRWork { |b| - b.FRBRthis(value: "#{Act::WORK_URI}/#{component}") - b.FRBRuri(value: Act::WORK_URI) - b.FRBRalias(value: self.alias) - b.FRBRdate(date: '1980-01-01', name: 'Generation') - b.FRBRauthor(href: '#council') - b.FRBRcountry(value: 'za') - } - b.FRBRExpression { |b| - b.FRBRthis(value: "#{Act::EXPRESSION_URI}/#{component}") - b.FRBRuri(value: Act::EXPRESSION_URI) - b.FRBRdate(date: '1980-01-01', name: 'Generation') - b.FRBRauthor(href: '#council') - b.FRBRlanguage(language: 'eng') - } - b.FRBRManifestation { |b| - b.FRBRthis(value: "#{Act::MANIFESTATION_URI}/#{component}") - b.FRBRuri(value: Act::MANIFESTATION_URI) - b.FRBRdate(date: Time.now.strftime('%Y-%m-%d'), name: 'Generation') - b.FRBRauthor(href: '#slaw') - } - } - } - - b.mainBody { |b| - idprefix = "#{id}." - - # there is no good AKN hierarchy container for schedules, so we - # just use article because we don't use it anywhere else. - b.article(id: id) { |b| - b.heading(heading) if heading - body.children.elements.each_with_index { |e| e.to_xml(b, idprefix, i) } if body.is_a? Body - } - } - } - } - end - end - - class ScheduleStatement < Treetop::Runtime::SyntaxNode - def to_xml(b, idprefix) - b.p { |b| clauses.to_xml(b, idprefix) } if clauses - end - end - end - end -end diff --git a/slaw.gemspec b/slaw.gemspec index 0e685d4..e279729 100644 --- a/slaw.gemspec +++ b/slaw.gemspec @@ -8,9 +8,9 @@ Gem::Specification.new do |spec| spec.version = Slaw::VERSION spec.authors = ["Greg Kempe"] spec.email = ["greg@kempe.net"] - spec.summary = %q{A lightweight library for using Akoma Ntoso acts in Ruby.} - spec.description = %q{Slaw is a lightweight library for rendering and generating Akoma Ntoso acts from plain text and PDF documents.} - spec.homepage = "" + spec.summary = "A lightweight library for using Akoma Ntoso acts in Ruby." + spec.description = "Slaw is a lightweight library for rendering and generating Akoma Ntoso acts from plain text and PDF documents." + spec.homepage = "https://github.com/longhotsummer/slaw" spec.license = "MIT" spec.files = `git ls-files -z`.split("\x0") diff --git a/spec/act_spec.rb b/spec/act_spec.rb deleted file mode 100644 index d60e2c0..0000000 --- a/spec/act_spec.rb +++ /dev/null @@ -1,56 +0,0 @@ -# encoding: UTF-8 - -require 'spec_helper' -require 'slaw' - -describe Slaw::Act do - let(:filename) { File.dirname(__FILE__) + "/fixtures/community-fire-safety.xml" } - subject { Slaw::Act.new(filename) } - - it 'should have correct basic properties' do - subject.title.should == 'Community Fire Safety By-law' - subject.amended?.should be_true - end - - it 'should set the title correctly' do - subject.title = 'foo' - subject.meta.at_xpath('./a:identification/a:FRBRWork/a:FRBRalias', a: Slaw::NS)['value'].should == 'foo' - end - - it 'should set the title if it doesnt exist' do - subject.meta.at_xpath('./a:identification/a:FRBRWork/a:FRBRalias', a: Slaw::NS).remove - subject.title = 'bar' - subject.title.should == 'bar' - end - - it 'should set the publication details' do - subject.meta.at_xpath('./a:publication', a: Slaw::NS).remove - - subject.published!(name: 'foo', number: '1234', date: '2014-01-01') - subject.publication['name'].should == 'foo' - subject.publication['showAs'].should == 'foo' - subject.publication['number'].should == '1234' - end - - it 'should get/set the work date' do - subject.date.should == '2002-02-28' - - subject.date = '2014-01-01' - subject.date.should == '2014-01-01' - subject.meta.at_xpath('./a:identification/a:FRBRWork/a:FRBRdate[@name="Generation"]', a: Slaw::NS)['date'].should == '2014-01-01' - subject.meta.at_xpath('./a:identification/a:FRBRExpression/a:FRBRdate[@name="Generation"]', a: Slaw::NS)['date'].should == '2014-01-01' - - subject.id_uri.should == '/za/by-law/2014/2002' - end - - it 'should update the uri when the year changes' do - subject.id_uri.should == '/za/by-law/cape-town/2002/community-fire-safety' - subject.year = '1980' - subject.id_uri.should == '/za/by-law/1980/2002' - end - - it 'should validate' do - subject.validate.should == [] - subject.validates?.should be_true - end -end diff --git a/spec/bylaw_spec.rb b/spec/bylaw_spec.rb deleted file mode 100644 index cdb4147..0000000 --- a/spec/bylaw_spec.rb +++ /dev/null @@ -1,49 +0,0 @@ -# encoding: UTF-8 - -require 'spec_helper' -require 'slaw' - -describe Slaw::ByLaw do - let(:filename) { File.dirname(__FILE__) + "/fixtures/community-fire-safety.xml" } - subject { Slaw::ByLaw.new(filename) } - - it 'should have correct basic properties' do - subject.title.should == 'Community Fire Safety By-law as amended' - subject.amended?.should be_true - end - - it 'should update the uri when the region changes' do - subject.id_uri.should == '/za/by-law/cape-town/2002/community-fire-safety' - subject.region = 'foo-bar' - subject.id_uri.should == '/za/by-law/foo-bar/2002/community-fire-safety' - end - - it 'should update the uri when the name changes' do - subject.id_uri.should == '/za/by-law/cape-town/2002/community-fire-safety' - subject.name = 'foo-bar' - subject.id_uri.should == '/za/by-law/cape-town/2002/foo-bar' - end - - it 'should set the title if it doesnt exist' do - subject.meta.at_xpath('./a:identification/a:FRBRWork/a:FRBRalias', a: Slaw::NS).remove - subject.title = 'bar' - subject.title.should == 'bar as amended' - end - - it 'should get/set the work date' do - subject.date.should == '2002-02-28' - - subject.date = '2014-01-01' - subject.date.should == '2014-01-01' - subject.meta.at_xpath('./a:identification/a:FRBRWork/a:FRBRdate[@name="Generation"]', a: Slaw::NS)['date'].should == '2014-01-01' - subject.meta.at_xpath('./a:identification/a:FRBRExpression/a:FRBRdate[@name="Generation"]', a: Slaw::NS)['date'].should == '2014-01-01' - - subject.id_uri.should == '/za/by-law/cape-town/2014/community-fire-safety' - end - - it 'should update the uri when the year changes' do - subject.id_uri.should == '/za/by-law/cape-town/2002/community-fire-safety' - subject.year = '1980' - subject.id_uri.should == '/za/by-law/cape-town/1980/community-fire-safety' - end -end diff --git a/spec/generator_spec.rb b/spec/generator_spec.rb index b13821d..a4199c5 100644 --- a/spec/generator_spec.rb +++ b/spec/generator_spec.rb @@ -4,6 +4,8 @@ require 'slaw' describe Slaw::ActGenerator do + subject { Slaw::ActGenerator.new('za') } + describe 'guess_section_number_after_title' do context 'section number after title' do it 'should work' do diff --git a/spec/parse/builder_spec.rb b/spec/parse/builder_spec.rb index 01bb1e9..9e44cf5 100644 --- a/spec/parse/builder_spec.rb +++ b/spec/parse/builder_spec.rb @@ -715,54 +715,6 @@ end end - describe '#guess_at_definitions' do - it 'should find definitions in p elements' do - doc = xml2doc(section(<Definitions - - -

“authorised official” means any official of the Council who has been authorised by it to administer, implement and enforce the provisions of these By-laws;

-
-
- - - - -“Council” means – - - (a) -

the Metropolitan Municipality of the City of Johannesburg established by Provincial Notice No. 6766 of 2000 dated 1 October 2000, as amended, exercising its legislative and executive authority through its municipal Council; or

-
-
-
-
-XML - )) - - subject.guess_at_definitions(doc) - doc.to_s.should == section(<Definitions - - -

"authorised official" means any official of the Council who has been authorised by it to administer, implement and enforce the provisions of these By-laws;

-
-
- - - - "Council" means – - - (a) -

the Metropolitan Municipality of the City of Johannesburg established by Provincial Notice No. 6766 of 2000 dated 1 October 2000, as amended, exercising its legislative and executive authority through its municipal Council; or

-
-
-
-
-XML - ) - end - end - describe '#normalise_headings' do it 'should normalise ALL CAPS headings' do doc = xml2doc(section(< + + I + + +

Projekt ustawy

+
+
+ + 7 + Oznaczanie przepisów ustawy i ich systematyzacja +
+ 54. + +

Podstawową jednostką redakcyjną ustawy jest artykuł.

+
+
+
+ 55. + + 1. + +

Każdą samodzielną myśl ujmuje się w odrębny artykuł.

+
+
+ + 2. + +

Artykuł powinien być w miarę możliwości jednozdaniowy.

+
+
+ + 3. + +

Jeżeli samodzielną myśl wyraża zespół zdań, dokonuje się podziału artykułu na ustępy. W ustawie określanej jako "kodeks" ustępy oznacza się paragrafami (§).

+
+
+ + 4. + +

Podział artykułu na ustępy wprowadza się także w przypadku, gdy między zdaniami wyrażającymi samodzielne myśli występują powiązania treściowe, ale treść żadnego z nich nie jest na tyle istotna, aby wydzielić ją w odrębny artykuł.

+
+
+
+
+ 56. + + 1. + +

W obrębie artykułu (ustępu) zawierającego wyliczenie wyróżnia się dwie części: wprowadzenie do wyliczenia oraz punkty. Wyliczenie może kończyć się częścią wspólną, odnoszącą się do wszystkich punktów. Po części wspólnej nie dodaje się kolejnej samodzielnej myśli; w razie potrzeby formułuje się ją w kolejnym ustępie.

+
+
+ + 2. + +

W obrębie punktów można dokonać dalszego wyliczenia, wprowadzając litery.

+
+
+
+
+
+' + end + end + + #------------------------------------------------------------------------------- + # Articles + + describe 'articles' do + it 'should handle articles' do + node = parse :article, < + 1. + +

Ustawa reguluje opodatkowanie podatkiem dochodowym dochodów osób fizycznych

+
+' + end + + it 'should handle articles with blank lines' do + node = parse :article, < + 1. + +

Ustawa reguluje opodatkowanie podatkiem dochodowym dochodów osób fizycznych

+
+' + end + + it 'should handle consecutive articles' do + node = parse :body, < +
+ 1. + +

Ustawa reguluje opodatkowanie podatkiem dochodowym dochodów osób fizycznych

+
+
+
+ 2. + +

Something else

+
+
+' + end + + it 'should handle nested content' do + node = parse :article, < + 2. + + 1. + +

Przepisów ustawy nie stosuje się do:

+
+ + 1) + +

przychodów z działalności rolniczej, z wyjątkiem przychodów z działów specjalnych produkcji rolnej;

+
+
+ + 2) + +

przychodów z gospodarki leśnej w rozumieniu ustawy o lasach;

+
+
+
+' + end + end + + #------------------------------------------------------------------------------- + # Divisions + + describe 'divisions' do + it 'should handle divisions' do + node = parse :division, < + I + Projekt ustawy + + 7 + Oznaczanie przepisów ustawy i ich systematyzacja +
+ 54. + +

Podstawową jednostką redakcyjną ustawy jest artykuł.

+
+
+
+' + end + end + + #------------------------------------------------------------------------------- + # Subdivisions + + describe 'subdivisions' do + it 'should handle subdivisions' do + node = parse :subdivision, < + I + Projekt ustawy +
+ 54. + +

Podstawową jednostką redakcyjną ustawy jest artykuł.

+
+
+' + end + end + + #------------------------------------------------------------------------------- + # Paragraph + + describe 'paragraph' do + it 'should handle simple para' do + node = parse :paragraph, < + 1. + +

Każdą samodzielną myśl ujmuje się w odrębny artykuł.

+
+' + end + + it 'should handle an empty para' do + node = parse :paragraph, < + 1. + +

+ +' + end + + it 'should handle a para with whitespace' do + node = parse :paragraph, < + 1. + +

foo bar

+
+' + end + + it 'should handle paragraphs with points' do + node = parse :paragraph, < + 2. + +

W ustawie należy unikać posługiwania się:

+
+ + 1) + +

określeniami specjalistycznymi, o ile ich użycie nie jest powodowane zapewnieniem należytej precyzji tekstu;

+
+
+ + 2) + +

określeniami lub zapożyczeniami obcojęzycznymi, chyba że nie mają dokładnego odpowiednika w języku polskim;

+
+
+ + 3) + +

nowo tworzonymi pojęciami lub strukturami językowymi, chyba że w dotychczasowym słownictwie polskim brak jest odpowiedniego określenia.

+
+
+' + end + + it 'should not get confused by points with articles' do + node = parse :paragraph, < + 2. + +

W ustawie należy unikać posługiwania się:

+
+ + 1) + +

art. 1

+
+
+ + 2) + +

art. 2

+
+
+' + end + end + + #------------------------------------------------------------------------------- + # Section + + describe 'section' do + it 'should handle section with un-numbered para' do + node = parse :section, < + 5. + +

Przepisy ustawy redaguje si´ zwi´êle i syntetycznie, unikajàc nadmiernej szczegó∏owoÊci, a zarazem w sposób, w jaki opisuje si´ typowe sytuacje wyst´pujàce w dziedzinie spraw regulowanych tà ustawà.

+
+' + end + + it 'should handle section with numbered para on the same line' do + node = parse :section, < + 54. + +

Podstawową jednostką redakcyjną ustawy jest artykuł.

+
+' + end + + it 'should handle section with numbered paras' do + node = parse :section, < + 55. + + 1. + +

Każdą samodzielną myśl ujmuje się w odrębny artykuł.

+
+
+ + 2. + +

Artykuł powinien być w miarę możliwości jednozdaniowy.

+
+
+ + 3. + +

Jeżeli samodzielną myśl wyraża zespół zdań, dokonuje się podziału artykułu na ustępy. W ustawie określanej jako "kodeks" ustępy oznacza się paragrafami (§).

+
+
+' + end + + it 'should not confuse section content with block elements' do + node = parse :section, < + 55. + +

1. Każdą samodzielną myśl ujmuje się w odrębny artykuł.

+
+ + 3. + +

Jeżeli samodzielną myśl wyraża zespół zdań, dokonuje się podziału artykułu na ustępy. W ustawie określanej jako "kodeks" ustępy oznacza się paragrafami (§).

+
+
+' + end + + it 'should handle section with intro, para and points' do + node = parse :section, < + 54. + +

Podstawową jednostką redakcyjną ustawy jest artykuł.

+
+ + +

Something here

+
+
+ + 1) + +

a point

+
+
+ + 2) + +

second point

+
+
+' + end + + it 'should not get confused by sections with articles' do + node = parse :section, < + 54. + +

Art 1. is changed...

+
+' + end + end + + #------------------------------------------------------------------------------- + # Point + + describe 'point' do + it 'should handle basic point' do + node = parse :point, < + 1) + +

szczegółowy tryb i terminy rozpatrywania wniosków o udzielenie finansowego wsparcia;

+
+' + end + + it 'should handle points with litera' do + node = parse :point, < + 1) + +

dokumenty potwierdzające prawo własności albo prawo użytkowania wieczystego nieruchomości, której dotyczy przedsięwzięcie albo na której położony jest budynek, którego budowę, remont lub przebudowę zamierza się przepro- wadzić w ramach realizacji przedsięwzięcia, w tym:

+
+ + a) + +

oryginał albo potwierdzoną za zgodność z oryginałem kopię wypisu i wyrysu z rejestru gruntów wszystkich dzia- łek ewidencyjnych, na których realizowane jest przedsięwzięcie, wydanego nie wcześniej niż 3 miesiące przed dniem złożenia wniosku, oraz

+
+
+ + b) + +

numer księgi wieczystej;

+
+
+' + end + end + + #------------------------------------------------------------------------------- + # Litera + + describe 'litera' do + + it 'should handle litera with indents' do + node = parse :litera, < + b) + +

liczby:

+
+ + + +

tworzonych lokali wchodzących w skład mieszkaniowego zasobu gminy,

+
+
+ + +

mieszkań chronionych,

+
+
+ + +

lokali mieszkalnych powstających z udziałem gminy albo związku międzygminnego w wyniku realizacji przedsięwzięć, o których mowa w art. 5 ust. 1 i art. 5a ust. 1 ustawy,

+
+
+ + +

tymczasowych pomieszczeń,

+
+
+ + +

miejsc w noclegowniach, schroniskach dla bezdomnych i ogrzewalniach,

+
+
+
+' + end + end + + #------------------------------------------------------------------------------- + # Indent + + describe 'indent' do + it 'should handle basic indent' do + node = parse :indents, < + + +

tworzonych lokali wchodzących w skład mieszkaniowego zasobu gminy,

+
+
+' + end + + it 'should handle indents with different dash characters' do + node = parse :indents, < + + +

foo

+
+
+ + +

bar

+
+
+' + end + + it 'should handle empty indents' do + node = parse :indents, < + + +

+ + + + +

+ + +' + end + + it 'should handle multiple indent items' do + node = parse :indents, < + + +

tworzonych lokali wchodzących w skład mieszkaniowego zasobu gminy,

+
+
+ + +

mieszkań chronionych,

+
+
+ + +

lokali mieszkalnych powstających z udziałem gminy albo związku międzygminnego w wyniku realizacji przedsięwzięć, o których mowa w art. 5 ust. 1 i art. 5a ust. 1 ustawy,

+
+
+' + end + end + +end diff --git a/spec/za/act_block_spec.rb b/spec/za/act_block_spec.rb index 8b6e0f3..c39643c 100644 --- a/spec/za/act_block_spec.rb +++ b/spec/za/act_block_spec.rb @@ -3,6 +3,8 @@ require 'slaw' describe Slaw::ActGenerator do + subject { Slaw::ActGenerator.new('za') } + def parse(rule, s) subject.builder.text_to_syntax_tree(s, {root: rule}) end @@ -1897,12 +1899,12 @@ def to_xml(node, *args) it 'should handle a clause with a remark' do node = parse :clauses, "simple [[remark]]. text" node.text_value.should == "simple [[remark]]. text" - node.elements[7].is_a?(Slaw::ZA::Act::Remark).should be_true + node.elements[7].is_a?(Slaw::Grammars::ZA::Act::Remark).should be_true node = parse :clauses, "simple [[remark]][[another]] text" node.text_value.should == "simple [[remark]][[another]] text" - node.elements[7].is_a?(Slaw::ZA::Act::Remark).should be_true - node.elements[7].is_a?(Slaw::ZA::Act::Remark).should be_true + node.elements[7].is_a?(Slaw::Grammars::ZA::Act::Remark).should be_true + node.elements[7].is_a?(Slaw::Grammars::ZA::Act::Remark).should be_true end end end diff --git a/spec/za/act_inline_spec.rb b/spec/za/act_inline_spec.rb index 275d5b6..b1996b2 100644 --- a/spec/za/act_inline_spec.rb +++ b/spec/za/act_inline_spec.rb @@ -3,6 +3,8 @@ require 'slaw' describe Slaw::ActGenerator do + subject { Slaw::ActGenerator.new('za') } + def parse(rule, s) subject.builder.text_to_syntax_tree(s, {root: rule}) end diff --git a/spec/za/act_schedules_spec.rb b/spec/za/act_schedules_spec.rb index 38b2617..7f89d35 100644 --- a/spec/za/act_schedules_spec.rb +++ b/spec/za/act_schedules_spec.rb @@ -3,6 +3,8 @@ require 'slaw' describe Slaw::ActGenerator do + subject { Slaw::ActGenerator.new('za') } + def parse(rule, s) subject.builder.text_to_syntax_tree(s, {root: rule}) end diff --git a/spec/za/act_table_spec.rb b/spec/za/act_table_spec.rb index 37e278e..8f168a7 100644 --- a/spec/za/act_table_spec.rb +++ b/spec/za/act_table_spec.rb @@ -3,6 +3,8 @@ require 'slaw' describe Slaw::ActGenerator do + subject { Slaw::ActGenerator.new('za') } + def parse(rule, s) subject.builder.text_to_syntax_tree(s, {root: rule}) end