Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assorted bug fixes. #3

Closed
wants to merge 7 commits into from
Closed

Conversation

ruricolist
Copy link
Contributor

This is a compilation of bug fixes cherry-picked from FXML. Some are mine, most were gathered from others over the years.

dmitryvk and others added 6 commits September 16, 2018 17:33
Signed-off-by: Paul M. Rodriguez <pmr@ruricolist.com>
That is, if a document has a UTF-16 byte order mark, complain if the
XML header specifies UTF-8. The XML header can specify which of
several versions of the current encoding the document uses, but it
can't change the encoding completely.

The actual issue here is dumb XML serializers that print "UTF-8"
whatever the actual encoding may be.
@scymtym scymtym self-assigned this Sep 22, 2018
@scymtym
Copy link
Member

scymtym commented Sep 22, 2018

@slyrus Do you want to look at this?

If you don't have any objections, I'm going to merge this with (probably) minor modifications.

Copy link
Member

@slyrus slyrus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fixes @ruricolist ! I still think it would be good to either unify fxml and cxml or deprecate cxml and use fxml, but I haven't convinced myself that that's a reasonable thing to do yet.

(let* ((cxml
(slot-value (asdf:find-system :cxml) 'asdf::relative-pathname))
(dtd (merge-pathnames "catalog.dtd" cxml)))
(let* ((load-truename #.(or *compile-file-truename* *load-truename*))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I like this. What's wrong with having the path by asdf-relative?

@@ -162,20 +162,20 @@
(setf (aref r i) 1))))) )

`(progn
(DEFINLINE NAME-RUNE-P (RUNE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we have a separate commit for things that are just case changes?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't the case changes all in the "Allegro CL modern mode fixes" commit?

@scymtym
Copy link
Member

scymtym commented Sep 22, 2018

My only concern is with "Fix memory leak in klacks". The commit message doesn't explain

  1. where the memory leak comes from
  2. how to reproduce it
  3. how the fix works

I used this modification

modified   xml/xml-parse.lisp
@@ -179,12 +179,12 @@
 
 (defvar *ctx* nil)
 
-(defstruct (context (:conc-name nil))
+(defstruct (context (:conc-name nil) (:constructor make-context (&key handler dtd model-stack base-stack referenced-notations id-table name-hashtable standalone-p entity-resolver disallow-internal-subset main-zstream &aux (%base-stack base-stack))))
   handler
   (dtd nil)
   model-stack
   ;; xml:base machen wir fuer klacks mal gleich als expliziten stack:
-  base-stack
+  %base-stack
   (referenced-notations '())
   (id-table (%make-rod-hash-table))
   ;; FIXME: Wofuer ist name-hashtable da?  Will man das wissen?
@@ -194,6 +194,16 @@
   (disallow-internal-subset nil)
   main-zstream)
 
+(defun base-stack (context)
+  (%base-stack context))
+
+(defun (setf base-stack) (new-value context)
+  (when (> (length new-value) (length (%base-stack context)))
+    (break))
+  (format t "~V<~>~A~%" (length (%base-stack context)) (first (%base-stack context)))
+  (setf (%base-stack context) new-value))
+
+
 (defvar *expand-pe-p* nil)
 
 (defparameter *initial-namespace-bindings*

and

(loop with source = (klacks:make-tapping-source (cxml:make-source #P"SOMEFILE"))
      for event = (multiple-value-list (klacks::consume source))
      until (eq (first event) :end-document))

to understand what is going on: klacks/element calls p/sztag which pushes onto base-stack, but in the context of klacks, there is no corresponding pop (and probably similarly for namespace-stack). As far as I can tell, that answers 1), 2) and 3).

That said, I don't like how the proposed fix addresses the problem. I'm now trying whether using the following macro in p/element and klacks/element works:

(defmacro with-restored-base-stack ((context) &body body)
  (let ((context-var (gensym "CONTEXT"))
        (old-base-stack-var (gensym "OLD-BASE-STACK"))) ; TODO use alexandria later
    `(let* ((,context-var ,context)
            (,old-base-stack-var (base-stack ,context-var)))
       (prog1
           (progn ,@body)
         (setf (base-stack ,context-var) ,old-base-stack-var)))))

Opinions?

@ruricolist
Copy link
Contributor Author

@scymtym The klacks memory leak isn't just a memory leak. It prevents klacks from working at all with XML that uses namespaces. When klacks encounters an element that declares a namespace, it sets the current namespace, but it never pops it so it continues reading elements in that namespace until it encounters another element declaration.

@scymtym
Copy link
Member

scymtym commented Sep 24, 2018

@ruricolist Thanks for the comment, I think we are on the same page in that regard. In my comment, I was suggesting to use a with-restored-base-stack macro instead of adding the required pops.

@ruricolist
Copy link
Contributor Author

ruricolist commented Sep 24, 2018

@scymtym I do think the macro is an improvement. (Although maybe it should use multiple-value-prog1?) I was trying to suggest where you might look for test cases: basically any XML that uses namespaces.

I think I first ran into the problem for myself when trying to parse Feedburner feeds.

Unfortunately CXML doesn't have a test suite that can be added to yet, but that is what I plan to port next.

@scymtym
Copy link
Member

scymtym commented Sep 26, 2018

@scymtym I do think the macro is an improvement. (Although maybe it should use multiple-value-prog1?) I was trying to suggest where you might look for test cases: basically any XML that uses namespaces.

I think I first ran into the problem for myself when trying to parse Feedburner feeds.

Thank you for the clarification and suggestion. I will move forward with the macro-approach.

Unfortunately CXML doesn't have a test suite that can be added to yet, but that is what I plan to port next.

Great.

@ruricolist
Copy link
Contributor Author

I missed one -- a fix for the fix to DTD embedding, which can result in duplicate internal subsets.

@ruricolist
Copy link
Contributor Author

The test suite is ready:

https://github.com/ruricolist/cxml-tests

I've made a separate repository for it because it's quite large (>30 Mb). You might decide (as I did) that keeping the test suite with the rest of the code is worth the extra size.

(The other advantage of a separate repository, of course, is that you can easily test against older versions of CXML.)

@scymtym scymtym mentioned this pull request Sep 27, 2018
@scymtym
Copy link
Member

scymtym commented Sep 27, 2018

The test suite is ready:

https://github.com/ruricolist/cxml-tests

That's great, thank you.

I've made a separate repository for it because it's quite large (>30 Mb). You might decide (as I did) that keeping the test suite with the rest of the code is worth the extra size.

(The other advantage of a separate repository, of course, is that you can easily test against older versions of CXML.)

I opened #4 for discussing those questions so we can close this one at some point.

@scymtym
Copy link
Member

scymtym commented Sep 29, 2018

I merged everything but "Remove runtime dependency on ASDF" (about which @slyrus had doubts) into master.

@ruricolist Thanks again for the patches. Please open a separate PR for discussing the "Remove runtime dependency on ASDF" change if you need it.

@scymtym scymtym closed this Sep 29, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants