Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is streaming encoding possible with Saxy? #109

Closed
thbar opened this issue Jun 16, 2022 · 6 comments
Closed

Is streaming encoding possible with Saxy? #109

thbar opened this issue Jun 16, 2022 · 6 comments

Comments

@thbar
Copy link
Contributor

thbar commented Jun 16, 2022

Maybe it is already supported, but I'm not 100% clear with this, and I will be very happy to document the findings in a way or another, so opening the discussion :-)

I am currently manipulating potentially large XML responses provided by third-party servers inside a proxy.

Basically, someone queries our proxy with a small XML query, then we modify it, and send the payload to a third-party server.

What I would like to do is stream the response of the third-party server as a client, modify it on the fly (e.g. redacting sensitive elements) and send it back to the client, also in streaming fashion.

This means I would need to stream the large response, but also generate a large streaming encoded document out of it, all with minimal memory.

I think @JoeZ99 paved the way in #100 and https://joez99.medium.com/stream-output-when-parsing-big-xml-with-elixir-92baff37e607, and maybe everything is more or less already here.

Is "streaming encoding" possible currently with Saxy? If I plug this with the above work, I'll have achieved what I need.

Thanks for your input!

@thbar
Copy link
Contributor Author

thbar commented Jun 16, 2022

Note: I am aware about this (Understanding IOData):

saxy/test/saxy_test.exs

Lines 136 to 145 in 98e2c9e

describe "encode_to_iodata!/2" do
import Saxy.XML
test "encodes XML document into IO data" do
root = element("foo", [], "foo")
assert xml = Saxy.encode_to_iodata!(root, version: "1.0")
assert is_list(xml)
assert IO.iodata_to_binary(xml) == ~s(<?xml version="1.0"?><foo>foo</foo>)
end
end

Although it is already a nice optimisation, having real streaming would be even better for arbitrarily large & diverse XML documents.

@JoeZ99
Copy link
Contributor

JoeZ99 commented Jun 16, 2022

@thbar
I'm having a hard time understanding what you want to do. (Sorry about that :-) ) Can you please try to explain it again? maybe it's better for me to understand if you use a concrete example (Again, I'm sorry. English is not my mother tongue, so maybe that's the reason...)

@JoeZ99
Copy link
Contributor

JoeZ99 commented Jun 17, 2022

@thbar , what would it be the elements of the output stream? binary chunks of xml?? this way, you could process these chunks as they come without worrying too much for memory usage ... something like this:

# input_file.xml
<root_element>
  <item>
    <!-- lots of tags and things belonging to "item" that you may want to process -->
    ....
  </item>
  <item>...</item>
  <item>...</item>
  ...
</root_element>
File.stream!("input_file.xml") |> Saxy.parse_stream(EventParserModule, element: "item") |> Enum.to_list()

And EventParserModule is a sax-events-processing modules of the kind Saxy use to need for Saxy.parse_string and the like. But the "string" to be parsed by EventParserModule is the content of each <item> element.
And, of course the elements Enum.to_list() receives would be whatever Saxy.parse_string(item_content, EventParserModule, initial_state) produces , given that item_content is the content of each <item> element ...

something like that???

@thbar
Copy link
Contributor Author

thbar commented Jun 27, 2022

@JoeZ99 re:

I'm having a hard time understanding what you want to do. (Sorry about that :-) ) Can you please try to explain it again?

I will come back to this a bit later, and will improve my explanation!

@thbar
Copy link
Contributor Author

thbar commented Jul 13, 2022

My bandwidth is too limited for this at the moment, but I will likely reopen in the future. Closing to avoid polluting the project!

@thbar thbar closed this as completed Jul 13, 2022
@JoeZ99
Copy link
Contributor

JoeZ99 commented Jul 13, 2022

ok, @thbar , nevertheless, thank you for making myself look into the issue, the explanation I proposed above is a good starting point for start to think about a more proper ¨stream output" of Saxy.
@qcam , what do you think???

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants