Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

accessing user_state inside partial to allow streaming output #100

Merged
merged 4 commits into from May 16, 2022

Conversation

JoeZ99
Copy link
Contributor

@JoeZ99 JoeZ99 commented May 4, 2022

with the use of Saxy.Partial.parse_stream() we can also "streamize" the output (i.e. not having to wait until some provided empty state at the beginning of the parse is fully accumulated), but we should be able to access user state so we can keep it "clean"

It's hard to explain, but see this code

xml_stream = File.stream!("huge_xml_file.xml")
initial_user_state = %{elements: []}

{:ok, partial} = Saxy.Partial.new(DooFeeds.Parser.XmlEventHandler, initial_user_state)

Stream.chunk_while(
  xml_stream,
  partial,
  fn chunk, partial ->
    {:cont, partial} = Saxy.Partial.parse(partial, chunk)
    case Saxy.Partial.get_state(partial) do
      %{elements: elements} when length(elements) > 0 -> 
        {:cont, elements, Saxy.Partial.set_state(partial, {elements: []})}
      _ -> 
        {:cont, partial}
    end
  end,
  fn partial ->
    partial = Saxy.Partial.terminate(partial)
    case Saxy.Partial.get_state(partial) do
      %{elements: elements} when length(elements) > 0 -> {:cont, elements, partial}
      _ -> {:cont, partial}
    end
  end
)

that would allow us to "streamize" parsing output, and partially it fixes #95

@qcam
Copy link
Owner

qcam commented May 15, 2022

@JoeZ99 Thanks for the PR! It looks great to me but could you please run the formatter?

@JoeZ99
Copy link
Contributor Author

JoeZ99 commented May 15, 2022

done!

@JoeZ99
Copy link
Contributor Author

JoeZ99 commented May 15, 2022

@qcam , it's been very hard for me to explain what this PR is for, and about the function names, do you think they are correct and good indicators of function are doing? or should they be improved??
this article summarizes how I reached to that
https://joez99.medium.com/stream-output-when-parsing-big-xml-with-elixir-92baff37e607

@qcam
Copy link
Owner

qcam commented May 15, 2022

I think what you proposed is a very valid use case.

get_state sounds like a good name and API though TBH I'm not completely sure about introducing set_state. I assume the only reason we would want to set a new state is to continue parsing right? In that case, can we make Partial.parse to accept the third argument which is a new state like this?

Saxy.Partial.parse(partial, binary) # continue with previous accumulated state.
Saxy.Partial.parse(partial, binary, new_state) # continue previous partial with a new state.

Do you think it would work?

Partials have very constrained use case and I would be more conservative about introducing new API (so it's harder to use in a wrong way) unless it's unavoidable.

@qcam
Copy link
Owner

qcam commented May 15, 2022

And great blog post ❤️ ! It's really cool to see the library is used in different creative ways :D.

@JoeZ99
Copy link
Contributor Author

JoeZ99 commented May 16, 2022

Yes, @qcam , you're right. I'll modify the code to do what you suggest and yes, the only reason to "set the state" is to prevent it from growing when parsing large xml files.
and txs for reading the blog post!

@JoeZ99
Copy link
Contributor Author

JoeZ99 commented May 16, 2022

@qcam , how do you like it now??

Copy link
Owner

@qcam qcam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ❤️

lib/saxy/partial.ex Outdated Show resolved Hide resolved
@qcam qcam merged commit f680914 into qcam:master May 16, 2022
@qcam
Copy link
Owner

qcam commented May 16, 2022

Thanks @JoeZ99!

@JoeZ99 JoeZ99 deleted the access_state branch May 17, 2022 02:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Output Stream
2 participants