Performance on Large Files #342
Unanswered
noahssarcastic
asked this question in
Help
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi, I'm new to the Ohmlang community but love how accessible Ohm is to pick up (especially for developers with a bit of a languages background) compared to other frameworks. I'm using Ohm for a work project where I've written a grammar for a geosciences file-type called .grdecl. Files in this format can commonly be 100k+ lines. I'm succeeding in parsing smaller dummy files, but when I try to parse large files, I'm running out of RAM. I've upgraded to 8GB but have still been unable to parse a file of ~150k lines.
I've been unable to find any guidelines for performance optimizations when using Ohm, nor much discussion about performance upgrades to the project. I was wondering if I've been ignorant or if there hasn't been much discussion on this topic yet. I would love to interact with the community and possibly help contribute in my spare time if it means that Ohm may be able to support larger files such as these in the future.
I'm unsure if this is outside the scope of what the project is currently trying to accomplish, but I would like to state that I see Ohm solving a massive problem in the larger scientific community. It is pretty standard for scientific developers to work in proprietary or somewhat arcane data format standards that are difficult to serialize in custom software. It is often the case that the only way to solve this problem is to write a one-off parser that only suits the needs of a specific project. Javascript projects are historically underrepresented in the scientific community, and I think that projects such as this one can help bridge the data gap. The ability to create a more general parser that can convert these data formats to something more serializable could help pave the way to creating more javascript projects in fields such as the geosciences, solid-state physics, etc. I have personally run into this problem with the Crystallographic Information File format, Protein Data Bank format, and more recently Eclipse Grid format.
Beta Was this translation helpful? Give feedback.
All reactions