New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance degredation with noddy csv parser #269

Open
ehiggs opened this Issue Oct 24, 2017 · 14 comments

Comments

Projects
None yet
5 participants
@ehiggs

ehiggs commented Oct 24, 2017

The performance of this csv parser code went from reading 1m lines in ~1second to taking minutes with 13.1.

The grammar is:

grammar;                                                                        
                                                                                
pub Record = Comma<Field>;                                                      
                                                                                
Comma<T>: Vec<T> = {                                                            
    <v:(<T> ",")*> <e:T?> => match e {                                          
        None=> v,                                                               
        Some(e) => {                                                            
            let mut v = v;                                                      
            v.push(e);                                                          
            v                                                                   
        }                                                                       
    }                                                                           
};                                                                              
                                                                                
Field: String = {                                                               
    <s:r#""[^"]+""#> => {                                                       
        s[1..s.len()-1].to_string()                                             
    },                                                                          
    <s:r#"[^",][^,]*"#> => {                                                    
        s.to_string()                                                           
    },                                                                          
}; 

The test program counts the number of fields encountered:

pub mod csv;

use std::io::BufReader;
use std::io::BufRead;
use std::fs::File;

fn main() {
    let fpath = ::std::env::args().nth(1).unwrap();
    let f = File::open(fpath).unwrap();
    let file = BufReader::new(&f);
    let mut sum = 0;
    for line in file.lines() {
        let l = line.unwrap();
        let rec = csv::parse_Record(&l).unwrap();
        sum += rec.len();
    }
    println!("{}", sum);
}

Results:

$ for t in 1 10 100 1000 10000 100000 ; do time ./target/release/lalr-reader <(head -${t} /tmp/hello.csv ); done
5

real    0m0.006s
user    0m0.002s
sys     0m0.009s
50

real    0m0.006s
user    0m0.004s
sys     0m0.006s
500

real    0m0.014s
user    0m0.014s
sys     0m0.001s
5000

real    0m0.085s
user    0m0.084s
sys     0m0.001s
50000

real    0m0.838s
user    0m0.837s
sys     0m0.002s
500000

real    0m8.274s
user    0m8.261s
sys     0m0.007s

$ time ./target/release/lalr-reader /tmp/hello.csv 
5000000

real    1m22.654s
user    1m22.554s
sys     0m0.007s

@ehiggs ehiggs changed the title from Performance divergence with noddy csv parser to Performance degredation with noddy csv parser Nov 2, 2017

@nikomatsakis

This comment has been minimized.

Collaborator

nikomatsakis commented Jan 24, 2018

As reported in #273 -- believed to be due to regex recompilation. I can presumably cache these.

@nikomatsakis

This comment has been minimized.

Collaborator

nikomatsakis commented Jan 26, 2018

OK, this is a problem we really have to fix, obviously.

There are two options. One option, which preserves the API, is to create a thread-local value containing the compiled regular expressions that we can reuse between executions.

Another option would be to modify the API so that the user is responsible for instantiating the tokenizer and giving it to us (giving them control of how long it lives).

I'm inclined towards the first option -- or, perhaps, ultimately, a hybrid. That is, we could generate an "easy entry point", that uses the thread-local, and later have a second entry point that takes an explicit tokenizer, and you can call the one you prefer.

So let's start with the thead-local. The internal tokenizer is created here:

pub fn compile<W: Write>(
grammar: &Grammar,
intern_token: &InternToken,
out: &mut RustWrite<W>)
-> io::Result<()>
{

You can see that it generates a bunch of calls into the regex APIs, and in particular creates a RegexSet here:

rust!(out, "let {}regex_set = {}regex::RegexSet::new({}strs).unwrap();",
prefix, prefix, prefix);

We could modify this code to use lazy_static! or we could just write something like it ourselves using thread_local!. If we use lazy_static!, we'd have to advise the user to add it, and I'm not 100% sure it would work with RegexSet. I think using a thread-local feels easier. Something like:

thread_local! {
    static regex_set: RefCell<Option<RegexSet>> = None
}

then, if it is Some, we would not create a new one, but otherwise we would, and we would use access it as needed later on.

The only downsides of this is it will never get freed. But of course people always have the option of making their own lexer (and we can also add the more flexible option later).

@ahmedcharles

This comment has been minimized.

Contributor

ahmedcharles commented Jan 26, 2018

Just curious, why was this faster before 13.1?

@federicomenaquintero

This comment has been minimized.

Contributor

federicomenaquintero commented Jan 26, 2018

lazy_static! seems marginally better to me, since (I think?) a thread-local one would cause each thread that uses the parser to recompile the regexps the first time. Unless I'm getting the behavior of lazy_static! completely wrong for the threaded case :)

If I may suggest something:

  • Implement any of lazy_static or thread_local for now, as a stop-gap measure.

  • Decide whether we really need an API to let the user create the tokenizer. My experience with APIs that once required no initial objects to be created, but they later do, is that you end up with two parallel APIs, a new one with a "context" parameter all over the place, and another one without. I haven't thought how this is different in the lalrpop case, where it just generates functions you call directly, instead of giving you back objects upon which you later do stuff.

  • If we really need that new API, just remove the old one and do the new one in a major version?

(The only case where I think it's worthwhile to actually free the regexps is if a program were generating parsers dynamically at runtime... an interactive program to let you explore grammars?) :)

@willmurphyscode

This comment has been minimized.

willmurphyscode commented Feb 6, 2018

I tried to write use git bisect to help answer @ahmedcharles question - Why was this formerly faster? I couldn't get the bisect script to work because there are quite a few commits in master that I wasn't able to compile locally, so I can't bench them. I think the first commit to master that had the slow down is one I can't build locally, so I can't tell which of the ones I can't build locally has the problem.

Do we know what changed? It seems like we have two options: cache regexes in lazy_static, or cache regexes in thread_local. Is there a third option: revert whatever made 13.1 slower?

I also started trying to implement the lazy_static approach, but ran into a question: where is the crate root for the generated code? Is it the main fail of the user's crate? I'm asking because master...willmurphyscode:try-staticdiff-aaa35664da81cd904ea0c482dca7dd60R52 won't compile, failing with, for example:

error[E0468]: an `extern crate` loading macros must be at the crate root
     --> doc/pascal/lalrpop/src/pascal.rs:93272:18
      |
93272 |     #[macro_use] extern crate lazy_static as __lazy_static;
      |                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@ahmedcharles

This comment has been minimized.

Contributor

ahmedcharles commented Feb 7, 2018

There's actually another option, which is to change the interface to the parser. Like:

pub mod csv;

use std::io::BufReader;
use std::io::BufRead;
use std::fs::File;

fn main() {
    let fpath = ::std::env::args().nth(1).unwrap();
    let f = File::open(fpath).unwrap();
    let file = BufReader::new(&f);
    let mut sum = 0;
    let parser = csv::Parser::new();
    for line in file.lines() {
        let l = line.unwrap();
        let rec = parser.parse_Record(&l).unwrap();
        sum += rec.len();
    }
    println!("{}", sum);
}

After seeing the contentiousness of the choice to use TLS in futures, I think it's best to avoid that in favor of 'simpler' solutions that more directly address the issue. That way people don't show up in two years confused about why they have random bugs or can't understand why the parser doesn't work the way they expect.

@nikomatsakis

This comment has been minimized.

Collaborator

nikomatsakis commented Feb 7, 2018

@willmurphyscode

I'm asking because master...willmurphyscode:try-staticdiff-aaa35664da81cd904ea0c482dca7dd60R52 won't compile, failing with, for example:

Ah, hmm, that is an obstacle. We'd have to generate our own code.


@ahmedcharles

There's actually another option, which is to change the interface to the parser. Like:

Yes, I've been wanting to avoid that, just because I continue to want writing LALRPOP code to be "drop dead" simple where possible, but to be honest now that I see it written out it doesn't seem "more complex" per se. And I do like giving users the control.

(It might still be nice to offer a convenience form, but that can wait.)

Of course if we do this we have to update the docs and tests and so forth.

It might also be a good idea to offer the existing methods as wrappers, I suppose, perhaps with a "deprecation" warning:

#[deprecated("prefer the new `FooParser::new().parse(...)` style of invocation")]
fn parse_Foo(...) {
    FooParser::new().parse(...)
}
@nikomatsakis

This comment has been minimized.

Collaborator

nikomatsakis commented Feb 7, 2018

@willmurphyscode would you be interested in changing to match what @ahmedcharles is suggesting? This is the rough area of the code that would have to be changed.

The start_parse_fn method emits the parse_Foo functions:

pub fn start_parser_fn(&mut self) -> io::Result<()> {

Similarly the define_tokens method emits the code that creates the tokenizer (this is the code we want to share):

pub fn define_tokens(&mut self) -> io::Result<()> {
if self.grammar.intern_token.is_some() {
// if we are generating the tokenizer, create a matcher as our input iterator
rust!(self.out,
"let mut {}tokens = {}::{}intern_token::{}Matcher::new(input);",
self.prefix,
self.action_module,
self.prefix,
self.prefix);
} else {
// otherwise, convert one from the `IntoIterator`
// supplied, using the `ToTriple` trait which inserts
// errors/locations etc if none are given
let clone_call = if self.repeatable { ".clone()" } else { "" };
rust!(self.out,
"let {}tokens = {}tokens0{}.into_iter();",
self.prefix,
self.prefix,
clone_call);
rust!(self.out,
"let mut {}tokens = {}tokens.map(|t| {}ToTriple::to_triple(t));",
self.prefix,
self.prefix,
self.prefix);
}
Ok(())
}

They are invoked from both the recursive ascent and parse table driven code roughly around here.

Finally, the code that defines the Matcher (i.e., internal tokenizer) is in intern_token/mod.rs. Somewhat annoyingly, the Matcher struct currently pairs together the RegexSet (which is what we want to be caching across parses) and the text from a single parse:

pub struct __Matcher<'input> {
text: &'input str,
consumed: usize,
regex_set: __regex::RegexSet,
regex_vec: Vec<__regex::Regex>,
}

So I imagine we would want do the following:

  • Break apart Matcher so that the RegexSet can be cached. Perhaps have a MatcherBuilder or some such thing. In this first step, the generate code stays much the same except that instead of doing Matcher::new(input) it does MatcherBuilder::new().matcher(input).

  • Create a new "sibling method" (i.e., in the base.rs file) that generates the new parser type. Initially, ParseFoo::new() would do nothing and all the work would be in the parse(input) method, which is precisely the same as today's parse_Foo method basically.

  • Finally, In the case where intern_token.is_some(), move the creation of the builder into the ParseFoo struct so it is cached across invocations.

@nikomatsakis

This comment has been minimized.

Collaborator

nikomatsakis commented Feb 7, 2018

@ahmedcharles

Just curious, why was this faster before 13.1?

I imagine the turning point was actually release 0.13; that is when we changed from generating our own regex to using the regex library. I imagine this is what made it slower (but it also vastly improved compilation time).

@ahmedcharles

This comment has been minimized.

Contributor

ahmedcharles commented Feb 7, 2018

Ah, that makes sense.

@nikomatsakis

This comment has been minimized.

Collaborator

nikomatsakis commented Feb 7, 2018

All of that said, I do feel like some convenience that uses thread_local! would be a fine stopgap -- and probably useful later on too. Seems like lazy_static is harder; not worth it -- I would stick with thread-local for now. (Later on, if we cared, we could implement our own lazy_static, using once, but it seems like more trouble than it's worth.)

I am imagining that even with a ParserFoo type, it'd be nice to be able to do ParserFoo::cached().parse(...) basically and have it create (and re-use) a shared ParserFoo instance. (Of course we would want to document this behavior.)

@ahmedcharles

This comment has been minimized.

Contributor

ahmedcharles commented Feb 7, 2018

thread_local! isn't really a trivial feature and no where as simple as creating a struct and calling a function on it or passing a parameter to a function. The implications of using it within a library impacts every user whether they know it or not and it's not even easy to tell that it's there. It seems like the ideal example of making a short term problem into a long term one or turning a minor increase in up front complexity for a larger increase in hidden complexity. Granted, it's really nice for implementing a memory manager like jemalloc, but I'm not sure many problems come with exactly the unique set of requirements that a drop in replacement for malloc has.

@nikomatsakis

This comment has been minimized.

Collaborator

nikomatsakis commented Feb 7, 2018

Well, my general concern is that integrating LALRPOP doesn't involve a lot of ceremony to achieve decent performance etc. That said, I definitely agree that caching values across invocations is an "orthogonal concern", and ideally we would not force the use of thread-local or other means to achieve it (though I'd be ok with offering it as a convenience measure).

The main thing is to do the refactoring I described earlier. Whether or not we offer ParserFoo::cached() or ask the user to do the caching I think is a smaller matter.

@ahmedcharles

This comment has been minimized.

Contributor

ahmedcharles commented Feb 7, 2018

I agree.

@willmurphyscode If you want to work on this, respond here. If not, I'll take a look when I get a chance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment