PGtokenizer ParseError #2050

rtrier · 2021-02-08T14:29:00Z

Describe the issue
If the String to parse contains brackets inside nestedDoubleQuote the result is not correct

Driver Version?
last
Java Version?
dosnt matter

OS Version?
dosnt matter

PostgreSQL Version?
dosnt matter

To Reproduce
Steps to reproduce the behaviour:

PGtokenizer tokenizer = new PGtokenizer(",,d,\"f(10\",\"(mime,pdf,pdf)\",test,2018-10-11,1010", ',');

for (int i=0, c = tokenizer.getSize(); i<c; i++) {
	System.out.println(i+"  "+tokenizer.getToken(i));
}

Results:

0  
1  
2  d
3  "f(10","(mime,pdf,pdf)",test,2018-10-11,1010

Expected behaviour

0  
1  
2  d
3  "f(10"
4  "(mime,pdf,pdf)"
5  test
6  2018-10-11
7  1010

** Solution **
quick

package de.gdiservice.wfs.test;

import java.util.HashMap;
import java.util.Map;
import java.util.Stack;

import org.postgresql.util.PGtokenizer;

public class Tokenizer extends PGtokenizer {
	
	static final Map<Character, Character> closing2OpeningCharacter = new HashMap<>();
	static {
		closing2OpeningCharacter.put(')', '(');
		closing2OpeningCharacter.put(']', '[');
		closing2OpeningCharacter.put('>', '<');
		closing2OpeningCharacter.put('"', '"');
	}

	public Tokenizer(String string, char delim) {
		super(string, delim);
		System.out.println(string);
	}
	
	  /**
	   * This resets this tokenizer with a new string and/or delimiter.
	   *
	   * @param string containing tokens
	   * @param delim single character to split the tokens
	   * @return number of tokens
	   */
	  public int tokenize(String string, char delim) {
	    tokens.clear();
	    
	    final Stack<Character> stack = new Stack<>();

	    // nest holds how many levels we are in the current token.
	    // if this is > 0 then we don't split a token when delim is matched.
	    //
	    // The Geometric datatypes use this, because often a type may have others
	    // (usualls PGpoint) imbedded within a token.
	    //
	    // Peter 1998 Jan 6 - Added < and > to the nesting rules
	    int nest = 0;
	    int p;
	    int s;
	    boolean skipChar = false;
	    boolean nestedDoubleQuote = false;
	    char c = (char)0;
	    for (p = 0, s = 0; p < string.length(); p++) {
	      c = string.charAt(p);

	      // increase nesting if an open character is found
	      if (c == '(' || c == '[' || c == '<' || (!nestedDoubleQuote && !skipChar && c == '"')) {
	        nest++;
	        stack.push(c);
	        if (c == '"') {
	          nestedDoubleQuote = true;
	          skipChar = true;
	        }
	      }

	      // decrease nesting if a close character is found
	      if (c == ')' || c == ']' || c == '>' || (nestedDoubleQuote && !skipChar && c == '"')) {
	    	  
	    	
	        if (c == '"') {
	        	while (stack.size()>0 && stack.peek().charValue()!='"') {  
		    		nest--;
		    		stack.pop();	
		    	}	
	        	nestedDoubleQuote = false;
	        	stack.pop();
		    	nest--;
	        } else {
	        	if (stack.size()>0 && stack.peek().charValue()==closing2OpeningCharacter.get(c).charValue()) {
	        		stack.pop();
			    	nest--;
	        	}
	        }
	      }

	      skipChar = c == '\\';

	      if (nest == 0 && c == delim) {
	        tokens.add(string.substring(s, p));
	        s = p + 1; // +1 to skip the delimiter
	      }

	    }

	    // Don't forget the last token ;-)
	    if (s < string.length()) {
	      tokens.add(string.substring(s));
	    }

	    // check for last token empty
	    if ( s == string.length() && c == delim) {
	      tokens.add("");
	    }

	    return tokens.size();
	  }		

	public static void main(String[] args) {
		
		PGtokenizer tokenizer = new Tokenizer(",,d,\"f(10\",\"(mime,pdf,pdf)\",test,2018-10-11,1010", ',');
		for (int i=0, c = tokenizer.getSize(); i<c; i++) {
			System.out.println(i+"  "+tokenizer.getToken(i));
		}
	}

}

The text was updated successfully, but these errors were encountered:

davecramer · 2021-02-08T14:42:06Z

@rtrier can you turn this into a PR with the additional test?

rtrier · 2021-02-09T14:35:09Z

I have now a pull request ready. The test "Travis CI - Pull Request" does'nt succeeded. I have no idea how to fix this.

fix small spelling errors (#2062)

davecramer closed this as completed in f8644a7 Feb 15, 2021

davecramer pushed a commit that referenced this issue Feb 15, 2021

* Fix resolve ParseError in PGtokenizer fixes #2050

9d6ab68

fix small spelling errors (#2062)

sync-by-unito bot mentioned this issue Feb 19, 2021

Bump postgresql from 42.2.18 to 42.2.19 liquibase/liquibase-postgresql#36

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PGtokenizer ParseError #2050

PGtokenizer ParseError #2050

rtrier commented Feb 8, 2021

davecramer commented Feb 8, 2021

rtrier commented Feb 9, 2021

PGtokenizer ParseError #2050

PGtokenizer ParseError #2050

Comments

rtrier commented Feb 8, 2021

davecramer commented Feb 8, 2021

rtrier commented Feb 9, 2021