Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with non-latin characters #29

Closed
BestianCode opened this issue Sep 18, 2015 · 11 comments
Closed

Problem with non-latin characters #29

BestianCode opened this issue Sep 18, 2015 · 11 comments

Comments

@BestianCode
Copy link
Contributor

Hi! The problem appears when the search string contains Cyrillic (non-ascii) symbols.


Example for latin (ascii):

Search request: (|(displayName=Smith)(cn=Smith))

Debug from ldap library:

Attribute: (Universal, Primitive, Octet String) Len=11 "displayName"
Substrings Any: (Context, Primitive, 0x01) Len=5 "Smith"
Attribute: (Universal, Primitive, Octet String) Len=2 "cn"
Substrings Any: (Context, Primitive, 0x01) Len=5 "Smith"

Debug by my printf in file gopkg.in/ldap.v1/filter.go, function compileFilter, line number 193:
image

Filter:(|(displayName=*Smith*)(cn=*Smith*)) / filter[newPos]: * / newPos: 15
Filter:(|(displayName=*Smith*)(cn=*Smith*)) / filter[newPos]: S / newPos: 16
Filter:(|(displayName=*Smith*)(cn=*Smith*)) / filter[newPos]: m / newPos: 17
Filter:(|(displayName=*Smith*)(cn=*Smith*)) / filter[newPos]: i / newPos: 18
Filter:(|(displayName=*Smith*)(cn=*Smith*)) / filter[newPos]: t / newPos: 19
Filter:(|(displayName=*Smith*)(cn=*Smith*)) / filter[newPos]: h / newPos: 20
Filter:(|(displayName=*Smith*)(cn=*Smith*)) / filter[newPos]: * / newPos: 21

Debug from LDAP Server:

55fbcb6d ==>backsql_search(): base="ou=sl it,ou=aup,ou=tsg,ou=quadra,o=enterprise", filter="(|(displayName=smith)(cn=smith))", scope=2, deref=0, attrsonly=0, attributes to load: custom list

55fbcb72 Constructed query: SELECT DISTINCT ldap_entries.id,ldapx_persons.id,text('inetOrgPerson') AS objectClass,ldap_entries.dn AS dn FROM ldap_entries,ldapx_persons WHERE ldapx_persons.id=ldap_entries.keyval AND ldap_entries.oc_map_id=? AND lower(ldap_entries.dn) LIKE lower('%'||?) AND ldapx_persons.lang=0 AND ((lower(text(ldapx_persons.fullname)) LIKE '%SMITH%') OR (lower(text(ldapx_persons.surname||' '||ldapx_persons.name)) LIKE '%SMITH%'))

All Good!



Example for Cyrilic (non-ascii):

I'll put pictures to avoid problems with Cyrillic and non-printable symbols.

Search request:
image

Debug from ldap library:
image

Debug by my printf in file gopkg.in/ldap.v1/filter.go, function compileFilter, line number 193:
image

Debug From LDAP Server:
image

...


@BestianCode
Copy link
Contributor Author

Since the library is in the ASCII mode, i offer little fix (or hack) to work with Cyrillic and possibly other charcters...

Size of the cyrillic symbols in the "string" - 2 bytes, whereas the latin symbol - 1 byte.

I did these steps:

  1. I debug the library half a day :)
  2. Open file gopkg.in/ldap.v1/filter.go
  3. import "regexp" library
  4. make regexp [0-9A-Za-z\ \*\-\_] for detect ascii symbols (This regexp needs to be expanded)
  5. I added code to function "compileFilter" for detect latin and non-latin symbols. If symbol is non-latin, put to "condition" 2 bytes and additional increment "newPos". If symbol is latin, put to "condition" 1 byte.

diff for gopkg.in/ldap.v1/filter.go

*** filter.go.orig  2015-09-17 10:40:06.000000000 +0300
--- filter.go   2015-09-17 13:31:42.000000000 +0300
***************
*** 8,13 ****
--- 8,14 ----
    "errors"
    "fmt"
    "strings"
+   "regexp"

    "gopkg.in/asn1-ber.v1"
  )
***************
*** 164,169 ****
--- 165,172 ----
        }
    }()

+   asciiChechRegexp:=regexp.MustCompile(`[0-9A-Za-z\ \*\-\_]`)
+ 
    newPos := pos
    switch filter[pos] {
    case '(':
***************
*** 190,196 ****
        for newPos < len(filter) && filter[newPos] != ')' {
            switch {
            case packet != nil:
-               condition += fmt.Sprintf("%c", filter[newPos])
            case filter[newPos] == '=':
                packet = ber.Encode(ber.ClassContext, ber.TypeConstructed, FilterEqualityMatch, nil, FilterMap[FilterEqualityMatch])
            case filter[newPos] == '>' && filter[newPos+1] == '=':
--- 193,207 ----
        for newPos < len(filter) && filter[newPos] != ')' {
            switch {
            case packet != nil:
+               if !asciiChechRegexp.MatchString(fmt.Sprintf("%c", filter[newPos])) {
+ //                    fmt.Printf("DEBUG RU:%s / %s / %d\n", filter, filter[newPos:newPos+2], newPos)
+                   condition += fmt.Sprintf("%s", filter[newPos:newPos+2])
+                   newPos++
+               }else{ 
+ //                    fmt.Printf("DEBUG EN:%s / %c / %d\n", filter, filter[newPos], newPos)
+                   condition += fmt.Sprintf("%c", filter[newPos])
+               }
+ //                fmt.Printf("DEBUG TT:%v\n", condition)
            case filter[newPos] == '=':
                packet = ber.Encode(ber.ClassContext, ber.TypeConstructed, FilterEqualityMatch, nil, FilterMap[FilterEqualityMatch])
            case filter[newPos] == '>' && filter[newPos+1] == '=':

@BestianCode
Copy link
Contributor Author

Example for Cyrilic (non-ascii) after path implementation:

Search request:
image

Debug from ldap library:
image

Debug by my printf in file gopkg.in/ldap.v1/filter.go, function compileFilter, line number 193:
image

Debug From LDAP Server:
image

Victory!:)

@johnweldon
Copy link
Member

Rather than using a hack to guess at non-ascii strings, I'd like to see correct handling of unicode. In my short googling I've found RFC 4518, and a go library for normalizing1 unicode to KC.

I will try to work on this in the next few weeks, anyone else interested in working on this is welcome to also.

@liggitt
Copy link
Contributor

liggitt commented Sep 18, 2015

+1 for proper escaping instead of regex tests/fixups

@BestianCode
Copy link
Contributor Author

This is very good news ;) Unicode - this is very good and versatile! I do not have work experience with unicode, but happy to learn it. For start, i will read the documentation and study the possibility of normalizing library.

@BestianCode
Copy link
Contributor Author

This is a slight modification of code using Unicode:

This code works, I tested it with different search terms. While it is good to optimize and improve :)

*** filter.go.orig  2015-09-17 10:40:06.000000000 +0300
--- filter.go   2015-09-21 16:34:04.042139038 +0300
***************
*** 8,13 ****
--- 8,14 ----
    "errors"
    "fmt"
    "strings"
+   "unicode/utf8"

    "gopkg.in/asn1-ber.v1"
  )
***************
*** 187,211 ****
    default:
        attribute := ""
        condition := ""
!       for newPos < len(filter) && filter[newPos] != ')' {
            switch {
            case packet != nil:
!               condition += fmt.Sprintf("%c", filter[newPos])
!           case filter[newPos] == '=':
                packet = ber.Encode(ber.ClassContext, ber.TypeConstructed, FilterEqualityMatch, nil, FilterMap[FilterEqualityMatch])
!           case filter[newPos] == '>' && filter[newPos+1] == '=':
                packet = ber.Encode(ber.ClassContext, ber.TypeConstructed, FilterGreaterOrEqual, nil, FilterMap[FilterGreaterOrEqual])
                newPos++
!           case filter[newPos] == '<' && filter[newPos+1] == '=':
                packet = ber.Encode(ber.ClassContext, ber.TypeConstructed, FilterLessOrEqual, nil, FilterMap[FilterLessOrEqual])
                newPos++
!           case filter[newPos] == '~' && filter[newPos+1] == '=':
                packet = ber.Encode(ber.ClassContext, ber.TypeConstructed, FilterApproxMatch, nil, FilterMap[FilterLessOrEqual])
                newPos++
            case packet == nil:
!               attribute += fmt.Sprintf("%c", filter[newPos])
            }
!           newPos++
        }
        if newPos == len(filter) {
            err = NewError(ErrorFilterCompile, errors.New("ldap: unexpected end of filter"))
--- 188,239 ----
    default:
        attribute := ""
        condition := ""
!       xfilter   := filter
!       xckl      := 0
!       xsize     := 0
! //        fmt.Printf("X1: %v / %v / %d /bytes x: %d /bytes f: %d/rune: %d\n", xfilter, filter, newPos, len(xfilter), len(filter), utf8.RuneCountInString(xfilter))
!       for {
!           xckl        += xsize
!           if xckl>newPos-xsize {
!               break
!           }
!           _, xsize    = utf8.DecodeRuneInString(xfilter)
!           xfilter     = xfilter[xsize:len(xfilter)]
!       }
! //        fmt.Printf("X2: %v / %v / %d /bytes x: %d /bytes f: %d/rune: %d\n", xfilter, filter, newPos, len(xfilter), len(filter), utf8.RuneCountInString(xfilter))
!       xsymbol, _              := utf8.DecodeRuneInString("")
!       xstop, _                := utf8.DecodeRuneInString(")")
!       xpar, _         := utf8.DecodeRuneInString("=")
!       xpar_more, _            := utf8.DecodeRuneInString(">")
!       xpar_less, _            := utf8.DecodeRuneInString("<")
!       xpar_tld, _     := utf8.DecodeRuneInString("~")
!       xsymbol, xsize   = utf8.DecodeRuneInString(xfilter)
!       xfilter          = xfilter[xsize:len(xfilter)]
!       xsymbol2, _             := utf8.DecodeRuneInString(xfilter)
!       for len(xfilter)>0 && xsymbol != xstop {
! //            fmt.Printf("X3: %c,%c\n", xsymbol, xsymbol2)
            switch {
            case packet != nil:
!               condition += fmt.Sprintf("%c", xsymbol)
! //                fmt.Printf("X4:%c -> %s\n", xsymbol, condition)
!           case xsymbol == xpar:
                packet = ber.Encode(ber.ClassContext, ber.TypeConstructed, FilterEqualityMatch, nil, FilterMap[FilterEqualityMatch])
!           case xsymbol == xpar_more   && xsymbol2 == xpar:
                packet = ber.Encode(ber.ClassContext, ber.TypeConstructed, FilterGreaterOrEqual, nil, FilterMap[FilterGreaterOrEqual])
                newPos++
!           case xsymbol == xpar_less   && xsymbol2 == xpar:
                packet = ber.Encode(ber.ClassContext, ber.TypeConstructed, FilterLessOrEqual, nil, FilterMap[FilterLessOrEqual])
                newPos++
!           case xsymbol == xpar_tld    && xsymbol2 == xpar:
                packet = ber.Encode(ber.ClassContext, ber.TypeConstructed, FilterApproxMatch, nil, FilterMap[FilterLessOrEqual])
                newPos++
            case packet == nil:
!               attribute += fmt.Sprintf("%c", xsymbol)
            }
!           xsymbol, xsize  = utf8.DecodeRuneInString(xfilter)
!           xfilter         = xfilter[xsize:len(xfilter)]
!           xsymbol2, _     = utf8.DecodeRuneInString(xfilter)
!           newPos          += xsize
        }
        if newPos == len(filter) {
            err = NewError(ErrorFilterCompile, errors.New("ldap: unexpected end of filter"))

@liggitt
Copy link
Contributor

liggitt commented Sep 21, 2015

Thanks for the investigation. If you have working code, could you put it in a branch and open a pull request? That would make it easier to review and check out for local testing.

@BestianCode
Copy link
Contributor Author

I created a pull request

@liggitt
Copy link
Contributor

liggitt commented Sep 21, 2015

#31 for reference

@BestianCode
Copy link
Contributor Author

#32 for reference

@liggitt
Copy link
Contributor

liggitt commented Oct 5, 2015

Fixed by #31/#32/#34

@liggitt liggitt closed this as completed Oct 5, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants