Discussion needed for the next steps: User-solutions for approximate-position-searching? #12
Replies: 6 comments 10 replies
-
The SCID code in the orig directory does a lot more than just parse. Much, much more. Keep in mind what I mentioned in an earlier e-mail. There are at least a couple of fresh CQL implementations on the near horizon, and their licensing might be more amenable to your needs. At least one of those has been written from scratch (partly so that it can handle variants) and would not carry any Scid baggage with it. The formal announcement should be just around the corner. Also, Gregor began implementing his own CQL for Scidb. https://sourceforge.net/projects/scidb/ I'm not sure how far along he got but you might want to take a look at it. His language has a different syntax and might give you some ideas for whatever you decide on for your own expressive chess query language.
Just to let others compare the above with the CQL equivalent syntax:
|
Beta Was this translation helpful? Give feedback.
-
I have started creating a BNF (Backus Naur Form) for the syntax of the expression/condition parser, used for position searching. I have tried to make it to be compatible as much as possible with CQL.
A condition/expression may have some chess piece types. For evaluating, they are cardinalities/total numbers of those chess pieces on the chessboard. For examples:
The condition may be implicit or explicit:
Some other examples:
|
Beta Was this translation helpful? Give feedback.
-
On 1/7/22 04:08, nguyenpham wrote:
I have started creating a BNF (Backus Naur Form) for the syntax of the expression/condition parser, used for position searching. I have tried to make it to be compatible as much as possible with CQL.
|condition = expression | expression (“=“|“<”|”<=“|”>”|”>=“ | ”and” | ”or”) expression expression = [“+”|”-“] term {(“+”|”-“) term } term = factor {(“*”|”/“]) factor} factor = piececardinality | number | “(“ expression “)” piececardinality = piece { <empty> | square|squareset } piece = “K” | “Q” | “R” | “B” | “N” | “P” | “k” | “q” | “r” | “b” | “n” | “p” | “W” | “B”|
|Is this valid BNF? "B" for Bishop and for Black?|
… |square = column row | column | row column = “a” | “b” | “c” | “d” | “e” | “f” | “g” | “h” row = “1” | “2” | “3” | “4” | “5” | “6” | “7” | “8” squareset = “[“ (square {“,” square} | square "-" square | column “-“ column | row “-“ row) “]” |
Below are examples:
|// find all positions having 3 White Queens Q = 3 // Find all positions having two Black Rooks in the middle squares r[e4, e5, f4, f5] = 2 // White Pawns in d4, e5, f4, g4, Black King in b7 P[d4, e5, f4, g4] = 4 and kb7 |
—
Reply to this email directly, view it on GitHub <#12 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ALI6WKZC3UIZBCG6GHLIFH3UU3CRHANCNFSM5LNQPVIA>.
Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you commented.Message ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
If you go into that direction, it might be really worthwhile though to convert the PGN into a binary format on the fly. It should be not much slower during conversion; it certainly won't be slower when searching, and it saves a lot of space! Just for the sake of the example I took a one game out of millionbase 2.22, which as 33 moves. No comments/variations, which is probably true for most games stored in large databases. The string of moves has 409 chars, i.e. 409 bytes. The PGN is approx 1.4 GB Using a two-byte (source destination) encoding of moves, we require 66 bytes for all 33 moves. That is a reduction down to 16 percent of the original size. In other words instead of requiring 1.4 GB to store all games, a user would only require space roughly in the 150 to 250 MB range. That is a huge(!) difference, especially if we consider that Mega has over 9 million games. It might be even possible to then further accelarate the search by using SQlite's memmaping ( https://www.sqlite.org/mmap.html ), as we can keep a database in memory for most current hardware systems. It also should not be too difficult to design a (simple) binary format. Basically something like this:
i.e. something like this:
(this should be further refined, but basically more or less directly correspond to PGN features to make conversion simple and without loosing information) The great advantage of SQLite btw is that everything is in one file. A program author could create additional "indices" into byte structures / extra tables which are not required for interop, i.e. purely optional/proprietary, but accelarate certain search queries at the cost of space. I think Chessbase did something similar what they dubbed "search booster". An alternative to SQL might be HDF5, but it certainly is not as widespread as SQlite. PS: Have a look at your 'typical' club player. They will never be able to type in a CQL query. They will always need a GUI, no matter if the backend is CQL or something else. |
Beta Was this translation helpful? Give feedback.
-
From a post by Fulvio on TalkChess:
Here's the CQL syntax for comparison: |
Beta Was this translation helpful? Give feedback.
-
I have implemented the parser for that language (Attempt 10th). All works so well! |
Beta Was this translation helpful? Give feedback.
-
In this post, we will make a brief about what we have/done and then what issues we are facing. We would like to hear your opinions, suggestions and get help from some experts.
A. Current status
1. Building databases
With the latest Attempt 9th, we store game contents into SQLite databases in a simple, straightforward way. Each game contains two important fields, one is a string of FEN (empty if the game started from the start position), the other is a string of all moves. The last string is not yet split nor parsed into individual moves.
This way makes the building process be very fast. A PGN file of 3.45 million games required under a minute to be converted into a SQL database stored on the hard disk (the program used only one thread, running on my old computer Quad-Core i7, 3.6 GHz, 16 GB RAM).
2. Extract moves/positions
Because we don’t store any information about moves nor their positions we much create all those information on the fly, whenever we need. Basically to do position matching or searching, we need to read all games from the database, parse the text of moves into moves, make all those moves in a chessboard to get all information about their positions.
In our tests, the time to read all games from the SQLite database is just similar to the time to read them from the PGN file. Total time to read and parse all games of a database 3.45 million games is under 4 minutes on my computer (IMHO, that is fast, somewhat is comparable to SCID - the fastest chess database program so far).
The code to read all games, get their pair strings of FEN-moves without splitting and parsing as the following:
Below is the code to parse the pair strings into a chessboard:
3. Exact-position-matching and approximate-position-searching
Both matching and searching need the information of all positions which are created on the fly as mentioned in the above part. After parsing/making each move the program calls a callback function, provides it with redundant parameters, including a full set of bitboards, king squares, hash key of the position. We can add code for that callback function to do any task we need.
For example, to find all positions with 3-White-Queens we create the callback as the following. The code takes the bitboard of all Queens, do operator AND with the bitboard of side White and then count all bit 1, using the function popCount:
All works well. The callback function runs very fast. It takes some extra time but just a little bit, thus the total time is quite close to the time needed to read and parse games (about 4 minutes).
B. The issue and considering solutions
Even the way to match/search as above part works, it requires hard-coding. We need to write some code, compile and run it. That is impossible for average users.
We need solutions for this issue. It should help users to query what they want without coding.
One of the simplest solutions is to use dialog boxes. The program creates some dialog boxes, users tick some boxes, select something from dropdowns. After all the program will run the callback function with parameters from that dialog box (using some If-Switch commands). However, using dialog boxes has a huge limit since they can’t cover all cases and any effort to expand covering may pay by heavy complexity. Dialog boxes are suitable for the programs with GUI but not for ones in the form of console/command lines.
We think solutions should relate to the text-only: users can query what they want just by entering some simple text without compiling and rerunning the program.
At the moment we have been considering two solutions as the below.
1. Using CQL (Chess Query Language)
Frankly speaking, even though I have known CQL for a while but I didn’t high evaluate it nor plan to use it since it looked so complicated for me. I didn’t learn to use it and was confused about its license. Clearly, I didn’t have enough information and avoided it as much as possible. However, some experts have given me some simple examples of usage, enough for me to understand how strong it is and how simple to use it is (at least there are some simple ways to use it).
If we could integrate CQL with SQL databases all will be done in a quick way. CQL has been developing for a long time by a good team thus it should be good for its tasks.
We have started finding more information, got the latest code as well as some confirmations from CQL authors (thanks a lot and high appreciation to them). The authors don’t have any objection about using/integrating their code and may support us too.
The good news is that the code written by those authors is an MIT license, compatible with the license of this project, and could be used in any other project.
However, they warned us that CQL is using some code from SCID with a GPL license which conflicts with our license. They can’t contact Shane Hudson (the first author of SCID) to change the license. That code is mostly for PGN parsing and may take a lot of effort to replace. Yes, we have code to parse PGN but looks like not an easy task (to replace).
2. Developing a simple expression parser
Instead of coding, users can enter an expression as a text. The program will compile, auto changes it to parameters, and run with the callback function. For example, users can enter such as below expression strings:
Since that is just simple expressions, short strings, focusing on working with bitboards, and some specific chess constants (such as d4), it should be much easier to implement than CQL. Of course, that is our code we won’t have any problem with license conflicts.
C. Discussion
We love to hear your opinions, especially someone who has experience working and/or implementing CQL. Any suggestion, help on implementation is highly appreciated.
Beta Was this translation helpful? Give feedback.
All reactions