Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Numerical sorting #226

Merged
merged 1 commit into from
Jan 5, 2015
Merged

Numerical sorting #226

merged 1 commit into from
Jan 5, 2015

Conversation

dantleech
Copy link
Contributor

This PR adds an extra column numerical_props which is populated with LONG, DECIMAL and FLOAT values (in an XML doc).

When sorting the query will ORDER first by casting the value of the property in the numberical_props column and then by the standard props column.

SELECT ... WHERE ...  ORDER BY 
CAST(EXTRACTVALUE(n0.numerical_props, '//sv:property[@sv:name="sulu:order"]/sv:value[1]') AS DECIMAL),
EXTRACTVALUE(n0.props, '//sv:property[@sv:name="sulu:order"]/sv:value[1]')

Benchmarking

  • The performance should be the same when no ORDER BY is issued.
  • We should expect a small performance penalty when ORDER BY is issued.

Master:

SELECT \* FROM [nt:unstructured];
2143 rows in set (1.91051 sec)
2143 rows in set (1.96289 sec)
2143 rows in set (1.87208 sec)
SELECT \* FROM [nt:unstructured] ORDER BY [i18n:de-title];
2143 rows in set (1.99825 sec)
2143 rows in set (1.99817 sec)
2143 rows in set (2.02555 sec)
SELECT \* FROM [nt:unstructured] AS a WHERE i18n:de-title IS NOT NULL ORDER BY [i18n:de-title];
330 rows in set (1.63120 sec)
330 rows in set (1.63655 sec)
330 rows in set (1.65118 sec)
SELECT \* FROM [nt:unstructured] AS a WHERE ISDESCENDANTNODE(a, '/cmf/stanton/contents/service') AND i18n:de-title IS NOT NULL ORDER BY [sulu:order], [i18n:de-title] DESC;
57 rows in set (0.58143 sec)
57 rows in set (0.58995 sec)
57 rows in set (0.59365 sec)
SELECT \* FROM [nt:unstructured] AS a WHERE ISDESCENDANTNODE(a, '/cmf/stanton/contents/service');
57 rows in set (0.51940 sec)
57 rows in set (0.51984 sec)
57 rows in set (0.49899 sec)
SELECT \* FROM [nt:unstructured] AS a WHERE ISDESCENDANTNODE(a, '/cmf/stanton/contents/service') ORDER BY [i18n:de-title];
57 rows in set (0.53426 sec)
57 rows in set (0.51620 sec)
57 rows in set (0.53260 sec)
SELECT \* FROM [nt:unstructured] AS a WHERE ISDESCENDANTNODE(a, '/cmf/stanton/contents/service') AND i18n:de-title IS NOT NULL ORDER BY [i18n:de-title];
57 rows in set (0.54159 sec)
57 rows in set (0.55744 sec)
57 rows in set (0.55175 sec)
SELECT \* FROM [nt:unstructured] AS a WHERE ISDESCENDANTNODE(a, '/cmf/stanton/contents/service') AND i18n:de-title IS NOT NULL ORDER BY [sulu:order], [i18n:de-title] DESC;
57 rows in set (0.55607 sec)
57 rows in set (0.59970 sec)
57 rows in set (0.58270 sec)

This PR:

SELECT \* FROM [nt:unstructured];
2143 rows in set (1.84274 sec)
2143 rows in set (1.94709 sec)
2143 rows in set (1.86058 sec)
SELECT \* FROM [nt:unstructured] ORDER BY [i18n:de-title];
2143 rows in set (2.04875 sec)
2143 rows in set (2.04055 sec)
2143 rows in set (2.00308 sec)
SELECT \* FROM [nt:unstructured] AS a WHERE i18n:de-title IS NOT NULL ORDER BY [i18n:de-title];
330 rows in set (1.63383 sec)
330 rows in set (1.63438 sec)
330 rows in set (1.63745 sec)
SELECT \* FROM [nt:unstructured] AS a WHERE ISDESCENDANTNODE(a, '/cmf/stanton/contents/service') AND i18n:de-title IS NOT NULL ORDER BY [sulu:order], [i18n:de-title] DESC;
57 rows in set (0.57677 sec)
57 rows in set (0.60930 sec)
57 rows in set (0.58204 sec)
SELECT \* FROM [nt:unstructured] AS a WHERE ISDESCENDANTNODE(a, '/cmf/stanton/contents/service');
57 rows in set (0.48024 sec)
57 rows in set (0.47330 sec)
57 rows in set (0.48214 sec)
SELECT \* FROM [nt:unstructured] AS a WHERE ISDESCENDANTNODE(a, '/cmf/stanton/contents/service') ORDER BY [i18n:de-title];
57 rows in set (0.52143 sec)
57 rows in set (0.52957 sec)
57 rows in set (0.52393 sec)
SELECT \* FROM [nt:unstructured] AS a WHERE ISDESCENDANTNODE(a, '/cmf/stanton/contents/service') AND i18n:de-title IS NOT NULL ORDER BY [i18n:de-title];
57 rows in set (0.54349 sec)
57 rows in set (0.55270 sec)
57 rows in set (0.54978 sec)
SELECT \* FROM [nt:unstructured] AS a WHERE ISDESCENDANTNODE(a, '/cmf/stanton/contents/service') AND i18n:de-title IS NOT NULL ORDER BY [sulu:order], [i18n:de-title] DESC;
57 rows in set (0.56815 sec)
57 rows in set (0.58603 sec)
57 rows in set (0.55674 sec)

@dantleech
Copy link
Contributor Author

/cc @wachterjohannes

@@ -837,70 +845,85 @@ private function syncReferences()
}
}

public static function xmlToProps($xml, ValueConverter $valueConverter, $filter = null)
public static function xmlToProps($xmlData, ValueConverter $valueConverter, $filter = null)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We split the properties into 3 separate DOM documents, for LONG, DECIMAL and STRING

@dbu
Copy link
Member

dbu commented Dec 2, 2014

wow, pretty involved. i wonder if we should not use http://www.postgresql.org/docs/9.3/static/datatype-json.html or something similar, if we have to go away from simply serializing to xml anyways. the json field should be efficient for search - and it might better allow for mixed data (unless i did not grasp the full issue).

@dantleech
Copy link
Contributor Author

@dbu would storing in JSON hold any advantages here? (from what I can gather it might), but I don't think JSON column types are supported in most versions of MySQL.

@dbu
Copy link
Member

dbu commented Dec 2, 2014

hm, indeed mysql might be a problem here. and sqlite, and old postgres versions.

just wondering if there is something simpler than xml if all we store is just a "hashmap" with a known value format. but indeed, xml works and anything else opens a new can of worms.

@dantleech
Copy link
Contributor Author

The other option would be to have a separate table for the properties, which is something similar to what EZ publish does I think. But then thats basically a different transport.

@dbu
Copy link
Member

dbu commented Dec 2, 2014

and we end up with EAV with its performance problems.

@dantleech
Copy link
Contributor Author

Maybe it could be an option if we adopt a more modular architecture in Jackalope "2.0".

@dantleech
Copy link
Contributor Author

@dbu @lsmith77 would you have any objections to this solution assuming that performance would not be affected? i.e. shall I continue?

@lsmith77
Copy link
Member

lsmith77 commented Dec 4, 2014

I am generally open to this approach. Performance is important and also the behavior of Jackrabbit in this case.

@dantleech
Copy link
Contributor Author

Updated. Query tests are failing locally because of query parsing issues.. which I don't think are related.

Next I will run this through @lsmith77 s benchmarking script and see what happens.

@@ -207,6 +207,10 @@ public function __construct(FactoryInterface $factory, Connection $conn)
private function registerSqliteFunctions(PDOConnection $sqliteConnection)
{
$sqliteConnection->sqliteCreateFunction('EXTRACTVALUE', function ($string, $expression) {
if (null === $string) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numerical_prop can be nullable.

@dantleech
Copy link
Contributor Author

Have updated and added some performance stats. Basically it is expensive enough to be a problem at the moment. Although, the MySQL perf. penalties ~0.02s > ~0.04s whilst the roundtrip penalties are maybe 5x greater, so I expect that we can do some code optimization.

That said -- the penalty is always going to be significant. Shall I make this an option?

@lsmith77
Copy link
Member

yeah .. I think it will need to be an option, ideally changeable at runtime.

if (!empty($xmlData['long_props'])) {
$xmlFields[] = $xmlData['long_props'];
}
if (!empty($xmlData['decimal_props'])) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we do not actually need separate columns here.. we can just have a numberical_props column cast to decimal regardless.

@dantleech
Copy link
Contributor Author

@lsmith77 at one level we need to know at code-time because we need to persistently populate the number column, or I guess we could always populate that column. A runtime option could be used to toggle if we select from / order by it.

We could also always store all the value in props and copy the numerical values into the other column. Not very efficient, but if its an option than I guess its not too important.

@lsmith77
Copy link
Member

will we only populate those properties when the values are actually numeric? but yeah I think its ok to always populate them.

@dantleech dantleech changed the title WIP allow sorting by integers WIP allow sorting by numbers Dec 11, 2014
@@ -0,0 +1,32 @@
SELECT \* FROM [nt:unstructured];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the purpose of committing this file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

everything in this folder is an accidental commit.

@dbu
Copy link
Member

dbu commented Dec 28, 2014

restarted the build. but the errors are only about query parsing tests.

@dantleech can you squash the commits? i would say this is ready to merge now unless you know of something we missed.

$propsData = array('dom' => $dom);
if ($row['numerical_props']) {
$numericalDom = new \DOMDocument('1.0', 'UTF-8');
$numericalDom->loadXML($row['props']);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks wrong.

@dantleech dantleech force-pushed the numberical_sorting branch 2 times, most recently from 433adb9 to a378a50 Compare December 28, 2014 10:18
@dantleech
Copy link
Contributor Author

ok.. tests are fine locally after rebasing and I fixed an issue and added a test when copying nodes. Lets see what happens with travis.

@@ -1055,7 +1120,6 @@ public function getNode($path)
$query = '
SELECT * FROM phpcr_nodes
WHERE path = :path
AND workspace_name = :workspace
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doh. wtf.

@dantleech dantleech force-pushed the numberical_sorting branch 3 times, most recently from 27988c9 to f8862a5 Compare December 28, 2014 11:45
@dantleech dantleech changed the title WIP Numerical sorting Numerical sorting Dec 28, 2014
@dantleech
Copy link
Contributor Author

ok! The tests are fine now -- the failing postgres test corresponds to the existing failing test in master.

@dantleech dantleech mentioned this pull request Dec 29, 2014
@@ -0,0 +1,32 @@
SELECT \* FROM [nt:unstructured];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

benchmark.back - did you accidentally commit this?

@dbu
Copy link
Member

dbu commented Dec 29, 2014

apart from the benchmark.back folder, this looks good to me. @lsmith77 can you also do a final review?

@@ -48,7 +48,7 @@ public function testDefaultQuery()
$this->nodeTypeManager->expects($this->once())->method('getSubtypes')->will($this->returnValue( array() ));

$query = $this->factory->createQuery($this->factory->selector('nt:unstructured', 'nt:unstructured'), null, array(), array());
list($selectors, $selectorAliases, $sql) = $this->walker->walkQOMQuery($query);
list($telectors, $selectorAliases, $sql) = $this->walker->walkQOMQuery($query);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tlectors??

- New column "numerical_props" which is populated by numerical properies
  and which is used exlusively for ordering.
@dantleech
Copy link
Contributor Author

Removed the accidentally comitted benchmark files.

@lsmith77
Copy link
Member

wondering if we could make the dbal:init command smart enough to detect if the schema needs to be updated. or maybe we should have a specific command for this? then again .. for now I guess we will just need to provide alter statements and data migration for this change.

@dantleech
Copy link
Contributor Author

I think that it makes sense that init:dbal should migate the DB (e.g. if the schema has changed we should check with the user and ask them to specify --force).

But we could do that in another PR I think.

@dantleech
Copy link
Contributor Author

Created #237

@dantleech
Copy link
Contributor Author

@lsmith77 @dbu what do you think?

@dbu
Copy link
Member

dbu commented Jan 4, 2015

the travis fail is only the postgres problem. +1 to merge from me.

(and maybe we should skip the failing test when the db is postgres and the env travis, if we can't find the problem, to have a reliable build)

lsmith77 added a commit that referenced this pull request Jan 5, 2015
@lsmith77 lsmith77 merged commit 93de594 into master Jan 5, 2015
@lsmith77 lsmith77 deleted the numberical_sorting branch January 5, 2015 08:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants