Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[4.0] Restructuring Installation, splitting up SQL #28350

Merged
merged 19 commits into from
Apr 4, 2020

Conversation

Hackwar
Copy link
Member

@Hackwar Hackwar commented Mar 15, 2020

Pull Request for Issue #28332.

When the installation was redone for 4.0, it seems as if the serverside scripts were more or less bypassed and partially duplicated. I tried to clean this up and also split up the huge SQL file into smaller, more manageable chunks.

This PR is "final", but if we follow this path, we would have to do some more work. Instead of the loader animation, we would have to add a modal that displays what is currently done. (Base SQL, Support SQL, Extensions, custom SQL, localisation SQL, writing configuration) and especially the last step, the config step, would have to be split up further.

…4install

# Conflicts:
#	installation/sql/mysql/joomla.sql
#	installation/sql/postgresql/base.sql
@wilsonge
Copy link
Contributor

wilsonge commented Mar 15, 2020

SQL seems dodgy - we're going to add assets and extension table rows for extensions that don't have a valid DB table :/

Also not sure what happens during rollback here. Currently if the tables fail to populate or the db fails to create we call the configurationmodel::deleteconfiguration method. Which your now trashing which I think would mean if something fails would now end up with us stuck (as the trigger for the postinstallation view is the configuration.php file existence).

Overall not against the concept of splitting some of this up (assume this would also help with drone installation failures). But I think there's still quite some work to be done in this PR before it's ready

@Hackwar
Copy link
Member Author

Hackwar commented Mar 15, 2020

Concerning the "dodgy" SQL: Yes, we could split this up further and have stuff like the extension entries moved to the extensions SQL file, but I wanted to keep it simple in the first iteration. Especially since stuff like nested sets require quite a set of queries to insert new rows...

Regarding the rollback: Right now we are writing the configuration.php and executing all the SQL stuff plus some more stuff in just that config step and that is pretty crappy. The whole code structure is quite a mess. Anyway: Right now we are first writing the configuration.php and then we are executing all the queries. This changes that to first setup the DB and THEN to write the configuration.php. So in case of a "rollback", we are simply starting in the installation screen again.

@richard67
Copy link
Member

TBH, I am not sure if this is the right approach.

If I'd split up joomla.sql, I'd first split it up into two sql files only for the beginning: One for the schema and one for data.

Then we could see if that is already enough.

Then, as far as I remember each SQL statement is executed separately in PHP because they are checked and if necessary modified for an utf8mb4 to utf8 downgrade if necessary. So it should be easy to add some error log statement in PHP for each statement to see by the time stamps of these logs where we lose so much time.

If necessary then we could change the installation of extensions so that the extensions installer is used, using extension-specific installation sql, like it would be with a discovery installation, so PHP code would care for stuff like insert into extensions table, menus and assets and so on, that would save us some SQL statements in the data file.

Possibly it could also help with time to create indexes first with offline status and later after having inserted initial data set them online.

I will try to find time on weekend to do that, find out where we lose the time with use of some error log or debug output, then I'll report back here the results and my recommendations and eventually make an alternative PR.

@alikon
Copy link
Contributor

alikon commented Mar 17, 2020

plus don't forget that mysql is not able to rollback DDL

@richard67
Copy link
Member

To show what advantages it would have to split schema and data, I've made a branch where I have split the joomla.sql file into joomla_schema.sql and joomla_data.sql. Up to now it has only this change, i.e. nothing changed in PHP yet: https://github.com/richard67/joomla-cms/tree/4.0-dev-split-joomla-sql-1.

But you can immediately see the advantage when comparing the 2 joomla_data.sql files of the 2 database types with a good comparison tool, e.g. BeyondCompare, TotalCommander or whatever: You see more quickly the differences between them, and there are a lot, e.g. for PostgreSQL we do not set assed_id for diverse table data. And there are other differences. So it seems I have to make at least 1 PR to fix PostgreSQL data.

And I think maintenance for the pure data file will be easier, no need to scoll up and down miles to look up some asset_id for some menu.

Another advantage of separate file(s) for data and schema is that for the data files we don't need to check for stuff like utf8mb4 to utf8 query downgrade.

When I execute the schemal sql in PhpMyAdmin here, it takes some 1 second for all. For the data script it takes fractions of a second. But when I install J4, using the same database server, the database step takes some 30 to 40 seconds. So I think we don't lose the time in SQL, we lose it in PHP.

This all just for discussion. @Hackwar I think it is related to this PR, but if you don't want that to be discussed here I will stop and continue elsewhere.

@Hackwar
Copy link
Member Author

Hackwar commented Mar 17, 2020

At the beginning of this PR, the goal was to split up the SQL to make it run on underpowered hosts. But now that it is "done", it is rather a cleanup of the installation. The installation currently has a semi-sophisticated system to process a list of tasks for the actual installation and we are first dbcheck and then... mash everything else into the task "config". As if that wasn't opaque enough, that config step uses methods from several models and those are basically distributed randomly between these models. I'm not claiming that this is perfect already, but at least a lot better than before. How to split up the SQL I don't really care.

@richard67
Copy link
Member

@Hackwar

But now that it is "done"

What do you mean with it is done? Here in this PR?

How to split up the SQL I don't really care.

But with this PR you determine how it will be split. And I think it's not a good way.

@Hackwar
Copy link
Member Author

Hackwar commented Mar 17, 2020

But now that it is "done"

What do you mean with it is done? Here in this PR?

I mean the way the PHP has been rearranged.

How to split up the SQL I don't really care.

But with this PR you determine how it will be split. And I think it's not a good way.

Feel free to do a PR against this PR to re-order this. But I have to say that I don't think the distinction between schema and content of a table is a good one either. We will at one point come to a situation where we are supporting installations with different feature sets and that will mean for example a reduced number of tables what in here is in the extensions.sql. So a split along the lines of schema and data will not help here.

@richard67
Copy link
Member

Feel free to do a PR against this PR to re-order this.

@Hackwar Not sure if that would make sense because too different approach, but I will check and do so if useful. Regarding the conflicts: They come from my recently merged PR #28345 . Shall I help you with solving the conflicts? The changes from my PR will still be necessary with your PR, I think.

@Hackwar
Copy link
Member Author

Hackwar commented Mar 18, 2020

That would be good, yes.

@richard67
Copy link
Member

@Hackwar Hmm, it's not simply solving conflicts at the same place. Because you restructured things in your PR, my change has to be applied differently. In general the change was that the createDatabase function of the databae model throws exceptions, but in the calling functions those were not handled, and so the configuration file was not deleted after failure. If you think you can apply this, do so and then just solve conflicts by taking yours, otherwise I will have to check later after work when I'm at home if I can make a PR for your branch for that purpose.

@richard67
Copy link
Member

@Hackwar Additional merge conflicts now in joomla.sql files, coming from your merged PR #28319 . I think those you can fix yourself. If not, ping me. But it might take some time until I can help, latest on weekend.

@richard67
Copy link
Member

@Hackwar Have done my best, see Hackwar#37.

@richard67
Copy link
Member

@strawhatgami You have tested with the last changes? Those should have fixed that.

@richard67
Copy link
Member

@strawhatgami If that helps you: You can download full installation packages "Joomla_4.0.0-beta1-dev+pr.28350-Development-Full_Package" here: https://ci.joomla.org:444/artifacts/joomla/joomla-cms/4.0-dev/28350/downloads/30772/. Just download the package again, or if you use a git clone or Patchtester, revert the patch of this PR if you still have it applied, then fetch the latest changes and apply the patch again. The last change should have solved the issue with the falsely reported database error related to UTF-8 Multibyte conversion.

@strawhatgami
Copy link

@richard67 I download the full updated package for each test, so I used your link, thanks.
Here is my new screen:
Capture d’écran de 2020-03-31 02-41-52

@richard67
Copy link
Member

@strawhatgami As you can see it doesn't show the error about missing UTF-8 Multibyte conversion. The error about wrong schema version is normal for these packages build for the PR's. So if you want you can give this PR here a good test result.

Merge upstream/4.0-dev into origin/j4install (2)
@strawhatgami
Copy link

I have tested this item ✅ successfully on 5feaae1


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/28350.

@richard67
Copy link
Member

I have tested this item ✅ successfully on 5feaae1

I'm still not a fan of how the sql is split up to the files, but that might be optimized later with another PR.

I have tested new installation with and without the patch of this PR applied, and both of this on MySQL 5.7 and PostgreSQL 11.

In each case I have exported structure and data of the database into an SQL file.

I have verified that there is no difference in structure and data between a new install with and without this PR, except of those differences in data which are expected for different installations (e.g. id of super user, update times, ...).

I can also say that the database installation part feels a bit faster with the patch of this PR applied in case of new installation.


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/28350.

@richard67
Copy link
Member

RTC


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/28350.

@joomla-cms-bot joomla-cms-bot added the RTC This Pull Request is Ready To Commit label Apr 2, 2020
@rdeutz rdeutz added this to the Joomla 4.0 milestone Apr 4, 2020
@rdeutz rdeutz merged commit f3d7f6b into joomla:4.0-dev Apr 4, 2020
@joomla-cms-bot joomla-cms-bot removed the RTC This Pull Request is Ready To Commit label Apr 4, 2020
@alikon
Copy link
Contributor

alikon commented Apr 4, 2020

without a strong separation between DDL and DML
this is only a cosmetic one
ymmv

@richard67
Copy link
Member

@alikon I think about doing that, separate DDL and DML oir at least reorganise DML so it is easier to be maintained. But when I have tested this PR here, it really seemed to me that database installation was faster than before, and PHP has been cleaned up a bit, so I think it wasn't only cosmetic.

@Hackwar
Copy link
Member Author

Hackwar commented Apr 4, 2020

Again, the goal wasn't to get a perfectly split up SQL, but to fix our installation steps. A next step now is to visualise the progress. So instead of just having a rotating logo, it should display a modal with a list of steps and a checkmark next to them.

@richard67
Copy link
Member

richard67 commented Apr 4, 2020

it should display a modal with a list of steps and a checkmark next to them.

@Hackwar Exactly what I want. Will you do that, or should we ping some JS guru?

Update: Maybe not a modal.

@Hackwar
Copy link
Member Author

Hackwar commented Apr 4, 2020

I also disagree that we should split this into structure and data. Instead I want minimal viable system and then the different extensions. The current split is an attempt at that.

But if you are unhappy with the current split, feel free to propose a better one in another PR. I don't mind.

@richard67
Copy link
Member

We can have both, split into minimal and extended functionality and in addition a de-coupling of the order of processing for DDL and DML, which never had another reason than our tradition and maybe the ability to look up column names for insert statements in the create table statements above, but that never was a good reason. E.g. I think instead of hard-coding asset_id columns we could get them by clever select, such things.

@richard67
Copy link
Member

richard67 commented Apr 4, 2020

I think this PR here was a good step and later changes I have in mind will be small optimizations maybe, nothing fundamental.

@alikon
Copy link
Contributor

alikon commented Apr 4, 2020

as you may already knows Mysql doesn't allow to rollback for DDL instructions
that's why in my view we strictly need to separate DDL from DML
if i can work with only postgresql then this is not an issues but as postgresql it is used by me and probably from another couple of users
this should be a main point of attentention
it's not about performance as we are in installation phase its about logic
my 0.02€ cent

@Hackwar
Copy link
Member Author

Hackwar commented Apr 4, 2020

@alikon I disagree here. This is about performance, about user experience and about error handling. Before this was merged, the installation had just 2 steps (checking the DB and everything else) and that "everything else" step was slightly resource intensive. On a slow machine this could time out. By splitting this up into far more and more logical steps, this practically prevents the PHP timeouts that some people experienced in the past.

By splitting this up, you also have ways to identify an error. Right now it could fail basically everywhere and the error response would always be identical. Did it fail when populating the tables or when converting tables to utf8mb4 or when creating the configuration.php? Since it is just one big lump of a step, you wouldn't really know.

Last but not least, even on faster hosts, the installation takes some time. It isn't obvious if the process is still running or if it hangs. With the above mentioned modal, we would now be able to give the user feedback, that it is still processing the installation.

@Hackwar Hackwar deleted the j4install branch April 4, 2020 14:40
@Hackwar
Copy link
Member Author

Hackwar commented Apr 4, 2020

Anyway, thanks @rdeutz for merging. :-)

@alikon
Copy link
Contributor

alikon commented Apr 4, 2020

i'm not questioning if this is better or worst than before----- i trust you and people who made test on it
i'm just questioning on how we can do it even better for the next step just that
and sorry if i'm off.topic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants