Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CHERRY-PICK from dev to point-release/23103 PR #17526 #17811

Conversation

nick-l-o3de
Copy link
Contributor

Improve ap startup on linux (#17526)

  • Improves the startup time of AP on linux.

Note that this could also potentially improve it on mac and windows, as well as metafile processing in general.

Most of this change is just moving the initialization of the catalog until the file cache is warmed up and then using the file cache to fix path cases on case sensitive file systems.

The rest of it is fixing the unit tests to account for this - the mock for the file system wasn't sufficient to actually use this new code as it would do things like always answer false for "is directory".

With this change, all unit tests pass in debug on linux, with no unexpected assertions and no failures. In addition, AP starts and idles very quickly without any long freezes during startup.

  • Fixes a warning I'm not sure I created

During Createjobs, Asset Processor Manager does this:

Q_EMIT CreateJobsDurationChanged(sourceAsset.RelativePath().c_str(), sourceAsset.ScanFolderId());

However, scan folder ID of type AZ::s64 which hasn't been registered to Qt serializer's meta object system, so it does not know how to convert it to a byte array to emit messages across threads. This causes warnings and probably causes that message not to be emitted.

  • Fixes for windows

Most of the problems were in oversights in existing tests. For example the asset catalog tests use a virtual file system but were using CreateDummyFile which makes real files.

Likewise, the asset catalog was using SystemFile instead of FileIO which was the mock file IO. So it would be making mock files in memory but then checking real files on disk, etc.

Note that the Pass-thru file cache is only used in unit tests, or when specifically asked for by a command line parameter, which is not used in any of the test scripts.

Improve ap startup on linux (o3de#17526)

* Improves the startup time of AP on linux.

Note that this could also potentially improve it on mac and windows,
as well as metafile processing in general.

Most of this change is just moving the initialization of the catalog
until the file cache is warmed up and then using the file cache to
fix path cases on case sensitive file systems.

The rest of it is fixing the unit tests to account for this -
the mock for the file system wasn't sufficient to actually use this
new code as it would do things like always answer false for
"is directory".

With this change, all unit tests pass in debug on linux, with no
unexpected assertions and no failures.  In addition, AP starts
and idles very quickly without any long freezes during startup.

Signed-off-by: Nicholas Lawson <70027408+nick-l-o3de@users.noreply.github.com>

* Fixes a warning I'm not sure I created

During Createjobs,  Asset Processor Manager does this:
```cpp
Q_EMIT CreateJobsDurationChanged(sourceAsset.RelativePath().c_str(), sourceAsset.ScanFolderId());
```

However, scan folder ID of type AZ::s64 which hasn't been registered to Qt
serializer's meta object system, so it does not know how to convert it to
a byte array to emit messages across threads.  This causes warnings
and probably causes that message not to be emitted.

* Fixes for windows

Most of the problems were in oversights in existing tests.
For example the asset catalog tests use a virtual file system
but were using CreateDummyFile which makes real files.

Likewise, the asset catalog was using SystemFile instead of FileIO
which was the mock file IO.  So it would be making mock files in memory
but then checking real files on disk, etc.

Note that the Pass-thru file cache is only used in unit tests, or when
specifically asked for by a command line parameter, which is not used
in any of the test scripts.

Signed-off-by: Nicholas Lawson <70027408+nick-l-o3de@users.noreply.github.com>
@nick-l-o3de nick-l-o3de requested review from a team as code owners April 23, 2024 16:13
@byrcolin byrcolin added sig/content Categorizes an issue or PR as relevant to SIG Content. sig/core Categorizes an issue or PR as relevant to SIG Core labels Apr 23, 2024
Comment on lines +246 to +260
// The reason this is so expensive is that it could also be the case that the DIRECTORY path is the wrong case.
// For example, the actual path might be /SomeFolder/textures/whatever/texture.dat
// but the real file on disk is actually /SomeFolder/Textures/Whatever/texture.dat
// Note the case difference of the directories. If you were to ask the operating system just to list all of
// the files in the input folder (/SomeFolder/textures/whatever/*) it would not find any since the directory
// does not itself exist. Not only that, but many operating system calls on case-insensitive file systems
// actually use the input given in their return - that is, if you ask a WINAPI call to construct an absolute path
// to a file and give it the wrong case as input, the output absolute path will also be the wrong case.
// Thus, to actually correct a path it has to start at the bottom and whenever it encounters
// a path segment, it has to do a read of the actual directory information to see which files exist in that
// directory, and whether a file or directory exists with the correct name but with different case.

// if checkEntirePath is false, we only check the file name, and nothing else. This is for the case where
// the path is known to be good except for the file name, such as when the relPathFromRoot comes from just
// taking an existing, known-good file and replacing its extension only.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically if this function was part of a class or struct with state. Or if we had a caching structure, we could "index" any file paths that were found with the correct case in a set<AZ::IO::Path> or unordered_set<AZ::IO::Path> where each AZ::IO::Path uses the Windows path separator.
Alternatively even an in-memory only sqlite table could be populated as the application is running to store of the correct cased path and it's case-insensitive hash value

When an AZ::IO::Path uses the Windows path separator it hashes it's contents case-insensitively.
Also comparing two paths that uses Windows path separators would compare each path segment case-insensitively as well.

The HashPathCompareValidation parameterized unit-test verifies that paths that compare equal case-insensitively also hashes to the same value.

Technically a index of file paths with the correct case can be built up lazily (as this function is called) or it can be built as paths are added to the Asset Catalog.
As this function is calling FileIOBase::FindFiles and visiting every file within a directory recursively, the index of file paths can be built for each file in a directory the first time it is visited and then every time afterwards, the file index can be checked to see if the correct case exist.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting - in this case, I'm not sure its quite necessary until we rearchitect the thing, since the system calling into this function does have state, and does have its own case insensitive hash - its only calling this function when it encounters a case insensitive hash miss to make sure that what it then adds into the cache is authoritatively the correct case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is not something we need to fix for point release before this goes in?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well there is one low-hanging fruit, which we could tackle if we need to improve the performance in the future even more.
That is the "potential" repeated calls to FileIOBase::FindFiles. Since that could be called multiple times for the same directory, that directory information with the correct case and case-insensitive directory hash could be a quick win without storing too much data in-memory.

However we can wait on that.

Signed-off-by: Nicholas Lawson <70027408+nick-l-o3de@users.noreply.github.com>
@nick-l-o3de nick-l-o3de merged commit 33a77b7 into o3de:point-release/23103 Apr 24, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/content Categorizes an issue or PR as relevant to SIG Content. sig/core Categorizes an issue or PR as relevant to SIG Core
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants