CHERRY-PICK from dev to point-release/23103 PR #17526 #17811

nick-l-o3de · 2024-04-23T16:13:47Z

Improve ap startup on linux (#17526)

Improves the startup time of AP on linux.

Note that this could also potentially improve it on mac and windows, as well as metafile processing in general.

Most of this change is just moving the initialization of the catalog until the file cache is warmed up and then using the file cache to fix path cases on case sensitive file systems.

The rest of it is fixing the unit tests to account for this - the mock for the file system wasn't sufficient to actually use this new code as it would do things like always answer false for "is directory".

With this change, all unit tests pass in debug on linux, with no unexpected assertions and no failures. In addition, AP starts and idles very quickly without any long freezes during startup.

Fixes a warning I'm not sure I created

During Createjobs, Asset Processor Manager does this:

Q_EMIT CreateJobsDurationChanged(sourceAsset.RelativePath().c_str(), sourceAsset.ScanFolderId());

However, scan folder ID of type AZ::s64 which hasn't been registered to Qt serializer's meta object system, so it does not know how to convert it to a byte array to emit messages across threads. This causes warnings and probably causes that message not to be emitted.

Fixes for windows

Most of the problems were in oversights in existing tests. For example the asset catalog tests use a virtual file system but were using CreateDummyFile which makes real files.

Likewise, the asset catalog was using SystemFile instead of FileIO which was the mock file IO. So it would be making mock files in memory but then checking real files on disk, etc.

Note that the Pass-thru file cache is only used in unit tests, or when specifically asked for by a command line parameter, which is not used in any of the test scripts.

Improve ap startup on linux (o3de#17526) * Improves the startup time of AP on linux. Note that this could also potentially improve it on mac and windows, as well as metafile processing in general. Most of this change is just moving the initialization of the catalog until the file cache is warmed up and then using the file cache to fix path cases on case sensitive file systems. The rest of it is fixing the unit tests to account for this - the mock for the file system wasn't sufficient to actually use this new code as it would do things like always answer false for "is directory". With this change, all unit tests pass in debug on linux, with no unexpected assertions and no failures. In addition, AP starts and idles very quickly without any long freezes during startup. Signed-off-by: Nicholas Lawson <70027408+nick-l-o3de@users.noreply.github.com> * Fixes a warning I'm not sure I created During Createjobs, Asset Processor Manager does this: ```cpp Q_EMIT CreateJobsDurationChanged(sourceAsset.RelativePath().c_str(), sourceAsset.ScanFolderId()); ``` However, scan folder ID of type AZ::s64 which hasn't been registered to Qt serializer's meta object system, so it does not know how to convert it to a byte array to emit messages across threads. This causes warnings and probably causes that message not to be emitted. * Fixes for windows Most of the problems were in oversights in existing tests. For example the asset catalog tests use a virtual file system but were using CreateDummyFile which makes real files. Likewise, the asset catalog was using SystemFile instead of FileIO which was the mock file IO. So it would be making mock files in memory but then checking real files on disk, etc. Note that the Pass-thru file cache is only used in unit tests, or when specifically asked for by a command line parameter, which is not used in any of the test scripts. Signed-off-by: Nicholas Lawson <70027408+nick-l-o3de@users.noreply.github.com>

Code/Tools/AssetProcessor/native/AssetManager/FileStateCache.cpp

lemonade-dm · 2024-04-24T19:59:41Z

Code/Framework/AzToolsFramework/AzToolsFramework/Asset/AssetUtils.cpp

+        // The reason this is so expensive is that it could also be the case that the DIRECTORY path is the wrong case.
+        // For example, the actual path might be /SomeFolder/textures/whatever/texture.dat
+        // but the real file on disk is actually /SomeFolder/Textures/Whatever/texture.dat 
+        // Note the case difference of the directories.  If you were to ask the operating system just to list all of
+        // the files in the input folder (/SomeFolder/textures/whatever/*) it would not find any since the directory 
+        // does not itself exist.  Not only that, but many operating system calls on case-insensitive file systems
+        // actually use the input given in their return - that is, if you ask a WINAPI call to construct an absolute path
+        // to a file and give it the wrong case as input, the output absolute path will also be the wrong case.
+        // Thus, to actually correct a path it has to start at the bottom and whenever it encounters
+        // a path segment, it has to do a read of the actual directory information to see which files exist in that
+        // directory, and whether a file or directory exists with the correct name but with different case.
+
+        // if checkEntirePath is false, we only check the file name, and nothing else.  This is for the case where
+        // the path is known to be good except for the file name, such as when the relPathFromRoot comes from just
+        // taking an existing, known-good file and replacing its extension only.


Technically if this function was part of a class or struct with state. Or if we had a caching structure, we could "index" any file paths that were found with the correct case in a set<AZ::IO::Path> or unordered_set<AZ::IO::Path> where each AZ::IO::Path uses the Windows path separator.
Alternatively even an in-memory only sqlite table could be populated as the application is running to store of the correct cased path and it's case-insensitive hash value

When an AZ::IO::Path uses the Windows path separator it hashes it's contents case-insensitively.
Also comparing two paths that uses Windows path separators would compare each path segment case-insensitively as well.

The HashPathCompareValidation parameterized unit-test verifies that paths that compare equal case-insensitively also hashes to the same value.

Technically a index of file paths with the correct case can be built up lazily (as this function is called) or it can be built as paths are added to the Asset Catalog.
As this function is calling FileIOBase::FindFiles and visiting every file within a directory recursively, the index of file paths can be built for each file in a directory the first time it is visited and then every time afterwards, the file index can be checked to see if the correct case exist.

Interesting - in this case, I'm not sure its quite necessary until we rearchitect the thing, since the system calling into this function does have state, and does have its own case insensitive hash - its only calling this function when it encounters a case insensitive hash miss to make sure that what it then adds into the cache is authoritatively the correct case.

I assume this is not something we need to fix for point release before this goes in?

Well there is one low-hanging fruit, which we could tackle if we need to improve the performance in the future even more.
That is the "potential" repeated calls to FileIOBase::FindFiles. Since that could be called multiple times for the same directory, that directory information with the correct case and case-insensitive directory hash could be a quick win without storing too much data in-memory.

However we can wait on that.

Signed-off-by: Nicholas Lawson <70027408+nick-l-o3de@users.noreply.github.com>

nick-l-o3de requested review from a team as code owners April 23, 2024 16:13

byrcolin added sig/content Categorizes an issue or PR as relevant to SIG Content. sig/core Categorizes an issue or PR as relevant to SIG Core labels Apr 23, 2024

nick-l-o3de requested a review from AMZN-daimini April 23, 2024 16:29

AMZN-daimini approved these changes Apr 23, 2024

View reviewed changes

lemonade-dm approved these changes Apr 24, 2024

View reviewed changes

Fix spelling mistake for PR

e08911d

Signed-off-by: Nicholas Lawson <70027408+nick-l-o3de@users.noreply.github.com>

nick-l-o3de merged commit 33a77b7 into o3de:point-release/23103 Apr 24, 2024
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CHERRY-PICK from dev to point-release/23103 PR #17526 #17811

CHERRY-PICK from dev to point-release/23103 PR #17526 #17811

nick-l-o3de commented Apr 23, 2024

lemonade-dm Apr 24, 2024 •

edited

nick-l-o3de Apr 24, 2024

nick-l-o3de Apr 24, 2024

lemonade-dm Apr 24, 2024

CHERRY-PICK from dev to point-release/23103 PR #17526 #17811

CHERRY-PICK from dev to point-release/23103 PR #17526 #17811

Conversation

nick-l-o3de commented Apr 23, 2024

lemonade-dm Apr 24, 2024 • edited

Choose a reason for hiding this comment

nick-l-o3de Apr 24, 2024

Choose a reason for hiding this comment

nick-l-o3de Apr 24, 2024

Choose a reason for hiding this comment

lemonade-dm Apr 24, 2024

Choose a reason for hiding this comment

lemonade-dm Apr 24, 2024 •

edited