Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Archive format complete ArchiveWriter implementation #16052

Merged
merged 16 commits into from Jun 5, 2023

Conversation

lemonade-dm
Copy link
Contributor

@lemonade-dm lemonade-dm commented Jun 1, 2023

Archive Writer Changes

The ArchiveWriter implementation can mount an archive file, where it can
add/remove content files from the archive, as well as optionally compress the
content files being added.

It also supports compressing the Archive Table of Contents using a
compression algorithm registered with the Compression gem.

The ability to write to a new archive or an existing archive is
supported by the implementation.

Test Environment changes

Updated the GemTestEnvironment to support specifying gems as active in UnitTest

A new AddActiveGems function has been added that accepts a list of gem
names and for each gem query there gem root location using the o3de
manifest.json files and add there gem root locations to the Settings
Registry and as a FileIO @gemroot:<gem-name>@ alias.

Path API changes

Adding AZ_PATH_ARG macro to make it easier to use AZ::IO::Path in format specifiers

AutomatedTesting Active Gem changes

Adding the Compression and Archive gem to the AutomatedTesting project

o3de CLI changes

The edit-engine/project-properties command now sorts it's "gem_names" array field

The edit-engine-properties and edit-project-properties command has
been updated to sort the gem names array inside of their respective
manifest .json files for consistency

Task System changes

Fixed deadlock when Submitting a TaskGraph with zero tasks and a TaskGraphEvent.
When the TaskGraphEvent was waited on using the TaskGraphEvent::Wait call it would attempt to acquire a semaphore that will never be released as there were no tasks that could complete to signal the semaphore.

How was this PR tested?

Unit Test have been added for the Archive Writer implementation.
More UnitTest are on the way as the current set of test are nowhere near sufficient

For the edit-engine-properties/edit-project-properties changes, the enable-gem command was used to update the AutomatedTesting project list of active gems.
As can be seen in the project.json change, the list of gems are sorted.

The `edit-engine-properties` and `edit-project-properties` command has
been updated to sort the gem names array inside of their respective
manifest .json files for consistency

Signed-off-by: lumberyard-employee-dm <56135373+lumberyard-employee-dm@users.noreply.github.com>
Signed-off-by: lumberyard-employee-dm <56135373+lumberyard-employee-dm@users.noreply.github.com>
…mat specifiers

Signed-off-by: lumberyard-employee-dm <56135373+lumberyard-employee-dm@users.noreply.github.com>
UnitTest

A new `AddActiveGems` function has been added that accepts a list of gem
names and for each gem query there gem root location using the o3de
manifest.json files and add there gem root locations to the Settings
Registry and as a FileIO `@gemroot:<gem-name>@` alias.

Signed-off-by: lumberyard-employee-dm <56135373+lumberyard-employee-dm@users.noreply.github.com>
The ArchiveWriter implementation can mount an archive file, where it can
add/remove contentfiles from the archive, as well as optionally compress the
content files being added.

It also supports compressing the Archive Table of Contents using a
compression algorithm registered with the Compression gem.

The ability to write to a new archive or an existing archive is
supported by the implementation.

Signed-off-by: lumberyard-employee-dm <56135373+lumberyard-employee-dm@users.noreply.github.com>
@lemonade-dm lemonade-dm requested review from AMZN-alexpete and a team June 1, 2023 03:31
@lemonade-dm lemonade-dm requested review from a team as code owners June 1, 2023 03:31
That test found a bug in the AddFileToResult::operator bool logic where
it would return false when a file was successfully added.

Signed-off-by: lumberyard-employee-dm <56135373+lumberyard-employee-dm@users.noreply.github.com>
When the `TaskGraphEvent::Wait` function is used to wait for the
completion of task, if there are no task added to the TaskGraph, a
deadlock will occur.

This changes fixes the issue by invoking `TaskGraphEvent::Signal` if
there are no tasks on the Compiled Task Graph when submitted to the
TaskExecutor.

Signed-off-by: lumberyard-employee-dm <56135373+lumberyard-employee-dm@users.noreply.github.com>
content file as part of an Archive.

Signed-off-by: lumberyard-employee-dm <56135373+lumberyard-employee-dm@users.noreply.github.com>
The start of each compressed block were being overridden with \0 bytes
instead of being added as padding bytes to the compressed block to reach
512-byte alignment.

Set the compression algorithm ID in the  AddFileToArchive result value
structure.
It is now properly returned as part of the result

Renamed the `ArchiveBlockSize` constant to
`ArchiveBlockSizeForCompression` to indicate that the block size is for viewing a 2 MiB chunk of an uncompressed file when sending it through a compression algorithm.

Signed-off-by: lumberyard-employee-dm <56135373+lumberyard-employee-dm@users.noreply.github.com>
The test writes a single file to the archive using the LZ4 compression
algorithm

Signed-off-by: lumberyard-employee-dm <56135373+lumberyard-employee-dm@users.noreply.github.com>
Signed-off-by: lumberyard-employee-dm <56135373+lumberyard-employee-dm@users.noreply.github.com>
…em is part

of the "gem_names" array for the project.json and engine.json files

Signed-off-by: lumberyard-employee-dm <56135373+lumberyard-employee-dm@users.noreply.github.com>
Comment on lines +429 to +434
auto& compiledTasks = graph.Tasks();
if (compiledTasks.empty() && event)
{
event->Signal();
return;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch

…::vector

An ambiguity would occur if an `AZStd::vector` was used with a std::
namespace type such as `std::byte`(which AZStd::byte is just an alias
and not a real type)

Therefore ADL brings in both AZStd::equal and std::equal when making an
unalified call to `equal` in the AZStd::vector `operator=` function.

Signed-off-by: lumberyard-employee-dm <56135373+lumberyard-employee-dm@users.noreply.github.com>
open/close state

The boolean is just used to track whether a call to `Close()` has
occured on a ByteContainerStream

Signed-off-by: lumberyard-employee-dm <56135373+lumberyard-employee-dm@users.noreply.github.com>
Added a test to validate that an uncompressed file > 6 MIB creates 2 block line entries.

Added test to validate that a file can be added and removed from the
same mounted archive instance in a single Archive Writer.

Also validated file additional and removal across multiple Archive
writen instances, where the first instance, adds a file and the next instance remove a file from the archive.

A test was added to make sure that an existing file in the Archive can be updated using the `ArchiveWriter::AddFileToArchive` function.

Signed-off-by: lumberyard-employee-dm <56135373+lumberyard-employee-dm@users.noreply.github.com>
Public API Files that can be used in both the Client and Tool modules of
other Gems  have been moved to the `Include/Archive/Clients` directory
Public API that can be used in only the Tools modules have been moved to
the `Include/Archive/Tools` directory.

The ArchiveToc* files have been moved to the Archive.Private.Object
target as they only need to be used as part of implementing the Archive
Reader and Writer logic and should not be part ofthe public API.
They have been relocated to the `Source/Clients` directory.

Add a new ArchiveWriter test to validate that it can write out a
compressed table of contents(TOC) to a GenericStream.
Reading of an archive with a compressed TOC was also tested.

The testing found issues in the Compression and Decompression logic of
the Table of Contents in the ArchiveWriter.cpp and those logic have been
addressed.

Updated the Archive format to write out an 8-byte magic byte sequence at the beginning of the Table of Contents section in order to detect when an invalid archive TOC of contents has been read.

This is used to avoid crashes within the Archive code and instead inform
the user of an invalid Archive table of contents.

Signed-off-by: lumberyard-employee-dm <56135373+lumberyard-employee-dm@users.noreply.github.com>
virtual ArchiveAddFileResult AddFileToArchive(AZStd::span<const AZStd::byte> inputSpan,
const ArchiveWriterFileSettings& fileSettings = {}) = 0;

//! Searches for a relative path within the archive
//! @param relativePath Relative path within archive to search for
//! @return A token that identifies the Archive file if it exist
//! if the with the specified path doesn't exist InvalidArchiveFileToken is returned
virtual ArchiveFileToken FindFile(AZ::IO::PathView relativePath) = 0;
virtual ArchiveFileToken FindFile(AZ::IO::PathView relativePath) const = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was actually going to recommend const initially, but these days I'm on the fence about using const in early API iterations because of the situations when a dev may want to do something non-consty like say caching requests to FindFile or ContainsFile, or something else I'm not thinking of. Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well the API here for the IArchiveWriter is meant to allow access to the ArchiveWriter across DLL boundaries and be available as part of the Archive.Editor.API target.
It is not an abstract interface meant for other Gems or libraries to implement, so the API is made in mind with the current implementation.

I prefer using const with these APIs as to the outside caller it should appear constant to them.
For a caching request, what should be done is to NOT make the function non-const, but instead add a mutable member that performs the caching

I am of the believe that the internal implementation details of an interface should not leak into the signature of the interface.

{
// Resize the compress block buffer to be the same size as the input buffer
AZStd::vector<AZStd::byte> tocCompressBuffer;
tocCompressionAlgorithmId = m_settings.m_tocCompressionAlgorithm.value();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

m_settings can override the toc compression algo id from the m_archiveheader? is it possible for the archive header to specify a different toc compression algo than m_settings?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the Compression Algorithm is set by the previous instance of the ArchiveWriterSettings that was used to write the Archive before.
So if the a previous use of the ArchiveWriterSettings set the table of contents(TOC) compression algorithm to Zlib, but then the next use of the ArchiveWriterSettings set the TOC compression algorithm to LZ4, then the compression algorithm will be different than the one in the set in the archiveHeader

@lemonade-dm lemonade-dm merged commit 0d766fc into o3de:development Jun 5, 2023
3 checks passed
@lemonade-dm lemonade-dm deleted the archive-writer-impl branch June 5, 2023 15:58
Comment on lines +89 to +91
//! Used to track if the ByteContainerStream has received a Close() call
//! By default the ByteContainerStream is open
bool m_opened{ true };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this necessary?

It seems to me that if this container is going to change to support opening and closing, the following should happen:

  1. It should probably get set to true by the constructor that takes in a buffer pointer instead of defaulting to open, so that way if other constructors (like default constructors) get added, they would correctly start as closed, same as the other types of streams.
  2. All of the operations (seek / read / write) should handle the case that they're called when the stream isn't open. Either assert, return back an error or noop, or both.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is necessary because the checking if the stream is open via the AZ::IO::GenericStream::IsOpen function would return true even if a Close() call was performed.

This was causing an issue in the ArchiveWrite::UnmountArchive as it invokes Commit to write the Archive Header and Table of Contents if the file was open, which it would continuously do even if a previous unmount call has occurred.

  1. It should probably get set to true by the constructor that takes in a buffer pointer instead of defaulting to open, so that way if other constructors (like default constructors) get added, they would correctly start as closed, same as the other types of streams.

Well I defaulted it to true in the member initializer due to the maintaining the current the ByteContainerStream API is working with being associated with a single stream from the constructor.
I do like the flexibility however supporting an Open function that would take in reference to the buffer type, in which case the state a container being "open" is no longer tied to construction.
With that change it would make sense to no longer default the m_opened to true and instead set it in the constructor/Open

  1. All of the operations (seek / read / write) should handle the case that they're called when the stream isn't open. Either assert, return back an error or noop, or both.

This was kept inline with the other GenericStream implementations where their Seek, Read and Writes don't check if a file is open before performing the operations: https://github.com/o3de/o3de/blob/development/Code/Framework/AzCore/AzCore/IO/GenericStreams.cpp#L223-L238

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's not done in GenericStream, but there's some amount of checking in general:
https://github.com/o3de/o3de/blob/development/Code/Framework/AzCore/AzCore/IO/SystemFile.cpp#L118
https://github.com/o3de/o3de/blob/development/Code/Framework/AzCore/AzCore/IO/GenericStreams.cpp#L278-L281

I guess what I was thinking is that Close() should probably clear out the m_buffer pointer, and read/write/seek should be accounting for this, because otherwise you can keep reading/writing even though it's "closed" making the tracking of "closed" not really correct or functional.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll update the ByteContainerStream, to have proper lifetime semantics, so that when the stream is not in the "open state", it will no-op if trying to seek, read or write.

@@ -15,12 +15,26 @@ namespace AZ::IO
enum class OpenMode : AZ::u32
{
Invalid = 0,
//! Open the file in read mode ("r")
//! if ModeRead is specified without ModeWrite, the file is open in file share write mode
//! when on platforms that uses WinAPI
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
//! when on platforms that uses WinAPI
//! when on platforms that use WinAPI

ModeWrite = (1 << 1),
//! Opens the file in append mode ("a")
//! The file is opened and and set the write offset to the end of the file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
//! The file is opened and and set the write offset to the end of the file
//! The file is opened and sets the write offset to the end of the file

ModeBinary = (1 << 3),
//! Opens the file with the text option("t") set
//! This may affect how newlines are handled on Windows platforms
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a suggestion:

Suggested change
//! This may affect how newlines are handled on Windows platforms
//! This affects how newlines are handled on different platforms ("\n" vs "\r\n")

@@ -764,4 +764,8 @@ namespace AZ
AZ_TYPE_INFO_SPECIALIZE(AZ::IO::FixedMaxPath, "{FA6CA49F-376A-417C-9767-DD50744DF203}");
}

//! Use this macro to simplify safe printing of a PathView or BasicPath* which may not be null-terminated.
//! Example: AZStd::fixed_string<1024>::format("Safely formatted: %.*s", AZ_PATH_ARG(myString));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a suggestion:

Suggested change
//! Example: AZStd::fixed_string<1024>::format("Safely formatted: %.*s", AZ_PATH_ARG(myString));
//! Example: AZStd::fixed_string<1024>::format("Safely formatted: " AZ_STRING_FORMAT, AZ_PATH_ARG(myPathView));

};

//! vector storing a copy of each file path in memory
//! It's length matches the value of m_fileCount
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
//! It's length matches the value of m_fileCount
//! Its length matches the value of m_fileCount

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that alignment issue is bad.
Will fix.

Gems/Archive/Code/Source/Clients/ArchiveTOC.h Show resolved Hide resolved
Comment on lines +104 to +105
tocValidationResult.m_errorMessage = ArchiveTocValidationResult::ErrorString::format(
"The Archive TOC has an invalid magic byte sequence of %llu", tocView.m_magicBytes);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: %llx for byte sequence?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will change to use hex.

lemonade-dm added a commit to aws-lumberyard-dev/o3de that referenced this pull request Jun 5, 2023
Signed-off-by: lumberyard-employee-dm <56135373+lumberyard-employee-dm@users.noreply.github.com>
lemonade-dm added a commit that referenced this pull request Jun 6, 2023
* Added support to the ByteContainerStream class to open a new stream post construction

With this change, a default constructor has been added to the
ByteContainerStream class.
The Seek, Read, Write, GetCurPos, GetLength and Truncate functions have
been updated to now trigger an `AZ_Error` when attempting to perform
their operations on closed byte container stream.

Signed-off-by: lumberyard-employee-dm <56135373+lumberyard-employee-dm@users.noreply.github.com>

* Moved increment of the TaskExecutor::m_graphsRemaining member

The number of graphs being processed now only increments if there is at
least one compiled task to run.

Signed-off-by: lumberyard-employee-dm <56135373+lumberyard-employee-dm@users.noreply.github.com>

* Addressed feedback in PR #16052 around comments and documentation

Signed-off-by: lumberyard-employee-dm <56135373+lumberyard-employee-dm@users.noreply.github.com>

* Update Gems/Archive/Code/Include/Archive/Tools/ArchiveWriterAPI.h

Co-authored-by: Mike Balfour <82224783+mbalfour-amzn@users.noreply.github.com>
Signed-off-by: lumberyard-employee-dm <56135373+lumberyard-employee-dm@users.noreply.github.com>

* Update Gems/Archive/Code/Source/Clients/ArchiveTOC.h

Co-authored-by: Mike Balfour <82224783+mbalfour-amzn@users.noreply.github.com>
Signed-off-by: lumberyard-employee-dm <56135373+lumberyard-employee-dm@users.noreply.github.com>

* Update Gems/Archive/Code/Source/Clients/ArchiveTOC.h

Co-authored-by: Mike Balfour <82224783+mbalfour-amzn@users.noreply.github.com>
Signed-off-by: lumberyard-employee-dm <56135373+lumberyard-employee-dm@users.noreply.github.com>

* Update Code/Framework/AzCore/AzCore/IO/ByteContainerStream.h

Co-authored-by: Alex Peterson <26804013+AMZN-alexpete@users.noreply.github.com>
Signed-off-by: lumberyard-employee-dm <56135373+lumberyard-employee-dm@users.noreply.github.com>

---------

Signed-off-by: lumberyard-employee-dm <56135373+lumberyard-employee-dm@users.noreply.github.com>
Co-authored-by: Mike Balfour <82224783+mbalfour-amzn@users.noreply.github.com>
Co-authored-by: Alex Peterson <26804013+AMZN-alexpete@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants