Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add rebuild support for VersionInfo related structures #34

Merged
merged 1 commit into from
Jul 26, 2023

Conversation

swismer
Copy link
Collaborator

@swismer swismer commented Jul 12, 2023

I introduced a new interface to mark structures that can be updated and written back to a byte stream: ModifiableStructure. All VersionInfo related structures implement it now.

In addition, I removed some fields containing padding information as this should not be stored but recalculated.

Copy link
Owner

@kichik kichik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this require users to call rebuild() before write() so they get the write length? What would that improve?

*
* @return the updated length of this structure in bytes (might include padding)
*/
int rebuild();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do any of the implementations actually rebuild anything? At least in this diff it seems to only calculate the length and not modify the structure itself.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, in the current PR, there are only length calculations. The next PR will tackle the resources tree, where this method will layout the directories and entries, and updating all pointers between them.


import java.io.IOException;

public interface ModifiableStructure {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the interface makes it modifiable? I feel like I'm missing something with the naming of this.

Copy link
Collaborator Author

@swismer swismer Jul 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, all of the structure classes are modifiable already now. I would propose to rename it to RebuildableStructure, so it better reflects its main method (rebuild).
I will also add a JavaDoc to make it clearer.

@swismer
Copy link
Collaborator Author

swismer commented Jul 15, 2023

The idea behind splitting rebuild and write is to leave the user full control over all the fields. A researcher might deliberately change the length to some unexpected value and still being able to write that to a file. In addition, the executables I analyzed showed that there is not the one and only way of calculating these values. Some use extra padding, some not. Some store empty strings as 1 zero character, others just set the length to 0 and skip all characters.

@swismer
Copy link
Collaborator Author

swismer commented Jul 15, 2023

I updated the PR with a suggestion. I would squash the commits, once the PR is OK for you. Just let me know before merging it...

@kichik
Copy link
Owner

kichik commented Jul 21, 2023

So is that length that researches may want to change in the PE header or the physical length? How do we plan on giving them this control? Right now the length field is private. Would it make sense to expect a subclass for such changes? MyWeirdlyLengthedSection?

I updated the PR with a suggestion. I would squash the commits, once the PR is OK for you. Just let me know before merging it...

I always squash merge. No need to work extra to force push on your side. It's nice when the PR has a full history too.

@swismer
Copy link
Collaborator Author

swismer commented Jul 22, 2023

I'm not sure, I'm getting your point... The original author created all structure classes as simple POJOs having getter and setter methods, so all fields can be controlled from the outside (including all length fields - unless there are some inconsistencies, which I did not see yet). So although the length field is private, I can call getLength() and setLength(len), so there is no need for subclassing.
The length of a structure in memory (as returned by rebuild and not stored anywhere) is not meant to be customizable. In most cases, there is only one correct value, otherwise the PE is corrupt.

Regarding squash merges, I usually prefer keeping all commits if they are well done (what I try to do): Separate commits for separate sub tasks, put refactorings in dedicated commits, do NOT use it as diary ("first I did this, than that, then changed my mind") etc. When having this information in git, it makes investigations easier than having to browse the corresponding PRs to get these details. Github handles force pushes well and shows the diffs without any problems. Why not keeping the discussions and review iterations in the PR page on Github and push a clean, understandable, non-aggregated history to the git tree?

@kichik
Copy link
Owner

kichik commented Jul 24, 2023

I'm not sure, I'm getting your point... The original author created all structure classes as simple POJOs having getter and setter methods, so all fields can be controlled from the outside (including all length fields - unless there are some inconsistencies, which I did not see yet). So although the length field is private, I can call getLength() and setLength(len), so there is no need for subclassing.
The length of a structure in memory (as returned by rebuild and not stored anywhere) is not meant to be customizable. In most cases, there is only one correct value, otherwise the PE is corrupt.

So rebuild() basically calls setLength() for you with the correct value? I don't see the old code calling setLength() in write() or anything like that. But I did miss setLength() so I might have missed that too.

If that's the case, I think just the name needs to change to something like recalculateLength() and we're good. We do want to have correct lengths easily. Do you plan on putting anything else in rebuild() in the future that will be more than length?

Or maybe something like the following would be easier for the user. Each structure can calculate its own length, but also has an optional override length. So getLength() would return the overridden value or calculate the value as it should technically be. That way the user doesn't have to always remember to set the length, unless they want to actually override it. The only weirdness here would be when reading a structure that has the "incorrect" length. Do we set that as the overridden value? Do we ignore it?

Regarding squash merges, I usually prefer keeping all commits if they are well done (what I try to do): Separate commits for separate sub tasks, put refactorings in dedicated commits, do NOT use it as diary ("first I did this, than that, then changed my mind") etc. When having this information in git, it makes investigations easier than having to browse the corresponding PRs to get these details. Github handles force pushes well and shows the diffs without any problems. Why not keeping the discussions and review iterations in the PR page on Github and push a clean, understandable, non-aggregated history to the git tree?

Sounds like we agree on this one. Pushes to main branch should be clean. I usually take the PR description for the commit message of the squashed commit, not the aggregated thing GitHub auto-generates.

@swismer
Copy link
Collaborator Author

swismer commented Jul 24, 2023

I don't see the old code calling setLength() in write() or anything like that.

Correct, the old code just wrote the values as they are.

Do you plan on putting anything else in rebuild() in the future that will be more than length?

Yes, the next PR will tackle the resources tree, where this method will layout the directories and entries, and updating all pointers between them. So recalculateLength() is not an option.

I had a look at how other libraries handle this topic. LIEF, for example, also requires the user to explicitly call an additional method - see https://lief-project.github.io/doc/stable/tutorials/02_pe_from_scratch.html at the bottom: "By default the import table is not rebuilt so we have to configure the builder to rebuild it".

Sounds like we agree on this one.

Well, I actually wanted to provide arguments against squashing. If you look at the squashed PR 31 commit, you see in the commit message that 5 tasks have been done (Java update, refactoring, new functionality, add tests etc.), but the information which of these changes affected (let's say) the ResourceDirectory class is lost. Maybe it was just the refactoring task and is therefore not of great interest? Could it even be related to the Java update? We don't know anymore... Especially when looking at the history of a single file, this makes IMO a huge difference when trying to understand what and why something actually changed.

@kichik
Copy link
Owner

kichik commented Jul 26, 2023

OK, so is this PR ready to merge as-is? Or do you want to make any changes based on this discussion? I can normal merge it.

@swismer swismer force-pushed the versioninfo-rebuild-support branch from da2ba1e to 54db1fb Compare July 26, 2023 20:47
@swismer
Copy link
Collaborator Author

swismer commented Jul 26, 2023

The branch is now ready to be merged. I still think, rebuild should be suitable for the things to come. Maybe let's wait for the next PR before creating a release - we then might have a better picture, and maybe also new ideas...

@kichik kichik merged commit 454e1de into kichik:master Jul 26, 2023
7 checks passed
@swismer swismer deleted the versioninfo-rebuild-support branch August 24, 2023 17:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants