Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite PDBs? #42

Closed
ohz10 opened this issue Feb 20, 2023 · 11 comments
Closed

Rewrite PDBs? #42

ohz10 opened this issue Feb 20, 2023 · 11 comments

Comments

@ohz10
Copy link
Contributor

ohz10 commented Feb 20, 2023

Hi, I'm new to the PDB file format. I found your project as well as Microsoft's. I found your project interesting b/c it actually compiles. I am poking around the example code now.

I'm looking for some advice about rewriting PDB files. I posted a question on Stackoverflow, hoping to get some advice from someone with more experience. Do you think it's possible to read a PDB file and write a new PDB with the source file paths changed? If so, how would you recommend going about this?

Any advice you can offer would be great, thanks.

@MolecularMatters
Copy link
Owner

RawPDB is intended to be a consumer of PDB files, but not a producer, so I would not recommend using RawPDB for this task.
If you want to rewrite PDBs, your best bet might be to use the PDB support found in LLVM, since their lld-link is able to produce conforming PDB files.

However, there are still two drawbacks with this approach:

  • LLVM does not understand every little part that makes up a PDB file. If your source PDB file uses anything that LLVM does not understand, it will probably not be able to reproduce this data in the destination PDB.
  • LLVM is a large dependency.

I read your question on Stackoverflow and my recommendation would be to fix the underlying issue (why does editing the source using the junction path corrupt the file?) and not try to fix the symptoms by patching the paths in the PDB.

@ohz10
Copy link
Contributor Author

ohz10 commented Feb 21, 2023

Thank you for your input, I greatly appreciate it.

I read your question on Stackoverflow and my recommendation would be to fix the underlying issue (why does editing the source using the junction path corrupt the file?) and not try to fix the symptoms by patching the paths in the PDB.

The reason files are corrupted is because Perforce is the revision control system. It gets very unhappy if you edit files before checking them out, and it can cause corruption. Perforce doesn't understand junctions.

I 100% agree with the sentiment of fixing the actual problem, but that's not my decision in this case.

@MolecularMatters
Copy link
Owner

Thanks for clearing that up. I'm familiar with P4 and use it myself, but didn't know that it doesn't support junctions.
Thinking about this some more, I wonder if you could change anything in your (distributed) build system to make this work out of the box, without having to mess with paths.

E.g. if you look at FASTBuild, it supports distributed builds and produces PDBs that will contain paths to local files - what matters is how everything is linked together, producing the final PDB.
How is this currently done on your end? How does your distributed build system roughly work?

@ohz10
Copy link
Contributor Author

ohz10 commented Feb 22, 2023 via email

@MolecularMatters
Copy link
Owner

MolecularMatters commented Feb 23, 2023

Regarding distributed build systems, Incredibuild definitely does caching and sharing of build artefacts.
Maybe SN-DBS does that too, but I'd have to check myself.

Back to the topic, I had another idea:
Instead of rewriting the (potentially huge) PDB after it has been produced by the linker, why not try changing the paths before the PDB is built?

More specifically, you could look into building everything distributed as usual, but with the /Z7 compiler option in case you don't already do that.
Before linking, you could then "massage" the paths which are stored in the debug sections of the .obj files to be linked. Might be easier to do it this way instead of rewriting the PDB, since you can probably leave the rest of the debug information in the .obj alone.

@ohz10
Copy link
Contributor Author

ohz10 commented Feb 23, 2023 via email

@MolecularMatters
Copy link
Owner

The Incredibuild FAQ suggests they reproduce the user’s directory structure on remote build nodes, so it’s unclear to me how they’d share PDBs in that scenario (though I could see them still sharing OBJs).

Yes, they do so through virtualization AFAIK.
Though I don't understand why that should be at odds with sharing build artefacts?
Caching and building are two orthogonal things IMO.

That said, my understanding is Incredibuild doesn’t use cl to create OBJ files or PDBs any more, they have their own toolchain, and thus have more flexibility.

I don't think so.
They might be using MSBuild underneath and trick it into being better at parallel builds (which isn't that hard), but I doubt they have their own toolchain(s) for Windows, Xbox, PlayStation, etc.
Pretty sure they use the platform's native toolchain for compiling & linking.

@ohz10
Copy link
Contributor Author

ohz10 commented Feb 23, 2023

Though I don't understand why that should be at odds with sharing build artefacts?

Other than source file paths being incorrect in PDBs? No.

Pretty sure they use the platform's native toolchain for compiling & linking.

I may have misinterpreted what they said in their FAQ about why OBJ and EXE files look different than those from native builds.

@MolecularMatters
Copy link
Owner

MolecularMatters commented Feb 23, 2023 via email

@ohz10
Copy link
Contributor Author

ohz10 commented Feb 23, 2023

RawPDB works with memory-mapped files directly, and is able to read the compiland info of all translation units, which stores the source paths, object paths, etc.

If you can guarantee that your junction PDBs are always going to be equal or longer than actual local paths (e.g. by making a junction to D:\SomeReallyLongDevPathHereOrSomethingLikeThis), you could grab the const char* from RawPDB and simply overwrite them with whatever you want.

Actually, this was my plan for the first cut =) This will probably work for us as a stop-gap solution, but ultimately, we won't be able to guarantee the junction paths are longer.

@ohz10
Copy link
Contributor Author

ohz10 commented Feb 24, 2023

I think we covered everything.

@ohz10 ohz10 closed this as completed Feb 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants