Skip to content

Commit

Permalink
[NFC] Refactor symbol table parsing.
Browse files Browse the repository at this point in the history
Symbol table parsing has evolved over the years and many plug-ins contained duplicate code in the ObjectFile::GetSymtab() that used to be pure virtual. With this change, the "Symbtab *ObjectFile::GetSymtab()" is no longer virtual and will end up calling a new "void ObjectFile::ParseSymtab(Symtab &symtab)" pure virtual function to actually do the parsing. This helps centralize the code for parsing the symbol table and allows the ObjectFile base class to do all of the common work, like taking the necessary locks and creating the symbol table object itself. Plug-ins now just need to parse when they are asked to parse as the ParseSymtab function will only get called once.

This is a retry of the original patch https://reviews.llvm.org/D113965 which was reverted. There was a deadlock in the Manual DWARF indexing code during symbol preloading where the module was asked on the main thread to preload its symbols, and this would in turn cause the DWARF manual indexing to use a thread pool to index all of the compile units, and if there were relocations on the debug information sections, these threads could ask the ObjectFile to load section contents, which could cause a call to ObjectFileELF::RelocateSection() which would ask for the symbol table from the module and it would deadlock. We can't lock the module in ObjectFile::GetSymtab(), so the solution I am using is to use a llvm::once_flag to create the symbol table object once and then lock the Symtab object. Since all APIs on the symbol table use this lock, this will prevent anyone from using the symbol table before it is parsed and finalized and will avoid the deadlock I mentioned. ObjectFileELF::GetSymtab() was never locking the module lock before and would put off creating the symbol table until somewhere inside ObjectFileELF::GetSymtab(). Now we create it one time inside of the ObjectFile::GetSymtab() and immediately lock it which should be safe enough. This avoids the deadlocks and still provides safety.

Differential Revision: https://reviews.llvm.org/D114288
  • Loading branch information
clayborg committed Nov 30, 2021
1 parent 80cdf0d commit 7e6df41
Show file tree
Hide file tree
Showing 22 changed files with 353 additions and 373 deletions.
28 changes: 24 additions & 4 deletions lldb/include/lldb/Symbol/ObjectFile.h
Expand Up @@ -19,6 +19,7 @@
#include "lldb/Utility/FileSpec.h"
#include "lldb/Utility/UUID.h"
#include "lldb/lldb-private.h"
#include "llvm/Support/Threading.h"
#include "llvm/Support/VersionTuple.h"

namespace lldb_private {
Expand Down Expand Up @@ -322,12 +323,26 @@ class ObjectFile : public std::enable_shared_from_this<ObjectFile>,
/// Gets the symbol table for the currently selected architecture (and
/// object for archives).
///
/// Symbol table parsing can be deferred by ObjectFile instances until this
/// accessor is called the first time.
/// This function will manage when ParseSymtab(...) is called to actually do
/// the symbol table parsing in each plug-in. This function will take care of
/// taking all the necessary locks and finalizing the symbol table when the
/// symbol table does get parsed.
///
/// \return
/// The symbol table for this object file.
virtual Symtab *GetSymtab() = 0;
Symtab *GetSymtab();

/// Parse the symbol table into the provides symbol table object.
///
/// Symbol table parsing will be done once when this function is called by
/// each object file plugin. All of the necessary locks will already be
/// acquired before this function is called and the symbol table object to
/// populate is supplied as an argument and doesn't need to be created by
/// each plug-in.
///
/// \param
/// The symbol table to populate.
virtual void ParseSymtab(Symtab &symtab) = 0;

/// Perform relocations on the section if necessary.
///
Expand Down Expand Up @@ -708,7 +723,12 @@ class ObjectFile : public std::enable_shared_from_this<ObjectFile>,
const lldb::addr_t m_memory_addr;
std::unique_ptr<lldb_private::SectionList> m_sections_up;
std::unique_ptr<lldb_private::Symtab> m_symtab_up;
uint32_t m_synthetic_symbol_idx;
/// We need a llvm::once_flag that we can use to avoid locking the module
/// lock and deadlocking LLDB. See comments in ObjectFile::GetSymtab() for
/// the full details. We also need to be able to clear the symbol table, so we
/// need to use a std::unique_ptr to a llvm::once_flag so if we clear the
/// symbol table, we can have a new once flag to use when it is created again.
std::unique_ptr<llvm::once_flag> m_symtab_once_up;

/// Sets the architecture for a module. At present the architecture can
/// only be set if it is invalid. It is not allowed to switch from one
Expand Down
9 changes: 1 addition & 8 deletions lldb/include/lldb/Symbol/Symtab.h
Expand Up @@ -119,20 +119,13 @@ class Symtab {
lldb::addr_t file_addr, std::function<bool(Symbol *)> const &callback);
void FindFunctionSymbols(ConstString name, uint32_t name_type_mask,
SymbolContextList &sc_list);
void CalculateSymbolSizes();

void SortSymbolIndexesByValue(std::vector<uint32_t> &indexes,
bool remove_duplicates) const;

static void DumpSymbolHeader(Stream *s);

void Finalize() {
// Shrink to fit the symbols so we don't waste memory
if (m_symbols.capacity() > m_symbols.size()) {
collection new_symbols(m_symbols.begin(), m_symbols.end());
m_symbols.swap(new_symbols);
}
}
void Finalize();

void AppendSymbolNamesToMap(const IndexCollection &indexes,
bool add_demangled, bool add_mangled,
Expand Down
11 changes: 7 additions & 4 deletions lldb/source/Core/Module.cpp
Expand Up @@ -1379,12 +1379,15 @@ void Module::PreloadSymbols() {
if (!sym_file)
return;

// Prime the symbol file first, since it adds symbols to the symbol table.
sym_file->PreloadSymbols();

// Now we can prime the symbol table.
// Load the object file symbol table and any symbols from the SymbolFile that
// get appended using SymbolFile::AddSymbols(...).
if (Symtab *symtab = sym_file->GetSymtab())
symtab->PreloadSymbols();

// Now let the symbol file preload its data and the symbol table will be
// available without needing to take the module lock.
sym_file->PreloadSymbols();

}

void Module::SetSymbolFileFileSpec(const FileSpec &file) {
Expand Down
Expand Up @@ -116,9 +116,10 @@ bool ObjectFileBreakpad::ParseHeader() {
return true;
}

Symtab *ObjectFileBreakpad::GetSymtab() {
// TODO
return nullptr;
void ObjectFileBreakpad::ParseSymtab(Symtab &symtab) {
// Nothing to do for breakpad files, all information is parsed as debug info
// which means "lldb_private::Function" objects are used, or symbols are added
// by the SymbolFileBreakpad::AddSymbols(...) function in the symbol file.
}

void ObjectFileBreakpad::CreateSections(SectionList &unified_section_list) {
Expand Down
Expand Up @@ -71,7 +71,7 @@ class ObjectFileBreakpad : public ObjectFile {
return AddressClass::eInvalid;
}

Symtab *GetSymtab() override;
void ParseSymtab(lldb_private::Symtab &symtab) override;

bool IsStripped() override { return false; }

Expand Down
244 changes: 110 additions & 134 deletions lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp
Expand Up @@ -2687,155 +2687,131 @@ unsigned ObjectFileELF::RelocateDebugSections(const ELFSectionHeader *rel_hdr,
return 0;
}

Symtab *ObjectFileELF::GetSymtab() {
void ObjectFileELF::ParseSymtab(Symtab &lldb_symtab) {
ModuleSP module_sp(GetModule());
if (!module_sp)
return nullptr;
return;

Progress progress(
llvm::formatv("Parsing symbol table for {0}",
m_file.GetFilename().AsCString("<Unknown>")));
ElapsedTime elapsed(module_sp->GetSymtabParseTime());

// We always want to use the main object file so we (hopefully) only have one
// cached copy of our symtab, dynamic sections, etc.
ObjectFile *module_obj_file = module_sp->GetObjectFile();
if (module_obj_file && module_obj_file != this)
return module_obj_file->GetSymtab();

if (m_symtab_up == nullptr) {
Progress progress(
llvm::formatv("Parsing symbol table for {0}",
m_file.GetFilename().AsCString("<Unknown>")));
ElapsedTime elapsed(module_sp->GetSymtabParseTime());
SectionList *section_list = module_sp->GetSectionList();
if (!section_list)
return nullptr;
return module_obj_file->ParseSymtab(lldb_symtab);

uint64_t symbol_id = 0;
std::lock_guard<std::recursive_mutex> guard(module_sp->GetMutex());

// Sharable objects and dynamic executables usually have 2 distinct symbol
// tables, one named ".symtab", and the other ".dynsym". The dynsym is a
// smaller version of the symtab that only contains global symbols. The
// information found in the dynsym is therefore also found in the symtab,
// while the reverse is not necessarily true.
Section *symtab =
section_list->FindSectionByType(eSectionTypeELFSymbolTable, true).get();
if (symtab) {
m_symtab_up = std::make_unique<Symtab>(symtab->GetObjectFile());
symbol_id += ParseSymbolTable(m_symtab_up.get(), symbol_id, symtab);
}

// The symtab section is non-allocable and can be stripped, while the
// .dynsym section which should always be always be there. To support the
// minidebuginfo case we parse .dynsym when there's a .gnu_debuginfo
// section, nomatter if .symtab was already parsed or not. This is because
// minidebuginfo normally removes the .symtab symbols which have their
// matching .dynsym counterparts.
if (!symtab ||
GetSectionList()->FindSectionByName(ConstString(".gnu_debugdata"))) {
Section *dynsym =
section_list->FindSectionByType(eSectionTypeELFDynamicSymbols, true)
.get();
if (dynsym) {
if (!m_symtab_up)
m_symtab_up = std::make_unique<Symtab>(dynsym->GetObjectFile());
symbol_id += ParseSymbolTable(m_symtab_up.get(), symbol_id, dynsym);
}
}
SectionList *section_list = module_sp->GetSectionList();
if (!section_list)
return;

// DT_JMPREL
// If present, this entry's d_ptr member holds the address of
// relocation
// entries associated solely with the procedure linkage table.
// Separating
// these relocation entries lets the dynamic linker ignore them during
// process initialization, if lazy binding is enabled. If this entry is
// present, the related entries of types DT_PLTRELSZ and DT_PLTREL must
// also be present.
const ELFDynamic *symbol = FindDynamicSymbol(DT_JMPREL);
if (symbol) {
// Synthesize trampoline symbols to help navigate the PLT.
addr_t addr = symbol->d_ptr;
Section *reloc_section =
section_list->FindSectionContainingFileAddress(addr).get();
if (reloc_section) {
user_id_t reloc_id = reloc_section->GetID();
const ELFSectionHeaderInfo *reloc_header =
GetSectionHeaderByIndex(reloc_id);
if (reloc_header) {
if (m_symtab_up == nullptr)
m_symtab_up =
std::make_unique<Symtab>(reloc_section->GetObjectFile());

ParseTrampolineSymbols(m_symtab_up.get(), symbol_id, reloc_header,
reloc_id);
}
}
}
uint64_t symbol_id = 0;

if (DWARFCallFrameInfo *eh_frame =
GetModule()->GetUnwindTable().GetEHFrameInfo()) {
if (m_symtab_up == nullptr)
m_symtab_up = std::make_unique<Symtab>(this);
ParseUnwindSymbols(m_symtab_up.get(), eh_frame);
// Sharable objects and dynamic executables usually have 2 distinct symbol
// tables, one named ".symtab", and the other ".dynsym". The dynsym is a
// smaller version of the symtab that only contains global symbols. The
// information found in the dynsym is therefore also found in the symtab,
// while the reverse is not necessarily true.
Section *symtab =
section_list->FindSectionByType(eSectionTypeELFSymbolTable, true).get();
if (symtab)
symbol_id += ParseSymbolTable(&lldb_symtab, symbol_id, symtab);

// The symtab section is non-allocable and can be stripped, while the
// .dynsym section which should always be always be there. To support the
// minidebuginfo case we parse .dynsym when there's a .gnu_debuginfo
// section, nomatter if .symtab was already parsed or not. This is because
// minidebuginfo normally removes the .symtab symbols which have their
// matching .dynsym counterparts.
if (!symtab ||
GetSectionList()->FindSectionByName(ConstString(".gnu_debugdata"))) {
Section *dynsym =
section_list->FindSectionByType(eSectionTypeELFDynamicSymbols, true)
.get();
if (dynsym)
symbol_id += ParseSymbolTable(&lldb_symtab, symbol_id, dynsym);
}

// DT_JMPREL
// If present, this entry's d_ptr member holds the address of
// relocation
// entries associated solely with the procedure linkage table.
// Separating
// these relocation entries lets the dynamic linker ignore them during
// process initialization, if lazy binding is enabled. If this entry is
// present, the related entries of types DT_PLTRELSZ and DT_PLTREL must
// also be present.
const ELFDynamic *symbol = FindDynamicSymbol(DT_JMPREL);
if (symbol) {
// Synthesize trampoline symbols to help navigate the PLT.
addr_t addr = symbol->d_ptr;
Section *reloc_section =
section_list->FindSectionContainingFileAddress(addr).get();
if (reloc_section) {
user_id_t reloc_id = reloc_section->GetID();
const ELFSectionHeaderInfo *reloc_header =
GetSectionHeaderByIndex(reloc_id);
if (reloc_header)
ParseTrampolineSymbols(&lldb_symtab, symbol_id, reloc_header, reloc_id);
}
}

// If we still don't have any symtab then create an empty instance to avoid
// do the section lookup next time.
if (m_symtab_up == nullptr)
m_symtab_up = std::make_unique<Symtab>(this);

// In the event that there's no symbol entry for the entry point we'll
// artificially create one. We delegate to the symtab object the figuring
// out of the proper size, this will usually make it span til the next
// symbol it finds in the section. This means that if there are missing
// symbols the entry point might span beyond its function definition.
// We're fine with this as it doesn't make it worse than not having a
// symbol entry at all.
if (CalculateType() == eTypeExecutable) {
ArchSpec arch = GetArchitecture();
auto entry_point_addr = GetEntryPointAddress();
bool is_valid_entry_point =
entry_point_addr.IsValid() && entry_point_addr.IsSectionOffset();
addr_t entry_point_file_addr = entry_point_addr.GetFileAddress();
if (is_valid_entry_point && !m_symtab_up->FindSymbolContainingFileAddress(
entry_point_file_addr)) {
uint64_t symbol_id = m_symtab_up->GetNumSymbols();
// Don't set the name for any synthetic symbols, the Symbol
// object will generate one if needed when the name is accessed
// via accessors.
SectionSP section_sp = entry_point_addr.GetSection();
Symbol symbol(
/*symID=*/symbol_id,
/*name=*/llvm::StringRef(), // Name will be auto generated.
/*type=*/eSymbolTypeCode,
/*external=*/true,
/*is_debug=*/false,
/*is_trampoline=*/false,
/*is_artificial=*/true,
/*section_sp=*/section_sp,
/*offset=*/0,
/*size=*/0, // FDE can span multiple symbols so don't use its size.
/*size_is_valid=*/false,
/*contains_linker_annotations=*/false,
/*flags=*/0);
// When the entry point is arm thumb we need to explicitly set its
// class address to reflect that. This is important because expression
// evaluation relies on correctly setting a breakpoint at this
// address.
if (arch.GetMachine() == llvm::Triple::arm &&
(entry_point_file_addr & 1)) {
symbol.GetAddressRef().SetOffset(entry_point_addr.GetOffset() ^ 1);
m_address_class_map[entry_point_file_addr ^ 1] =
AddressClass::eCodeAlternateISA;
} else {
m_address_class_map[entry_point_file_addr] = AddressClass::eCode;
}
m_symtab_up->AddSymbol(symbol);
if (DWARFCallFrameInfo *eh_frame =
GetModule()->GetUnwindTable().GetEHFrameInfo()) {
ParseUnwindSymbols(&lldb_symtab, eh_frame);
}

// In the event that there's no symbol entry for the entry point we'll
// artificially create one. We delegate to the symtab object the figuring
// out of the proper size, this will usually make it span til the next
// symbol it finds in the section. This means that if there are missing
// symbols the entry point might span beyond its function definition.
// We're fine with this as it doesn't make it worse than not having a
// symbol entry at all.
if (CalculateType() == eTypeExecutable) {
ArchSpec arch = GetArchitecture();
auto entry_point_addr = GetEntryPointAddress();
bool is_valid_entry_point =
entry_point_addr.IsValid() && entry_point_addr.IsSectionOffset();
addr_t entry_point_file_addr = entry_point_addr.GetFileAddress();
if (is_valid_entry_point && !lldb_symtab.FindSymbolContainingFileAddress(
entry_point_file_addr)) {
uint64_t symbol_id = lldb_symtab.GetNumSymbols();
// Don't set the name for any synthetic symbols, the Symbol
// object will generate one if needed when the name is accessed
// via accessors.
SectionSP section_sp = entry_point_addr.GetSection();
Symbol symbol(
/*symID=*/symbol_id,
/*name=*/llvm::StringRef(), // Name will be auto generated.
/*type=*/eSymbolTypeCode,
/*external=*/true,
/*is_debug=*/false,
/*is_trampoline=*/false,
/*is_artificial=*/true,
/*section_sp=*/section_sp,
/*offset=*/0,
/*size=*/0, // FDE can span multiple symbols so don't use its size.
/*size_is_valid=*/false,
/*contains_linker_annotations=*/false,
/*flags=*/0);
// When the entry point is arm thumb we need to explicitly set its
// class address to reflect that. This is important because expression
// evaluation relies on correctly setting a breakpoint at this
// address.
if (arch.GetMachine() == llvm::Triple::arm &&
(entry_point_file_addr & 1)) {
symbol.GetAddressRef().SetOffset(entry_point_addr.GetOffset() ^ 1);
m_address_class_map[entry_point_file_addr ^ 1] =
AddressClass::eCodeAlternateISA;
} else {
m_address_class_map[entry_point_file_addr] = AddressClass::eCode;
}
lldb_symtab.AddSymbol(symbol);
}

m_symtab_up->CalculateSymbolSizes();
}

return m_symtab_up.get();
}

void ObjectFileELF::RelocateSection(lldb_private::Section *section)
Expand Down

0 comments on commit 7e6df41

Please sign in to comment.