Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

workaround for std::tmpfile() returning nullptr o Windows #45

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

MichalLeonBorsuk
Copy link
Collaborator

Plus "0ULL" changed to static_cast, as C++20 interpretations differbetween Windows and Linux

PS Please excuse the line formatting, I applied clang-tidy on some files.

@MichalLeonBorsuk
Copy link
Collaborator Author

Also, find_if() seems to be C++-20, not 17, so I changed the version in CMake.

@Simplxss
Copy link
Collaborator

Notice: size_t is different in some architectures.

// In windows
#ifdef _WIN64
    typedef unsigned __int64 size_t;
#else
    typedef unsigned int     size_t;
#endif

// In linux
#define __SIZE_TYPE__ long unsigned int
   typedef __SIZE_TYPE__ size_t;

I recommand to use a fix length integer like uint32_t or uint64_t instead of size_t.

Copy link
Owner

@ihedvall ihedvall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hint: There is a .clang_format file (Google formatting) in the root of the repository. You can set up your editor to use that file if it exist or to use google formatting.

Copy link
Owner

@ihedvall ihedvall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is OK. Maybe better to wrap the code and move it to platform.h/cpp.
This solution with temp files usage should be redone but it is little bit tricky when the DT block is split into several DT/DZ blocks. If the standard said that the splits must contain the whole record, then it should be OK but currently many files have splits randomly. The function is possibly to change without temp files. Anyone like for loops ?

Copy link
Owner

@ihedvall ihedvall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed from C++ 20 to C++ 17 by request of Murmele. There is a lot of chnged files due to that.

@ihedvall
Copy link
Owner

I will add a lot of changes when the bus logging writing is ready, I have the C# interface left. I anyone can figure out why I as a repository owner, cannot create a fork/branch in GitHub!

Going back to C++ 20 is a problem for Murmele. Maybe it's OK for the time being and take the problem when Murmele actually use it.

Note that this implementation of MDF reading uses a lot of primary memory. The previous version of this library was designed for 32-bit Windows which have a limit of 2G of primary memory. Sure enough someone wanted to read in a >4G file. The solution was to cache the observers data buffers into files. This was truly a real mess.

Just curious about the WebAssembly support. Is it possible to read a file from a WebAssembly?

@NoelBruno
Copy link
Collaborator

Plus "0ULL" changed to static_cast, as C++20 interpretations differbetween Windows and Linux

PS Please excuse the line formatting, I applied clang-tidy on some files.

I am facing same trouble of null pointer with mingw / cygwin. I have modified the code to create a string variable "data_file_name" storing a file name in the current directory with the std::tmpnam() function, excluding root character '/' or '' as first character.

Here is the modified function:

void Dg4Block::ReadData(std::FILE* file) const {
if (file == nullptr) {
throw std::invalid_argument("File pointer is null");
}
const auto& block_list = DataBlockList();
if (block_list.empty()) {
return;
}

// First scan through all CN blocks and read in any SD VLSD data related data
// bytes into memory.

for (const auto& cg : cg_list_) {
if (!cg) {
continue;
}
const auto channel_list = cg->Channels();
for (const auto* channel :channel_list) {
if (channel == nullptr) {
continue;
}
const auto* cn_block = dynamic_cast<const Cn4Block*>(channel);
if (cn_block == nullptr) {
continue;
}
const auto data_link = cn_block->DataLink();
if (data_link == 0) {
continue; // No data to read into the system
}
const auto* block = Find(data_link);
if (block == nullptr) {
MDF_DEBUG() << "Missing data block in channel. Channel :"
<< cn_block->Name()
<< ", Data Link: " << data_link;

    continue; // Strange that the block doesn't exist
  }

  // Check if it relate to a VLSD CG block or a SD block
  if (cn_block->Type() == ChannelType::VariableLength &&
      block->BlockType() == "CG") {
    const auto* cg_block = dynamic_cast<const Cg4Block*>(block);
    if (cg_block != nullptr) {
      cn_block->CgRecordId(cg_block->RecordId());
    }
    // No meaning to read in the data as it is inside this DG/DT block
    continue;
  }
  cn_block->ReadData(file);
}

}

// Convert everything to a samples in a file single DT block can be read
// directly but remaining block types are streamed to a temporary file. The
// main reason is that linked data blocks not is aligned to a record or even
// worse a channel value bytes. Converting everything to a simple DT block
// solves that problem.

bool close_data_file = false;
std::FILE* data_file = nullptr;
size_t data_size = 0;
std::string data_file_name ;
if (block_list.size() == 1 && block_list[0] &&
block_list[0]->BlockType() == "DT") { // If DT read from file directly
const auto* dt = dynamic_cast<const Dt4Block*>(block_list[0].get());
if (dt != nullptr) {
SetFilePosition(file, dt->DataPosition());
data_file = file;
data_size = dt->DataSize();
}
} else {
close_data_file = true;

data_file_name = std::tmpnam(NULL) ;
if (data_file_name.size() > (size_t)0)
  {
if ((data_file_name.front() == '\\') || (data_file_name.front() == '/'))
  {
    data_file_name.erase(data_file_name.begin()) ;
  } /* data_file_name.front() == '\') || (data_file_name.front() == '/') */
  } /* data_file_name.size() > 0 */
data_file = fopen(data_file_name.c_str(), "wb+") ;
if (data_file == NULL)
  MDF_ERROR() << "Dg4Block::ReadData: unable to open temporary file "
	  << data_file_name << std::endl ;

data_size = CopyDataToFile(block_list, file, data_file);
std::rewind(data_file);  // SetFilePosition(data_file,0);

}

auto pos = GetFilePosition(data_file);
// Read through all record
ParseDataRecords(data_file, data_size);
if (data_file != nullptr && close_data_file) {
fclose(data_file);
remove(data_file_name.c_str()) ;
}

for (const auto& cg : cg_list_) {
if (!cg) {
continue;
}
for (const auto& cn : cg->Cn4()) {
if (!cn) {
continue;
}
cn->ClearData();
}
}
}

@ihedvall
Copy link
Owner

Maybe it is better for everyone, that the code was rewritten so it can read the file without opening a temporary file. Faster but little bit spagetti code, well maybe not so little.

@ihedvall
Copy link
Owner

I have made version 2.1 of the MDF library. It includes the Channel Array (CA) support as well as fix for the nonaligned numbers as 6-bit signed numbers. These changes affects the iChannel/Cn4Block files why a merge to the fork is difficult to perform.

The temp file function returning null, has never occurred in Windows 11 64-bit and in Ubuntu 64-bit. I have added so the unit test are run in Windows 11 and in Ubuntu in GitHub action. I also have the possibility to test on MacOS but there are some issue with the XCode compiler. MacOS/CMake using an old Clang compiler just before C++20 which cause problem for all my libraries.

It's a problem to test all OS and compiler combinations with the GitHub actions due to the fact that it stops at first error, so it takes a long time to find all errors. Most compiler errors are easy to fix typical Clang/Gnu/VC compiler issues. If anyone have a compiler/OS combination that is possible to test the library + tools before check-in to the GitHub (action) and do a build YML file (see existing yml file). I recommend that avoid the matrix parallel build as it seems to cancel each other if something fails.

For the tmpfile problem, I suggest to move the code snippets to the MdfHelper or Platform class and add have all #ifdef there.
Then write a unit test that stress test that code snippets in all environments.

@ihedvall
Copy link
Owner

This pull request needs to be synchronized with main branch but this will be tricky as there is overlapping fixes.

I do have a proposal to somewhat solve the tmpfile problem. The extra file is needed to solve the problem when the "DT" block is split or compressed.

  1. If it is an uncompressed single DT block, use current read record by record from the input file. This minimize the memory consumption and reads the file quite fast.
  2. If not a single data block (HL/DL/DZ/DT), then try to allocate the whole required data bytes into memory and copy the bytes from file to memory.
  3. If 2 fails, then use the current strategy with temporary file.

Maybe step 1 can be skipped. This will speed up the reading somewhat.

Any ides ?

@ihedvall
Copy link
Owner

ihedvall commented Jan 4, 2024

FYI
I will do a ReadCache class that solves the problem with tmpfile() i.e. it reads directly from the input file. This is primary due to long read times for large files (> 10G). I will do this on the main branch. Someone else is working on the emscripten (WASM) solution.

@ihedvall
Copy link
Owner

ihedvall commented Jan 6, 2024

I have checked in the Read Cache functionality for MDF 4 files. It solves the tmpfile() problem as well as improving the read speed, quite a lot actually. Most of the above changes are obsolete. I don't if a merge is possible/needed at this time.

Please let me know if you need my help with the merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants