Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SD 22 and SD 23 occurences have increased drastically lately #1340

Open
zo3n opened this issue Apr 2, 2020 · 17 comments
Open

SD 22 and SD 23 occurences have increased drastically lately #1340

zo3n opened this issue Apr 2, 2020 · 17 comments
Labels
bug Something isn't working

Comments

@zo3n
Copy link

zo3n commented Apr 2, 2020

Describe the bug
I am not really sure how to describe this other than the title. While administering another server I can notice that SD 22 and SD 23 are shown in like 15 players out of 150. This is not normal.
SD 22 seems to happen the most. I have recieved the error now myself:

HTTP server file mismatch (admin) admin_spectator.lua [Got size:0 MD5:00000000000000000000000000000000, wanted MD5:8DB609CB32CA48997CE2CF5BAF9DDB7D]
Download error: HTTP server file mismatch (admin) admin_spectator.lua [Got size:0 MD5:00000000000000000000000000000000, wanted MD5:8DB609CB32CA48997CE2CF5BAF9DDB7D]

Last time I encountered HTTP errors like this was in 2015
Some of my friends have also encountered this MD5 0000 thing within last month.

I won't say with absolute certainty that this is not normal, but I have a good feeling that the frequency of it has increased within last week or month. That is the only reason I'm reporting it; so that it can get some attention, or even better, be solved or explained.

To reproduce
I don't know

Expected behaviour
To not happen

Version
I don't know

Additional Information
Restarting the resource affected doesn't solve the issue, and still spits out errors:

Unable to delete old file admin\client\gui\admin_spectator.lua
HTTP server file mismatch (admin) admin_spectator.lua [Got size:0 MD5:00000000000000000000000000000000, wanted MD5:8DB609CB32CA48997CE2CF5BAF9DDB7D]
Download error: HTTP server file mismatch (admin) admin_spectator.lua [Got size:0 MD5:00000000000000000000000000000000, wanted MD5:8DB609CB32CA48997CE2CF5BAF9DDB7D]

This results in admin panel being bugged without having any other fix other than restarting the server, which can become a problem if it's a production server.

@zo3n zo3n added the bug Something isn't working label Apr 2, 2020
@zo3n
Copy link
Author

zo3n commented Apr 3, 2020

image

I have noticed that when i try to delete affected resource from cache, it doesn't allow me to delete it "because it's in use", even if i am not connected to any server

Additionally, when I tried to copy the affected resource (ipb in my newer case) it says that it cannot do that because it is open in Multi Theft Auto WOW64 Helper

@ccw808
Copy link
Member

ccw808 commented Apr 3, 2020

What happens if you try deleting the individual files in the folder. Do you get the same error for every file?

@zo3n
Copy link
Author

zo3n commented Apr 3, 2020

Other files (which weren't corrupted) can be deleted successfully, while affected one(s) cannot due to being used by WOW64 Helper

This is what I got when trying to delete hook.lua from ipb

image

@ccw808
Copy link
Member

ccw808 commented Apr 3, 2020

WOW64 helper should not be using any files.
What is the MD5 of MTA\wow64_helper.exe ?
What is the MD5 of MTA\libwow64.dll ?
What is the version of MTA\libwow64.dll ? (Right click->Properties->Details->File version)

@zo3n
Copy link
Author

zo3n commented Apr 3, 2020

MTA\wow64_helper.exe: FE8544D6C350C0559741029F8E1A3607
MTA\libwow64.dll: 7324074994D1C9F8850EF56F1D5D3C37
MTA\libwow64.dll Version: 1.5.7.20447

I should note that I have accidentally (brain on autopilot) closed my MTA and now I won't be getting the errors for a while, but I'm sure they'll reappear like they always do lately (saying this in case if there will be another test with cache files, that i'll need to get the error again to do the test)

@ccw808
Copy link
Member

ccw808 commented Apr 3, 2020

If it happens again, try terminating wow64_helper.exe using Windows Task Manager

@qaisjp
Copy link
Contributor

qaisjp commented Oct 30, 2020

I just encountered this randomly now, on Multi Theft Auto v1.5.8-release-20704:

image

Extra context:

  • I've been having very bad computer slowdowns recently (not because of MTA) and I think is because of my HDD, but I am not sure. My PC is due a reinstall/refresh of Windows.
  • The problem in the image above coincides with the 100% active time seen on the left most part of this graph
    image
  • Client resource files is on my SSD (E:\Documents\mtasa-blue\Bin\mods\deathmatch), which has a consistently very low active time %. My badly behaving HDD is on the D drive, so maybe it's not linked. But perhaps whatever is causing the system-wide slowdowns also caused the md5 00000 issue to appear for me.
  • I can't reproduce it.

@Pirulax
Copy link
Contributor

Pirulax commented Oct 30, 2020

CCheckSum is default initalized to null. (this is why the md5 is full null)
And CCheckSum remains null if the file coudn't be opened(due to maybe the HTTP server still writing to it, or something) when the checksum needs to be generated (client side), and we don't retry, thus the generated checksum is null, and it's considered valid, so this pops up.
Now, the fix would be to log the error, and retry generating the checksum.
I have a pr #1511 which could address this issue.

@Dutchman101
Copy link
Member

Dutchman101 commented Oct 30, 2020

This issue is almost guaranteed to happen when you join a server with a decent total download size for the first time.. as in, near completion it will read/write lock a random file, which will remain stuck (and in use) until you restart MTA, even if you remain disconnected from anything for a while. As in, with a decently sized DL, it's probably going to happen to some file.

I can tell the majority of occurences happens in this way - after which player is forced to restart MTA and can then completely download server resources, but in multiple runs. Idk if the file being stuck is a result of bad implementation (e.g download error > file remains in use and locked) or if the download error is a result of the file becoming stuck for some reason.

Anyways, servers using both internal and external HTTP are affected, so we're probably looking for a client downloader bug.

Having a null MD5 string can indicate it opens a handle to the file that's about to be downloaded, but doesn't write the contents to it (due to being read/write locked.. other such cases also show this), after which it also doesn't release the handle so it remains stuck and in use.

@qaisjp
Copy link
Contributor

qaisjp commented Oct 30, 2020

This issue is almost guaranteed to happen when you join a server with a decent total download size for the first time.

Sounds like a potential race condition between downloader and checker. (I don't know how distinct those two things are in our code.)

player is forced to restart MTA [..] Idk if the file being stuck is a result of bad implementation (e.g download error > file remains in use and locked) or if the download error is a result of the file becoming stuck for some reason.

In my case I just tried to reconnect and it was fine. And yeah I agree that maybe some file handle is being left open somewhere.


One obvious issue is that this code just returns null if File::Fopen returns null for some reason (which will happen for any error and not just the file being missing, as the comment suggests):

unsigned long CRCGenerator::GetCRCFromFile ( const char* szFilename, unsigned long ucOldCRC )
{
// Open the file
FILE* pFile = File::Fopen ( szFilename, "rb" );
if ( pFile )
{
// Start at the old CRC
unsigned long ulCRC = ucOldCRC;
// While we're not at the end
char pBuffer [65536];
do
{
// Try to read 65536 bytes. It returns number of bytes actually read.
// Then CRC that using the CRC from last loop as beginning. This should
// be faster/more compatible than allocating a huge buffer for the entire
// file.
size_t sizeRead = fread ( pBuffer, 1, 65536, pFile );
ulCRC = crc32 ( ulCRC, (Bytef*) pBuffer, sizeRead );
}
while ( !feof ( pFile ) );
// Close it and return the CRC
fclose ( pFile );
return ulCRC;
}
// Not exist
return 0;
}

And we just use that result without doing any error checking:

// static generators
static CChecksum GenerateChecksumFromFile(const SString& strFilename)
{
CChecksum result;
result.ulCRC = CRCGenerator::GetCRCFromFile(strFilename);
CMD5Hasher().Calculate(strFilename, result.md5);
return result;
}

@brzys
Copy link

brzys commented Oct 30, 2020

I can say it also depends on HTTP server performance.
With internal HTTP server the issue can occur quite frequently, especially with larger downloads. Can't really see this on nginx or Apache.

@AfuSensi
Copy link

This has been happening quite regularly on our server, using the internal HTTP server.
The only thing that works for our players is removing the affected file from the client's resources folder and reconnecting.

I have confirmed the checksum is correct in both the server's resource and cache directories, but is corrupted on the client's PC.

Unfortunately i am unable to do more tests since i have not found a way to reproduce.

@qaisjp
Copy link
Contributor

qaisjp commented Oct 31, 2020

One easy to way to trigger the 00000 error message is by following these instructions:

  1. Open PowerShell
  2. Type $file = [System.IO.File]::Open("C:\Users\qaisjp\Documents\mtasa-blue\Bin\mods\deathmatch\resources\freeroam\gui.lua", "Open", "Read", "None")
    • Update the path to your own client resources folder
    • The final parameter "None" means:

      Declines sharing of the current file. Any request to open the file (by this process or another process) will fail until the file is closed.

    • When you are done testing, in your PowerShell window, type $file.close() to close your handle to the file. You don't need to type this command now, but just make sure you don't overwrite the $file variable by executing this command multiple times. (You could could try to executing it again, assigning to a different variable name, and it should say that it can't open the file.)
  3. Make sure you have <enablesd>22</enablesd> in your server config.
  4. Start your server, start the freeroam resource if you've changed the defaults, and join the server
  5. You should get this:
    image
  6. When you are done testing, in your PowerShell window, type $file.close() to close your handle to the file.

You can also simulate an invalid checksum by simply changing "None" to "Read", which only allows other processes (like MTA) to read the file, but not write to it. (Note that if you do this, you should probably change the content of gui.lua so that it does not just pick up a valid script with a valid checksum.)

I've got a pull request ready that will improve error messages (for now, only for internal HTTP servers), which you'll probably see in the timeline here in the next couple of minutes.

This issue is almost guaranteed to happen when you join a server with a decent total download size for the first time

So some of the errors will be something from https://docs.microsoft.com/en-us/cpp/c-runtime-library/errno-doserrno-sys-errlist-and-sys-nerr?view=msvc-160 so any of these error messages could describe a user's exact issue. E.g. memory problems, too many files open (unlikely since this is apparently done sequentially, see #1511), etc.


When reading files for checksumming, we open it using File::Fopen (see my previous comment). That looks like this:

FILE* SharedUtil::File::Fopen(const char* szFilename, const char* szMode)
{
#ifdef WIN32
return _wfsopen(FromUTF8(szFilename), FromUTF8(szMode), _SH_DENYNO);
#else
return fopen(szFilename, szMode);
#endif
}

The _SH_DENYNO option means "Permits [other processes] read and write access". In this particular scenario we could use _SH_DENYWR, which allows read, but denies write access. This should help alleviate some issues. This option is only available for Windows -- I don't think Linux fully supports file locking.

@Pirulax
Copy link
Contributor

Pirulax commented Oct 31, 2020


dassert is a debug-mode-only assert. So it does nothing in release.

Basically what I think is happening, that a resource include tries to start this resource while it's files are still being downloaded. Then this line

if (CChecksum::GenerateChecksumFromFile(pResourceFile->GetName()) == pResourceFile->GetServerChecksum())
{
}
else
{
HandleDownloadedFileTrouble(pResourceFile, false);
}
tries to generate the checksum, and it fails, because (probably) the HTTP server has locked the file, because it's writing data to it.

The solution might be to somehow delay the resource loading until CanBeLoaded() returns true, because im pretty sure it returns false.
We could queue resources up - so they still start in order - if one of them can' be loaded at the moment.

Edit: Looking thru the comments have confirmed that it really is the HTTP server. As the checksum matches afterwards, and the file can be deleted, which means that it coudn't be opened before because the HTTP server was writing to it.

@Pirulax
Copy link
Contributor

Pirulax commented Dec 11, 2020

This also seems to happen if theres a space in the filename. Space in the path causes inconsistencies. Eg.: files/vehicle 10.dff would cause the server to not update the file, or something..
Eg.:
image

@Dutchman101
Copy link
Member

Dutchman101 commented Feb 9, 2021

This also seems to happen if theres a space in the filename. Space in the path causes inconsistencies. Eg.: files/vehicle 10.dff would cause the server to not update the file, or something..

That's issue #1466

Anyways, here's an update.. after 11f94de (PR #1778) the error messages that fit my story: #1340 (comment) .. of the most common download failure, which is when you first download a server's resources or need to download a significant update

Have changed to more descriptive:
#1002 HTTP server file mismatch! (resourcename) scriptfile.lua [CRC could not open file: Permission denied]

So does "CRC could not open file: Permission denied" mean something, or is the logging that we added with PR #1778 still not sufficient to find out?

It still sounds like affected files are in use (being occupied) by MTA while the HTTP server wants to modify/overwrite it into its final form. Can that be the meaning of "Permission denied" though?

@Pirulax
Copy link
Contributor

Pirulax commented Jan 18, 2022

We could make the error more specific by using Win32 API stuff to open the file instead of C stdio (So like using CreateFile instead of fopen).
Also.. Maybe by using CreateFile we could mitigate the issue, not sure.
Yeah, not the most ideal solution, but could work.

Could also use fopen_s which hands back an error code.

Edit: Took a look at it. It uses errno which should (in theory) have the same functionality.
It's a simple "access denied", I don't think we can get any more info out of it.
We could perhaps.. retry a few times? Instead of just erroring out?
Also.. people say only a restart solves the issue, which leads me to believe it's some file handle leak issue in the HTTP server on the client-side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

7 participants