-
-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
answers.log contains incomplete or malformed lines #2016
Comments
It may be useful to compare against this course's transaction log from the server. Ask your server admin for the file |
Thanks for taking an interest. I will follow your suggestion immediately, and report back when I have the file you specify. |
The IT crew have sent me
(I used repeated a's to hide the students' internal alphanumeric identifiers, at the request of our IT team.) The exceptions come in various forms, including essentially blank lines, but a substantial fraction of them look like the middle line in the 3-line snapshot above. That is, they look like the tail end of a line that could make a reasonable log entry. The scale of my course is such that the WW server is pretty busy. The timestamps of good lines before and after the problematic ones are often quite close together ... though sometimes a gap of several seconds can be observed. Finally (for now), the number of strange lines in What else I can report on to help diagnose this behaviour? Thanks! |
I'm just curious, but do you observe any relationship between the corresponding lines from the two files? For example in the recent post, at In either file, do strange lines happen with about the same frequency among the most recent entries? Probably there is an issue with writing these lines in the first place, but it's possible they are altered later. If you don't see this among recent timestamps, that would be a clue. If you find recent weird lines in either file, take note of the timestamp, and then check other logs on the server for anything curious at those times. The first to look at would be the apache error log. Assuming you have apache log file rotation going, if the most recent instance is a few days ago already you will have to uncompress that day's log file. And if it's too far in the past, that day's log file will be gone by now. I searched all my 2.17 production server transaction logs for lines that don't start with |
Note that if we can't determine that this is some sort of WeBWorK bug or deficiency, then this shouldn't be an issue here in GitHub. Posting about it in thee forum may be more helpful, as people with Linux experience may have relevant wisdom to share. |
I suspect that WeBWorK logging code was written assuming the file system race conditions would not cause problems, and under enough load some sort of conflict/race-condition is occurring on your system. Looking at https://metacpan.org/pod/Path::Tiny supports using @drgrice1 - What do you think? There is a lower level discussion of locking in https://docstore.mik.ua/orelly/perl4/cook/ch07_19.htm |
It is very possible that the lack of file locking could cause something like this. When multiple server processes could write to the same file, file locking should always be utilized to prevent this sort of thing. I have written quite a bit of code that uses |
More is probably needed with this as there are probably other files that multiple server processes could be attempting to write to at the same time. Finding these would take careful review of the code. This is to address issue openwebwork#2016.
If (if only) we were elite athletes in a relay race, I would be gratefully passing the baton to faster runners at this point. I'll try to respond to some of the points above, then step back and watch with admiration as the experts move forward. @Alex-Jordan asks for a comparison between @taniwallach suggests that there may be a race condition to worry about. The examples below seem to support this hypothesis. I just picked 3 instances using no selection criteria at all, and in all cases there were multiple actions involving different student users in the same second. Races are notoriously hard to debug. Thanks to @drgrice1 for proposing some edits that might work to reduce them. As noted above, I am well out of my depth at this point. I will continue to watch with interest and respond to direct questions, but that's about all I can offer. Here are the 3 case studies I promised earlier.
Commentary: The mangled line in transactions.log includes the string "4!" that appears also in the correct first line taken from
Commentary: 1. The problem line in
Commentary: It looks like the bad line in Hearty thanks to the experts named above, and to anyone else who comes to help out in the future! |
On my production server in one of our big courses I am seeing some of these truncated lines in both the Here's a snippet from an
It looks like questions 8, 11 and 12 for user1 were interrupted. |
I'm having second thoughts about what to do about the behaviour reported above. There seems to be general agreement that something here is not quite right. My hesitation comes from looking at the "Issues" tab and seeing that it shows 140 other topics. Some of the others seem like more effective ways to transform developer effort into improved student learning. And surely student learning is the key metric here. So perhaps it would be wise to do some kind of quick triage on this issue and then pivot to higher priorities. The minimal version of "quick triage" would be simply to close this issue and walk away. Another possibility might be to pre-populate the directory containing WW logs with a short permanent README file noting that the log files can be slightly unreliable in high-load situations, pointing to this discussion, and indicating which database file contains a rock-solid version of the information that is typically also present in the logs. I could invent other alternatives, but these two may be enough to inspire a pragmatic and effective intervention from someone who knows the system inside-out. Thanks again. |
More is probably needed with this as there are probably other files that multiple server processes could be attempting to write to at the same time. Finding these would take careful review of the code. This is to address issue openwebwork#2016.
More is probably needed with this as there are probably other files that multiple server processes could be attempting to write to at the same time. Finding these would take careful review of the code. This is to address issue openwebwork#2016.
More is probably needed with this as there are probably other files that multiple server processes could be attempting to write to at the same time. Finding these would take careful review of the code. This is to address issue openwebwork#2016.
Today I noticed that the timestamps in my file 'answer.log' do not form a nondecreasing sequence. That is, there are a few instances where the time shown in line N is later than the time shown in line N+1. (The error is on the order of just 1 second in the cases I have seen.) This corroborates the hypothesis that races are the problem and locking the log files might solve it. Thanks, @drgrice1. |
More is probably needed with this as there are probably other files that multiple server processes could be attempting to write to at the same time. Finding these would take careful review of the code. This is to address issue openwebwork#2016.
More is probably needed with this as there are probably other files that multiple server processes could be attempting to write to at the same time. Finding these would take careful review of the code. This is to address issue openwebwork#2016.
More is probably needed with this as there are probably other files that multiple server processes could be attempting to write to at the same time. Finding these would take careful review of the code. This is to address issue openwebwork#2016.
More is probably needed with this as there are probably other files that multiple server processes could be attempting to write to at the same time. Finding these would take careful review of the code. This is to address issue openwebwork#2016.
Hello, I couldn't seem to reproduce this with just multiple processes on a single computer. But I was able to reproduce this when it was two separate servers using shared storage. It seems to be fixed with @drgrice1's fix applied. But a more conclusive test/confirmation will be when students start hitting the servers in the upcoming term. |
Here are 3 consecutive lines from the file answers.log in my course:
Here are 3 others:
In both cases there is something wrong with the line in the middle. My answers.log file has 1326 lines that don't have the expected structure, out of 3052734 lines in total. For many of these, including the two examples shown above, the lines before and after show timestamps that are very close together (sometimes identical).
Unfortunately I do not have the knowledge or skill to help fix this. All I can do is report it. Thanks for reading.
I harvested the file named "answers.log" from the File Manager in the user-facing web interface. To make sure I got a valid copy I downloaded the file twice. Both copies (size 301MB) show the same md5 hash.
This is WeBWorK © 1996-2022 | theme: math4 | ww_version: 2.17 | pg_version 2.17
The text was updated successfully, but these errors were encountered: