Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

answers.log contains incomplete or malformed lines #2016

Open
PhilipLoewen opened this issue Jun 1, 2023 · 13 comments
Open

answers.log contains incomplete or malformed lines #2016

PhilipLoewen opened this issue Jun 1, 2023 · 13 comments

Comments

@PhilipLoewen
Copy link

Here are 3 consecutive lines from the file answers.log in my course:

[Thu Mar 09 21:48:44 2023] |ZI7UJNQDCR03|WW06|18|100    1678427324      -ky
x+32)/2
[Thu Mar 09 21:48:44 2023] |7XTJHYSEKZ08|WW06|13|100000 1678427324      tan(arctanx)    [0,pi]

Here are 3 others:

[Thu Mar 09 21:49:11 2023] |C93SRBEKNW04|WW06|19|1110   1678427351      7       1       3       3.67
8(T-22) 106.3
[Thu Mar 09 21:49:12 2023] |BA05O2XZTR07|WW06|13|111000 1678427352      x       (-inf,inf)      tan(arctanx+arctan(-5))

In both cases there is something wrong with the line in the middle. My answers.log file has 1326 lines that don't have the expected structure, out of 3052734 lines in total. For many of these, including the two examples shown above, the lines before and after show timestamps that are very close together (sometimes identical).

Unfortunately I do not have the knowledge or skill to help fix this. All I can do is report it. Thanks for reading.

I harvested the file named "answers.log" from the File Manager in the user-facing web interface. To make sure I got a valid copy I downloaded the file twice. Both copies (size 301MB) show the same md5 hash.

This is WeBWorK © 1996-2022 | theme: math4 | ww_version: 2.17 | pg_version 2.17

@Alex-Jordan
Copy link
Contributor

It may be useful to compare against this course's transaction log from the server. Ask your server admin for the file /opt/webwork/webwork2/logs/course_name_transaction.log where of course course_name is the name of the course. Just to see if there was anything strange about the student's input in these instances; or if the student inputs here have something in common; or if those lines in the transaction log are also corrupted.

@PhilipLoewen
Copy link
Author

Thanks for taking an interest. I will follow your suggestion immediately, and report back when I have the file you specify.

@PhilipLoewen
Copy link
Author

The IT crew have sent me *_transactions.log. I expect each line in this file to start with a timestamp in square brackets. This is true for most of the 1,579,982 lines, but there are around 700 exceptions. The middle line of the following 3 provides a sample; I provide discussion further below:

[Thu Jan 12 11:09:23 2023] 1	WW00	Xaaaaaaaaaa7	set101-WW0/WW0explanation.pg	1	-1	1873	1	1	["AnSwEr0001","I have read and understood this explanation."]	1	0
0004","E","AnSwEr0005","C","AnSwEr0006","A"]	1	0
[Thu Jan 12 11:09:24 2023] 10	WW00	Taaaaaaaaaa0	set101-WW0/via-local-Library-ASU-topics-setIntroduction_to_WeBWorK-prob4-prob4.pg	1	-1	4144	1	1	["AnSwEr0001","D","AnSwEr0002","A","AnSwEr0003","C","AnSwEr0004","B"]	1	0

(I used repeated a's to hide the students' internal alphanumeric identifiers, at the request of our IT team.)

The exceptions come in various forms, including essentially blank lines, but a substantial fraction of them look like the middle line in the 3-line snapshot above. That is, they look like the tail end of a line that could make a reasonable log entry.

The scale of my course is such that the WW server is pretty busy. The timestamps of good lines before and after the problematic ones are often quite close together ... though sometimes a gap of several seconds can be observed.

Finally (for now), the number of strange lines in *_transactions.log is smaller than the number of oddities in answer.log.

What else I can report on to help diagnose this behaviour? Thanks!

@Alex-Jordan
Copy link
Contributor

I'm just curious, but do you observe any relationship between the corresponding lines from the two files? For example in the recent post, at Thu Jan 12 11:09:23 2023 (or maybe a second later) something strange happens. Is there anything strange at about that time in the answer log?

In either file, do strange lines happen with about the same frequency among the most recent entries? Probably there is an issue with writing these lines in the first place, but it's possible they are altered later. If you don't see this among recent timestamps, that would be a clue.

If you find recent weird lines in either file, take note of the timestamp, and then check other logs on the server for anything curious at those times. The first to look at would be the apache error log. Assuming you have apache log file rotation going, if the most recent instance is a few days ago already you will have to uncompress that day's log file. And if it's too far in the past, that day's log file will be gone by now.

I searched all my 2.17 production server transaction logs for lines that don't start with [, and I found no instances of this.

@Alex-Jordan
Copy link
Contributor

Note that if we can't determine that this is some sort of WeBWorK bug or deficiency, then this shouldn't be an issue here in GitHub. Posting about it in thee forum may be more helpful, as people with Linux experience may have relevant wisdom to share.

@taniwallach
Copy link
Member

taniwallach commented Jun 2, 2023

I suspect that WeBWorK logging code was written assuming the file system race conditions would not cause problems, and under enough load some sort of conflict/race-condition is occurring on your system.

Looking at writeLog and writeCourseLog in lib/WeBWorK/Utils.pm https://github.com/openwebwork/webwork2/blob/65073f9d48c9661f1c19ee9bb95b7131843ec229/lib/WeBWorK/Utils.pm it is noticeable that there is no attempt to lock the log files before writing to them. I suspect that the problems are caused by multiple Apache workers attempting to append to the log file in parallel when the system is under stress.

https://metacpan.org/pod/Path::Tiny supports using flock to lock files and using append and/pr filehandle with locked => 1 may help prevent the problem.

@drgrice1 - What do you think?

There is a lower level discussion of locking in https://docstore.mik.ua/orelly/perl4/cook/ch07_19.htm

@drgrice1
Copy link
Sponsor Member

drgrice1 commented Jun 2, 2023

It is very possible that the lack of file locking could cause something like this. When multiple server processes could write to the same file, file locking should always be utilized to prevent this sort of thing. I have written quite a bit of code that uses flock. Another option would be to switch to using Mojo::Log for writing logs, which takes care of that under the hood for you.

drgrice1 added a commit to drgrice1/webwork2 that referenced this issue Jun 2, 2023
More is probably needed with this as there are probably other files that
multiple server processes could be attempting to write to at the same
time.  Finding these would take careful review of the code.

This is to address issue openwebwork#2016.
@PhilipLoewen
Copy link
Author

If (if only) we were elite athletes in a relay race, I would be gratefully passing the baton to faster runners at this point. I'll try to respond to some of the points above, then step back and watch with admiration as the experts move forward.

@Alex-Jordan asks for a comparison between answer.log and transaction.log, possibly backed with a review of apache server logs. I don't have easy access to the apache server logs, but I present a kind of ad-hoc comparison between the WW log files later in this response. After posting this reply, I will take Alex's advice to nip over to the forum and mention this in case someone there can help out here.

@taniwallach suggests that there may be a race condition to worry about. The examples below seem to support this hypothesis. I just picked 3 instances using no selection criteria at all, and in all cases there were multiple actions involving different student users in the same second. Races are notoriously hard to debug. Thanks to @drgrice1 for proposing some edits that might work to reduce them.

As noted above, I am well out of my depth at this point. I will continue to watch with interest and respond to direct questions, but that's about all I can offer.

Here are the 3 case studies I promised earlier.

Problematic 4-line excerpt from transaction.log:
[Thu Apr 06 23:11:46 2023] 15	WW10	N3OYTM1E2K06	set101-WW10/via_ur_sr_9_6a.pg	1	-1	4655	0.8	1	["AnSwEr0001","sqrt3/2","AnSwEr0002","1/2","AnSwEr0003","(-1/2sqrt3)/3","AnSwEr0004","(1/6(-1))/2","AnSwEr0005","1/24sqrt3/2"]	0	1
4!"]	14	1
[Thu Apr 06 23:11:46 2023] 16	WW10	5KXGZ6D4PC00	set101-WW10/limitbyseries4.pg	1	-1	4670	1	1	["AnSwEr0001","e^(14/6)"]	1	0
[Thu Apr 06 23:11:46 2023] 19	WW10	EVNM3WFAZH02	set101-WW10/via-ubcLibrary-setTaylor-templatemod.pg	1	-1	2446	0.25	1	["AnSwEr0001","3+1/3","AnSwEr0002","","AnSwEr0003","?","AnSwEr0004",""]	0	1

answer.log has 4 lines bracketed by that same timestamp, and here they are:
[Thu Apr 06 23:11:46 2023] |YV..........|WW10|15|11111  1680847906      sin(pi/6)       cos(pi/6)       -sin(pi/6)/2!   -cos(pi/6)/3!   sin(pi/6)/4!
[Thu Apr 06 23:11:46 2023] |N3..........|WW10|15|11011  1680847906      sqrt3/2 1/2     (-1/2sqrt3)/3   (1/6(-1))/2     1/24sqrt3/2
[Thu Apr 06 23:11:46 2023] |5K..........|WW10|16|1      1680847906      e^(14/6)
[Thu Apr 06 23:11:46 2023] |EV..........|WW10|19|1000   1680847906      3+1/3           ?

Commentary: The mangled line in transactions.log includes the string "4!" that appears also in the correct first line taken from answer.log. If this attribution is correct, the first two lines in answer.log and transaction.log describe the same event, but they appear in different orders in the two files. This supports the race-condition hypothesis.

Problematic 10-line excerpt from transaction.log:
[Fri Apr 07 17:57:08 2023] 1	Quiz01	R6..........	set101-WW1/via_Stewart5_5_1_19.pg	1	-1	4050	1	1	["Q0001_AnSwEr0001","B1"]	1	0
[Fri Apr 07 17:57:08 2023] 2	Quiz01	R6..........	set101-WW1/problemJ.pg	1	-1	3470	1	1	["Q0002_AnSwEr0001","E\u0000F"]	1	0
[Fri Apr 07 17:57:08 2023] 3	Quiz01	R6..........	set101-WW1/GeometricIntegral2.pg	1	-1	440	0	1	["Q0003_AnSwEr0001","0+(8/7)-((4-(8/7) ) * 5 /2) - (5 * ( (10/9) - 11) /2 ) + ( (2 - (10/9) ) * 2) "]	0	1
[Fri Apr 07 17:57:08 2023] 1	Quiz09	EW..........	set101-WW9/via_s_11_6_10.pg	1	-1	2270	1	1	["Q0001_AnSwEr0001","inf","Q0001_AnSwEr0002","Divergent","Q0001_AnSwEr0003","Divergent"]	1	0
r0004","6 - 6/6 * ( 1 + 1/n ) * (2+1/n)","Q0004_AnSwEr0005","4"]	1	0
[Fri Apr 07 17:57:08 2023] 5	Quiz01	R6..........	set101-WW1/problemA.pg	1	-1	4592	1	1	["Q0005_AnSwEr0001","B\u0000E"]	1	0
[Fri Apr 07 17:57:08 2023] 2	Quiz09	EW..........	set101-WW9/via-Library-ma123DB-set12-s11_8_5.pg	1	-1	4621	1	1	["Q0002_AnSwEr0001","(-1,1)"]	1	0
[Fri Apr 07 17:57:08 2023] 3	Quiz09	EW..........	set101-WW9/via_s_11_6_6.pg	1	-1	3598	1	1	["Q0003_AnSwEr0001","1","Q0003_AnSwEr0002","Inconclusive","Q0003_AnSwEr0003","Absolutely Convergent"]	1	0
[Fri Apr 07 17:57:08 2023] 4	Quiz09	EW..........	set101-WW9/PDL_altseries.pg	1	-1	1065	1	1	["Q0004_AnSwEr0001","Converges","Q0004_AnSwEr0002","Converges","Q0004_AnSwEr0003","Diverges","Q0004_AnSwEr0004","Converges Conditionally","Q0004_AnSwEr0005","Diverges"]	1	0
[Fri Apr 07 17:57:08 2023] 5	Quiz09	EW..........	set101-WW9/PDL_seriestests.pg	1	-1	3482	1	1	["Q0005_AnSwEr0001","Converges","Q0005_AnSwEr0002","Converges","Q0005_AnSwEr0003","Converges","Q0005_AnSwEr0004","Diverges","Q0005_AnSwEr0005","Diverges"]	1	0


answer.log has 19 lines bracketed by that same timestamp. Some include null bytes (^@). Here they are:
[Fri Apr 07 17:57:08 2023] |LK..........|Quiz09,v1|1|preview |111       1680915427      (12/10,14/10)   (-(23tan(1))/4, -(23tan(-1))/4) ((88/25arcsin((-sqrt(1/2)))),(88/25arcsin((sqrt(1/2)))))
[Fri Apr 07 17:57:08 2023] |LK..........|Quiz09,v1|2|preview |1 1680915427      (-1,1)
[Fri Apr 07 17:57:08 2023] |LK..........|Quiz09,v1|3|preview |111       1680915427      0       Convergent      Absolutely Convergent
[Fri Apr 07 17:57:08 2023] |LK..........|Quiz09,v1|4|preview |111111    1680915427      True    False   True    False   True    True
[Fri Apr 07 17:57:08 2023] |LK..........|Quiz09,v1|5|preview |11111     1680915427      Converges       Converges       Converges       Diverges        Diverges
[Fri Apr 07 17:57:08 2023] |7W..........|Quiz01,v2|1|autosave|1 1680915428      C^@D
[Fri Apr 07 17:57:08 2023] |7W..........|Quiz01,v2|2|autosave|1111      1680915428      6       15      -1      11
[Fri Apr 07 17:57:08 2023] |7W..........|Quiz01,v2|3|autosave|1 1680915428      70(5-0) - 5^3/3
[Fri Apr 07 17:57:08 2023] |7W..........|Quiz01,v2|4|autosave|1111100   1680915428      2/n     2/n,4/n,6//n    2k/n    36(k/n)^2+8     2/n(36(k/n)^2+8)        2/n(361/6 n (n+1) (2n+1) (1/n)^2+8)
[Fri Apr 07 17:57:08 2023] |7W..........|Quiz01,v2|5|autosave|000       1680915428      No answer entered
[Fri Apr 07 17:57:08 2023] |R6..........|Quiz01,v4|1|1  1680915428      B1
[Fri Apr 07 17:57:08 2023] |R6..........|Quiz01,v4|2|1  1680915428      E^@F
[Fri Apr 07 17:57:08 2023] |R6..........|Quiz01,v4|3|0  1680915428      0+(8/7)-((4-(8/7) ) * 5 /2) - (5 * ( (10/9) - 11) /2 ) + ( (2 - (10/9) ) * 2)
[Fri Apr 07 17:57:08 2023] |R6..........|Quiz01,v4|4|11111      1680915428      1/n     i/n     6 -6 * (i/n)^2  6 - 6/6 * ( 1 + 1/n ) * (2+1/n) 4
[Fri Apr 07 17:57:08 2023] |R6..........|Quiz01,v4|5|1  1680915428      B^@E
1)
[Fri Apr 07 17:57:08 2023] |EW..........|Quiz09,v1|3|111        1680915428      1       Inconclusive    Absolutely Convergent
[Fri Apr 07 17:57:08 2023] |EW..........|Quiz09,v1|4|11111      1680915428      Converges       Converges       Diverges        Converges Conditionally Diverges
[Fri Apr 07 17:57:08 2023] |EW..........|Quiz09,v1|5|11111      1680915428      Converges       Converges       Converges       Diverges        Diverges

Commentary: 1. The problem line in transaction.log seems to match a valid line from student "R6..." in answer.log.
2. There is a problem line in answer.log, too.

Problematic 10-line excerpt from transaction.log:
[Fri Apr 14 18:56:46 2023] 1	Quiz08	R1..........	set101-WW8/telescope2terms.pg	1	-1	2566	1	1	["Q0001_AnSwEr0001","13/4"]	1	0
[Fri Apr 14 18:56:46 2023] 2	Quiz08	R1..........	set101-WW8/via_ns8_2_9.pg	1	-1	2473	1	1	["Q0002_AnSwEr0001","4","Q0002_AnSwEr0002","inf"]	1	0
[Fri Apr 14 18:56:46 2023] 3	Quiz08	R1..........	set101-WW8/via_jj14.pg	1	-1	4610	0	1	["Q0003_AnSwEr0001","531444/708588"]	0	1
[Fri Apr 14 18:56:46 2023] 4	Quiz08	R1..........	set101-WW8/Cristoph-sierpinski.pg	1	-1	2725	1	1	["Q0004_AnSwEr0001","3*0.75^k","Q0004_AnSwEr0002","0","Q0004_AnSwEr0003","12","Q0004_AnSwEr0004","6","Q0004_AnSwEr0005","9","Q0004_AnSwEr0006","13.5","Q0004_AnSwEr0007","20.25","Q0004_AnSwEr0008","6*1.5^k","Q0004_AnSwEr0009","inf"]	1	0
[Fri Apr 14 18:56:46 2023] 1	Quiz05	81..........	set101-WW5/viaStewart5_7_5_81.pg	1	-1	2924	1	1	["Q0001_AnSwEr0001","5xe^(5x^2)+C"]	1	0
,"inf","Q0005_AnSwEr0003","38/5","Q0005_AnSwEr0004","inf","Q0005_AnSwEr0005","-16807/61440","Q0005_AnSwEr0006","1/6"]	1	0
[Fri Apr 14 18:56:46 2023] 2	Quiz05	81..........	set101-WW5/Q23-JacobUpdates.pg	1	-1	399	1	1	["Q0002_AnSwEr0001","16448/75"]	1	0
[Fri Apr 14 18:56:46 2023] 3	Quiz05	81..........	set101-WW5/via-Library-UCSB-Stewart5_7_1-Stewart5_7_1_34.pg	1	-1	4663	1	1	["Q0003_AnSwEr0001","8e^2"]	1	0
[Fri Apr 14 18:56:46 2023] 4	Quiz05	81..........	set101-WW5/Library-UCSB-Stewart5_7_7-Stewart5_7_7_20_kmpEdit.pg	1	-1	696	1	1	["Q0004_AnSwEr0001","0.838296684377488","Q0004_AnSwEr0002","0.895758896143971","Q0004_AnSwEr0003","0.904501265751175","Q0004_AnSwEr0004","0.020833","Q0004_AnSwEr0005","0.000933","Q0004_AnSwEr0006","183","Q0004_AnSwEr0007","14"]	1	0
[Fri Apr 14 18:56:46 2023] 5	Quiz05	81..........	set101-WW5/FindingPi-LeftRiemannsumremoved-JC.pg	1	-1	2012	1	1	["Q0005_AnSwEr0001","pi","Q0005_AnSwEr0002","0.05464","Q0005_AnSwEr0003","0.004645","Q0005_AnSwEr0004","0.001817"]	1	0

answer.log has 10 lines bracketed by that same timestamp, and here they are:
[Fri Apr 14 18:56:46 2023] |R1..........|Quiz08,v1|1|1	1681523804	13/4
[Fri Apr 14 18:56:46 2023] |R1..........|Quiz08,v1|2|11	1681523804	4	inf
[Fri Apr 14 18:56:46 2023] |R1..........|Quiz08,v1|3|0	1681523804	531444/708588
[Fri Apr 14 18:56:46 2023] |R1..........|Quiz08,v1|4|111111111	1681523804	3*0.75^k	0	12	6	9	13.5	20.25	6*1.5^k	inf
[Fri Apr 14 18:56:46 2023] |81..........|Quiz05,v3|1|1	1681523805	5xe^(5x^2)+C
[Fri Apr 14 18:56:46 2023] |R1..........|Quiz08,v1|5|111111	1681523804	10/97	inf	38/5	inf	-16807/61440	1/6
[Fri Apr 14 18:56:46 2023] |81..........|Quiz05,v3|2|1	1681523805	16448/75
[Fri Apr 14 18:56:46 2023] |81..........|Quiz05,v3|3|1	1681523805	8e^2
[Fri Apr 14 18:56:46 2023] |81..........|Quiz05,v3|4|1111111	1681523805	0.838296684377488	0.895758896143971	0.904501265751175	0.020833	0.000933	183	14
[Fri Apr 14 18:56:46 2023] |81..........|Quiz05,v3|5|1111	1681523805	pi	0.05464	0.004645	0.001817

Commentary: It looks like the bad line in transaction.log corresponds to a good line in answer.log.

Hearty thanks to the experts named above, and to anyone else who comes to help out in the future!

@dlglin
Copy link
Member

dlglin commented Jun 2, 2023

On my production server in one of our big courses I am seeing some of these truncated lines in both the answer.log and transaction.log, though not the same entries.

Here's a snippet from an answer.log file that supports the theory that different apache processes are interrupting each other when writing to the log files. It corresponds to two students submitting a gateway at the same time.

[Tue Feb 07 23:01:38 2023] |user1|Term_Exam,v1|2|1	1675836097	-15
[Tue Feb 07 23:01:38 2023] |user1|Term_Exam,v1|3|0	1675836097	B
[Tue Feb 07 23:01:38 2023] |user1|Term_Exam,v1|4|0	1675836097	E
[Tue Feb 07 23:01:38 2023] |user1|Term_Exam,v1|5|1	1675836097	E
[Tue Feb 07 23:01:38 2023] |user1|Term_Exam,v1|6|0	1675836097	No answer entered	
[Tue Feb 07 23:01:38 2023] |user1|Term_Exam,v1|7|0	1675836097	No answer entered	
[Tue Feb 07 23:01:38 2023] |user2|Term_Exam,v1|1|1	1675836097	E
No answer entered	
[Tue Feb 07 23:01:38 2023] |user1|Term_Exam,v1|9|0	1675836097	No answer entered	
[Tue Feb 07 23:01:38 2023] |user1|Term_Exam,v1|10|0	1675836097	No answer entered	
[Tue Feb 07 23:01:38 2023] |user2|Term_Exam,v1|4|1	1675836097	E
o answer entered	
[Tue Feb 07 23:01:38 2023] |user2|Term_Exam,v1|5|1	1675836097	B
o answer entered	
[Tue Feb 07 23:01:38 2023] |user1|Term_Exam,v1|13|0	1675836097	No answer entered	
[Tue Feb 07 23:01:38 2023] |user1|Term_Exam,v1|14|0	1675836097	No answer entered	
[Tue Feb 07 23:01:38 2023] |user2|Term_Exam,v1|7|1	1675836097	A
[Tue Feb 07 23:01:38 2023] |user2|Term_Exam,v1|8|111	1675836097	4*( 4/5 )^10	10	2
[Tue Feb 07 23:01:38 2023] |user1|Term_Exam,v1|16|0	1675836097	E
[Tue Feb 07 23:01:38 2023] |user2|Term_Exam,v1|9|1	1675836097	D
[Tue Feb 07 23:01:38 2023] |user2|Term_Exam,v1|10|1	1675836097	D
[Tue Feb 07 23:01:38 2023] |user2|Term_Exam,v1|11|1	1675836097	C
[Tue Feb 07 23:01:38 2023] |user2|Term_Exam,v1|12|1	1675836097	B
[Tue Feb 07 23:01:38 2023] |user2|Term_Exam,v1|13|1	1675836097	B
[Tue Feb 07 23:01:38 2023] |user2|Term_Exam,v1|14|1	1675836097	E
[Tue Feb 07 23:01:38 2023] |user2|Term_Exam,v1|15|1	1675836097	C
[Tue Feb 07 23:01:38 2023] |user2|Term_Exam,v1|16|0	1675836097	D

It looks like questions 8, 11 and 12 for user1 were interrupted.

@PhilipLoewen
Copy link
Author

I'm having second thoughts about what to do about the behaviour reported above.

There seems to be general agreement that something here is not quite right. My hesitation comes from looking at the "Issues" tab and seeing that it shows 140 other topics. Some of the others seem like more effective ways to transform developer effort into improved student learning. And surely student learning is the key metric here. So perhaps it would be wise to do some kind of quick triage on this issue and then pivot to higher priorities.

The minimal version of "quick triage" would be simply to close this issue and walk away. Another possibility might be to pre-populate the directory containing WW logs with a short permanent README file noting that the log files can be slightly unreliable in high-load situations, pointing to this discussion, and indicating which database file contains a rock-solid version of the information that is typically also present in the logs. I could invent other alternatives, but these two may be enough to inspire a pragmatic and effective intervention from someone who knows the system inside-out.

Thanks again.

@Alex-Jordan
Copy link
Contributor

@drgrice1 has a pull request in queue (#2025) that should, in theory, fix this. So don't close the issue, and hopefully quick triage is not necessary

drgrice1 added a commit to drgrice1/webwork2 that referenced this issue Jun 6, 2023
More is probably needed with this as there are probably other files that
multiple server processes could be attempting to write to at the same
time.  Finding these would take careful review of the code.

This is to address issue openwebwork#2016.
drgrice1 added a commit to drgrice1/webwork2 that referenced this issue Jun 6, 2023
More is probably needed with this as there are probably other files that
multiple server processes could be attempting to write to at the same
time.  Finding these would take careful review of the code.

This is to address issue openwebwork#2016.
drgrice1 added a commit to drgrice1/webwork2 that referenced this issue Jun 6, 2023
More is probably needed with this as there are probably other files that
multiple server processes could be attempting to write to at the same
time.  Finding these would take careful review of the code.

This is to address issue openwebwork#2016.
@PhilipLoewen
Copy link
Author

PhilipLoewen commented Jun 7, 2023

Today I noticed that the timestamps in my file 'answer.log' do not form a nondecreasing sequence. That is, there are a few instances where the time shown in line N is later than the time shown in line N+1. (The error is on the order of just 1 second in the cases I have seen.) This corroborates the hypothesis that races are the problem and locking the log files might solve it. Thanks, @drgrice1.

drgrice1 added a commit to drgrice1/webwork2 that referenced this issue Jun 8, 2023
More is probably needed with this as there are probably other files that
multiple server processes could be attempting to write to at the same
time.  Finding these would take careful review of the code.

This is to address issue openwebwork#2016.
drgrice1 added a commit to drgrice1/webwork2 that referenced this issue Jun 9, 2023
More is probably needed with this as there are probably other files that
multiple server processes could be attempting to write to at the same
time.  Finding these would take careful review of the code.

This is to address issue openwebwork#2016.
drgrice1 added a commit to drgrice1/webwork2 that referenced this issue Jun 9, 2023
More is probably needed with this as there are probably other files that
multiple server processes could be attempting to write to at the same
time.  Finding these would take careful review of the code.

This is to address issue openwebwork#2016.
ionparticle pushed a commit to ubc/webwork2 that referenced this issue Aug 31, 2023
More is probably needed with this as there are probably other files that
multiple server processes could be attempting to write to at the same
time.  Finding these would take careful review of the code.

This is to address issue openwebwork#2016.
@ionparticle
Copy link
Contributor

Hello, I couldn't seem to reproduce this with just multiple processes on a single computer. But I was able to reproduce this when it was two separate servers using shared storage. It seems to be fixed with @drgrice1's fix applied. But a more conclusive test/confirmation will be when students start hitting the servers in the upcoming term.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants