Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hang sometimes immediately after connecting #75

Closed
iceiix opened this issue Jan 5, 2019 · 7 comments
Closed

Hang sometimes immediately after connecting #75

iceiix opened this issue Jan 5, 2019 · 7 comments

Comments

@iceiix
Copy link
Owner

iceiix commented Jan 5, 2019

Occasionally, soon after connecting to a server, the game hangs. Sometimes the world is partially rendered:

screen shot 2019-01-05 at 1 22 13 pm

other times, it is not. If this problem does not occur, the world loads quickly, and reliably and repeatedly when disconnecting and reconnecting.

This may have occurred since the first version of Steven I've tried (specifically, the first version I could run from the updates1 branch Thinkofname/steven#75), but I've been working around it by relaunching until it doesn't hang. Sometimes this takes several attempts to get a non-hanging launch, very cumbersome. Other times it doesn't hang at all. Race condition?

@iceiix
Copy link
Owner Author

iceiix commented Jan 5, 2019

Last message before the hang is [server/mod.rs:192][DEBUG] Login, then missing state/models if applicable, likely unrelated. The server eventually times out the connection. Occurs with both release and debug builds. Attaching in lldb, here is a backtrace in release mode - deadlock?:

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x00007fff795fca86 libsystem_kernel.dylib`__psynch_rw_wrlock + 10
    frame #1: 0x00007fff796b3de8 libsystem_pthread.dylib`_pthread_rwlock_lock_wait + 63
    frame #2: 0x00007fff796b1bdb libsystem_pthread.dylib`_pthread_rwlock_lock_slow + 529
    frame #3: 0x000000010721a8cd steven`steven::main::h0fc486bea166835e + 8861
    frame #4: 0x00000001071a0c96 steven`std::rt::lang_start::_$u7b$$u7b$closure$u7d$$u7d$::h1203d5565d0cacb8 + 6
    frame #5: 0x00000001076658d8 steven`std::panicking::try::do_call::h8a1df18972bca627 [inlined] std::rt::lang_start_internal::_$u7b$$u7b$closure$u7d$$u7d$::h38c6d35a8a74d01a at rt.rs:59 [opt]
    frame #6: 0x00000001076658cc steven`std::panicking::try::do_call::h8a1df18972bca627 at panicking.rs:310 [opt]
    frame #7: 0x000000010767169f steven`__rust_maybe_catch_panic at lib.rs:102 [opt]
    frame #8: 0x000000010765862d steven`std::rt::lang_start_internal::h7d0f1ce4d4ec6d26 [inlined] std::panicking::try::he784830af71690b3 at panicking.rs:289 [opt]
    frame #9: 0x00000001076585fa steven`std::rt::lang_start_internal::h7d0f1ce4d4ec6d26 [inlined] std::panic::catch_unwind::h17cb4012512d83ac at panic.rs:392 [opt]
    frame #10: 0x00000001076585fa steven`std::rt::lang_start_internal::h7d0f1ce4d4ec6d26 at rt.rs:58 [opt]
    frame #11: 0x000000010721f38c steven`main + 44
    frame #12: 0x00007fff794beed9 libdyld.dylib`start + 1

@iceiix
Copy link
Owner Author

iceiix commented Jan 7, 2019

Another observation: when Steven is launched, if it connects to a server and hangs, it takes about 15 seconds before the client is kicked with disconnect.timeout, and the problem persists when reconnecting. What state is generated at startup that could cause this kind of issue? Taking a backtrace in debug mode shows it is in the hasher:

(lldb) c
Process 15996 resuming
Process 15996 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
    frame #0: 0x0000000107fe5341 steven`_$LT$std..collections..hash..table..RawTable$LT$K$C$$u20$V$GT$$GT$::raw_bucket_at::he7ffeee16497e692(self=0x00007ffee7e4a950, index=108) at table.rs:728 [opt]
Target 0: (steven) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x0000000107fe5341 steven`_$LT$std..collections..hash..table..RawTable$LT$K$C$$u20$V$GT$$GT$::raw_bucket_at::he7ffeee16497e692(self=0x00007ffee7e4a950, index=108) at table.rs:728 [opt]
    frame #1: 0x0000000107ff17e0 steven`_$LT$std..collections..hash..table..Bucket$LT$K$C$$u20$V$C$$u20$M$GT$$GT$::at_index::h724c172917eb019a(table=0x00007ffee7e4a950, ib_index=<unavailable>) at table.rs:352 [opt]
    frame #2: 0x0000000107fecf3e steven`_$LT$std..collections..hash..table..Bucket$LT$K$C$$u20$V$C$$u20$M$GT$$GT$::new::h21ad9cd40e5828f7(table=<unavailable>, hash=<unavailable>) at table.rs:333 [opt]
    frame #3: 0x0000000107f4fda2 steven`std::collections::hash::map::search_hashed_nonempty::hc829934463d43c18(table=<unavailable>, hash=(hash = 14818755690024606316), is_match=<unavailable>) at map.rs:451 [opt]
    frame #4: 0x0000000107f546b1 steven`_$LT$std..collections..hash..map..HashMap$LT$K$C$$u20$V$C$$u20$S$GT$$GT$::search::h5d1c1a2aa0efd79c(self=0x00007ffee7e4a950, q=<unavailable>) at map.rs:596 [opt]
    frame #5: 0x0000000107f54481 steven`_$LT$std..collections..hash..map..HashMap$LT$K$C$$u20$V$C$$u20$S$GT$$GT$::get::h2c7ec1298143c913(self=<unavailable>, k=<unavailable>) at map.rs:1256 [opt]
    frame #6: 0x0000000107f5660d steven`steven::world::World::get_block_light::hc6459c6b5551190d(self=<unavailable>, pos=(x = 161, y = 16, z = -212)) at mod.rs:169 [opt]
    frame #7: 0x0000000107f564b4 steven`steven::world::World::update_range::h39ddc78c5f8e6bc3(self=0x00007ffee7e4a950, x1=159, y1=<unavailable>, z1=-225, x2=177, y2=<unavailable>, z2=-207) at mod.rs:143 [opt]
    frame #8: 0x0000000107f58d0c steven`steven::world::World::dirty_chunks_by_bitmask::h32ee1ee12b33af45(self=0x00007ffee7e4a950, x=10, z=-14, mask=31) at mod.rs:581 [opt]
    frame #9: 0x0000000107f5b575 steven`steven::world::World::load_chunk19::h3c4be158aa79a261(self=0x00007ffee7e4a950, x=10, z=-14, new=<unavailable>, mask=31, data=Vec<u8> @ 0x00007ffee7e47910) at mod.rs:917 [opt]
    frame #10: 0x0000000107def9b8 steven`steven::server::Server::on_chunk_data::hfe94246c09175282(self=0x00007ffee7e49ad0, chunk_data=ChunkData @ 0x00007ffee7e47fa0) at mod.rs:1091 [opt]
    frame #11: 0x0000000107de9da4 steven`steven::server::Server::entity_tick::hdb299baf636c1dcd(self=<unavailable>, renderer=<unavailable>, delta=<unavailable>) at mod.rs:386 [opt]
    frame #12: 0x0000000107de7c3f steven`steven::server::Server::tick::h98d8ce432872e7b6(self=0x00007ffee7e49ad0, renderer=0x00007ffee7e496f0, delta=0.093505619999999998) at mod.rs:349 [opt]
    frame #13: 0x0000000107e03450 steven`steven::main::h4c2580aa73bcf23a at main.rs:268 [opt]
    frame #14: 0x0000000107db5036 steven`std::rt::lang_start::_$u7b$$u7b$closure$u7d$$u7d$::ha4acbf996f218010 at rt.rs:74 [opt]
    frame #15: 0x00000001084ff298 steven`std::panicking::try::do_call::h8a1df18972bca627 [inlined] std::rt::lang_start_internal::_$u7b$$u7b$closure$u7d$$u7d$::h38c6d35a8a74d01a at rt.rs:59 [opt]
    frame #16: 0x00000001084ff28c steven`std::panicking::try::do_call::h8a1df18972bca627 at panicking.rs:310 [opt]
    frame #17: 0x000000010850b05f steven`__rust_maybe_catch_panic at lib.rs:102 [opt]
    frame #18: 0x00000001084f1fed steven`std::rt::lang_start_internal::h7d0f1ce4d4ec6d26 [inlined] std::panicking::try::he784830af71690b3 at panicking.rs:289 [opt]
    frame #19: 0x00000001084f1fba steven`std::rt::lang_start_internal::h7d0f1ce4d4ec6d26 [inlined] std::panic::catch_unwind::h17cb4012512d83ac at panic.rs:392 [opt]
    frame #20: 0x00000001084f1fba steven`std::rt::lang_start_internal::h7d0f1ce4d4ec6d26 at rt.rs:58 [opt]
    frame #21: 0x0000000107db5028 steven`std::rt::lang_start::hbf60d5a13fa3283e(main=<unavailable>, argc=<unavailable>, argv=<unavailable>) at rt.rs:74 [opt]
    frame #22: 0x00007fff794beed9 libdyld.dylib`start + 1
    frame #23: 0x00007fff794beed9 libdyld.dylib`start + 1
(lldb) 

@iceiix
Copy link
Owner Author

iceiix commented Jan 7, 2019

The timeout occurs because, as shown with added logging in src/server/mod.rs on_chunk_data(), loading chunks is slower in debug mode. The chunks stop loading as soon as the client is kicked. However, in release mode, the chunks load quickly enough, and the hang occurs later, but since the app is compiled in release mode, debugging symbols are limited (same pthread rwlock wait from steven`steven::main::hac396abb5a81b8e3 + 8880).

@iceiix
Copy link
Owner Author

iceiix commented Jan 7, 2019

diff --git a/src/server/mod.rs b/src/server/mod.rs
index 1a7cb20..d8764f3 100644
--- a/src/server/mod.rs
+++ b/src/server/mod.rs
@@ -77,6 +77,7 @@ pub struct Server {
 
     sun_model: Option<sun::SunModel>,
     target_info: target::Info,
+    loaded_chunks: usize,
 }
 
 pub struct PlayerInfo {
@@ -311,6 +312,7 @@ impl Server {
             sun_model: None,
 
             target_info: target::Info::new(),
+            loaded_chunks: 0,
         }
     }
 
@@ -1088,6 +1090,9 @@ impl Server {
     }
 
     fn on_chunk_data(&mut self, chunk_data: packet::play::clientbound::ChunkData) {
+        println!("on_chunk_data about to call self.world.load_chunk19 {}", self.loaded_chunks);
+        self.loaded_chunks += 1;
+        if self.loaded_chunks > 100 && self.loaded_chunks < 300 { return }
         self.world.load_chunk19(
             chunk_data.chunk_x,
             chunk_data.chunk_z,
@@ -1095,7 +1100,9 @@ impl Server {
             chunk_data.bitmask.0 as u16,
             chunk_data.data.data
         ).unwrap();
+        println!("on_chunk_data about to update block entities"); 
         for optional_block_entity in chunk_data.block_entities.data {
+            println!("optional_block_entity = {:?}", optional_block_entity);
             if let Some(block_entity) = optional_block_entity {
                 let x = block_entity.1.get("x").unwrap().as_int().unwrap();
                 let y = block_entity.1.get("y").unwrap().as_int().unwrap();
@@ -1115,6 +1122,7 @@ impl Server {
                 });
             }
         }
+        println!("on_chunk_data returning");
     }
 
     fn on_chunk_data_no_entities(&mut self, chunk_data: packet::play::clientbound::ChunkData_NoEntities) {
diff --git a/src/world/mod.rs b/src/world/mod.rs
index 9900846..1e14dad 100644
--- a/src/world/mod.rs
+++ b/src/world/mod.rs
@@ -914,7 +914,9 @@ impl World {
             chunk.calculate_heightmap();
         }
 
+        println!("about to self.dirty_chunks_by_bitmask");
         self.dirty_chunks_by_bitmask(x, z, mask);
+        println!("returned from self.dirty_chunks_by_bitmask");
         Ok(())
     }

Skipping loading some chunks to speed up game loading, attempting to reproduce in debug mode, but doesn't seem to stick. If this is a timing bug, may have to debug in release mode.

https://stackoverflow.com/questions/38803760/how-to-get-a-release-build-with-debugging-information-when-using-cargo

@iceiix
Copy link
Owner Author

iceiix commented Jan 7, 2019

With debug symbols in release mode, added in Cargo.toml:

[profile.release]
debug = true

backtrace shows it is in ticking resources. Is this because resources are ticked in the main loop, or an underlying concurrency issue or other bug in resource ticking?

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x00007fff795fca86 libsystem_kernel.dylib`__psynch_rw_wrlock + 10
    frame #1: 0x00007fff796b3de8 libsystem_pthread.dylib`_pthread_rwlock_lock_wait + 63
    frame #2: 0x00007fff796b1bdb libsystem_pthread.dylib`_pthread_rwlock_lock_slow + 529
    frame #3: 0x0000000105d5683d steven`steven::main::h46d12621e7a50021 [inlined] std::sys::unix::rwlock::RWLock::write::h436ef511507b3b85(self=0x0000000106fbf000) at rwlock.rs:82 [opt]
    frame #4: 0x0000000105d56835 steven`steven::main::h46d12621e7a50021 [inlined] std::sys_common::rwlock::RWLock::write::h0a439576880db706(self=0x0000000106fbf000) at rwlock.rs:51 [opt]
    frame #5: 0x0000000105d56835 steven`steven::main::h46d12621e7a50021 [inlined] _$LT$std..sync..rwlock..RwLock$LT$T$GT$$GT$::write::h10ab77ee7cbfad69(self=<unavailable>) at rwlock.rs:277 [opt]
    frame #6: 0x0000000105d56831 steven`steven::main::h46d12621e7a50021 at main.rs:253 [opt]
    frame #7: 0x0000000105cdcb56 steven`std::rt::lang_start::_$u7b$$u7b$closure$u7d$$u7d$::h64e736b1aa35af7b at rt.rs:74 [opt]
    frame #8: 0x00000001061a9678 steven`std::panicking::try::do_call::h8a1df18972bca627 [inlined] std::rt::lang_start_internal::_$u7b$$u7b$closure$u7d$$u7d$::h38c6d35a8a74d01a at rt.rs:59 [opt]
    frame #9: 0x00000001061a966c steven`std::panicking::try::do_call::h8a1df18972bca627 at panicking.rs:310 [opt]
    frame #10: 0x00000001061b543f steven`__rust_maybe_catch_panic at lib.rs:102 [opt]
    frame #11: 0x000000010619c3cd steven`std::rt::lang_start_internal::h7d0f1ce4d4ec6d26 [inlined] std::panicking::try::he784830af71690b3 at panicking.rs:289 [opt]
    frame #12: 0x000000010619c39a steven`std::rt::lang_start_internal::h7d0f1ce4d4ec6d26 [inlined] std::panic::catch_unwind::h17cb4012512d83ac at panic.rs:392 [opt]
    frame #13: 0x000000010619c39a steven`std::rt::lang_start_internal::h7d0f1ce4d4ec6d26 at rt.rs:58 [opt]
    frame #14: 0x0000000105d5b2dc steven`main + 44
    frame #15: 0x00007fff794beed9 libdyld.dylib`start + 1
    frame #16: 0x00007fff794beed9 libdyld.dylib`start + 1
(lldb) f 6
steven was compiled with optimization - stepping may behave oddly; variables may not be available.
frame #6: 0x0000000105d56831 steven`steven::main::h46d12621e7a50021 at main.rs:253 [opt]
   250 	        let (physical_width, physical_height) = window.get_inner_size().unwrap().to_physical(game.dpi_factor).into();
   251 	
   252 	        let version = {
-> 253 	            let mut res = game.resource_manager.write().unwrap();
   254 	            res.tick(&mut resui, &mut ui_container, delta);
   255 	            res.version()
   256 	        };
(lldb) 

@iceiix
Copy link
Owner Author

iceiix commented Jan 7, 2019

        let version = {
            let mut res = game.resource_manager.write().unwrap();
            res.tick(&mut resui, &mut ui_container, delta);
            res.version()
        };

resource_manager is an Arc<RwLock<resources::Manager>>. Obtains a writable lock, ticks, version. If I step through, the thread ends here with "invalid thread", even though it is the main thread:

(lldb)  
Process 18305 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = instruction step over
    frame #0: 0x00007fff796b3de3 libsystem_pthread.dylib`_pthread_rwlock_lock_wait + 58
libsystem_pthread.dylib`_pthread_rwlock_lock_wait:
->  0x7fff796b3de3 <+58>: callq  0x7fff796b7b82            ; symbol stub for: __psynch_rw_wrlock
    0x7fff796b3de8 <+63>: movl   %eax, %ecx
    0x7fff796b3dea <+65>: cmpl   $-0x1, %ecx
    0x7fff796b3ded <+68>: jne    0x7fff796b3e0b            ; <+98>
Target 0: (steven) stopped.
(lldb)  
(lldb) 
error: invalid thread
(lldb) 
error: invalid thread
(lldb) 

It is indeed hanging on obtaining the mutable reference to the resource lock:

--- a/Cargo.toml
+++ b/Cargo.toml
@@ -4,6 +4,9 @@ version = "0.0.1"
 authors = [ "Thinkofdeath <thinkofdeath@spigotmc.org>" ]
 edition = "2018"
 
+[profile.release]
+debug = true
+
 [profile.dev]
 # Steven runs horrendously slow with no optimizations, and often freezes.
 # However, building with full -O3 optimizations takes too long for a debug build.
diff --git a/src/main.rs b/src/main.rs
index 22ba351..819c9ab 100644
--- a/src/main.rs
+++ b/src/main.rs
@@ -250,10 +250,14 @@ fn main() {
         let (physical_width, physical_height) = window.get_inner_size().unwrap().to_physical(game.dpi_factor).into();
 
         let version = {
+            println!("obtaining mutable reference to resource manager");
             let mut res = game.resource_manager.write().unwrap();
+            println!("about to tick resource");
             res.tick(&mut resui, &mut ui_container, delta);
+            println!("ticked resources, returning version");
             res.version()
         };
+        println!("version = {}", version);
 
         let vsync_changed = *game.vars.get(settings::R_VSYNC);
         if vsync != vsync_changed {

output:

ticked resources, returning version
version = 3
obtaining mutable reference to resource manager

hangs there

https://doc.rust-lang.org/std/sync/struct.RwLock.html#method.write

Locks this rwlock with exclusive write access, blocking the current thread until it can be acquired.

This function will not return while other writers or other readers currently have access to the lock.

Who else already has the resource manager lock? game.resource_manager is cloned in connect_to(), cloned in main passed to render::Renderer::new, server::Server::dummy_server, Game initialization, and chunk_builder::ChunkBuilder::new.

@iceiix
Copy link
Owner Author

iceiix commented Jan 7, 2019

Where is resources::Manager used for write() besides in the main loop? I can only find for read():

src/render/ui.rs
139:            let version = self.resources.read().unwrap().version();
250:        let res = self.resources.read().unwrap();

src/render/mod.rs
157:            res.read().unwrap().version()
244:            let rm = self.resources.read().unwrap();
802:                let ver = res.read().unwrap().version();
997:        let img = if let Some(mut val) = res.read().unwrap().open("minecraft", "textures/entity/steve.png") {
1040:        if let Some(mut val) = res.read().unwrap().open(plugin, &path) {
1070:        if let Some(val) = res.read().unwrap().open(plugin, &path) {

src/ui/logo.rs
27:            let res = resources.read().unwrap();
95:            let res = resources.read().unwrap();

src/model/mod.rs
74:        let mut val = match res.read().unwrap().open("minecraft", &format!("textures/colormap/{}.png", name)) {
190:        let file = match self.resources.read().unwrap().open(plugin, &format!("blockstates/{}.json", name)) {
278:        let file = match self.resources.read().unwrap().open(plugin, &format!("models/block/{}.json", model_name)) {
305:            let file = match self.resources.read().unwrap().open(plugin, &format!("models/{}.json", parent)) {

src/server/mod.rs
274:        let version = resources.read().unwrap().version();
331:        let version = self.resources.read().unwrap().version();

except in main:

src/main.rs
254:            let mut res = game.resource_manager.write().unwrap();

How can RwLock deadlock with only one write()?

@iceiix iceiix transferred this issue from iceiix/steven Jan 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant