Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Socket state corrupts when original socket object goes out of scope in a different thread #70890

Open
JoshN mannequin opened this issue Apr 6, 2016 · 10 comments
Open
Labels
docs Documentation in the Doc dir type-bug An unexpected behavior, bug, or error

Comments

@JoshN
Copy link
Mannequin

JoshN mannequin commented Apr 6, 2016

BPO 26703
Nosy @pitrou, @vadmium, @MojoVampire

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2016-04-06.18:07:05.803>
labels = ['type-bug', 'docs']
title = 'Socket state corrupts when original socket object goes out of scope in a different thread'
updated_at = <Date 2016-04-08.08:58:54.305>
user = 'https://bugs.python.org/JoshN'

bugs.python.org fields:

activity = <Date 2016-04-08.08:58:54.305>
actor = 'pitrou'
assignee = 'docs@python'
closed = False
closed_date = None
closer = None
components = ['Documentation']
creation = <Date 2016-04-06.18:07:05.803>
creator = 'JoshN'
dependencies = []
files = []
hgrepos = []
issue_num = 26703
keywords = []
message_count = 10.0
messages = ['262951', '262957', '262958', '262968', '262969', '262973', '262976', '262980', '263009', '263010']
nosy_count = 5.0
nosy_names = ['pitrou', 'docs@python', 'martin.panter', 'josh.r', 'JoshN']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue26703'
versions = ['Python 3.5']

@JoshN
Copy link
Mannequin Author

JoshN mannequin commented Apr 6, 2016

Creating a socket in one thread and sharing it with another will cause the socket to corrupt as soon as the thread it was created in exits.

Example code:
import socket, threading, time, os

def start():
    a = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    a.bind(("", 8080))
    a.set_inheritable(True)

    thread = threading.Thread(target=abc, args=(a.fileno(),))
    thread.start()

    time.sleep(2)
    print("Main thread exiting, socket is still valid: " + str(a) + "\n")

def abc(b):
    sock = socket.socket(fileno=b)
    for _ in range(3):
        print("Passed as an argument:" + str(sock) + "\n=====================")

        time.sleep(1.1)

start()

Note that, as soon as the main thread exits, the socket isn't closed, nor is the fd=-1, etc. Doing anything with this corrupted object throws WinError 10038 ('operation performed on something that is not a socket').

I should note that the main thread exiting doesn't seem to be the cause, it is the original object containing the socket going out of scope that causes the socket to become corrupted.

-JoshN

@JoshN JoshN mannequin added type-crash A hard crash of the interpreter, possibly with a core dump topic-IO labels Apr 6, 2016
@JoshN JoshN mannequin changed the title Socket state corrupts when original assignment goes out of scope Socket state corrupts when original socket object goes out of scope in a different thread Apr 6, 2016
@MojoVampire
Copy link
Mannequin

MojoVampire mannequin commented Apr 6, 2016

You used the fileno based initialization in the child, which creates a wrapper around the same file descriptor without duplicating it, so when the first socket disappears, that file descriptor becomes invalid.

I think this is a doc bug more than a behavior bug; the docs say "Unlike socket.fromfd(), fileno will return the same socket and not a duplicate." which almost seems like the idea is that the Python level socket object it returns is cached in some way that allows it to be looked up by file descriptor (making mysock2 = socket.socket(fileno=mysock.fileno()) equivalent to mysock2 = mysock), but what it really means is that there are two Python level socket objects referencing the same C level file descriptor; the normal cleanup behavior still applies though, so the first Python level socket object to be destroyed also closes the file descriptor, leaving the other socket object in a broken state.

The correct approach to this would be to just pass the socket object to the thread directly, or pass along the address family and type and use socket.fromfd (which dups the underlying file descriptor).

@MojoVampire
Copy link
Mannequin

MojoVampire mannequin commented Apr 6, 2016

For source reference, the behavior for this case is to just copy out the file descriptor and stick it in a new socket object ( https://hg.python.org/cpython/file/3.5/Modules/socketmodule.c#l4289 ); no work is being done to somehow collaboratively manage the file descriptor to ensure it remains alive for the life of the socket object you're creating.

@vadmium
Copy link
Member

vadmium commented Apr 6, 2016

The documentation already says “Sockets are automatically closed when they are garbage-collected”. If for some reason you want to release a socket object but keep the file descriptor open, I suggest socket.detach(). Otherwise, pass the original socket, not the fileno.

I think this is at best a documentation issue, if you have any suggestions.

@vadmium vadmium added docs Documentation in the Doc dir and removed topic-IO labels Apr 6, 2016
@vadmium vadmium added type-bug An unexpected behavior, bug, or error and removed type-crash A hard crash of the interpreter, possibly with a core dump labels Apr 6, 2016
@vadmium
Copy link
Member

vadmium commented Apr 6, 2016

Also, if you enable warnings (e.g. python -Wall), you should see that the socket is being closed:

-c:22: ResourceWarning: unclosed <socket.socket fd=3, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('0.0.0.0', 8080)>

@JoshN
Copy link
Mannequin Author

JoshN mannequin commented Apr 7, 2016

I do understand that the docs are a bit strange on the issue. For example, actually testing the line you referenced ("...fileno will return the same socket and not a duplicate.") by creating 2 sockets and testing sameness with the 'is' operator returns false.

I tried to trim the example code as much as possible - I did test disabling the garbage collector, playing with inheritance, etc, but trimmed them out as they didn't have any effect on my system.

I think my main issue was, when this occurs, the socket 'breaks' as you mentioned instead of closing. Was almost sure it was a bug. Using detach works for this UDP example, but I wasn't sure if detaching the socket actually closes it (e.g. in a stream oriented connection).

So this is considered normal behavior then?

@vadmium
Copy link
Member

vadmium commented Apr 7, 2016

Yes, I think this is the expected behaviour, and I can’t think of any improvements that could be made. If you call fileno(), you have to ensure that you don’t close the file descriptor until you have finished using it. It is a bit like accessing memory after it has been freed. Python doesn’t make raw memory addresses easily accessible, but it does make fileno() accessible without much protection.

Perhaps there is some confusion about the term socket. Normally (without using the fileno=... parameter), Python’s socket() constructor does two things. First, it creates a new OS socket using the socket() system call (or Winsock equivalent), which returns a file descriptor or handle (an integer). Then, it creates a Python socket object, which wraps the file descriptor.

When you use socket(fileno=...), only the second step is taken. You get a _new_ socket object, which wraps the given existing OS socket file descriptor. So when it says “the same socket”, I think it means the same OS-level socket. It still creates a new Python object.

@pitrou
Copy link
Member

pitrou commented Apr 7, 2016

The general answer here is you should avoid mixing calls to different abstraction layers. Either use only the file descriptor or only the socket object.

This is not limited to lifetime issues, other issues can occur. For example, setting a timeout on a socket puts the underlying file descriptor in non-blocking mode. So code using the file descriptor can fail with EAGAIN.

If you really want to use *both* a file descriptor and a socket object, you can use os.dup() on the file descriptor, so that the OS resources are truly independent.

@JoshN
Copy link
Mannequin Author

JoshN mannequin commented Apr 8, 2016

Josh/Martin/Antoine: Thank you for the tips - I was not aware of the underlying mechanics, especially the separate abstraction layers.

I did RTFM up and down before posting this, to be sure. My apologies for the inconvenience.

@pitrou
Copy link
Member

pitrou commented Apr 8, 2016

No need to apologize :) Perhaps we need to make the docs a bit clearer.

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation in the Doc dir type-bug An unexpected behavior, bug, or error
Projects
Status: No status
Development

No branches or pull requests

2 participants