Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

process SysV-Semaphore support #49975

Open
jvdias mannequin opened this issue Apr 9, 2009 · 13 comments
Open

process SysV-Semaphore support #49975

jvdias mannequin opened this issue Apr 9, 2009 · 13 comments
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@jvdias
Copy link
Mannequin

jvdias mannequin commented Apr 9, 2009

BPO 5725
Nosy @pitrou, @osvenskan, @bitdancer
Files
  • psem_example.py: Example "Log Compressor" psem using application
  • psempy.c: C source code implementing "psem.*" python module functions
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2009-04-09.05:20:51.435>
    labels = ['type-feature', 'library']
    title = 'process SysV-Semaphore support'
    updated_at = <Date 2010-12-29.17:45:30.026>
    user = 'https://bugs.python.org/jvdias'

    bugs.python.org fields:

    activity = <Date 2010-12-29.17:45:30.026>
    actor = 'jnoller'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2009-04-09.05:20:51.435>
    creator = 'jvdias'
    dependencies = []
    files = ['13658', '13662']
    hgrepos = []
    issue_num = 5725
    keywords = []
    message_count = 13.0
    messages = ['85791', '85792', '85793', '85795', '85809', '85810', '85811', '85817', '85836', '85846', '85955', '95744', '95771']
    nosy_count = 5.0
    nosy_names = ['pitrou', 'osvenskan', 'jnoller', 'r.david.murray', 'jvdias']
    pr_nums = []
    priority = 'low'
    resolution = None
    stage = None
    status = 'open'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue5725'
    versions = ['Python 3.1', 'Python 2.7']

    @jvdias
    Copy link
    Mannequin Author

    jvdias mannequin commented Apr 9, 2009

    Please could we have an API in the Python Core for PROCESS
    as opposed to THREAD Semaphores , to enable facilities such
    as I have done in the attached example "psempy.so" compiled
    "C" python module.

    ie. I'd like to be able to :
    '
    import sys.semget sys.semop sys.semctl
    '

    Because of being un-able to do this, the attached "psem.*"
    module provides a workaround for my application.
    Should I expand this into a Python Add-On or will full
    support for SysV PROCESS Semaphores be coming to Python core soon ?

    @jvdias jvdias mannequin added the type-feature A feature request or enhancement label Apr 9, 2009
    @jvdias
    Copy link
    Mannequin Author

    jvdias mannequin commented Apr 9, 2009

    To Build:

    $ gcc -fPIC -shared -o psempy.c psempy.so -I/usr/include/python2.6
    -L/usr/lib/python2.6 -lpython2.6 && mv psempy.so psem.so
    $ dd if=/dev/urandom of=app1_20090407.01.log bs=1000000 count=1
    $ python
    >>> import sys, os, re, datetime, psem, psem_example
    >>> psem_example.compress_log( "app1", "2009", "04", "07", "01", "bzip",
    "app1_20090407.01.log");
    0

    Example psem.so using program that compresses logs
    named *{YEAR}-${MONTH}-${DAY}* in a psem.* based
    parallel for .
    On a 32 2Ghz processor SPARC, the time taken to compress
    32 1MB files using the psem parallel-for (for 32 CPUs)
    was really of the order of the time taken to compress 1
    1MB file - ie. roughly 1/32nd of the time taken to compress
    32 files serially .
    The number of processes was made secure and "run-away" safe
    ONLY because direct access was available to the
    semop(2), semget(2), and semctl(2) system calls.
    Please can Python put this API into sys or I will create a
    Python add-on module to do so - let me know whether this is
    a good idea or not - thank you, Jason.

    @jvdias
    Copy link
    Mannequin Author

    jvdias mannequin commented Apr 9, 2009

    Example Python use of psem.so Parallel ForEach : Log Compressor

    Usage:

    $ gcc -o psem.so -fPIC -shared psempy.c -I/usr/include/python2.6
    -L/usr/lib/python2.6 -lpython2.6
    $ dd if=/dev/urandom bs=1000000 count=1 of=app1_20090407.01.log
    $ python 
    Python 2.6 (r26:66714, Oct 16 2008, 00:21:12)
    [GCC 4.2.4] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import sys, os, re, datetime, psem, psem_example
    >>> psem_example.compress_log( "app1", "2009", "04", "07", "01", "bzip",
    "app2_20090407.02.log");
    0
    >>> quit
    Use quit() or Ctrl-D (i.e. EOF) to exit
    >>>

    Now, one can time for example 16 runs of the above command with
    16 different input files, and on a 16 CPU machine one would expect
    the elapsed time between start and finish to be of the order of
    1/16th of the time taken to compress all of the 16 files sequentially.

    @jvdias
    Copy link
    Mannequin Author

    jvdias mannequin commented Apr 9, 2009

    $ time /usr/bin/python2.6 ./psem_example.py 2>&1 | tee log
    Do you really want to run test using 16 1MB log files ? Y/Ny
    generating files 0..15
    generating file 0
    generating file 1
    generating file 2
    generating file 3
    generating file 4
    generating file 5
    1+0 records in
    1+0 records out
    1000000 bytes (1.0 MB) copied, 0.783862 s, 1.3 MB/s
    1+0 records in
    1+0 records out
    1000000 bytes (1.0 MB) copied, 0.890675 s, 1.1 MB/s
    1+0 records in
    1+0 records out
    1000000 bytes (1.0 MB) copied, 0.831693 s, 1.2 MB/s
    1+0 records in
    1+0 records out
    1000000 bytes (1.0 MB) copied, 0.84914 s, 1.2 MB/s
    1+0 records in
    1+0 records out
    1000000 bytes (1.0 MB) copied, 0.885601 s, 1.1 MB/s
    generating file 6
    generating file 7
    1+0 records in
    1+0 records out
    1000000 bytes (1.0 MB) copied, 0.942455 s, 1.1 MB/s
    1+0 records in
    1+0 records out
    1000000 bytes (1.0 MB) copied, 0.282143 s, 3.5 MB/s
    generating file 8
    generating file 9
    1+0 records in
    1+0 records out
    1000000 bytes (1.0 MB) copied, 0.41776 s, 2.4 MB/s
    1+0 records in
    1+0 records out
    1000000 bytes (1.0 MB) copied, 0.292488 s, 3.4 MB/s
    generating file 10
    generating file 11
    1+0 records in
    1+0 records out
    1000000 bytes (1.0 MB) copied, 0.396643 s, 2.5 MB/s
    1+0 records in
    1+0 records out
    1000000 bytes (1.0 MB) copied, 0.2736 s, 3.7 MB/s
    1+0 records in
    1+0 records out
    1000000 bytes (1.0 MB) copied, 0.316026 s, 3.2 MB/s
    generating file 12
    generating file 13
    generating file 14
    1+0 records in
    1+0 records out
    1000000 bytes (1.0 MB) copied, 0.349368 s, 2.9 MB/s
    1+0 records in
    1+0 records out
    1000000 bytes (1.0 MB) copied, 0.364177 s, 2.7 MB/s
    generating file 15
    compressing files 0..15
    compressing file 0
    1+0 records in
    1+0 records out
    1000000 bytes (1.0 MB) copied, 0.495831 s, 2.0 MB/s
    1+0 records in
    1+0 records out
    1000000 bytes (1.0 MB) copied, 0.229301 s, 4.4 MB/s
    compressing file 1
    compressing file 2
    compressing file 3
    compressing file 4
    compressing file 5
    compressing file 6
    compressing file 7
    compressing file 8
    compressing file 9
    compressing file 10
    compressing file 11
    compressing file 12
    compressing file 13
    compressing file 14
    compressing file 15

    real 0m10.700s
    user 0m7.987s
    sys 0m5.130s

    @jvdias
    Copy link
    Mannequin Author

    jvdias mannequin commented Apr 9, 2009

    I suggest a new sys SysV-semaphore API :

      sys.semget( sem_key, sem_nsems, sem_flags)
      sys.SEM_KEY_ANY = 0
      sys.SEM_UNDO    = 0x1000   /*chain of atomic kernel UNDO operations*/ 
      sys.SEM_GETPID  = 11   /* get sempid */
      sys.SEM_GETVAL  = 12   /* get semval */
      sys.SEM_GETALL  = 13   /* get all semval's */
      sys.SEM_GETNCNT = 14   /* get semncnt */
      sys.SEM_GETZCNT = 15   /* get semzcnt */
      sys.SEM_SETVAL  = 16   /* set semval */
      sys.SEM_SETALL  = 17   /* set all semval's */
    #if ( ! defined(__sun__) ) || defined ( __GNU__ )
      sys.SEM_STAT    = 18
      sys.SEM_INFO    = 19
    #endif
      
      sys.semop(semid, sops, nsops)
      sys.semtimedop(semid, sops, nsops,
                     timeout
                    )
      
      sys.semctl(semid, semnum, cmd, ...);

    @jvdias
    Copy link
    Mannequin Author

    jvdias mannequin commented Apr 9, 2009

    PS: Timings for x86{-32,-64} or ia{32,64} architectures are likely
    to show a significantly smaller speedup because they truly are
    MIMD CISC pipeline machines ( the multiple-core feature and
    IA64 "instruction triplet groups" mean they are more optimized
    for "hyperthreading" WITHIN the same process.

    Please remember that Python must also be capable of taking advantage
    of SIMD parallelism also on architectures such as the SPARC where
    often fork()-ing a new process is vastly more efficient than 
    creating a new LWP with true shared memory .
    
    On the amd64 machines, also, which do not have such a huge
    microcode overhead headache such as on IA64 machines, 
    running multiple processes instead of multiple threads
    can often be more efficient.
    
    But on ia64 and of course single-processor machines of all types
    there is no improvement to be found in running multiple processes
    and running multiple threads is greatly to be preferred .
    

    @jvdias
    Copy link
    Mannequin Author

    jvdias mannequin commented Apr 9, 2009

    C source code implementing "psem.*" python module functions

    @jvdias jvdias mannequin added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Apr 9, 2009
    @jvdias
    Copy link
    Mannequin Author

    jvdias mannequin commented Apr 9, 2009

    Contrast what I had to do to perform a process semaphore operation
    in Python with how one would do it in PERL :

    -- Perl 5.10.0 documentation --
    Show toolbar
    Home > Language reference > Functions > semop
    semop
    Perl functions A-Z | Perl functions by category | The 'perlfunc' manpage

    * semop KEY,OPSTRING
    
      Calls the System V IPC function semop to perform semaphore
    

    operations such as signalling and waiting. OPSTRING must be a packed
    array of semop structures. Each semop structure can be generated with
    pack("s!3", $semnum, $semop, $semflag) . The length of OPSTRING implies
    the number of semaphore operations. Returns true if successful, or false
    if there is an error. As an example, the following code waits on
    semaphore $semnum of semaphore id $semid:

          $semop = pack("s!3", $semnum, -1, 0);
          die "Semaphore trouble: $!\n" unless semop($semid, $semop);
    
      To signal the semaphore, replace -1 with 1 . See also "SysV IPC"
    

    in perlipc, IPC::SysV , and IPC::SysV::Semaphore documentation.

    Nice ! Why can't Python provide something similar ?

    Then my example psempy.c module could be implemented in 100% pure
    python.

    I'm bringing this issue up here so as to gain some feedback from
    the Python development team as to the likelihood of Python's Core
    'sys' module ever supportting process-scope semaphores - if I don't
    here back from them within three days I'll submit a patch for
    Python to support the sys.semget(), sys.semctl(), and
    sys.semop/sys.semtimedop operations as described above .

    @bitdancer
    Copy link
    Member

    In bpo-5672 Martin said:

    If somebody would provide a patch that adds prctl to the posix module,
    that would be fine with me - we have a long tradition of exposing all
    available system calls if somebody wants them.

    However, you are talking about a System V call, not a posix call, so
    it's not clear to me if the same rules apply.

    I suggest bringing this up on python-ideas.

    (BTW, sys is for python system stuff, not OS system stuff (which goes in
    the 'os' module).

    @bitdancer bitdancer added stdlib Python modules in the Lib dir and removed interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Apr 9, 2009
    @jvdias
    Copy link
    Mannequin Author

    jvdias mannequin commented Apr 11, 2009

    Thanks for responding !

    I also think that the the Solaris Python should provide support for prctl .
    Well, actually in modern Solaris what can be achieved with "prctl"
    can be achieved my mmap()-ping and manipulating the procfs(4) mounted under /proc -
    but yes, what I meant was not that process-scope semaphore support
    was required to solve the problem solved in the example code for this bug report -
    actually, there are many other ways of doing that on either Linux or Solaris,
    such as with rlimit(1) - but was only that, had process-scope semaphore support
    been available in Python , then the algorithm used by the example would have been
    a simple and robust one and suitable for use in Python code ; but as the application
    the example code was originally part of required process-scope semaphores anyway ,
    and for many other reasons, such as when a process is killed with -9, its semaphore
    count is automatically adjusted by specifying the SEM_UNDO option, meaning another
    "slot" is available, this was the best among several methods investigated .

    Actually, the example code I submitted for this bug report also raises another,
    completely separate issue, which is that any parent using Python fork() MUST somehow do a
    fflush() of its standard output and error C stdio file descriptors before doing so ; otherwise
    the child's first Python "print " statement is prefixed by what was in the parent's stdio buffer,
    so that more than one initial "log" line can occasionally appear if Python "print" is used to
    produce the log - but if the parent does 'print "\n"' immediately before calling "fork()"
    then both processes proceed with clean buffers and there are no doubled log lines .
    Can't Python provide a better print() implementation or a fflush() implementation
    that will enable print()'s buffers to be flushed ? Perhaps something like PERL's IO::Handle::autoflush() ?

    Maybe I should raise another "fflush() support required" bug ?

    Thanks & Regards,
    Jason

    On Thursday 09 April 2009 19:31:31 you wrote:

    R. David Murray <rdmurray@bitdance.com> added the comment:

    In bpo-5672 Martin said:

    If somebody would provide a patch that adds prctl to the posix module,
    that would be fine with me - we have a long tradition of exposing all
    available system calls if somebody wants them.

    However, you are talking about a System V call, not a posix call, so
    it's not clear to me if the same rules apply.

    I suggest bringing this up on python-ideas.

    (BTW, sys is for python system stuff, not OS system stuff (which goes in
    the 'os' module).

    ----------
    components: +Library (Lib) -Interpreter Core
    nosy: +r.david.murray
    priority: -> low
    versions: -Python 2.6, Python 3.0


    Python tracker <report@bugs.python.org>
    <http://bugs.python.org/issue5725\>


    @bitdancer
    Copy link
    Member

    On Sat, 11 Apr 2009 at 09:29, jvdias wrote:

    jvdias <jason.vas.dias@gmail.com> added the comment:

    Thanks for responding !

    You are welcome. If you want something to happen, though,
    you'll have to get support from the community and submit
    a patch.

    Can't Python provide a better print() implementation or a fflush()
    implementation that will enable print()'s buffers to be flushed ?
    Perhaps something like PERL's IO::Handle::autoflush() ?

    Is there some way in which sys.stdout.flush() is not equivalent to "an
    fflush implementation that will enable print()'s buffers to be flushed"?

    Maybe I should raise another "fflush() support required" bug ?

    If it's a real issue, yes. But I don't think it is.

    --David

    @osvenskan
    Copy link
    Mannequin

    osvenskan mannequin commented Nov 26, 2009

    I stumbled across this bug report while looking for an mmap-related
    issue. I thought I'd mention that I have a module for SysV IPC that's
    about a year old.

    Obviously it's not in the standard library, but it is pretty well
    fleshed out. It's in active use and I consider it fairly well debugged.

    It's for Python 2.x only.

    http://semanchuk.com/philip/sysv_ipc/

    Hope this helps

    @pitrou
    Copy link
    Member

    pitrou commented Nov 27, 2009

    jvdias, have you looked at what the multiprocessing module offers?
    http://docs.python.org/library/multiprocessing.html#synchronization-primitives

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants