Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Browse files

Merge branch 'docs'

  • Loading branch information...
commit 02d219bf0e7ae27704da919a68f7f4265f5f0c52 2 parents 84eedc5 + ec51136
@Byron Byron authored
View
51 README.rst
@@ -1,51 +0,0 @@
-####################
-Sliding MMap (smmap)
-####################
-A straight forward implementation of a slidinging memory map.
-The idea is that every access to a file goes through a memory map manager, which will on demand map a region of a file and provide a string-like object for reading.
-
-When reading from it, you will have to check whether you are still within your window boundary, and possibly obtain a new window as required.
-
-The great benefit of this system is that you can use it to map files of any size even on 32 bit systems. Additionally it will be able to close unused windows right away to return system resources. If there are multiple clients for the same file and location, the same window will be reused as well.
-
-As there is a global management facility, you are also able to forcibly free all open handles which is handy on windows, which would otherwise prevent the deletion of the involved files.
-
-For convenience, a stream class is provided which hides the usage of the memory manager behind a simple stream interface.
-
-************
-LIMITATIONS
-************
-* The access is readonly by design.
-* In python below 2.6, memory maps will be created in compatability mode which works, but creates inefficient memory maps as they always start at offset 0.
-
-************
-REQUIREMENTS
-************
-* runs Python 2.4 or higher, but needs Python 2.6 or higher to run properly as it needs the offset parameter of the mmap.mmap function.
-
-*******
-Install
-*******
-TODO
-
-******
-Source
-******
-The source is available at git://github.com/Byron/smmap.git and can be cloned using::
-
- git clone git://github.com/Byron/smmap.git
-
-************
-MAILING LIST
-************
-http://groups.google.com/group/git-python
-
-*************
-ISSUE TRACKER
-*************
-https://github.com/Byron/smmap/issues
-
-*******
-LICENSE
-*******
-New BSD License
View
1  README.rst
View
42 doc/source/api.rst
@@ -0,0 +1,42 @@
+.. _api-label:
+
+#############
+API Reference
+#############
+
+***********************
+Mapped Memory Managers
+***********************
+
+.. automodule:: smmap.mman
+ :members:
+ :undoc-members:
+
+*******
+Buffers
+*******
+
+.. automodule:: smmap.buf
+ :members:
+ :undoc-members:
+
+**********
+Exceptions
+**********
+
+.. automodule:: smmap.exc
+ :members:
+ :undoc-members:
+
+*********
+Utilities
+*********
+
+.. automodule:: smmap.util
+ :members:
+ :undoc-members:
+
+
+
+
+
View
2  doc/source/conf.py
@@ -16,7 +16,7 @@
# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
-#sys.path.append(os.path.abspath('.'))
+sys.path.append(os.path.abspath('../../'))
# -- General configuration -----------------------------------------------------
View
3  doc/source/index.rst
@@ -12,6 +12,9 @@ Contents:
.. toctree::
:maxdepth: 2
+ intro
+ tutorial
+ api
changes
Indices and tables
View
79 doc/source/intro.rst
@@ -0,0 +1,79 @@
+###########
+Motivation
+###########
+When reading from many possibly large files in a fashion similar to random access, it is usually the fastest and most efficient to use memory maps.
+
+Although memory maps have many advantages, they represent a very limited system resource as every map uses one file descriptor, whose amount is limited per process. On 32 bit systems, the amount of memory you can have mapped at a time is naturally limited to theoretical 4GB of memory, which may not be enough for some applications.
+
+########
+Overview
+########
+
+Smmap wraps an interface around mmap and tracks the mapped files as well as the amount of clients who use it. If the system runs out of resources, or if a memory limit is reached, it will automatically unload unused maps to allow continued operation.
+
+To allow processing large files even on 32 bit systems, it allows only portions of the file to be mapped. Once the user reads beyond the mapped region, smmap will automatically map the next required region, unloading unused regions using a LRU algorithm.
+
+The interface also works around the missing offset parameter in python implementations up to python 2.5.
+
+Although the library can be used most efficiently with its native interface, a Buffer implementation is provided to hide these details behind a simple string-like interface.
+
+For performance critical 64 bit applications, a simplified version of memory mapping is provided which always maps the whole file, but still provides the benefit of unloading unused mappings on demand.
+
+#############
+Prerequisites
+#############
+* Python 2.4, 2.5 or 2.6
+* OSX, Windows or Linux
+
+The package was tested on all of the previously mentioned configurations.
+
+###########
+Limitations
+###########
+* The memory access is read-only by design.
+* In python below 2.6, memory maps will be created in compatibility mode which works, but creates inefficient memory mappings as they always start at offset 0.
+* It wasn't tested on python 2.7 and 3.x.
+
+################
+Installing smmap
+################
+Its easiest to install smmap using the *easy_install* or *pip* program, which is part of the `setuptools`_ or `pip`_ respectively::
+
+ $ easy_install smmap
+ # or
+ $ pip install smmap
+
+As the command will install smmap in your respective python distribution, you will most likely need root permissions to authorize the required changes.
+
+If you have downloaded the source archive, the package can be installed by running the ``setup.py`` script::
+
+ $ python setup.py install
+
+It is advised to have a look at the :ref:`Usage Guide <tutorial-label>` for a brief introduction on the different database implementations.
+
+##################
+Homepage and Links
+##################
+The project is home on github at `https://github.com/Byron/smmap <https://github.com/Byron/smmap>`_.
+
+The latest source can be cloned from github as well:
+
+ * git://github.com/gitpython-developers/smmap.git
+
+
+For support, please use the git-python mailing list:
+
+ * http://groups.google.com/group/git-python
+
+
+Issues can be filed on github:
+
+ * https://github.com/Byron/smmap/issues
+
+###################
+License Information
+###################
+*smmap* is licensed under the New BSD License.
+
+.. _setuptools: http://peak.telecommunity.com/DevCenter/setuptools
+.. _pip: http://www.pip-installer.org/en/latest/
View
118 doc/source/tutorial.rst
@@ -0,0 +1,118 @@
+.. _tutorial-label:
+
+###########
+Usage Guide
+###########
+This text briefly introduces you to the basic design decisions and accompanying classes.
+
+******
+Design
+******
+Per application, there is *MemoryManager* which is held as static instance and used throughout the application. It can be configured to keep your resources within certain limits.
+
+To access mapped regions, you require a cursor. Cursors point to exactly one file and serve as handles into it. As long as it exists, the respective memory region will remain available.
+
+For convenience, a buffer implementation is provided which handles cursors and resource allocation behind its simple buffer like interface.
+
+***************
+Memory Managers
+***************
+There are two types of memory managers, one uses *static* windows, the other one uses *sliding* windows. A window is a region of a file mapped into memory. Although the names might be somewhat misleading as technically windows are always static, the *sliding* version will allocate relatively small windows whereas the *static* version will always map the whole file.
+
+The *static* manager does nothing more than keeping a client count on the respective memory maps which always map the whole file, which allows to make some assumptions that can lead to simplified data access and increased performance, but reduces the compatibility to 32 bit systems or giant files.
+
+The *sliding* memory manager therefore should be the default manager when preparing an application for handling huge amounts of data on 32 bit and 64 bit platforms::
+
+ import smmap
+ # This instance should be globally available in your application
+ # It is configured to be well suitable for 32-bit or 64 bit applications.
+ mman = smmap.SlidingWindowMapManager()
+
+ # the manager provides much useful information about its current state
+ # like the amount of open file handles or the amount of mapped memory
+ mman.num_file_handles()
+ mman.mapped_memory_size()
+ # and many more ...
+
+
+Cursors
+*******
+*Cursors* are handles that point onto a window, i.e. a region of a file mapped into memory. From them you may obtain a buffer through which the data of that window can actually be accessed::
+
+ import smmap.test.lib
+ fc = smmap.test.lib.FileCreator(1024*1024*8, "test_file")
+
+ # obtain a cursor to access some file.
+ c = mman.make_cursor(fc.path)
+
+ # the cursor is now associated with the file, but not yet usable
+ assert c.is_associated()
+ assert not c.is_valid()
+
+ # before you can use the cursor, you have to specify a window you want to
+ # access. The following just says you want as much data as possible starting
+ # from offset 0.
+ # To be sure your region could be mapped, query for validity
+ assert c.use_region().is_valid() # use_region returns self
+
+ # once a region was mapped, you must query its dimension regularly
+ # to assure you don't try to access its buffer out of its bounds
+ assert c.size()
+ c.buffer()[0] # first byte
+ c.buffer()[1:10] # first 9 bytes
+ c.buffer()[c.size()-1] # last byte
+
+ # its recommended not to create big slices when feeding the buffer
+ # into consumers (e.g. struct or zlib).
+ # Instead, either give the buffer directly, or use pythons buffer command.
+ buffer(c.buffer(), 1, 9) # first 9 bytes without copying them
+
+ # you can query absolute offsets, and check whether an offset is included
+ # in the cursor's data.
+ assert c.ofs_begin() < c.ofs_end()
+ assert c.includes_ofs(100)
+
+ # If you are over out of bounds with one of your region requests, the
+ # cursor will be come invalid. It cannot be used in that state
+ assert not c.use_region(fc.size, 100).is_valid()
+ # map as much as possible after skipping the first 100 bytes
+ assert c.use_region(100).is_valid()
+
+ # You can explicitly free cursor resources by unusing the cursor's region
+ c.unuse_region()
+ assert not c.is_valid()
+
+
+Now you would have to write your algorithms around this interface to properly slide through huge amounts of data.
+
+Alternatively you can use a convenience interface.
+
+*******
+Buffers
+*******
+To make first use easier, at the expense of performance, there is a Buffer implementation which uses a cursor underneath.
+
+With it, you can access all data in a possibly huge file without having to take care of setting the cursor to different regions yourself::
+
+ # Create a default buffer which can operate on the whole file
+ buf = smmap.SlidingWindowMapBuffer(mman.make_cursor(fc.path))
+
+ # you can use it right away
+ assert buf.cursor().is_valid()
+
+ buf[0] # access the first byte
+ buf[-1] # access the last ten bytes on the file
+ buf[-10:]# access the last ten bytes
+
+ # If you want to keep the instance between different accesses, use the
+ # dedicated methods
+ buf.end_access()
+ assert not buf.cursor().is_valid() # you cannot use the buffer anymore
+ assert buf.begin_access(offset=10) # start using the buffer at an offset
+
+ # it will stop using resources automatically once it goes out of scope
+
+Disadvantages
+*************
+Buffers cannot be used in place of strings or maps, hence you have to slice them to have valid input for the sorts of struct and zlib. A slice means a lot of data handling overhead which makes buffers slower compared to using cursors directly.
+
View
10 smmap/buf.py
@@ -47,6 +47,8 @@ def __len__(self):
def __getitem__(self, i):
c = self._c
assert c.is_valid()
+ if i < 0:
+ i = self._size + i
if not c.includes_ofs(i):
c.use_region(i, 1)
# END handle region usage
@@ -57,6 +59,12 @@ def __getslice__(self, i, j):
# fast path, slice fully included - safes a concatenate operation and
# should be the default
assert c.is_valid()
+ if i < 0:
+ i = self._size + i
+ if j == sys.maxint:
+ j = self._size
+ if j < 0:
+ j = self._size + j
if (c.ofs_begin() <= i) and (j < c.ofs_end()):
b = c.ofs_begin()
return c.buffer()[i-b:j-b]
@@ -68,6 +76,7 @@ def __getslice__(self, i, j):
md = str()
while l:
c.use_region(ofs, l)
+ assert c.is_valid()
d = c.buffer()[:l]
ofs += len(d)
l -= len(d)
@@ -102,6 +111,7 @@ def begin_access(self, cursor = None, offset = 0, size = sys.maxint, flags = 0):
self._size = size
#END set size
return res
+ # END use our cursor
return False
def end_access(self):
View
50 smmap/mman.py
@@ -11,21 +11,22 @@
import sys
from sys import getrefcount
-__all__ = ["StaticWindowMapManager", "SlidingWindowMapManager"]
+__all__ = ["StaticWindowMapManager", "SlidingWindowMapManager", "WindowCursor"]
#{ Utilities
#}END utilities
-
class WindowCursor(object):
- """Pointer into the mapped region of the memory manager, keeping the map
+ """
+ Pointer into the mapped region of the memory manager, keeping the map
alive until it is destroyed and no other client uses it.
-
+
Cursors should not be created manually, but are instead returned by the SlidingWindowMapManager
- :note: The current implementation is suited for static and sliding window managers, but it also means
- that it must be suited for the somewhat quite different sliding manager. It could be improved, but
- I see no real need to do so."""
+
+ **Note**: The current implementation is suited for static and sliding window managers, but it also means
+ that it must be suited for the somewhat quite different sliding manager. It could be improved, but
+ I see no real need to do so."""
__slots__ = (
'_manager', # the manger keeping all file regions
'_rlist', # a regions list with regions for our file
@@ -85,14 +86,16 @@ def assign(self, rhs):
def use_region(self, offset = 0, size = 0, flags = 0):
"""Assure we point to a window which allows access to the given offset into the file
+
:param offset: absolute offset in bytes into the file
:param size: amount of bytes to map. If 0, all available bytes will be mapped
:param flags: additional flags to be given to os.open in case a file handle is initially opened
for mapping. Has no effect if a region can actually be reused.
:return: this instance - it should be queried for whether it points to a valid memory region.
This is not the case if the mapping failed becaues we reached the end of the file
- :note: The size actually mapped may be smaller than the given size. If that is the case,
- either the file has reached its end, or the map was created between two existing regions"""
+
+ **note**: The size actually mapped may be smaller than the given size. If that is the case,
+ either the file has reached its end, or the map was created between two existing regions"""
need_region = True
man = self._manager
fsize = self._rlist.file_size()
@@ -123,9 +126,10 @@ def use_region(self, offset = 0, size = 0, flags = 0):
def unuse_region(self):
"""Unuse the ucrrent region. Does nothing if we have no current region
- :note: the cursor unuses the region automatically upon destruction. It is recommended
- to unuse the region once you are done reading from it in persistent cursors as it
- helps to free up resource more quickly"""
+
+ **note** the cursor unuses the region automatically upon destruction. It is recommended
+ to unuse the region once you are done reading from it in persistent cursors as it
+ helps to free up resource more quickly"""
self._region = None
# note: should reset ofs and size, but we spare that for performance. Its not
# allowed to query information if we are not valid !
@@ -133,9 +137,11 @@ def unuse_region(self):
def buffer(self):
"""Return a buffer object which allows access to our memory region from our offset
to the window size. Please note that it might be smaller than you requested when calling use_region()
- :note: You can only obtain a buffer if this instance is_valid() !
- :note: buffers should not be cached passed the duration of your access as it will
- prevent resources from being freed even though they might not be accounted for anymore !"""
+
+ **note** You can only obtain a buffer if this instance is_valid() !
+
+ **note** buffers should not be cached passed the duration of your access as it will
+ prevent resources from being freed even though they might not be accounted for anymore !"""
return buffer(self._region.buffer(), self._ofs, self._size)
def map(self):
@@ -155,7 +161,8 @@ def is_associated(self):
def ofs_begin(self):
""":return: offset to the first byte pointed to by our cursor
- :note: only if is_valid() is True"""
+
+ **note** only if is_valid() is True"""
return self._region._b + self._ofs
def ofs_end(self):
@@ -177,7 +184,8 @@ def region_ref(self):
def includes_ofs(self, ofs):
""":return: True if the given absolute offset is contained in the cursors
current region
- :note: cursor must be valid for this to work"""
+
+ **note** cursor must be valid for this to work"""
# unroll methods
return (self._region._b + self._ofs) <= ofs < (self._region._b + self._ofs + self._size)
@@ -199,7 +207,8 @@ def path(self):
def fd(self):
""":return: file descriptor used to create the underlying mapping.
- :note: it is not required to be valid anymore
+
+ **note** it is not required to be valid anymore
:raise ValueError: if the mapping was not created by a file descriptor"""
if isinstance(self._rlist.path_or_fd(), basestring):
raise ValueError("File descriptor queried although mapping was generated from path")
@@ -354,8 +363,9 @@ def _obtain_region(self, a, offset, size, flags, is_recursive):
#{ Interface
def make_cursor(self, path_or_fd):
- """:return: a cursor pointing to the given path or file descriptor.
- It can be used to map new regions of the file into memory
+ """
+ :return: a cursor pointing to the given path or file descriptor.
+ It can be used to map new regions of the file into memory
:note: if a file descriptor is given, it is assumed to be open and valid,
but may be closed afterwards. To refer to the same file, you may reuse
your existing file descriptor, but keep in mind that new windows can only
View
4 smmap/test/test_buf.py
@@ -50,6 +50,10 @@ def test_basics(self):
assert data[offset] == buf[0]
assert data[offset:offset*2] == buf[0:offset]
+ # negative indices, partial slices
+ assert buf[-1] == buf[len(buf)-1]
+ assert buf[-10:] == buf[len(buf)-10:len(buf)]
+
# end access makes its cursor invalid
buf.end_access()
assert not buf.cursor().is_valid()
View
83 smmap/test/test_tutorial.py
@@ -0,0 +1,83 @@
+from lib import TestBase
+
+class TestTutorial(TestBase):
+
+ def test_example(self):
+ # Memory Managers
+ ##################
+ import smmap
+ # This instance should be globally available in your application
+ # It is configured to be well suitable for 32-bit or 64 bit applications.
+ mman = smmap.SlidingWindowMapManager()
+
+ # the manager provides much useful information about its current state
+ # like the amount of open file handles or the amount of mapped memory
+ assert mman.num_file_handles() == 0
+ assert mman.mapped_memory_size() == 0
+ # and many more ...
+
+ # Cursors
+ ##########
+ import smmap.test.lib
+ fc = smmap.test.lib.FileCreator(1024*1024*8, "test_file")
+
+ # obtain a cursor to access some file.
+ c = mman.make_cursor(fc.path)
+
+ # the cursor is now associated with the file, but not yet usable
+ assert c.is_associated()
+ assert not c.is_valid()
+
+ # before you can use the cursor, you have to specify a window you want to
+ # access. The following just says you want as much data as possible starting
+ # from offset 0.
+ # To be sure your region could be mapped, query for validity
+ assert c.use_region().is_valid() # use_region returns self
+
+ # once a region was mapped, you must query its dimension regularly
+ # to assure you don't try to access its buffer out of its bounds
+ assert c.size()
+ c.buffer()[0] # first byte
+ c.buffer()[1:10] # first 9 bytes
+ c.buffer()[c.size()-1] # last byte
+
+ # its recommended not to create big slices when feeding the buffer
+ # into consumers (e.g. struct or zlib).
+ # Instead, either give the buffer directly, or use pythons buffer command.
+ buffer(c.buffer(), 1, 9) # first 9 bytes without copying them
+
+ # you can query absolute offsets, and check whether an offset is included
+ # in the cursor's data.
+ assert c.ofs_begin() < c.ofs_end()
+ assert c.includes_ofs(100)
+
+ # If you are over out of bounds with one of your region requests, the
+ # cursor will be come invalid. It cannot be used in that state
+ assert not c.use_region(fc.size, 100).is_valid()
+ # map as much as possible after skipping the first 100 bytes
+ assert c.use_region(100).is_valid()
+
+ # You can explicitly free cursor resources by unusing the cursor's region
+ c.unuse_region()
+ assert not c.is_valid()
+
+ # Buffers
+ #########
+ # Create a default buffer which can operate on the whole file
+ buf = smmap.SlidingWindowMapBuffer(mman.make_cursor(fc.path))
+
+ # you can use it right away
+ assert buf.cursor().is_valid()
+
+ buf[0] # access the first byte
+ buf[-1] # access the last ten bytes on the file
+ buf[-10:]# access the last ten bytes
+
+ # If you want to keep the instance between different accesses, use the
+ # dedicated methods
+ buf.end_access()
+ assert not buf.cursor().is_valid() # you cannot use the buffer anymore
+ assert buf.begin_access(offset=10) # start using the buffer at an offset
+
+ # it will stop using resources automatically once it goes out of scope
+
View
4 smmap/util.py
@@ -20,7 +20,9 @@
#{ Utilities
def align_to_mmap(num, round_up):
- """Align the given integer number to the closest page offset, which usually is 4096 bytes.
+ """
+ Align the given integer number to the closest page offset, which usually is 4096 bytes.
+
:param round_up: if True, the next higher multiple of page size is used, otherwise
the lower page_size will be used (i.e. if True, 1 becomes 4096, otherwise it becomes 0)
:return: num rounded to closest page"""
Please sign in to comment.
Something went wrong with that request. Please try again.