# Operating System Tools

There are several modules in the standard library that you will usually reach for when dealing with operating system resources, such as files, processes, standard input and output, and with Python internals at a general level.

In [1]:
import sys, os, grp, stat, pathlib

## Module: sys

The `sys` module exposes **many** details of the currently running Python interpreter.  Having programmed in Python for over 20 years, I have never had reason to use most of the attributes and functions in the module.  There are other objects in the `sys` module that I use in nearly every Python program I write.

Let us look at a few of the attributes and functions that introspect details of the Python process.  There are approximately 100 of these.  Some are very commonly relevant to your task, others only rarely so.

Let us inquire about the Python version.

In [2]:
print(sys.version_info)
print(sys.version)

sys.version_info(major=3, minor=8, micro=2, releaselevel='final', serial=0)
3.8.2 | packaged by conda-forge | (default, Apr 24 2020, 08:20:52) 
[GCC 7.3.0]


Information on the *operating system* underlying Python is in the `os` module.

In [3]:
print(sys.platform, os.name)
print(os.uname())

linux posix
posix.uname_result(sysname='Linux', nodename='popkdm', release='5.4.0-7634-generic', version='#38~1595345317~20.04~a8480ad-Ubuntu SMP Wed Jul 22 15:13:45 UTC ', machine='x86_64')


These are things we commonly use, for example.

In [4]:
info = sys.version_info
if info.major == 3 and info.minor > 7:
    if sys.platform.startswith('linux'):
        print("Happy operating system and Python version")
    if 'Ubuntu' in os.uname().version:
        print("Ubuntu-based distribution")

Happy operating system and Python version
Ubuntu-based distribution


Of course, ideally your script will run on any operating system; so having to check is special purpose.

Some capabilities in `sys` really probe deeply into Python internals.  For examle, we can get a dictionary mapping every thread’s to its topmost stack frame.  Don't worry that that makes no sense to you.  

In [5]:
sys._current_frames()

{140534182033152: <frame at 0x7fd0b048e420, file '/home/dmertz/miniconda3/envs/INE/lib/python3.8/site-packages/ipykernel/parentpoller.py', line 39, code run>,
 140534190425856: <frame at 0x55cbd54d4720, file '/home/dmertz/miniconda3/envs/INE/lib/python3.8/threading.py', line 309, code wait>,
 140534215603968: <frame at 0x7fd0b0d0dbe0, file '/home/dmertz/miniconda3/envs/INE/lib/python3.8/site-packages/ipykernel/heartbeat.py', line 100, code run>,
 140534296385280: <frame at 0x55cbd54d53e0, file '/home/dmertz/miniconda3/envs/INE/lib/python3.8/selectors.py', line 468, code select>,
 140534374414144: <frame at 0x7fd0b04b4040, file '<ipython-input-5-7750e22009c1>', line 1, code <module>>}

Finding all the modules in builtins isn't quite as obscure, but it is also something you rarely need to worry about.

In [6]:
print(sys.builtin_module_names)



There is a lot more along these lines.

Let us focus on the capabilities in `sys` that you are genuinely likely to use.  `sys.argv` provides the command-line arguments passed (`argparse`, in lesson 5, is a higher-level approach).  `sys.stdin`, `sys.stdin`, and `sys.stderr` are the usual streams of a command-line program.  We run this simple program:

```python
import sys
print("Arguments:", sys.argv)
data = sys.stdin.read()
print("Standard input:", data)
print("Standard output", file=sys.stdout)
print("Error output", file=sys.stderr)
```

In [7]:
%%bash
echo "Piped in data" | python streams.py arg1 arg2

Arguments: ['streams.py', 'arg1', 'arg2']
Standard input: Piped in data

Standard output


Error output


Some others fall into the "use with caution" category.  For example, you can see (and modify) the list of directories where modules might be imported from.

In [8]:
sys.path

['/home/dmertz/git/INE/stdlib/01-OS-Tools',
 '/home/dmertz/miniconda3/envs/INE/lib/python38.zip',
 '/home/dmertz/miniconda3/envs/INE/lib/python3.8',
 '/home/dmertz/miniconda3/envs/INE/lib/python3.8/lib-dynload',
 '',
 '/home/dmertz/miniconda3/envs/INE/lib/python3.8/site-packages',
 '/home/dmertz/miniconda3/envs/INE/lib/python3.8/site-packages/IPython/extensions',
 '/home/dmertz/.ipython']

In [9]:
sys.path.append('/no/such/path')
sys.path[-2:]

['/home/dmertz/.ipython', '/no/such/path']

Writing *deeply* recursive programs in Python is not really a good idea.  Some other languages optimize that in ways Python cannot; but moderate recursion is very often useful.

In [10]:
sys.getrecursionlimit()

3000

In [11]:
sys.setrecursionlimit(4000)
sys.getrecursionlimit()

4000

## Module: os

The `os` module has a number of operating system specific functions at its root, but it also has a submodule called `os.path` which is equally important.  In most ways, the newer `pathlib` can replace `os.path` with a friendlier API.  

Within `os`, most functionality has to deal with working with paths, processes, and users.  Probably the capability I use most is working with environment variables.  For example, here are a few variables I have set.

In [12]:
list(os.environ.keys())[:5]

['SHELL',
 'SESSION_MANAGER',
 'QT_ACCESSIBILITY',
 'CONDA_BACKUP_GFORTRAN',
 'COLORTERM']

Some scripts utilize environment to configure actions.  While the functions `os.getenv()`, `os.putenv()`, and `os.unsetenv()` exist, it is better simply to use dictionary-style modifications of `os.environ`.

In [13]:
user = os.environ['USER']
print("USER is", user)

if altuser := os.environ.get('ALTUSER'):
    print("ALTUSER is", altuser)
else:
    print("Setting ALTUSER to 'guest'")
    os.environ['ALTUSER'] = 'guest'
    
print(os.environ['ALTUSER'])
del os.environ['ALTUSER']
print(os.environ.get('ALTUSER'))

USER is dmertz
Setting ALTUSER to 'guest'
guest
None


We can look at users, groups, permissions, and so on.  Another support module called `grp` gives more information on user groups.

In [14]:
for mygroup in os.getgroups():
    print(f"Group #{mygroup}")
    print(grp.getgrgid(mygroup))

Group #4
grp.struct_group(gr_name='adm', gr_passwd='x', gr_gid=4, gr_mem=['syslog', 'dmertz'])
Group #27
grp.struct_group(gr_name='sudo', gr_passwd='x', gr_gid=27, gr_mem=['dmertz'])
Group #130
grp.struct_group(gr_name='docker', gr_passwd='x', gr_gid=130, gr_mem=['dmertz'])
Group #1000
grp.struct_group(gr_name='dmertz', gr_passwd='x', gr_gid=1000, gr_mem=[])


Various details about the machine and configuration we are running on.

In [15]:
print("Process ID:", os.getpid())
print("Term size: ", os.get_terminal_size())
print("CPU count: ", os.cpu_count())

Process ID: 1683057
Term size:  os.terminal_size(columns=93, lines=12)
CPU count:  8


The usual Unix-style file and directory manipulation commands are in the `os` module.  The `stat` module provides support for some cryptic numbers.

In [16]:
cwd = os.getcwd()
print("Current dir:", cwd)

# Make dir and switch (os.mkdir() assumes not there already)
os.makedirs("tmp", exist_ok=True)
os.chdir('tmp')

# Create a file and set its permissions
with open('test', 'w') as fh:
    print("TEST", file=fh)
    
# Check content
print("Location:", os.getcwd())
print("File content:", open('test').read())

Current dir: /home/dmertz/git/INE/stdlib/01-OS-Tools
Location: /home/dmertz/git/INE/stdlib/01-OS-Tools/tmp
File content: TEST



In [17]:
# Change file mode
oldmode = os.stat('test').st_mode
os.chmod('test', stat.S_IRUSR | stat.S_IWUSR)
newmode = os.stat('test').st_mode
print("Permissions:", oldmode, "⟶", newmode)
print("Dir style:  ", stat.filemode(oldmode), "⟶", stat.filemode(newmode))

Permissions: 33204 ⟶ 33152
Dir style:   -rw-rw-r-- ⟶ -rw-------


In [18]:
# Cleanup
os.remove('test')
os.chdir(cwd)
os.rmdir('tmp')

# See what is here
list(os.walk('.'))

[('.',
  ['.ipynb_checkpoints'],
  ['streams.py', '01-OS-Tools.ipynb', '01-Exercise.ipynb']),
 ('./.ipynb_checkpoints',
  [],
  ['01-OS-Tools-checkpoint.ipynb', '01-Exercise-checkpoint.ipynb'])]

## Module: shutil

The module `shutil` provides higher-level operations on files than those in `os` or `os.path`.  This is a fairly small module with a few wrappers.

In [19]:
import shutil
shutil.disk_usage('.')

usage(total=236068233216, used=122818617344, free=101186588672)

In [20]:
try:
    os.mkdir('/tmp/os-tools')
except FileExistsError as err:
    print(err)
    
shutil.copytree('.', '/tmp/os-tools/', dirs_exist_ok=True)

print(os.listdir('/tmp/os-tools/'))

shutil.rmtree('/tmp/os-tools/')

['streams.py', '01-OS-Tools.ipynb', '.ipynb_checkpoints', '01-Exercise.ipynb']


In [21]:
shutil.which('python')

'/home/dmertz/miniconda3/envs/INE/bin/python'

## Module: pathlib

The `pathlib` module contains an abstraction of a "Path" which can be a file or a directory, which has expected methods on those objects.  This module contains modernized interfaces covering almost everything in `os.path`.

In [22]:
home = pathlib.Path('~').expanduser()
home

PosixPath('/home/dmertz')

In [23]:
ine = home / "git" / "INE"
ine

PosixPath('/home/dmertz/git/INE')

One thing we often want to do with a path is match filenames, which we can do without the separate `glob` module this way.

In [24]:
list(ine.glob('**/*.py'))

[PosixPath('/home/dmertz/git/INE/stdlib/01-OS-Tools/streams.py'),
 PosixPath('/home/dmertz/git/INE/regex/src/timeout.py'),
 PosixPath('/home/dmertz/git/INE/regex/src/setup.py'),
 PosixPath('/home/dmertz/git/INE/regex/src/.ipynb_checkpoints/setup-checkpoint.py')]

We can ask questions about the nature of the object the path represents.

In [25]:
streams = next(ine.glob('**/*.py'))
print("Exists?", streams.exists())
print("File?  ", streams.is_file())
print("Dir?   ", streams.is_dir())
print("Link?  ", streams.is_symlink())
print("Socket?", streams.is_socket())

Exists? True
File?   True
Dir?    False
Link?   False
Socket? False


Or inquire into aspects of the path.

In [26]:
print([streams.name, streams.stem, streams.suffix])
list(streams.parents)

['streams.py', 'streams', '.py']


[PosixPath('/home/dmertz/git/INE/stdlib/01-OS-Tools'),
 PosixPath('/home/dmertz/git/INE/stdlib'),
 PosixPath('/home/dmertz/git/INE'),
 PosixPath('/home/dmertz/git'),
 PosixPath('/home/dmertz'),
 PosixPath('/home'),
 PosixPath('/')]

In [27]:
print(streams.parts)
streams.relative_to('/home/dmertz/git')

('/', 'home', 'dmertz', 'git', 'INE', 'stdlib', '01-OS-Tools', 'streams.py')


PosixPath('INE/stdlib/01-OS-Tools/streams.py')

Path objects have methods to read and write content, and to open file handles.

In [28]:
streams.read_bytes()

b'import sys\nprint("Arguments:", sys.argv)\ndata = sys.stdin.read()\nprint("Standard input:", data)\nprint("Standard output", file=sys.stdout)\nprint("Error output", file=sys.stderr)\n'

In [29]:
with streams.open() as fh:
    fh.seek(11)
    print(fh.readline())

print("Arguments:", sys.argv)

