In [None]:
"""
Using the filesystem

1. Working with files involves one of two things: basic I/O (Reading and writing files) and working with the filesystem (for example,
   naming, creating, moving, or referring to files), which is a bit tricky, because different operating systems have different filesystem
   conventions.

2. It would be easy enough to learn how to perform basic file I/O without learning all the features Python has provided to simplify
   cross-platform filesystem interaction—but I wouldn’t recommend it. Instead, the first part of this chapter gives you the tools you
   need to refer to files in a manner that doesn’t depend on your particular operating system. Then, when you use the basic I/O
   operations, you can open the relevant files in this manner. 
"""

In [13]:
"""
Path (1)

1. The traditional way that file paths and filesystem operations have been handled in Python is by using functions included in the os
   and os.path modules. These functions have worked well enough but often resulted in more verbose code than necessary. Since Python 3.5,
   a new library, pathlib, has been added; it offers a more object-oriented and more unified way of doing the same operations.

2. All operating systems refer to files and directories with strings naming a given file or directory. Strings used in this manner are
   usually called pathnames (or sometimes just paths). The fact that pathnames are strings introduces possible complications into working
   with them.

3. Pathname semantics across operating systems are very similar because the filesystem on almost all operating systems is modeled as a tree
   structure, with a disk being the root and folders, subfolders, and so on being branches, subbranches, and so on. This means that most
   operating systems refer to a specific file in fundamentally the same manner: with a pathname that specifies the path to follow from
   the root of the filesystem tree (the disk) to the file in question. This pathname consists of a series of folders to descend into to
   get to the desired file.

4. Different operating systems have different conventions regarding the precise syntax of pathnames. The character used to separate
   sequential file or directory names in a Linux/UNIX pathname is /, whereas the character used to separate file or directory names in a
   Windows pathname is \. In addition, the UNIX filesystem has a single root, whereas the Windows filesystem has a separate root for
   each drive, labeled A:\, B:\, C:\, and so forth. Because of these differences, files have different pathname representations on different
   operating systems. A file called C:\data\myfile in MS Windows might be called /data/myfile on UNIX and on the Mac OS.

5. These operating systems allow two types of pathnames:
   -- Absolute Pathnames: specify the exact location of a file in a filesystem without any ambiguity; they do this by listing the entire path
      to that file, starting from the root of the filesystem.
   -- Relative Pathnames: specify the position of a file relative to some other point in the filesystem, and that other point isn't specified
      in the relative pathname itself; instead, the absolute starting point for relative pathnames is provided by the context in which
      they’re used.
"""
import os

# The anchor for that relative path is the current working directory, and the result of the command is a list of the filenames
# in the directory whose path is formed by appending the current working directory with the relative path argument
relative_path = 'data'
print(os.listdir(relative_path))

# Get current working directory
print(os.getcwd())

['word_count.txt', '.ipynb_checkpoints']
/data/home/zhangmu/workspace/workspace_python/quick_python


In [22]:
"""
Path(2)

1. The constant os.curdir returns whatever string your system happens to use as the same directory indicator. On both UNIX and Windows,
   the current directory is represented as a single dot, but to keep your programs portable, you should always use os.curdir instead of
   typing just the dot. This string is a relative path, meaning that os.listdir appends it to the path for the current working directory,
   giving the same path.

2. The os.path.join function interprets its arguments as a series of directory names or filenames, which are to be joined to form a single
   string understandable as a relative path by the underlying operating system. In other words, os.path.join lets you form file paths from
   a sequence of directory or filenames without any worry about the conventions of the underlying operating system.

3. In Linux/UNIX, an absolute path always begins with a / (because a single slash denotes the topmost directory of the entire system,
   which contains everything else, including the various floppy and CD drives that might be available). A relative path in UNIX is any
   legal path that does not begin with a slash.

4. The rules for Windows paths:
   -- A pathname beginning with a drive letter followed by a colon and a backslash, then a path is an absolute path: C:\Program Files\Doom. 
   -- A pathname beginning with neither a drive letter nor a backslash is a relative path: mydirectory\letters\business
   -- A pathname beginning with \\ followed by the name of a server is the path to a network resource
   -- Anything else can be considered to be an invalid pathname
"""
# Returns a list of all the files or folders inside the current working directory
print(os.curdir)
print(os.listdir(os.curdir))

# Python moves into the folder specified as an argument of the os.chdir function
os.chdir('/data/home/zhangmu/')
print(os.getcwd())
os.chdir('/data/home/zhangmu/workspace/workspace_python/quick_python')

# There’s no way for pathlib to change the current directory in the way that os.chdir() does
import pathlib
cur_path = pathlib.Path()
print(cur_path.cwd())

# Construct a few pathnames on different operating systems, using the os.path.join function
print(os.path.join('bin', 'utils', 'disktools'))

# The arguments to os.path.join need not be a single directory or filename; they may also be subpaths that are then joined to
# make a longer pathname. 
path1 = os.path.join('mydir', 'bin')
path2 = os.path.join('utils', 'disktools', 'chkdist')
print(os.path.join(path1, path2))

.
['5) dictionary.ipynb', '9) python programs', '3) list tuple set.ipynb', '8) modules and scoping rules.ipynb', '1) quick overview.ipynb', '4) strings.ipynb', 'module', 'data', '10) using the filesystem.ipynb', '2) absolute basics.ipynb', '.ipynb_checkpoints', '6) control flow.ipynb', '7) function.ipynb']
/data/home/zhangmu
/data/home/zhangmu/workspace/workspace_python/quick_python
bin/utils/disktools
mydir/bin/utils/disktools/chkdist


In [44]:
"""
Path (3)

1. Regardless of the operating system used, the os.path.join command doesn’t perform sanity checks on the names it’s constructing. It's
   possible to construct pathnames containing characters that, according to your OS, are forbidden in pathnames. If such checks are
   a requirement, probably the best solution is to write a small path-validitychecker function yourself. 

2. Other functions:
   -- os.path.commonprefix(path1, path2, ...) finds the common prefix for a set of paths. This technique is useful if you want to find
      the lowest-level directory that contains every file in a set of files.
   -- os.path.expanduser expands username shortcuts in paths, such as for UNIX. 
   -- os.path.expandvars does the same for environment variables.

3. You can access several useful path-related constants and functions to make your Python code more system-independent than it
   otherwise would be. The most basic of these constants are os.curdir and os.pardir, which respectively define the symbol used by
   the operating system for the directory and parent directory path indicators. The os.name constant returns the name of the Python module
   imported to handle the operating system–specific details.

4. All your environment variables and the values associated with them are available in a dictionary called os.environ. On most operating
   systems, this directory includes variables related to paths—typically, search paths for binaries and so forth. 
"""

# The os.path.split command returns a two-element tuple splitting the basename of a path (the single file or directory name
# at the end of the path) from the rest of the path.
print(os.path.split('/data/home/zhangmu/workspace/workspace_python/quick_python'))

# The os.path.basename function returns only the basename of the path
print(os.path.basename(os.path.join('some', 'directory', 'path.jpg')))

# The os.path.dirname function returns the path up to but not including the last name
print(os.path.dirname(os.path.join('some', 'directory', 'path.jpg')))

# To handle the dotted extension notation used by most filesystems to indicate file type
print(os.path.splitext(os.path.join('some', 'directory', 'path.jpg')))

from pathlib import Path

cur_path = Path()
print(cur_path.joinpath('bin', 'utils', 'disktools'))
print(cur_path / 'bin' / 'utils' / 'disktools')

# The parts property returns a tuple of all the components of a path;
# The name property returns only the basename of the path;
# The parent property returns the path up to but not including the last name;
# The suffix property handles the dotted extension notation used by most filesystems to indicate file type
a_path = Path('bin/utils/disktools/x.jpg')
print(a_path.parts)
print(a_path.name)
print(a_path.parent)
print(a_path.suffix)

# The most basic constants
print(os.pardir, os.curdir)
path = '/data/home/zhangmu'
print(os.path.isdir(os.path.join(path, os.pardir, os.curdir)))

# os.curdir is a relative path, so os.listdir always takes relative paths as being relative to the current working directory
print(os.listdir(os.curdir))

import os
import sys
print(os.name)
print(sys.platform)
#print(os.environ)

('/data/home/zhangmu/workspace/workspace_python', 'quick_python')
path.jpg
some/directory
('some/directory/path', '.jpg')
bin/utils/disktools
bin/utils/disktools
('bin', 'utils', 'disktools', 'x.jpg')
x.jpg
bin/utils/disktools
.jpg
.. .
True
['5) dictionary.ipynb', '9) python programs', '3) list tuple set.ipynb', '8) modules and scoping rules.ipynb', '1) quick overview.ipynb', '4) strings.ipynb', 'module', 'data', '10) using the filesystem.ipynb', '2) absolute basics.ipynb', '.ipynb_checkpoints', '6) control flow.ipynb', '7) function.ipynb']
posix
linux


In [47]:
"""
Getting information about files

1. File paths are supposed to indicate actual files and directories on your hard drive.

2. The most commonly used Python path-information functions are os.path.exists, os.path.isfile, and os.path.isdir, all of which take
   a single path as an argument:
   -- os.path.exists returns True if its argument is a path corresponding to something that exists in the filesystem.
   -- os.path.isfile returns True if and only if the path it’s given indicates a normal data file of some sort (executables
      fall under this heading), and it returns False otherwise, including the possibility that the path argument doesn’t indicate
      anything in the filesystem.
   -- os.path.isdir returns True if and only if its path argument indicates a directory; it returns False otherwise.

3. Several similar functions provide more specialized queries:
   -- os.path.islink and os.path.ismount are useful in the context of Linux and other UNIX operating systems that provide file links
      and mount points; they return True if, respectively, a path indicates a file that’s a link or a mount point.
   -- os.path.islink does not return True on Windows shortcuts files (files ending with .lnk), for the simple reason that such
      files aren’t true links. However, os.path.islink returns True on Windows systems for true symbolic links created with the mklink()
      command. The OS doesn’t assign them a special status, and programs can’t transparently use them as though they were the actual file. 
   -- os.path.samefile(path1, path2) returns True if and only if the two path arguments point to the same file.
   -- os.path.isabs(path) returns True if its argument is an absolute path; it returns False otherwise.
   -- os.path.getsize(path), os.path.getmtime(path), and os.path.getatime(path) return the size, last modify time, and last access
      time of a pathname, respectively. 

4. In addition to the os.path functions listed, you can get more complete information about the files in a directory by using os.scandir,
   which returns an iterator of os.DirEntry objects. os.DirEntry objects expose the file attributes of a directory entry, so using os.scandir
   can be faster and more efficient than combining os.listdir (discussed in the next section) with the os.path operations.

5. os.DirEntry objects have methods that correspond to the os.path functions mentioned in the previous section, including exists,
   is_dir, is_file, is_socket, and is_symlink.
"""
import os

print(os.path.exists("/data/home/zhangmu/workspace/"))
print(os.path.isdir("/data/home/zhangmu/workspace/locals"))
print(os.path.isfile("/data/home/zhangmu/workspace/locals/g"))

# os.scandir also supports a context manager using with, and using one is recommended to ensure resources are properly disposed of. 
with os.scandir(".") as my_dir:
    for entry in my_dir:
        print(entry.name, entry.is_file())

True
False
False
5) dictionary.ipynb True
9) python programs False
3) list tuple set.ipynb True
8) modules and scoping rules.ipynb True
1) quick overview.ipynb True
4) strings.ipynb True
module False
data False
10) using the filesystem.ipynb True
2) absolute basics.ipynb True
.ipynb_checkpoints False
6) control flow.ipynb True
7) function.ipynb True


In [13]:
"""
More filesystem operations

1. The glob function from the glob module expands Linux/UNIX shell-style wildcard characters and character sequences in a pathname,
   returning the files in the current working directory that match. A * matches any sequence of characters. A ? matches any single character.
   A character sequence ([h,H] or [0-9]) matches any single character in that sequence.

2. You can’t use os.remove to delete directories. This restriction is a safety feature, to ensure that you don’t accidentally delete
   an entire directory substructure.

3. To create a directory, use os.makedirs or os.mkdir. The difference between them is that os.mkdir doesn’t create any necessary
   intermediate directories, but os.makedirs does.

4. To remove a directory, use os.rmdir. This function removes only empty directories. Attempting to use it on a nonempty directory raises
   an exception. 
"""
import glob
import os

print(glob.glob("*"))
print(glob.glob("*.ipynb"))

# rename & remove
os.rename('data/registery.txt', 'data/registery.txt.bk')
os.remove('data/book.tmp')
os.listdir('data')

# create a directory
os.makedirs('data/mydir')
os.listdir('data')

os.rmdir('data/mydir')
os.listdir('data')

['5) dictionary.ipynb', '9) python programs', '3) list tuple set.ipynb', '8) modules and scoping rules.ipynb', '1) quick overview.ipynb', '4) strings.ipynb', 'module', 'data', '10) using the filesystem.ipynb', '2) absolute basics.ipynb', '6) control flow.ipynb', '7) function.ipynb']
['5) dictionary.ipynb', '3) list tuple set.ipynb', '8) modules and scoping rules.ipynb', '1) quick overview.ipynb', '4) strings.ipynb', '10) using the filesystem.ipynb', '2) absolute basics.ipynb', '6) control flow.ipynb', '7) function.ipynb']


['registery.txt.bk', 'word_count.txt', '.ipynb_checkpoints']

In [36]:
"""
More filesystem operations with pathlib

1. Path objects have most of the same methods mentioned earlier. Some differences exist, however. The iterdir method is similar to
   the os.path.listdir function except that it returns an iterator of paths rather than a list of strings. Note that in a Windows
   environment, the paths returned are WindowsPath objects, whereas on Mac OS or Linux, they’re PosixPath objects.

2. The pathlib path objects also have a glob method built in, which again returns not a list of strings but an iterator of path objects.
   Otherwise, this function behaves very much like the glob.glob function.

3. Note that as with os.remove, you can’t use the unlink method to delete directories. This restriction is a safety feature, to ensure
   that you don’t accidentally delete an entire directory substructure.

4. To create a directory by using a path object, use the path object’s mkdir method. If you give the mkdir method a parents=True parameter,
   it creates any necessary intermediate directories; otherwise, it raises a FileNotFoundError if an intermediate directory isn’t there.
"""

# iterdir
from pathlib import Path

cur_path = Path()
data_path = cur_path.joinpath('/data', 'home', 'zhangmu', 'workspace', 'workspace_python', 'quick_python', 'data')
print(list(data_path.iterdir()))

# glob
print(list(new_path.glob("*")))
print(list(new_path.glob("*.txt")))

# rename
old_path = Path('data','registery.txt')
new_path = Path('data','registery.txt.bk')
old_path.rename(new_path)
print(list(data_path.iterdir()))

# remove
more_new_path = Path ('data', 'mydir')
more_new_path.mkdir(parents=True)
print(list(data_path.iterdir()))

more_new_path = Path ('data', 'mydir')
more_new_path.rmdir()
print(list(data_path.iterdir()))

[PosixPath('/data/home/zhangmu/workspace/workspace_python/quick_python/data/word_count.txt'), PosixPath('/data/home/zhangmu/workspace/workspace_python/quick_python/data/.ipynb_checkpoints'), PosixPath('/data/home/zhangmu/workspace/workspace_python/quick_python/data/registery.txt')]
[]
[]
[PosixPath('/data/home/zhangmu/workspace/workspace_python/quick_python/data/registery.txt.bk'), PosixPath('/data/home/zhangmu/workspace/workspace_python/quick_python/data/word_count.txt'), PosixPath('/data/home/zhangmu/workspace/workspace_python/quick_python/data/.ipynb_checkpoints')]
[PosixPath('/data/home/zhangmu/workspace/workspace_python/quick_python/data/mydir'), PosixPath('/data/home/zhangmu/workspace/workspace_python/quick_python/data/registery.txt.bk'), PosixPath('/data/home/zhangmu/workspace/workspace_python/quick_python/data/word_count.txt'), PosixPath('/data/home/zhangmu/workspace/workspace_python/quick_python/data/.ipynb_checkpoints')]
[PosixPath('/data/home/zhangmu/workspace/workspace_pyth

In [37]:
"""
Processing all files in a directory subtree

1. A highly useful function for traversing recursive directory structures is the os.walk function. You can use it to walk through an entire
   directory tree, returning three things for each directory it traverses: the root, or path, of that directory; a list of its subdirectories;
   and a list of its files.

2. The os.walk is called with the path of the starting, or top, directory and can have three optional arguments, 
   os.walk(directory, topdown=True, onerror=None, followlinks= False):
   -- directory is a starting directory path;
   -- if topdown is True or not present, the files in each directory are processed before its subdirectories, resulting in a listing that
      starts at the top and goes down; whereas if topdown is False, the subdirectories of each directory are processed first, giving a
      bottom-up traversal of the tree.
   -- onerror parameter can be set to a function to handle any errors that result from calls to os.listdir, which are ignored by default.
   -- os.walk by default doesn’t walk down into folders that are symbolic links unless you give it the followlinks=True parameter

3. When called, os.walk creates an iterator that recursively applies itself to all the directories contained in the top parameter. In other
   words, for each subdirectory subdir in names, os.walk recursively invokes a call to itself, of the form os.walk(subdir, ...). 
"""
for root, dirs, files in os.walk(os.curdir):
    print("{0} has {1} files".format(root, len(files)))
    if ".git" in dirs:
        dirs.remove(".git") 

. has 9 files
./9) python programs has 14 files
./module has 3 files
./module/__pycache__ has 3 files
./data has 2 files
./data/.ipynb_checkpoints has 0 files
./.ipynb_checkpoints has 9 files


In [None]:
"""
Others

1. To remove nonempty directories, use the shutil.rmtree function. It recursively removes all files in a directory tree.

2. The copytree function of the shutil module recursively makes copies of all the files in a directory and all of its subdirectories,
   preserving permission mode and stat (that is, access/modify times) information. shutil also has the already-mentioned rmtree function
   for removing a directory and all of its subdirectories, as well as several functions for making copies of individual files. 
"""