# File handling by Python

![Creative Commons License](https://i.creativecommons.org/l/by/4.0/88x31.png)  
This work by Jephian Lin and Chang-Yu Pan is licensed under a [Creative Commons Attribution 4.0 International License](http://creativecommons.org/licenses/by/4.0/).

## Introduction to command line

Command line interface (CLI)  
provides **efficient** (though not intuitive) ways  
to control the machine.  

#### CLI is more than enough for many basic tasks
For example, Linux or Mac machines  
use `ls`, `cd`, ... to navigate the file systems.  
(MS-DOS uses `dir`, `cd`, ...)  

Various apps are available for high-level tasks:  
Emacs and Vim are text editors,  
`zip` (or `tar`) compresses files,  
`latex` generates high-quality PDFs,  
`ssh` provides a secure connection to a remote server, and  
`git` do the version control.

#### Default interface on many workstations
Graphical user interface (GUI)  
is slower and occupies  
larger bandwitdh of the connection.  

As a result,  
many workstations by default  
only have the CLI,  
while the GUI is limited  
to certain tasks (e.g., YouTube, FTP).

#### CLI allows you to talk to different programming languages
In general, different programming languages  
do not talk to others.  
(However, Python can call R, and vice versa.)

Almost all programming languages  
allows you to store your data  
into a text file (e.g., `txt` or `csv`).

Using CLI,  
you may easily run an R script  
(which generates some files)  
and then ask Python do further tasks  
(based on the generated files).

## Interact with system shell

Use the **subprocess** module  
to open a process in your OS  
and interact with it.

In [1]:
import subprocess

This is a Python code  
which asks for your name  
and say hello to you.

In [2]:
print('Please tell me your name:')
name = input()
print('Hello %s!'%name)

Please tell me your name:
Taiwan
Hello Taiwan!


With the magic method `%%writefile`  
you may write the content of the cell  
into a file.

In [3]:
%%writefile hello.py

print('Please tell me your name:')
name = input()
print('Hello %s!'%name)

Writing hello.py


Now `subprocess.run(command)`  
allows you to run your `command`.  

The `input` keyword allow you to  
provide your input (in `bytes` format).  

The default output will go to the terminal  
(so you cannot see it).  
Use `capture_output=True` to keep the output.

In [4]:
run = subprocess.run(['python', 'hello.py'], input=b'Taiwan', capture_output=True)

The standard input and output  
formats are in `bytes`.

In [5]:
run.stdout

b'Please tell me your name:\nHello Taiwan!\n'

Use `.decode()` to change it to `str`.

In [6]:
print(run.stdout.decode())

Please tell me your name:
Hello Taiwan!



You may try other commands.  
To list the content of the folder  
use `ls` for Linux and Mac,  
or `dir` for Windows.

In [7]:
run = subprocess.run('ls', capture_output=True)
print(run.stdout.decode())

256px-Colored_neural_network.svg.png
256px-SVM_margin.png
Algorithms-data-to-graph.ipynb
Algorithms-k-mean-clustering.ipynb
Algorithms-linear-classifier.ipynb
Algorithms-neural-network-feedforward-and-accuracy.ipynb
Algorithms-searching-algorithms.ipynb
Algorithms-spectral-embedding.ipynb
A-taste-of-data-science.ipynb
A-taste-of-feature-engineering.ipynb
Complexity-sorting-and-vectorization.ipynb
eball.png
File-handling-by-Python.ipynb
hello.py
Introduction-to-NetworkX.ipynb
Introduction-to-scikit-learn.ipynb
kmean.png
kNN.png
LICENSE
linear_classifier.png
NeuralNetwork1.ipynb
README.md
spectral_embedding.png



In [8]:
### remove the file hello.py
### returncode 0 means the process is doen without error
### returncode 1 means there is an error
run = subprocess.run(['rm', 'hello.py'])
run.returncode

0

## Handle files by Python
The **os** module allows you  
to do many basic tasks in Python.  

The syntax is cross-platform.

In [9]:
import os

Use `os.getcwd()`  
to get **current working directory**.

In [10]:
os.getcwd()

'/home/jephian/cache/ModularPython'

Use `os.listdir(path)`  
to **list** the content in `path`.  

The default path is `'.'`  
which means the current working directory.

In [11]:
os.listdir()

['kmean.png',
 'Introduction-to-NetworkX.ipynb',
 'Complexity-sorting-and-vectorization.ipynb',
 'Introduction-to-scikit-learn.ipynb',
 'Algorithms-k-mean-clustering.ipynb',
 'linear_classifier.png',
 '.ipynb_checkpoints',
 'A-taste-of-data-science.ipynb',
 'kNN.png',
 'spectral_embedding.png',
 'File-handling-by-Python.ipynb',
 'NeuralNetwork1.ipynb',
 'Algorithms-data-to-graph.ipynb',
 'A-taste-of-feature-engineering.ipynb',
 '.git',
 '256px-SVM_margin.png',
 'Algorithms-linear-classifier.ipynb',
 'LICENSE',
 'Algorithms-spectral-embedding.ipynb',
 'README.md',
 '256px-Colored_neural_network.svg.png',
 'eball.png',
 'Algorithms-neural-network-feedforward-and-accuracy.ipynb',
 'Algorithms-searching-algorithms.ipynb']

Use `os.mkdir(folder_name)`  
to **create a folder** called `folder_name`.

In [13]:
os.mkdir('folder_1')
os.listdir()

['kmean.png',
 'Introduction-to-NetworkX.ipynb',
 'Complexity-sorting-and-vectorization.ipynb',
 'Introduction-to-scikit-learn.ipynb',
 'Algorithms-k-mean-clustering.ipynb',
 'linear_classifier.png',
 '.ipynb_checkpoints',
 'A-taste-of-data-science.ipynb',
 'kNN.png',
 'spectral_embedding.png',
 'File-handling-by-Python.ipynb',
 'NeuralNetwork1.ipynb',
 'folder_1',
 'Algorithms-data-to-graph.ipynb',
 'A-taste-of-feature-engineering.ipynb',
 '.git',
 '256px-SVM_margin.png',
 'Algorithms-linear-classifier.ipynb',
 'LICENSE',
 'Algorithms-spectral-embedding.ipynb',
 'README.md',
 '256px-Colored_neural_network.svg.png',
 'eball.png',
 'Algorithms-neural-network-feedforward-and-accuracy.ipynb',
 'Algorithms-searching-algorithms.ipynb']

Use `os.chdir(path)`  
to **change working directory** to `path`.

In [14]:
os.chdir('folder_1/')
os.listdir()

[]

Create and write into a file.

In [15]:
f = open('sample.txt','w+')
f.write("This is a line.")
f.close()
os.listdir()

['sample.txt']

Use `os.rename(name1, name2)`  
to **rename** the file.

In [16]:
os.rename('sample.txt','sample_new.txt')
os.listdir()

['sample_new.txt']

`os.rename` can also be used  
to **move** a file.

Here `'..'` stands for the parent folder.

In [17]:
os.chdir('..')
os.rename('folder_1/sample_new.txt','sample.txt')
os.listdir()

['kmean.png',
 'Introduction-to-NetworkX.ipynb',
 'Complexity-sorting-and-vectorization.ipynb',
 'Introduction-to-scikit-learn.ipynb',
 'Algorithms-k-mean-clustering.ipynb',
 'linear_classifier.png',
 '.ipynb_checkpoints',
 'A-taste-of-data-science.ipynb',
 'kNN.png',
 'spectral_embedding.png',
 'File-handling-by-Python.ipynb',
 'NeuralNetwork1.ipynb',
 'folder_1',
 'Algorithms-data-to-graph.ipynb',
 'A-taste-of-feature-engineering.ipynb',
 'sample.txt',
 '.git',
 '256px-SVM_margin.png',
 'Algorithms-linear-classifier.ipynb',
 'LICENSE',
 'Algorithms-spectral-embedding.ipynb',
 'README.md',
 '256px-Colored_neural_network.svg.png',
 'eball.png',
 'Algorithms-neural-network-feedforward-and-accuracy.ipynb',
 'Algorithms-searching-algorithms.ipynb']

Use `os.remove(file_name)`  
to **remove** a file `file_name`.

In [18]:
os.remove('sample.txt')
os.listdir()

['kmean.png',
 'Introduction-to-NetworkX.ipynb',
 'Complexity-sorting-and-vectorization.ipynb',
 'Introduction-to-scikit-learn.ipynb',
 'Algorithms-k-mean-clustering.ipynb',
 'linear_classifier.png',
 '.ipynb_checkpoints',
 'A-taste-of-data-science.ipynb',
 'kNN.png',
 'spectral_embedding.png',
 'File-handling-by-Python.ipynb',
 'NeuralNetwork1.ipynb',
 'folder_1',
 'Algorithms-data-to-graph.ipynb',
 'A-taste-of-feature-engineering.ipynb',
 '.git',
 '256px-SVM_margin.png',
 'Algorithms-linear-classifier.ipynb',
 'LICENSE',
 'Algorithms-spectral-embedding.ipynb',
 'README.md',
 '256px-Colored_neural_network.svg.png',
 'eball.png',
 'Algorithms-neural-network-feedforward-and-accuracy.ipynb',
 'Algorithms-searching-algorithms.ipynb']

Use `os.rmdir(folder_name)`  
to **remove** a folder `folder_name`.

In [19]:
os.rmdir('folder_1')
os.listdir()

['kmean.png',
 'Introduction-to-NetworkX.ipynb',
 'Complexity-sorting-and-vectorization.ipynb',
 'Introduction-to-scikit-learn.ipynb',
 'Algorithms-k-mean-clustering.ipynb',
 'linear_classifier.png',
 '.ipynb_checkpoints',
 'A-taste-of-data-science.ipynb',
 'kNN.png',
 'spectral_embedding.png',
 'File-handling-by-Python.ipynb',
 'NeuralNetwork1.ipynb',
 'Algorithms-data-to-graph.ipynb',
 'A-taste-of-feature-engineering.ipynb',
 '.git',
 '256px-SVM_margin.png',
 'Algorithms-linear-classifier.ipynb',
 'LICENSE',
 'Algorithms-spectral-embedding.ipynb',
 'README.md',
 '256px-Colored_neural_network.svg.png',
 'eball.png',
 'Algorithms-neural-network-feedforward-and-accuracy.ipynb',
 'Algorithms-searching-algorithms.ipynb']

If a folder is not empty,  
you actually have to **remove the whole directory tree**.

In [None]:
import shutil
shutil.rmtree('folder_1/')
os.listdir()