# File handling by Python

![Creative Commons License](https://i.creativecommons.org/l/by/4.0/88x31.png)  
This work by Jephian Lin and Chang-Yu Pan is licensed under a [Creative Commons Attribution 4.0 International License](http://creativecommons.org/licenses/by/4.0/).

## Introduction to command line

Command line interface (CLI)  
provides **efficient** (though not intuitive) ways  
to control the machine.  

#### CLI is more than enough for many basic tasks
For example, Linux or Mac machines  
use `ls`, `cd`, ... to navigate the file systems.  
(MS-DOS uses `dir`, `cd`, ...)  

Various apps are available for high-level tasks:  
Emacs and Vim are text editors,  
`zip` (or `tar`) compresses files,  
`latex` generates high-quality PDFs,  
`ssh` provides a secure connection to a remote server, and  
`git` do the version control.

#### Default interface on many workstations
Graphical user interface (GUI)  
is slower and occupies  
larger bandwitdh of the connection.  

As a result,  
many workstations by default  
only have the CLI,  
while the GUI is limited  
to certain tasks (e.g., YouTube, FTP).

#### CLI allows you to talk to different programming languages
In general, different programming languages  
do not talk to others.  
(However, Python can call R, and vice versa.)

Almost all programming languages  
allows you to store your data  
into a text file (e.g., `txt` or `csv`).

Using CLI,  
you may easily run an R script  
(which generates some files)  
and then ask Python do further tasks  
(based on the generated files).

## Interact with system shell

Use the **subprocess** module  
to open a process in your OS  
and interact with it.

In [91]:
import subprocess

This is a Python code  
which asks for your name  
and say hello to you.

In [84]:
print('Please tell me your name:')
name = input()
print('Hello %s!'%name)

Please tell me your name:
Taiwan
Hello Taiwan!


With the magic method `%%writefile`  
you may write the content of the cell  
into a file.

In [85]:
%%writefile hello.py

print('Please tell me your name:')
name = input()
print('Hello %s!'%name)

Writing hello.py


Now `subprocess.run(command)`  
allows you to run your `command`.  

The `input` keyword allow you to  
provide your input (in `bytes` format).  

The default output will go to the terminal  
(so you cannot see it).  
Use `capture_output=True` to keep the output.

In [86]:
run = subprocess.run(['python', 'hello.py'], input=b'Taiwan', capture_output=True)

The standard input and output  
formats are in `bytes`.

In [87]:
run.stdout

b'Please tell me your name:\nHello Taiwan!\n'

Use `.decode()` to change it to `str`.

In [88]:
print(run.stdout.decode())

Please tell me your name:
Hello Taiwan!



You may try other commands.  
To list the content of the folder  
use `ls` for Linux and Mac,  
or `dir` for Windows.

In [89]:
run = subprocess.run('ls', capture_output=True)
print(run.stdout.decode())

137669277.docx
4034658.pdf
A2-1-Linear-algebra-k-mean-clustering.pdf
AerLingusReceipt2-2U3ML6.pdf
AerLingusReceipt.pdf
AG84.pdf
AK97.pdf
BLS07.pdf
Chladni.ipynb
fernandes2009.pdf
File-handling-by-Python.ipynb
gssp
hello.py
HLS_IEPG-ZF_part-draft.pdf
ILAS2020-abstract.pdf
ILAS2020-abstract.tex
JephianLin.aux
JephianLin.log
JephianLin.out
JephianLin.pdf
JephianLin.tex
JOC1901-002RA0.pdf
JT2020.pdf
learnc
Manual
MRChen-REU.pdf
Pauline-Jephian
photos
Photos (1).zip
Photos.zip
research
Response_to_reviewers.tex
sdsafe
ssp
ssp-0210.zip
ssp-1227.zip
ssp-1229.zip
ssp (1).pdf
ssp (1).tex
ssp-active.pdf
ssp.bib
ssp.pdf
ssp.tex
test
test2
Victoria-pix
website
xgboost.ipynb
Zpower.ipynb



In [90]:
### remove the file hello.py
### returncode 0 means the process is doen without error
### returncode 1 means there is an error
run = subprocess.run(['rm', 'hello.py'])
run.returncode

0

## Handle files by Python
The **os** module allows you  
to do many basic tasks in Python.  

The syntax is cross-platform.

In [94]:
import os

Use `os.getcwd()`  
to get **current working directory**.

In [95]:
os.getcwd()

'/home/jephian/Downloads'

Use `os.listdir(path)`  
to **list** the content in `path`.  

The default path is `'.'`  
which means the current working directory.

In [98]:
os.listdir()

['ssp.bib',
 'MRChen-REU.pdf',
 'A2-1-Linear-algebra-k-mean-clustering.pdf',
 'Zpower.ipynb',
 'JephianLin.tex',
 'AerLingusReceipt2-2U3ML6.pdf',
 'JephianLin.pdf',
 'Photos (1).zip',
 'Victoria-pix',
 'ssp-0210.zip',
 'test',
 'photos',
 'BLS07.pdf',
 'Manual',
 'Chladni.ipynb',
 'ssp (1).tex',
 '137669277.docx',
 '.ipynb_checkpoints',
 'ssp',
 'JephianLin.log',
 'xgboost.ipynb',
 'ILAS2020-abstract.pdf',
 'Pauline-Jephian',
 'File-handling-by-Python.ipynb',
 'fernandes2009.pdf',
 'ILAS2020-abstract.tex',
 '..html',
 'ssp.pdf',
 'JT2020.pdf',
 'ssp (1).pdf',
 'JephianLin.out',
 'sdsafe',
 'Photos.zip',
 'ssp-1227.zip',
 'AK97.pdf',
 '4034658.pdf',
 'test2',
 '._files',
 'research',
 'ssp-1229.zip',
 'gssp',
 'learnc',
 'AerLingusReceipt.pdf',
 'JephianLin.aux',
 'JOC1901-002RA0.pdf',
 'Response_to_reviewers.tex',
 'HLS_IEPG-ZF_part-draft.pdf',
 'website',
 'ssp-active.pdf',
 'AG84.pdf',
 'ssp.tex']

Use `os.mkdir(folder_name)`  
to **create a folder** called `folder_name`.

In [99]:
os.mkdir('folder_1')
os.listdir()

['ssp.bib',
 'MRChen-REU.pdf',
 'A2-1-Linear-algebra-k-mean-clustering.pdf',
 'Zpower.ipynb',
 'JephianLin.tex',
 'AerLingusReceipt2-2U3ML6.pdf',
 'JephianLin.pdf',
 'Photos (1).zip',
 'Victoria-pix',
 'ssp-0210.zip',
 'test',
 'photos',
 'folder_2',
 'BLS07.pdf',
 'Manual',
 'Chladni.ipynb',
 'ssp (1).tex',
 '137669277.docx',
 '.ipynb_checkpoints',
 'ssp',
 'JephianLin.log',
 'xgboost.ipynb',
 'ILAS2020-abstract.pdf',
 'Pauline-Jephian',
 'File-handling-by-Python.ipynb',
 'fernandes2009.pdf',
 'folder_1',
 'ILAS2020-abstract.tex',
 '..html',
 'ssp.pdf',
 'JT2020.pdf',
 'ssp (1).pdf',
 'JephianLin.out',
 'sdsafe',
 'Photos.zip',
 'ssp-1227.zip',
 'AK97.pdf',
 '4034658.pdf',
 'test2',
 '._files',
 'research',
 'ssp-1229.zip',
 'gssp',
 'learnc',
 'AerLingusReceipt.pdf',
 'JephianLin.aux',
 'JOC1901-002RA0.pdf',
 'Response_to_reviewers.tex',
 'HLS_IEPG-ZF_part-draft.pdf',
 'website',
 'ssp-active.pdf',
 'AG84.pdf',
 'ssp.tex']

Use `os.chdir(path)`  
to **change working directory** to `path`.

In [100]:
os.chdir('folder_1/')
os.listdir()

[]

Create and write into a file.

In [8]:
f = open('sample.txt','w+')
f.write("This is a line.")
f.close()
os.listdir()

['sample.txt']

Use `os.rename(name1, name2)`  
to **rename** the file.

In [10]:
os.rename('sample.txt','sample_new.txt')
os.listdir()

['sample.txt', 'sample_2.txt']

`os.rename` can also be used  
to **move** a file.

Here `'..'` stands for the parent folder.

In [11]:
os.chdir('..')
os.rename('folder_1/sample_new.txt','sample.txt')
os.listdir()

['.ipynb_checkpoints',
 'File-handling-by-Python.ipynb',
 'folder_1',
 'folder_2',
 'sample.txt',
 'Untitled.ipynb']

Use `os.remove(file_name)`  
to **remove** a file `file_name`.

In [12]:
os.remove('sample.txt')
os.listdir()

['.ipynb_checkpoints',
 'File-handling-by-Python.ipynb',
 'folder_1',
 'folder_2',
 'Untitled.ipynb']

Use `os.rmdir(folder_name)`  
to **remove** a folder `folder_name`.

In [13]:
os.rmdir('folder_1')
os.listdir()

['.ipynb_checkpoints',
 'File-handling-by-Python.ipynb',
 'folder_1',
 'Untitled.ipynb']

If a folder is not empty,  
you actually have to **remove the whole directory tree**.

In [14]:
import shutil
shutil.rmtree('folder_1/')
os.listdir()

['.ipynb_checkpoints', 'File-handling-by-Python.ipynb', 'Untitled.ipynb']