# Introduction to File Manipulation

- Operating system interaction: `os`
- File manipulation: `os.path`
- File Open / Close / Read

## Linux shell commands

- To call shell from a Notebook is to prepend an exclamation point to a shell command
- This is not python standard! Just Jupyter notebooks!
- Note the scope of the operation: Only cell wise!

In [4]:
# Current Directory
!pwd

/home/rhf/git/IPythonNotebookTutorial/notebooks


In [7]:
# Let's change our directory
!cd ..

In [8]:
# Current Directory
!pwd

/home/rhf/git/IPythonNotebookTutorial/notebooks


In [10]:
# We can chain Linux commands
!pwd
!ls

/home/rhf/git/IPythonNotebookTutorial/notebooks
'0.0 jupyter_notebooks_introduction.ipynb'	 '4 - 2D Imaging - FITS.ipynb'
'1.0 - jupyter dashboard.ipynb'			 '4 - 2D Imaging - TIFF.ipynb'
'1.1 - jupyter not a book.ipynb'		 '5 - Gaussian Fitting.ipynb'
'1.2 - markdown syntax.ipynb'			 '6 - HDF files.ipynb'
'1.3 - introduction to python.ipynb'		 '7 - widgets.ipynb'
'2.0 - Introduction to File Manipulation.ipynb'   Data
'2.1 - Introduction to Plotting.ipynb'		  hdfview.png
'3 - Multiple plots - Glob.ipynb'		  Mantid.ipynb


## Python `OS` module

- In python files (scrpts) the `!` does not work.
- Use the `os` module

In [14]:
# import the module
import os

In [15]:
# Current Directory
os.getcwd()

'/home/rhf/git/IPythonNotebookTutorial/notebooks'

In [16]:
# Let's change our directory
os.chdir('..')

- the scope of `os` it's not the cell but the notebook

In [18]:
# Current Directory
os.getcwd()

'/home/rhf/git/IPythonNotebookTutorial'

In [19]:
# Let's get back to our directory
os.chdir('notebooks')

In [20]:
# Let's list the contents of `notebooks`
os.listdir()

['.ipynb_checkpoints',
 '4 - 2D Imaging - FITS.ipynb',
 '1.3 - introduction to python.ipynb',
 '3 - Multiple plots - Glob.ipynb',
 'Data',
 '6 - HDF files.ipynb',
 '2.1 - Introduction to Plotting.ipynb',
 '1.2 - markdown syntax.ipynb',
 '5 - Gaussian Fitting.ipynb',
 '1.1 - jupyter not a book.ipynb',
 '0.0 jupyter_notebooks_introduction.ipynb',
 '1.0 - jupyter dashboard.ipynb',
 'hdfview.png',
 '7 - widgets.ipynb',
 'Mantid.ipynb',
 '2.0 - Introduction to File Manipulation.ipynb',
 '4 - 2D Imaging - TIFF.ipynb']

## File Manipulation: `os.path` module

In [22]:
# let's import our module
import os.path

In [25]:
# Let's pick one file
file_path = "Data/Glob/f1.txt"
file_path

'Data/Glob/f1.txt'

- Alternativelly we can use `os.path.join`.
- Remember the folder separation bar in Windows vs Linux!

In [26]:
file_path = os.path.join("Data", "Glob", "f1.txt")
file_path

'Data/Glob/f1.txt'

In [27]:
# Check if the file exists
os.path.exists(file_path)

False

In [31]:
# Let's change the path
file_path = os.path.join("..", "Data", "Glob", "f1.txt")
file_path

'../Data/Glob/f1.txt'

In [32]:
# Check if the file exists again!!
os.path.exists(file_path)

True

In [33]:
# Get File name
os.path.basename(file_path)

'f1.txt'

In [35]:
# Get Full path
os.path.abspath(file_path)

'/home/rhf/git/IPythonNotebookTutorial/Data/Glob/f1.txt'

In [36]:
# Get the firectory where the file is
os.path.dirname(file_path)

'../Data/Glob'

In [37]:
# We were expecting something different????
# Absolute directory where the file is
os.path.dirname(
    os.path.abspath(file_path)
)

'/home/rhf/git/IPythonNotebookTutorial/Data/Glob'

- There are plenty of methods available in the `os.path` module

In [39]:
# show what is available in os.path: <tab> or dir(...)
dir(os.path)

['__all__',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 '_get_sep',
 '_joinrealpath',
 '_varprog',
 '_varprogb',
 'abspath',
 'altsep',
 'basename',
 'commonpath',
 'commonprefix',
 'curdir',
 'defpath',
 'devnull',
 'dirname',
 'exists',
 'expanduser',
 'expandvars',
 'extsep',
 'genericpath',
 'getatime',
 'getctime',
 'getmtime',
 'getsize',
 'isabs',
 'isdir',
 'isfile',
 'islink',
 'ismount',
 'join',
 'lexists',
 'normcase',
 'normpath',
 'os',
 'pardir',
 'pathsep',
 'realpath',
 'relpath',
 'samefile',
 'sameopenfile',
 'samestat',
 'sep',
 'split',
 'splitdrive',
 'splitext',
 'stat',
 'supports_unicode_filenames',
 'sys']

### Exercise

```
/SNS/lustre/EXAMPLES/IPythonNotebookTutorial/Data/Glob/f1.txt
                                                       ------ 1
------------------------------------------------------------- 2
                                            -----------       3
-------------------------------------------------------       4                                                    
```

1. Get only the file name of file_path
2. Get the full path of file_path
3. Get the directory where the file_path is
4. Absolute directory where file_path is

In [52]:
# 1. File name


In [53]:
# 2. Full path


In [54]:
# 3. Relative directory where the file is


In [38]:
# 4. Absolute directory where the file is


In [None]:
# Bonus: Check if it is a file


How to get the **file extension**:

In [41]:
# Using: os.path.splitext
prefix, suffix = os.path.splitext(file_path)
print("prefix = {}; suffix = {}; suffix without '.' = {}.".format(prefix, suffix, suffix[1:]))

prefix = ../Data/Glob/f1; suffix = .txt; suffix without '.' = txt.


In [43]:
# Using: string ssplit
filename_only = os.path.basename(file_path)
print(filename_only)
print(filename_only.split("."))
print(filename_only.split(".")[-1])

f1.txt
['f1', 'txt']
txt


# File Management

In [44]:
# Open file, read file, close file

In [45]:
# Find in file

In [46]:
# Find in file and use string split substitution

# XML Example

In [47]:
#
import xml.etree.ElementTree as ET

In [54]:
# Let's see where we are:
os.getcwd()

'/home/rhf/git/IPythonNotebookTutorial/notebooks'

In [61]:
file_name = "../Data/XML/f1.xml"

In [62]:
os.path.isfile(file_name)


True

Let's get the tag `root`:
```
<root> 
    <tag1>xxxx</tag1>
</root>
```

In [64]:
tree = ET.parse(file_name)
root = tree.getroot()

In [66]:
!head ../Data/XML/f1.xml

<SPICErack SPICE_version="1.7" filename="CG2_exp346_scan0001_0001.xml" start_time="2018-11-09 14:33:20" end_time="2018-11-09 14:34:21">
  <Header>
    <Instrument>CG2</Instrument>
    <Start_Time>2018-11-09 14:33:20</Start_Time>
    <End_Time>2018-11-09 14:34:21</End_Time>
    <Experiment_Title>482 EOC</Experiment_Title>
    <Experiment_number type="INT32">346</Experiment_number>
    <IPTS_number>0000</IPTS_number>
    <Cycle_Number>482</Cycle_Number>
    <Command>scan preset time 60</Command>


In [70]:
element = root.find('Header/Experiment_number')

In [71]:
element.text

'346'