# Introduction to OS and ASCII + XML File Manipulation

- Operating system interaction: `os`
- File manipulation: `os.path`
- File Open / Close / Read

## Linux shell commands

- To call shell from a Notebook is to prepend an exclamation point to a shell command
- This is not python standard! Just Jupyter notebooks!
- Note the scope of the operation: Only cell wise!

**NB**: This is only avaible in the notebooks, not in python scripts!

In [1]:
# Current Directory
!pwd

/Users/rhf/git/IPythonNotebookTutorial/notebooks


In [2]:
# Let's change our directory
!cd ..

In [3]:
# Current Directory
!pwd

/Users/rhf/git/IPythonNotebookTutorial/notebooks


Note that the scope of the Linux commands started by `!` is the **cell** and not the entire notebook!

More about this in the `os` module,

In [1]:
# We can chain Linux commands
!pwd
!echo "---------------------------------------------------------"
!ls

/Users/rhf/git/IPythonNotebookTutorial/notebooks
---------------------------------------------------------
0.0 - jupyter_notebooks_introduction.ipynb
1.0 - jupyter dashboard.ipynb
1.1 - jupyter not a book.ipynb
1.2 - markdown syntax.ipynb
1.3 - introduction to python - part 1.ipynb
1.3 - introduction to python - part2.ipynb
2.0 - Introduction to File Manipulation.ipynb
2.1 - Introduction to Plotting.ipynb
3 - Multiple plots - Glob.ipynb
4 - 2D Imaging - FITS.ipynb
4 - 2D Imaging - TIFF.ipynb
5 - Gaussian Fitting.ipynb
6 - HDF files.ipynb
7 - widgets.ipynb
[35mData[m[m
Mantid.ipynb
hdfview.png


## Python `OS` module

- In python files (scrpts) the `!` does not work.
- Use the `os` module instead.

In [2]:
# import the module
import os

In [3]:
# Current Directory
os.getcwd()

'/Users/rhf/git/IPythonNotebookTutorial/notebooks'

In [4]:
# Let's change our directory
os.chdir('..')

- the scope of `os` it's not the cell but the notebook

In [5]:
# Current Directory
os.getcwd()

'/Users/rhf/git/IPythonNotebookTutorial'

In [6]:
# Let's get back to our directory
os.chdir('notebooks')

In [7]:
# Let's list the contents of `notebooks`
os.listdir()

['5 - Gaussian Fitting.ipynb',
 '6 - HDF files.ipynb',
 '1.1 - jupyter not a book.ipynb',
 '2.0 - Introduction to File Manipulation.ipynb',
 '0.0 - jupyter_notebooks_introduction.ipynb',
 '4 - 2D Imaging - TIFF.ipynb',
 'Mantid.ipynb',
 '3 - Multiple plots - Glob.ipynb',
 '7 - widgets.ipynb',
 '2.1 - Introduction to Plotting.ipynb',
 '1.2 - markdown syntax.ipynb',
 '1.3 - introduction to python - part 1.ipynb',
 '4 - 2D Imaging - FITS.ipynb',
 '.ipynb_checkpoints',
 'hdfview.png',
 '1.3 - introduction to python - part2.ipynb',
 '1.0 - jupyter dashboard.ipynb',
 'Data']

## File Manipulation: `os.path` module

In [14]:
# show what is available in os.path: <tab> or dir(...)
dir(os.path)

['__all__',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 '_get_sep',
 '_joinrealpath',
 '_varprog',
 '_varprogb',
 'abspath',
 'altsep',
 'basename',
 'commonpath',
 'commonprefix',
 'curdir',
 'defpath',
 'devnull',
 'dirname',
 'exists',
 'expanduser',
 'expandvars',
 'extsep',
 'genericpath',
 'getatime',
 'getctime',
 'getmtime',
 'getsize',
 'isabs',
 'isdir',
 'isfile',
 'islink',
 'ismount',
 'join',
 'lexists',
 'normcase',
 'normpath',
 'os',
 'pardir',
 'pathsep',
 'realpath',
 'relpath',
 'samefile',
 'sameopenfile',
 'samestat',
 'sep',
 'split',
 'splitdrive',
 'splitext',
 'stat',
 'supports_unicode_filenames',
 'sys']

There are plenty of methods available in the `os.path` module.

**For this example, let's use:**

```
 'abspath',
 'basename',
 'dirname',
 'exists',
 'getsize',
  'isdir',
 'isfile',
 'sep',
 'splitext',
```

In [15]:
# Let's pick one file
file_path = "Data/Glob/f1.txt"
file_path

'Data/Glob/f1.txt'

In [16]:
# Alternatively we can use `os.path.join`.
file_path = os.path.join("Data", "Glob", "f1.txt")
file_path

'Data/Glob/f1.txt'

In [17]:
# Path separator: It's different in Windows
os.path.sep

'/'

In [18]:
# Check if the file exists
os.path.exists(file_path)

False

In [19]:
# Where are we?
os.getcwd()

'/Users/rhf/git/IPythonNotebookTutorial/notebooks'

In [20]:
# Let's change the path
file_path = os.path.join("..", "Data", "Glob", "f1.txt")
file_path

'../Data/Glob/f1.txt'

In [21]:
# Check if the file exists again!!
os.path.exists(file_path)

True

In [22]:
# Is it a directory?
os.path.isdir(file_path)

False

In [23]:
# Is it a file?
os.path.isfile(file_path)

True

In [32]:
# What's its size? 
file_size_bytes = os.path.getsize(file_path)
file_size_kbytes = file_size_bytes / 1024
print(file_path, file_size_bytes, "Bytes", file_size_kbytes, "KBytes")

../Data/Glob/f1.txt 7375 Bytes 7.2021484375 KBytes


In [33]:
# Get File name
os.path.basename(file_path)

'f1.txt'

In [34]:
# Get Full path
os.path.abspath(file_path)

'/Users/rhf/git/IPythonNotebookTutorial/Data/Glob/f1.txt'

In [35]:
# Get the firectory where the file is
os.path.dirname(file_path)

'../Data/Glob'

In [36]:
# We were expecting something different????
# Absolute directory where the file is
os.path.dirname(
    os.path.abspath(file_path)
)

'/Users/rhf/git/IPythonNotebookTutorial/Data/Glob'

### Exercise

```
/SNS/lustre/EXAMPLES/IPythonNotebookTutorial/Data/Glob/f1.txt
                                                       ------ 1
------------------------------------------------------------- 2
                                            -----------       3
-------------------------------------------------------       4                                                    
```

1. Get only the file name of file_path
2. Get the full path of file_path
3. Get the directory where the file_path is
4. Absolute directory where file_path is

In [42]:
file_path = "../Data/Glob/f1.txt"

In [43]:
# 1. File name
os.path.basename(file_path)

'f1.txt'

In [44]:
# 2. Full path
os.path.abspath(file_path)

'/Users/rhf/git/IPythonNotebookTutorial/Data/Glob/f1.txt'

In [45]:
# 3. Relative directory where the file is
os.path.dirname(file_path)

'../Data/Glob'

In [46]:
# 4. Absolute directory where the file is
os.path.dirname(
    os.path.abspath(file_path)
)

'/Users/rhf/git/IPythonNotebookTutorial/Data/Glob'

In [47]:
# 5. Check if it is a file
os.path.isfile(file_path)

True

How to get the **file extension**:

In [48]:
# `os.path.splitext` returns a tuple
os.path.splitext(file_path)

('../Data/Glob/f1', '.txt')

### List vs Tuple

- This is a tuple `(1, 2, 3)`
- This is a list/array `[1, 2, 3]`

# TODO!!!!

In [52]:
l = [1, 2, 3]
type(l)

list

In [54]:
del l[1]
l

[1, 3]

In [51]:
t = (1, 2, 3)
type(t)

tuple

In [55]:
del t[1]
l

TypeError: 'tuple' object doesn't support item deletion

In [58]:
t[1] = 4

TypeError: 'tuple' object does not support item assignment

In [32]:
# Using: os.path.splitext
prefix, suffix = os.path.splitext(file_path)
print("prefix = {}; suffix = {}; suffix without '.' = {}.".format(prefix, suffix, suffix[1:]))

prefix = ../Data/Glob/f1; suffix = .txt; suffix without '.' = txt.


In [33]:
# Using: string ssplit
filename_only = os.path.basename(file_path)
print(filename_only)
print(filename_only.split("."))
print(filename_only.split(".")[-1])

f1.txt
['f1', 'txt']
txt


# ASCII File Management

In [34]:
# Let's get a file
file_path = os.path.join("..", "Data", "Glob", "f1.txt")
file_path

'../Data/Glob/f1.txt'

In [57]:
# Let's see the contents of the file with a Linux command. Try to use the variable `file_path` defined above :)
!head $file_path

# X , Y , E , DX
1
0.00232478,8.22832,0.677097,0.0020133
0.00458718,5.8915,0.193922,0.00192328
0.00684958,12.573,0.413909,0.00189569
0.00911198,37.3161,0.768022,0.00199785
0.0113744,156.672,2.59008,0.00205609
0.0136368,567.555,11.7842,0.00217489
0.0158992,1401.89,16.1017,0.00229348
0.0181616,1334.2,13.4719,0.00239931


<hr/>

**Reading the file contents:**


```
file_object  = open(“filename”, “mode”)
```

Where:

`file_object` is the variable to add the file object. 

`mode` is: 

* `r` – Read mode which is used when the file is only being read 
* `w` – Write mode which is used to edit and write new information to the file (any existing files with the same name will be erased when this mode is activated) 
* `a` – Appending mode, which is used to add new data to the end of the file; that is new information is automatically amended to the end 


In [36]:
# Open file, read file, close file
fh = open(file_path, "r")
lines = fh.readlines()
fh.close()

In [37]:
# What do we have in `lines`?
lines[:10]

['# X , Y , E , DX\n',
 '1\n',
 '0.00232478,8.22832,0.677097,0.0020133\n',
 '0.00458718,5.8915,0.193922,0.00192328\n',
 '0.00684958,12.573,0.413909,0.00189569\n',
 '0.00911198,37.3161,0.768022,0.00199785\n',
 '0.0113744,156.672,2.59008,0.00205609\n',
 '0.0136368,567.555,11.7842,0.00217489\n',
 '0.0158992,1401.89,16.1017,0.00229348\n',
 '0.0181616,1334.2,13.4719,0.00239931\n']

In [38]:
# get rid of the `1` after the header (position 1 in the list)
del lines[1]
lines[:10]

['# X , Y , E , DX\n',
 '0.00232478,8.22832,0.677097,0.0020133\n',
 '0.00458718,5.8915,0.193922,0.00192328\n',
 '0.00684958,12.573,0.413909,0.00189569\n',
 '0.00911198,37.3161,0.768022,0.00199785\n',
 '0.0113744,156.672,2.59008,0.00205609\n',
 '0.0136368,567.555,11.7842,0.00217489\n',
 '0.0158992,1401.89,16.1017,0.00229348\n',
 '0.0181616,1334.2,13.4719,0.00239931\n',
 '0.020424,1621.73,20.0042,0.00247893\n']

In [39]:
# Change the header to `X, Y, E(Y), Dx`. Don't forget the new line!!
lines[0] = 'X, Y, E(Y), Dx\n'
lines[:10]


['X, Y, E(Y), Dx\n',
 '0.00232478,8.22832,0.677097,0.0020133\n',
 '0.00458718,5.8915,0.193922,0.00192328\n',
 '0.00684958,12.573,0.413909,0.00189569\n',
 '0.00911198,37.3161,0.768022,0.00199785\n',
 '0.0113744,156.672,2.59008,0.00205609\n',
 '0.0136368,567.555,11.7842,0.00217489\n',
 '0.0158992,1401.89,16.1017,0.00229348\n',
 '0.0181616,1334.2,13.4719,0.00239931\n',
 '0.020424,1621.73,20.0042,0.00247893\n']

In [40]:
# Save the file with a new name (put your name on the file!!!)
my_file_path = '/tmp/file_1234.txt'
fh = open(my_file_path, "w")
fh.writelines(lines)
fh.close()

In [41]:
# visualize its contents to make sure everything is OK
!head $my_file_path

X, Y, E(Y), Dx
0.00232478,8.22832,0.677097,0.0020133
0.00458718,5.8915,0.193922,0.00192328
0.00684958,12.573,0.413909,0.00189569
0.00911198,37.3161,0.768022,0.00199785
0.0113744,156.672,2.59008,0.00205609
0.0136368,567.555,11.7842,0.00217489
0.0158992,1401.89,16.1017,0.00229348
0.0181616,1334.2,13.4719,0.00239931
0.020424,1621.73,20.0042,0.00247893


### Exercise

- Remove the header of the file and save it.

In the end it should look something like:
```
0.00232478,8.22832,0.677097,0.0020133
0.00458718,5.8915,0.193922,0.00192328
0.00684958,12.573,0.413909,0.00189569
(...)
```

<hr/>

# XML File management

## The ElementTree XML API

The Element type is a flexible container object, designed to store hierarchical data structures in memory. The type can be described as a cross between a list and a dictionary.


In [42]:
# Let's import it 
import xml.etree.ElementTree as ET

In [43]:
# Let's see where we are:
os.getcwd()

'/Users/rhf/git/IPythonNotebookTutorial/notebooks'

In [44]:
file_name = "../Data/XML/f1.xml"

In [45]:
# does the file exists?
os.path.isfile(file_name)

True

In [46]:
#Let's have a look at the file:

!head -20 $file_name

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="sansstyle.xsl"?>
<SPICErack SPICE_version="1.7" filename="CG2_exp206_scan0021_0001.xml" start_time="2017-06-13 23:21:26" end_time="2017-06-14 01:22:10">
  <Header>
    <Instrument>CG2</Instrument>
    <Start_Time>2017-06-13 23:21:26</Start_Time>
    <End_Time>2017-06-14 01:22:10</End_Time>
    <Experiment_Title>Beamline calibration activities</Experiment_Title>
    <Experiment_number type="INT32">206</Experiment_number>
    <IPTS_number>828</IPTS_number>
    <Cycle_Number>473</Cycle_Number>
    <Command>scan n=@(__active_setup.frame_count)-1 preset time @(__active_setup.scan_time)</Command>
    <Users>Lakeisha Walker, Shuo Qian, Daisuke Sawada, Yuri Melnichenko, William Heller, Lowell Crow, Gary Lynn, George Wignall, Lilin He, Volker Urban, Ryan Oliver, Durgesh Rai, Hugh O'Neill, Ken Littrell, Katherine Bailey, Sai Venkatesh Pingali, Lisa Debeer-Schmitt</Users>
    <Local_Contact>LDS, KMB</Local_

<hr/>

Let's get the tag `root`:
```
<root> 
    <tag1>xxxx</tag1>
</root>
```

In [47]:
# Let's initilize it and get the `root` node = SPICErack
tree = ET.parse(file_name)
root = tree.getroot()

In [48]:
# Make sure the root is  SPICErack
root.tag

'SPICErack'

In [49]:
# Show all attributes of root
root.attrib

{'SPICE_version': '1.7',
 'filename': 'CG2_exp206_scan0021_0001.xml',
 'start_time': '2017-06-13 23:21:26',
 'end_time': '2017-06-14 01:22:10'}

In [50]:
# root.attrib is a dictionary: we can read / modify it by the key value
root.attrib['filename']

'CG2_exp206_scan0021_0001.xml'

## Pseudo-Exercise

Update the file:

1. In `SPICErack/Header/Users` remove the `-` in `Lisa Debeer-Schmitt`.
2. Add an attribute `type="STRING"` to `SPICErack/Header/Users`



In [51]:
# Get the user list: xpath = Header/Users
element = root.find('Header/Users')
element.text

"Lakeisha Walker, Shuo Qian, Daisuke Sawada, Yuri Melnichenko, William Heller, Lowell Crow, Gary Lynn, George Wignall, Lilin He, Volker Urban, Ryan Oliver, Durgesh Rai, Hugh O'Neill, Ken Littrell, Katherine Bailey, Sai Venkatesh Pingali, Lisa Debeer-Schmitt"

In [52]:
# Let's get the names to a list
names_str = element.text
names_list = names_str.split(",")
names_list

['Lakeisha Walker',
 ' Shuo Qian',
 ' Daisuke Sawada',
 ' Yuri Melnichenko',
 ' William Heller',
 ' Lowell Crow',
 ' Gary Lynn',
 ' George Wignall',
 ' Lilin He',
 ' Volker Urban',
 ' Ryan Oliver',
 ' Durgesh Rai',
 " Hugh O'Neill",
 ' Ken Littrell',
 ' Katherine Bailey',
 ' Sai Venkatesh Pingali',
 ' Lisa Debeer-Schmitt']

In [53]:
# Are the names well formatted?
names_list = [name.strip() for name in names_list]
names_list

['Lakeisha Walker',
 'Shuo Qian',
 'Daisuke Sawada',
 'Yuri Melnichenko',
 'William Heller',
 'Lowell Crow',
 'Gary Lynn',
 'George Wignall',
 'Lilin He',
 'Volker Urban',
 'Ryan Oliver',
 'Durgesh Rai',
 "Hugh O'Neill",
 'Ken Littrell',
 'Katherine Bailey',
 'Sai Venkatesh Pingali',
 'Lisa Debeer-Schmitt']

In [54]:
# Let's remove the `-` from Lisa
lisa = names_list[-1]
lisa
lisa = lisa.replace("-", " ")
lisa

'Lisa Debeer Schmitt'

In [55]:
# Let's update the list of names
names_list[-1] = lisa
names_list

['Lakeisha Walker',
 'Shuo Qian',
 'Daisuke Sawada',
 'Yuri Melnichenko',
 'William Heller',
 'Lowell Crow',
 'Gary Lynn',
 'George Wignall',
 'Lilin He',
 'Volker Urban',
 'Ryan Oliver',
 'Durgesh Rai',
 "Hugh O'Neill",
 'Ken Littrell',
 'Katherine Bailey',
 'Sai Venkatesh Pingali',
 'Lisa Debeer Schmitt']

In [56]:
# Get the list into a string with ',' separation of the names
names_str = ", ".join(names_list)
names_str

"Lakeisha Walker, Shuo Qian, Daisuke Sawada, Yuri Melnichenko, William Heller, Lowell Crow, Gary Lynn, George Wignall, Lilin He, Volker Urban, Ryan Oliver, Durgesh Rai, Hugh O'Neill, Ken Littrell, Katherine Bailey, Sai Venkatesh Pingali, Lisa Debeer Schmitt"

In [57]:
# Let's update the file in memory
element.text = names_str
element.text

"Lakeisha Walker, Shuo Qian, Daisuke Sawada, Yuri Melnichenko, William Heller, Lowell Crow, Gary Lynn, George Wignall, Lilin He, Volker Urban, Ryan Oliver, Durgesh Rai, Hugh O'Neill, Ken Littrell, Katherine Bailey, Sai Venkatesh Pingali, Lisa Debeer Schmitt"

In [58]:
# Let's Add the attribute
element.attrib

{}

In [59]:
element.attrib["type"]="STRING"
element.attrib

{'type': 'STRING'}

In [60]:
# Let's save our XML Tree as a new file
new_file_path = '/tmp/file_xml_1234.xml'
fh = open(new_file_path, 'w')
tree.write(fh, encoding='unicode')
fh.close()

In [61]:
# visualize the content
!head -20 $new_file_path

<SPICErack SPICE_version="1.7" end_time="2017-06-14 01:22:10" filename="CG2_exp206_scan0021_0001.xml" start_time="2017-06-13 23:21:26">
  <Header>
    <Instrument>CG2</Instrument>
    <Start_Time>2017-06-13 23:21:26</Start_Time>
    <End_Time>2017-06-14 01:22:10</End_Time>
    <Experiment_Title>Beamline calibration activities</Experiment_Title>
    <Experiment_number type="INT32">206</Experiment_number>
    <IPTS_number>828</IPTS_number>
    <Cycle_Number>473</Cycle_Number>
    <Command>scan n=@(__active_setup.frame_count)-1 preset time @(__active_setup.scan_time)</Command>
    <Users type="STRING">Lakeisha Walker, Shuo Qian, Daisuke Sawada, Yuri Melnichenko, William Heller, Lowell Crow, Gary Lynn, George Wignall, Lilin He, Volker Urban, Ryan Oliver, Durgesh Rai, Hugh O'Neill, Ken Littrell, Katherine Bailey, Sai Venkatesh Pingali, Lisa Debeer Schmitt</Users>
    <Local_Contact>LDS, KMB</Local_Contact>
    <Scan_Number type="INT32">21</Scan_Number>
    <Scan_Point_Number ty