# Introduction to OS and ASCII + XML File Manipulation

- Operating system interaction: `os`
- File manipulation: `os.path`
- File Open / Close / Read

## Linux shell commands

- To call shell from a Notebook is to prepend an exclamation point to a shell command
- This is not python standard! Just Jupyter notebooks!
- Note the scope of the operation: Only cell wise!

**NB**: This is only avaible in the notebooks, not in python scripts!

In [119]:
# Current Directory


In [120]:
# Let's change our directory


In [121]:
# Current Directory


Note that the scope of the Linux commands started by `!` is the **cell** and not the entire notebook!

More about this in the `os` module,

In [122]:
# We can chain Linux commands


## Python `OS` module

- In python files (scrpts) the `!` does not work.
- Use the `os` module instead.

In [123]:
# import the module
import os

In [124]:
# Current Directory


In [125]:
# Let's change our directory


- the scope of `os` it's not the cell but the notebook

In [126]:
# Current Directory


In [127]:
# Let's get back to our directory


In [128]:
# Let's list the contents of `notebooks`


## File Manipulation: `os.path` module

In [129]:
# show what is available in os.path: <tab> or dir(...)
dir(os.path)

['__all__',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 '_get_sep',
 '_joinrealpath',
 '_varprog',
 '_varprogb',
 'abspath',
 'altsep',
 'basename',
 'commonpath',
 'commonprefix',
 'curdir',
 'defpath',
 'devnull',
 'dirname',
 'exists',
 'expanduser',
 'expandvars',
 'extsep',
 'genericpath',
 'getatime',
 'getctime',
 'getmtime',
 'getsize',
 'isabs',
 'isdir',
 'isfile',
 'islink',
 'ismount',
 'join',
 'lexists',
 'normcase',
 'normpath',
 'os',
 'pardir',
 'pathsep',
 'realpath',
 'relpath',
 'samefile',
 'sameopenfile',
 'samestat',
 'sep',
 'split',
 'splitdrive',
 'splitext',
 'stat',
 'supports_unicode_filenames',
 'sys']

There are plenty of methods available in the `os.path` module.

**For this example, let's use:**

```
 'abspath',
 'basename',
 'dirname',
 'exists',
 'getsize',
  'isdir',
 'isfile',
 'sep',
 'splitext',
```

In [130]:
# Let's pick one file
file_path = "Data/Glob/f1.txt"
file_path

'Data/Glob/f1.txt'

In [131]:
# Alternatively we can use `os.path.join`.


In [132]:
# Path separator: It's different in Windows


In [133]:
# Check if the file exists


In [134]:
# Let's buid a fake paths and see if the file exists


In [135]:
# Is it a directory?


In [136]:
# Is it a file?


In [137]:
# What's the file size in Bytes? 


In [138]:
# Let's confirm that


In [139]:
# Get File name


In [140]:
# Get Full path


In [141]:
# Get the firectory where the file is


In [142]:
# We were expecting something different????
# Absolute directory where the file is


### Exercise

```
/SNS/lustre/EXAMPLES/IPythonNotebookTutorial/Data/Glob/f1.txt
                                                       ------ 1
------------------------------------------------------------- 2
                                            -----------       3
-------------------------------------------------------       4                                                    
```

1. Get only the file name of file_path
2. Get the full path of file_path
3. Get the directory where the file_path is
4. Absolute directory where file_path is

In [143]:
file_path = "../Data/Glob/f1.txt"

In [144]:
# 1. File name


In [145]:
# 2. Full path


In [146]:
# 3. Relative directory where the file is


In [147]:
# 4. Absolute directory where the file is


In [148]:
# 5. Check if it is a file


How to get the **file extension**:

In [149]:
# `os.path.splitext` returns a tuple


### List vs Tuple

- This is a tuple `(1, 2, 3)`
- This is a list/array `[1, 2, 3]`

Python tuples vs lists – Mutability. 
*  The major difference between tuples and lists is that a list is mutable, whereas a tuple is immutable. This means that a list can be changed, but a tuple cannot.

In [150]:
# Using: os.path.splitext


In [151]:
# Using: string split


In [152]:
# what is the tyoe of the file name


In [153]:
# Let's use string split


# ASCII File Management

In [154]:
# Let's get a file
file_path = os.path.join("..", "Data", "Glob", "f1.txt")
file_path

'../Data/Glob/f1.txt'

In [155]:
# Let's see the contents of the file with a Linux command. Try to use the variable `file_path` defined above :)


<hr/>

**Reading the file contents:**


```
file_object  = open(“filename”, “mode”)
```

Where:

`file_object` is the variable to add the file object. 

`mode` is: 

* `r` – Read mode which is used when the file is only being read 
* `w` – Write mode which is used to edit and write new information to the file (any existing files with the same name will be erased when this mode is activated) 
* `a` – Appending mode, which is used to add new data to the end of the file; that is new information is automatically amended to the end 


In [156]:
# Open file, read file, close file


In [157]:
# What do we have in `lines`? Note '\n'


In [158]:
# get rid of the `1` after the header (position 1 in the list)


In [159]:
# Change the header to `Q, I, E(I), E(Q)`. Don't forget the new line!!



In [160]:
# Save the file with a new name (put a random string (your name?) in the file name!!!)


In [161]:
# visualize its contents to make sure everything is OK


### Exercise

- Remove the header of the file and save it in the /tmp directory with some random name.

In the end it should look something like:
```
0.00232478,8.22832,0.677097,0.0020133
0.00458718,5.8915,0.193922,0.00192328
0.00684958,12.573,0.413909,0.00189569
(...)
```

<hr/>

# XML File management

## The ElementTree XML API

The Element type is a flexible container object, designed to store hierarchical data structures in memory. The type can be described as a cross between a list and a dictionary.


In [162]:
# Let's import it 
import xml.etree.ElementTree as ET

In [163]:
# Let's see where we are:


In [164]:
# does the file exists?


In [165]:
#Let's have a look at the file:



<hr/>

Let's get the tag `root`:
```
<root> 
    <tag1>xxxx</tag1>
</root>
```

In [166]:
# Let's initilize it and get the `root` node = SPICErack


In [167]:
# Make sure the root is  SPICErack


In [168]:
# Show all attributes of root


In [169]:
# root.attrib is a dictionary: we can read / modify it by the key value


## Pseudo-Exercise

Update the file:

1. In `SPICErack/Header/Users` remove the `-` in `Lisa Debeer-Schmitt`.
2. Add an attribute `type="STRING"` to `SPICErack/Header/Users`



In [170]:
# Get the user list: xpath = Header/Users


In [171]:
# Let's get the names string to a list
# Hint: use split by ','


In [172]:
# Are the names well formatted?
# Hint: Probably there are some ' '....


In [173]:
# Let's remove the `-` from Lisa


In [174]:
# Let's update the list of names


In [175]:
# Get the list into a string with ',' separation of the names


In [176]:
# Let's update the file in memory


In [177]:
# Let's Add the attribute: "type"="STRING"


In [178]:
# Let's save our XML Tree as a new file. Remember: Use random file names


In [179]:
# visualize the content
