# Introduction to OS and ASCII + XML File Manipulation

- Operating system interaction: `os`
- File manipulation: `os.path`
- File Open / Close / Read

## Linux shell commands

- To call Linux shell commands from a Notebook one needs to prepend an exclamation point to the command
- This is not python standard! Just Jupyter notebooks!
- Note the scope of the operation is only cell wise!

**NB**: This is only avaible in the notebooks, not in python scripts!

In [31]:
# Current Directory
!pwd

/SNS/users/rhf/git/IPythonNotebookTutorial/notebooks


In [32]:
# Let's change our directory
!cd ..

In [33]:
# Current Directory
!pwd

/SNS/users/rhf/git/IPythonNotebookTutorial/notebooks


Note that the scope of the Linux commands started by `!` is the **cell** and not the entire notebook!

More about this in the `os` module,

## Python `OS` module

- In python files (scrpts) the `!` does not work.
- Use the `os` module instead.

In [34]:
# import the module
import os

In [35]:
# Current Directory
os.getcwd()

'/SNS/users/rhf/git/IPythonNotebookTutorial/notebooks'

In [36]:
# Let's change our directory
os.chdir('..')

- the scope of `os` it's not the cell but the notebook

In [37]:
# Current Directory
os.getcwd()

'/SNS/users/rhf/git/IPythonNotebookTutorial'

In [38]:
# Let's get back to our directory
os.chdir('notebooks')

In [39]:
# Let's list the contents of `notebooks`
os.listdir()

['1.4 - introduction to python - part2.ipynb',
 '5 - Gaussian Fitting.ipynb',
 'Data',
 '.ipynb_checkpoints',
 '1.3 - introduction to python - part 1.ipynb',
 '6 - HDF files.ipynb',
 '4.1 - 2D Imaging - TIFF.ipynb',
 '0.0 - jupyter_notebooks_introduction.ipynb',
 'hdfview.png',
 '1.2 - markdown syntax.ipynb',
 '1.0 - jupyter dashboard.ipynb',
 'command-line',
 '7.0 - Introduction to widgets.ipynb',
 '3 - Multiple plots - Glob.ipynb',
 'PyONCat.ipynb',
 '1.1 - jupyter not a book.ipynb',
 'Mantid.ipynb',
 '4.0 - 2D Imaging - FITS.ipynb']

## File Manipulation: `os.path` module

In [40]:
# show what is available in os.path: <tab> or dir(...)
dir(os.path)

['__all__',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 '_get_sep',
 '_joinrealpath',
 '_varprog',
 '_varprogb',
 'abspath',
 'altsep',
 'basename',
 'commonpath',
 'commonprefix',
 'curdir',
 'defpath',
 'devnull',
 'dirname',
 'exists',
 'expanduser',
 'expandvars',
 'extsep',
 'genericpath',
 'getatime',
 'getctime',
 'getmtime',
 'getsize',
 'isabs',
 'isdir',
 'isfile',
 'islink',
 'ismount',
 'join',
 'lexists',
 'normcase',
 'normpath',
 'os',
 'pardir',
 'pathsep',
 'realpath',
 'relpath',
 'samefile',
 'sameopenfile',
 'samestat',
 'sep',
 'split',
 'splitdrive',
 'splitext',
 'stat',
 'supports_unicode_filenames',
 'sys']

There are plenty of methods available in the `os.path` module.

**For this example, let's use only:**

```
'abspath',
'basename',
'dirname',
'exists',
'isdir',
'isfile',
'sep',
'splitext',
```


In [11]:
# Let's pick one file
file_path = "Data/Glob/f1.txt"
file_path

'Data/Glob/f1.txt'

In [13]:
# Alternatively we can use `os.path.join`.
file_path = os.path.join("Data", "Glob", "f1.txt")
file_path

'Data/Glob/f1.txt'

In [14]:
# Path separator: It's different in Windows
os.path.sep

'/'

In [15]:
# Check if the file exists
os.path.exists(file_path)

True

In [12]:
# Let's buid a fake paths and see if the file exists
os.path.exists("/tmp/this_does_not.exist")

False

In [13]:
# Is it a directory?
os.path.isdir(file_path)

False

In [40]:
# Is it a file?
os.path.isfile(file_path)

True

In [14]:
# Get File name
os.path.basename(file_path)

'f1.txt'

In [15]:
# Get Full path
os.path.abspath(file_path)

'/SNS/users/rhf/git/IPythonNotebookTutorial/notebooks/Data/Glob/f1.txt'

In [16]:
# Get the firectory where the file is
os.path.dirname(file_path)

'Data/Glob'

In [17]:
# We were expecting something different????
# Absolute directory where the file is
os.path.dirname(
    os.path.abspath(file_path)
)

'/SNS/users/rhf/git/IPythonNotebookTutorial/notebooks/Data/Glob'

## Exercise

```
/SNS/lustre/EXAMPLES/IPythonNotebookTutorial/Data/Glob/f1.txt
                                                       ------ 1
------------------------------------------------------------- 2
                                            -----------       3
-------------------------------------------------------       4                                                    
```

1. Get only the file name of file_path
2. Get the full path of file_path
3. Get the directory where the file_path is
4. Absolute directory where file_path is

In [18]:
file_path = "../Data/Glob/f1.txt"

In [19]:
# 1. File name
os.path.basename(file_path)

'f1.txt'

In [20]:
# 2. Full path
os.path.abspath(file_path)

'/SNS/users/rhf/git/IPythonNotebookTutorial/Data/Glob/f1.txt'

In [21]:
# 3. Relative directory where the file is
os.path.dirname(file_path)

'../Data/Glob'

In [22]:
# 4. Absolute directory where the file is
os.path.dirname(
    os.path.abspath(file_path)
)

'/SNS/users/rhf/git/IPythonNotebookTutorial/Data/Glob'

In [23]:
# 5. Check if it is a file
os.path.isfile(file_path)

True

## How to get the **file extension**:

In [25]:
file_path = "../Data/Glob/f1.txt"

In [27]:
# first let's get the file name
file_name = os.path.basename(file_path)

In [28]:
# file_name is a string
type(file_name)

str

In [42]:
# There a `split` function in string
file_name.split(".")

['f1', 'txt']

In [47]:
# Let's split it into variables
prefix, suffix = file_name.split(".")
suffix

'txt'

In [48]:
# Let's split it into an array
file_name_split = file_name.split(".")
file_name_split[-1]

'txt'

# ASCII File Management

In [51]:
# Let's get a file
file_path = os.path.join("Data", "Fitting", "HB1_exp0762_scan0072.dat")
file_path

'Data/Fitting/HB1_exp0762_scan0072.dat'

In [54]:
# Let's see the contents of the file with a Linux command. Try to use the variable `file_path` defined above :)
!head -15 $file_path

# scan = 72
# date = 9/19/2017
# time = 10:30:01 PM
# proposal = 19284
# experiment = Polarized neutron study on the UPt2Si2
# experiment_number = 762
# command = scan h 0 k 0.97 1.02 0.0025 l 0 e 0 preset countfile offon_mcu15
# builtin_command = scan h 0 k 0.97 1.02 0.0025 l 0 e 0 preset countfile offon_mcu15
# users = Garrett Granroth, Karel Prokes, Masaaki Matsuda, Jooseop Lee, Sachith Dissanayake
# local_contact = Masaaki Matsuda
# scan_title = (0,1,0) off/on 4.5 K
# monochromator = Heusler
# analyzer = Heusler
# sense = +-+
# collimation = 48-80-60-240


<hr/>

**Reading the file contents:**


```
file_object  = open(“filename”, “mode”)
```

Where:

`file_object` is the variable to add the file object. 

`mode` is: 

* `r` – Read mode which is used when the file is only being read 
* `w` – Write mode which is used to edit and write new information to the file (any existing files with the same name will be erased when this mode is activated) 
* `a` – Appending mode, which is used to add new data to the end of the file; that is new information is automatically amended to the end 


In [55]:
# Open file, read file, close file
fh = open(file_path, "r")
lines = fh.readlines()
fh.close()

In [59]:
# What do we have in `lines`? Note '\n'
lines[:20]

['# scan = 72\n',
 '# date = 9/19/2017\n',
 '# time = 10:30:01 PM\n',
 '# proposal = 19284\n',
 '# experiment = Polarized neutron study on the UPt2Si2\n',
 '# experiment_number = 762\n',
 '# command = scan h 0 k 0.97 1.02 0.0025 l 0 e 0 preset countfile offon_mcu15\n',
 '# builtin_command = scan h 0 k 0.97 1.02 0.0025 l 0 e 0 preset countfile offon_mcu15\n',
 '# users = Garrett Granroth, Karel Prokes, Masaaki Matsuda, Jooseop Lee, Sachith Dissanayake\n',
 '# local_contact = Masaaki Matsuda\n',
 '# scan_title = (0,1,0) off/on 4.5 K\n',
 '# monochromator = Heusler\n',
 '# analyzer = Heusler\n',
 '# sense = +-+\n',
 '# collimation = 48-80-60-240\n',
 '# samplename = UPt2Si2\n',
 '# sampletype = crystal\n',
 '# samplemosaic = 30.000000\n',
 '# latticeconstants = 4.189588,4.189588,9.662000,90.000000,90.000000,90.000000\n',
 '# ubmatrix = 0.066991,0.228983,0.003071,-0.229093,0.066946,0.001089,0.000423,-0.007503,0.103447\n']

### Objective:

Find the colimation and split it by '-'

In [58]:
# let's find a line starting with "# collimation"
for line in lines:
    if line.startswith("# collimation"):
        collimation = line
        print("Found:", collimation)

Found: # collimation = 48-80-60-240



In [60]:
# Let's get the collimation right side of =
prefix, suffix = collimation.split("=")
suffix

' 48-80-60-240\n'

In [61]:
# I bet there's a new line character, remove it
suffix = suffix.strip()
suffix

'48-80-60-240'

In [62]:
# Let's get the collimation values
collimation_values = suffix.split("-")
collimation_values

['48', '80', '60', '240']

In [64]:
# Aren't they strings? What about converting them to integers?
# Let's introduce list comprehension here!
[int(v) for v in collimation_values]

[48, 80, 60, 240]

### Exercise

- In the same file, can you get the users of this expriment one by one?

You need to split by `,` this time.

```
'# users = Garrett Granroth, Karel Prokes, Masaaki Matsuda, Jooseop Lee, Sachith Dissanayake\n',
```