### Python SO

In this brief notebook we will discuss the use of python as support for elementary system operations. Then we will explore better the pd.read_excel function.

Python is not the best language program to handle OS dependent functionality, but instead it is very simple and ready to use. Pthon is also easy to install from https://www.python.org/, but I personally suggest to install https://www.anaconda.com/products/individual. It is a tool kit for data science and will install pandas too that we will use at the end of the notebook.

In the first example we can see the functions `os.getcwd()` and `os.chdir()`:

In [1]:
import os

# Get the current working directory
def current_path():
    print("Current working directory before")
    print(os.getcwd())
    print()

# Printing CWD before
current_path()
   
# Changing the CWD
os.chdir('../')
   
# Printing CWD after
current_path()

# Changing the CWD
os.chdir('./python_SO')

# Printing CWD after
current_path()

Current working directory before
C:\Users\mattia.diiorio\Desktop\python_SO

Current working directory before
C:\Users\mattia.diiorio\Desktop

Current working directory before
C:\Users\mattia.diiorio\Desktop\python_SO



This can be very useful when we have to face relative path!

Lets see the functions `os.makedirs()` and `os.rmdir()`  to create and delete directory:

In [2]:
# Name of the directory 
directory = "An_example"
subdirectory = 'in an example'

# Create the directory
os.makedirs(directory)
print("Directory '% s' created" % directory )

# Path
path = os.path.join(directory,subdirectory) # Why this function is awesome? hint print the path

# Create the directory
os.makedirs(path)
print("Directory '% s' created" % subdirectory )

# Delete the directory
os.rmdir(path)
print("Directory '% s' deleted" % path )

Directory 'An_example' created
Directory 'in an example' created
Directory 'An_example\in an example' deleted


**Exercise 1.** Try to delete the folder 'An_example'

In [3]:
# Solution

os.rmdir(directory)

# What happen if there is someting in the directory?

We talked about directories lets see something about files: 

In [4]:
f = open("example.txt", "w") # Open a file in write modality means create it!!
f.write("Hello")
f.close
file = open("example.txt", 'r')
text = file.read()
print(text)




We have different option when we open a file:
* w -> write
* r -> read
* a -> append

Also append create the file if it doesn't exist

Let's see and example of manage an exception in reading file 

In [5]:
try:
    # If the file does not exist,
    # then it would throw an IOError
    filename = 'non_esisto.txt'
    f = open(filename, 'r')
    text = f.read()
    f.close()
 
# any of the above lines throws IOError.   

except IOError:
 
    # print(os.error) will <class 'OSError'>
    print('Problem reading: ' + filename)

Problem reading: non_esisto.txt


We can analyze also the size of a file:

In [6]:
os.stat('re.png').st_size # KB

62667

and similarly its data creation:

In [7]:
import datetime as dt
dt.datetime.fromtimestamp(os.path.getctime('re.png'))

datetime.datetime(2021, 7, 22, 16, 21, 54, 351205)

Here we have to use the lib `datetime` to convert the second from creation in a real date 

Let's see how we can handle path as string. I suggest you to use:

In [8]:
path=r'data/' # here r indicate that special characters should not be evaluated.

In general best way of manage path is:

`path_file = os.sep.join([path_dir, filename])`

Where `os.sep` is independent from the SO!

We can also list all item in a folder:

In [9]:
os.listdir(path)

['file_0.csv',
 'file_1.csv',
 'file_10.csv',
 'file_11.csv',
 'file_12.csv',
 'file_13.csv',
 'file_14.csv',
 'file_15.csv',
 'file_16.csv',
 'file_17.csv',
 'file_18.csv',
 'file_19.csv',
 'file_2.csv',
 'file_20.csv',
 'file_3.csv',
 'file_4.csv',
 'file_5.csv',
 'file_6.csv',
 'file_7.csv',
 'file_8.csv',
 'file_9.csv',
 'sample']

 Now we will introduce an awesome lib to list the file of a folder  

with the lib `glob` you can express pathname with regular expression for example:

In [10]:
from glob import glob

glob(path+'*.csv')

['data\\file_0.csv',
 'data\\file_1.csv',
 'data\\file_10.csv',
 'data\\file_11.csv',
 'data\\file_12.csv',
 'data\\file_13.csv',
 'data\\file_14.csv',
 'data\\file_15.csv',
 'data\\file_16.csv',
 'data\\file_17.csv',
 'data\\file_18.csv',
 'data\\file_19.csv',
 'data\\file_2.csv',
 'data\\file_20.csv',
 'data\\file_3.csv',
 'data\\file_4.csv',
 'data\\file_5.csv',
 'data\\file_6.csv',
 'data\\file_7.csv',
 'data\\file_8.csv',
 'data\\file_9.csv']

This are some examples for the use of regular expression:

<img src="re.png">

In this case `*.csv` means all files that end with `.csv` .The function return a list and this is very powerful

**ex** used for renaming:

In [11]:
new_path=r'data/sample/'

for i in glob(path+'*.csv'):
    os.rename(i,new_path+os.path.basename(i))

then we move back:

In [12]:
for i in glob('data/**/*.csv'):
    os.replace(i, path+os.path.basename(i))

**ex** used in combination with pandas: 

In [13]:
import pandas as pd
dfs=[pd.read_csv(i,index_col=0,parse_dates=True) for i in glob('data/*.csv')]
# Who guess what is dfs? and how is called last expression?

# to join them use:
df=pd.concat(dfs)

Now we will explore some functions of `pandas` that help us to mange excel files.

In [14]:
# This is our DataFrame
df.head()

Unnamed: 0,feature1,feature2,feature3
1991-12-25 00:00:00,0.397651,0.156372,0.773794
1991-12-25 00:01:00,0.988181,0.26959,0.799375
1991-12-25 00:02:00,0.717102,0.374019,0.494222
1991-12-25 00:03:00,0.10627,0.939622,0.426496
1991-12-25 00:04:00,0.259952,0.691947,0.819431


In [15]:
# We can save DataFrame in .xlsx file
df.to_excel('df.xlsx',sheet_name='Sheet_test',startrow=0, startcol=0)

In [16]:
# and similarly we can read too
my_df=pd.read_excel('df.xlsx',index_col=0,convert_float=True,# is True by default and is used to convert the decimal 
                    thousands=None) # is None by default and is used when thousands are counted with points

In [18]:
os.remove('df.xlsx')

In [19]:
my_df.head()

Unnamed: 0,feature1,feature2,feature3
1991-12-25 00:00:00,0.397651,0.156372,0.773794
1991-12-25 00:01:00,0.988181,0.26959,0.799375
1991-12-25 00:02:00,0.717102,0.374019,0.494222
1991-12-25 00:03:00,0.10627,0.939622,0.426496
1991-12-25 00:04:00,0.259952,0.691947,0.819431
