## Reading files with Open
we can open the file example.txt as follows using the open function <br/>
The first argument is the file path. The second parameter is the "mode". Common values used include "r" for reading, "w" for writing and "a" for appending.

In [9]:
File = open("./example.txt", "r")

# Once the file is loaded we can use it's attributes

print(File.name, File.mode)

# We should always close the file object using the method close.
File.close()

./example.txt r


This could be tedious, instead of that we can use "with" statement to open the file. This is a better practice because it automatically closes the file. The code will run everything in the indent block, then closes the file.

In [10]:
with open("./example.txt","r") as File1:
    file_stuff = File1.read()
    print(file_stuff)
    #All the operations
    
print("\nIs closed: ", File1.closed, "\n") # We can check if the file content is closed
print(file_stuff)   # We can print the file content out of the indent

Hey I'm an example
I'm other line

Is closed:  True 

Hey I'm an example
I'm other line


In [11]:
# We can also output every line as an element in a list using the method readlines
with open("./example.txt", "r") as File1:
    file_stuff = File1.readlines()
    print(file_stuff)
    print("===========")
    
# If we just want to read the first line of the file we can use "readline" method
with open("./example.txt", "r") as File1:
    file_stuff = File1.readline()
    print(file_stuff)
    print("===========")

# We can use it twice or more times to get each value or use a for to loop every line
with open("./example.txt", "r") as File1:
    file_stuff = File1.readline()
    print(file_stuff)
    file_stuff = File1.readline()
    print(file_stuff)

["Hey I'm an example\n", "I'm other line"]
Hey I'm an example

Hey I'm an example

I'm other line


In [12]:
# Also we can specify the number of characters we would like to read from a string as an argument
# We can combine this behaviour with the readline method

with open("./example.txt", "r") as File1:
    file_stuff = File1.readline(2)
    print(file_stuff)
    file_stuff = File1.readline(5)
    print(file_stuff)

He
y I'm


## Creating and writing files

In [13]:
# We can create a file example2 as follows
# File2 = open("./example2","w")
# As before we are going to use the "with" statement

with open("./example2.txt","w") as File2:
    File2.write("This is a line A") # With the write method we can write on the new file

# We can set the mode to appended using a "a"
with open("./example2.txt", "a") as File2:
    File2.write("This is a line B")
# The append will not create a new file, it will just write the existing file 

## Importing Pandas
Pandas is a software library written for Python for data manipulation and analysis.
### Pandas csv reading example
<br/>
<em>
A csv is a typical file type used to store data<br/>
We are going to see how to use it with few simple steps <br/>
</em> <br/>
<code>import pandas as pd</code>

We can also read an online csv or urls but for this example we are going to use a local file <br/>
<code>csv_path = "file1.csv"
df = pd.read_csv(csv_path)</code>
    
One way pandas allows you to work with data is with data frames <br/>
You can see the top elements  of your dataframe using the head method <br/>
<code>csv_path="file1.csv"
df = pd.read_csv(csv_path)
df.head()
</code>

We can do the same for reading an excel file<br/>
<code>xlsx_path = "file1.xlsx"
df = pd.read_excel(xlsx_path)
df.head()</code>

We have a lot of functions on pandas but the purpose of this guide is not to get into it.


In [14]:
# We can create a dataframe out of a dictionary.
import pandas as pd

songs = {"album": ["thriller","back in black", "the dark side of the moon"], "released": [1982, 1980, 1973], "length": ["00:42:19","00:42:11", "00:42:49"]}
songs_frame =  pd.DataFrame(songs)
# Once we created the new dataframe we can use all the methods pandas give us
songs_frame.head()

Unnamed: 0,album,released,length
0,thriller,1982,00:42:19
1,back in black,1980,00:42:11
2,the dark side of the moon,1973,00:42:49


A dataframe is comprised of rows and columns. The keys correspond to the column labels. The values are lists corresponding to the rows. As we can see the keys correspond to the table headers. We can create a new dataframe consisting of one column. We just put the dataframe name and the name of the column header.

In [15]:
x = songs_frame[["released"]]
print(x.head())

# We can do this with multiple column headers

y = songs_frame[["released", "length"]]
print(y.head())

   released
0      1982
1      1980
2      1973
   released    length
0      1982  00:42:19
1      1980  00:42:11
2      1973  00:42:49


In [16]:
# We can use the "ix" method to access unique elements
# ix is going to be depredated so I recommend to use iloc or loc instead
# loc for label based indexing and iloc for just positional

# We select the item at the position 0,0
print(songs_frame.iloc[0,0])

# Instead of index we can use the name of the column as well
# We select the 3rd element of the released column
print(songs_frame.loc[2,"released"])

thriller
1973


In [17]:
# We can also slice dataframes and assign the values to a new dataframe
z = songs_frame.iloc[0:1, 0:3]

# we can slice dataframes using the column names for values between
print(z.head())

y = songs_frame.loc[0:0, "album": "length"]
print(y.head())

      album  released    length
0  thriller      1982  00:42:19
      album  released    length
0  thriller      1982  00:42:19


In [18]:
# Pandas has the method unique to determine the unique elements in a column of a dataframe
songs_frame['released'].unique() 
# the result is all to the unique elements in the column released

array([1982, 1980, 1973])

In [19]:
# We can use inequality operators for the entire dataframe, in pandas the result is a series of boolean values
print(songs_frame['released'] >= 1980)

# we can use this output to create a new dataframe
newDf = songs_frame[songs_frame['released'] >= 1980 ]
print(newDf.head())

# we can save the new dataframe using the method to_csv("name.csv")

0     True
1     True
2    False
Name: released, dtype: bool
           album  released    length
0       thriller      1982  00:42:19
1  back in black      1980  00:42:11
