<center>
  <a href="1.9-Working-with-data-sources.ipynb">Previous Page</a> | <a href="./">Content Page</a> | <a href="11.1-working-with-databases.ipynb">Next Page</a></center>
</center>

# 1.10 Working with files

Let's explore how to work with files nd use loops to iterate through lists. We will be working with a data set from https://data.gov.sg and in particular `Number of Parcels cleared at the Parcel Post Centre` data set. This data set presents the yearly breakdown on the number of parcels cleared at the Parcel Post Centre from 2004 onwards.

This data set has already been downloaded as a CSV file named `parcels.csv` in the same directory as this notebook.


To open a file in Python, we use the `open()` function. This function access two different arguments, in the following order:

* the name of the file (as a string)
* the mode of working with the file (also a string)

For now we use `"r"` as the mode for reading in files. `open()` returns a File object which you can access methods from.

Let's open the CSV file and read File object into a variable:

In [23]:
f = open("parcels.csv", "r")

File objects have a `read()` method that returns a string representation of the text in a file. Let's use the `read()` method to read in the contents into another variable called `data`.

In [24]:
data = f.read()

# let's see what the data object is
data

'year,parcels_count\n2004,1175900\n2005,1148900\n2006,1190384\n2007,1346800\n2008,1432700\n2009,1527400\n2010,1564000\n2011,2176900\n2012,2602700\n2013,2987800\n2014,3646600\n2015,4829700\n2016,5162576\n'

Since `data` is a string, we can use the `print()` function to display the contents of the file:

In [25]:
print(data)

year,parcels_count
2004,1175900
2005,1148900
2006,1190384
2007,1346800
2008,1432700
2009,1527400
2010,1564000
2011,2176900
2012,2602700
2013,2987800
2014,3646600
2015,4829700
2016,5162576



In [26]:
type(data)

str

From the above outputs, we know that data is string type, and each line of data has a newline character `\n`. Let's convert `data` into a List object so we have a list of row data to manipulate.

As `data` object is a string, we can use the `.split()` method to split into elements in a list. The `.split()` takes in a delimiter argument. Let's split it with the newline character `\n` as the delimiter and print the row data.

In [27]:
listrows = data.split("\n")
type(listrows)
listrows

['year,parcels_count',
 '2004,1175900',
 '2005,1148900',
 '2006,1190384',
 '2007,1346800',
 '2008,1432700',
 '2009,1527400',
 '2010,1564000',
 '2011,2176900',
 '2012,2602700',
 '2013,2987800',
 '2014,3646600',
 '2015,4829700',
 '2016,5162576',
 '']

In [28]:
len(listrows)

15

We can see that each element is another string and it is delimited by a comma `,` character. For now, let's practice writing a `for` loop to iterate through row data.

In [29]:
# for lrow in listrows:
#     lrow.split(",")
# listrows

Not much different from when we were just printing out the data from reading in the file but we can do more with a List object. What other things can we do with the row data?

Let's say we want to count the total number number of parcels in the data set. How would you do this?

In [66]:
total_parcels = 0
for row in listrows[1:-1]:
    year, parcels_string = row.split(',')
    total_parcels = total_parcels + int(parcels_string)
    
print("The total number of parcels is " + str(total_parcels))



The total number of parcels is 30792360


How do we solve the above error and get the total number of parcels?

In [31]:
for i in range(1, len(listrows)-1):
    print (listrows[i])
    year, parcels_string = listrows[i].split(',')
    total_parcels = total_parcels + int(parcels_string)
print("The total number of parcels is " + str(total_parcels))

2004,1175900
2005,1148900
2006,1190384
2007,1346800
2008,1432700
2009,1527400
2010,1564000
2011,2176900
2012,2602700
2013,2987800
2014,3646600
2015,4829700
2016,5162576
The total number of parcels is 30792360


<center>
  <a href="1.9-Working-with-data-sources.ipynb">Previous Page</a> | <a href="./">Content Page</a> | <a href="11.1-working-with-databases.ipynb">Next Page</a></center>
</center>

In [35]:
total_year = 0
for i in range(1, len(listrows)-1):
    print (listrows[i])
    year, parcels_string = listrows[i].split(',')
    total_year = int(year) + total_year
print("The total year is " + str(total_year))

2004,1175900
2005,1148900
2006,1190384
2007,1346800
2008,1432700
2009,1527400
2010,1564000
2011,2176900
2012,2602700
2013,2987800
2014,3646600
2015,4829700
2016,5162576
The total year is 26130
