# Foreign Exchange Rate Dataset File Reading
Datafile: "14_Foreign_Exchange_Rates_PureNumeric.csv.csv"

Originally developed by Jingwei Liu (2020-08-29)


### Let's first define some functions to show the data we will read and help us get some basic information about the dataset.
In this example, the dataset will be read as<font color = "red"> **a list of lists** </font>, so we will define a small function to show the values in the list of lists

In [2]:
# Define a "ShowData" function - note the default value for the (now) optional parameter.
#  dataset is a list of lists
def ShowData(dataset = [["No dataset sent"]]):
    for r in dataset:
        # print elements in a tab-separated format
        print ("\t".join(r))

# sample calls
ShowData([["one", "two", "three"], ["four", "five", "six"], ["seven", "eight", "nine"]])
#show()

one	two	three
four	five	six
seven	eight	nine


### Also, it is always good to check the shape of the dataset you read 
*The function below will show the number of rows and columns in the list of lists.*
<br>
**Here, Row number means how many elements in the list. Column number means how many elements in each element list.**

In [3]:
# Define a "ShowRowsAndCols" function which show the number of rows and columns in the dataset
# dataset is a list of list. Row number means how many elements in the list. Column number means how many elements in each element list.
def ShowRowsAndCols(dataset = [["No dataset sent"]]): 
    print("There are {} rows in the dataset".format(len(dataset)))
    print("There are {} columns in the dataset".format(len(dataset[1])))
    
# sample calls
ShowRowsAndCols([["one", "two", "three"], ["four", "five", "six"]])
#show()

There are 2 rows in the dataset
There are 3 columns in the dataset


### Now, let's read the data set into a list of lists using different Python methods

One thing you should know is , each element in the list of lists is stored <font color = "red">**as string**</font>. (even it is a number).

In [None]:
# Initial version - "standard programming"
#
# Define a list for the data.  Will be a list of lists.
data = []
# open the file
fname = "../data/14_Foreign_Exchange_Rates_PureNumeric.csv"
f = open(fname, "r")
# ignore the first 5 lines
for i in range(6):
    line = f.readline()
# loop until we run out of lines
while (line):
    # strip the newline and tokenize (split on commas, in this case)
    tokens = line.rstrip().split(',')
    # append this record to the dataset
    data.append(tokens)
    # read the next line
    line = f.readline()
# close the file
f.close()
# show the data
ShowData(data)

After running the above cell, you should see the data is read as a list of lists. We read all rows in the dataset and each row is a list and also an element of a bigger list. **So, that's why we call this a list of lists**

Now, Let's try to check the value and data type of the first element of the first row *(keep in mind that the subscript in python starts from 0)*

In [None]:
data[0][0]

In [None]:
type(data[0][0])

#### A Python-esque version of the code.
You can see in this cell, it uses fewer lines to do the same work.  For your assignment, you are free to use any of the code versions as a starting point.

In [4]:
#
# Python-esque version 1
#
# Grab all the lines from the file starting with line 6, strip
# the newline and tokenize
with open("../data/14_Foreign_Exchange_Rates_PureNumeric.csv") as f:
    vdataset = [line.rstrip().split(',') for line in f.readlines()[5:]]
# show the data
ShowData(vdataset)


4	1/7/2000	1.5272	0.9714	1.938	0.6104	1.831	1.4505	8.2794	7.7783	43.55	1138	9.52	6.057	1.6625	7.2285	105.17	3.8	7.966	8.415	73.15	1.5623	30.85	37.3
5	1/10/2000	1.5242	0.9754	1.935	0.6107	1.819	1.4568	8.2794	7.7785	43.55	1133.5	9.445	6.0765	1.6618	7.254	105.28	3.8	8.024	8.449	73.3	1.5704	30.83	37.27
6	1/11/2000	1.5209	0.9688	1.9365	0.6068	1.8225	1.457	8.2795	7.7785	43.6	1147	9.4825	6.09	1.669	7.214	106.09	3.8	7.969	8.397	73.35	1.5605	30.83	37.61
7	1/12/2000	1.5202	0.9727	1.9286	0.6073	1.835	1.455	8.2796	7.7787	43.6	1144.5	9.515	6.0685	1.669	7.236	105.76	3.8	7.977	8.418	73.35	1.566	30.8	37.54
8	1/13/2000	1.4954	0.9737	1.9084	0.6067	1.814	1.4495	8.2798	7.7788	43.55	1135.5	9.51	6.057	1.67	7.252	106.09	3.8	7.9425	8.417	73.35	1.5707	30.8	37.49
9	1/14/2000	1.5004	0.9874	1.9186	0.6115	1.805	1.4497	8.2797	7.7789	43.55	1125	9.455	6.075	1.6752	7.347	105.86	3.8	7.997	8.474	73.35	1.5945	30.83	37.55
11	1/18/2000	1.506	0.988	1.9342	0.6105	1.7942	1.4502	8.2793	7.779	43.6	1127	9.441	6.089	1.6735	7.356	

#### Another Python-esque version of the codes
This time we use a module to help us read the dataset and we will read all rows.  Note that this version retains the column heading rows.

In [5]:
#
# Python-esque version 2 
# 
# use the csv module
import csv
ds = []
with open("../data/14_Foreign_Exchange_Rates_PureNumeric.csv") as f:
    reader = csv.reader(f)
    for row in reader:
        ds.append(row)
# show the data
ShowData(ds)


Index	Time Serie	AUSTRALIA - AUSTRALIAN DOLLAR/US$	EURO AREA - EURO/US$	NEW ZEALAND - NEW ZELAND DOLLAR/US$	UNITED KINGDOM - UNITED KINGDOM POUND/US$	BRAZIL - REAL/US$	CANADA - CANADIAN DOLLAR/US$	CHINA - YUAN/US$	HONG KONG - HONG KONG DOLLAR/US$	INDIA - INDIAN RUPEE/US$	KOREA - WON/US$	MEXICO - MEXICAN PESO/US$	SOUTH AFRICA - RAND/US$	SINGAPORE - SINGAPORE DOLLAR/US$	DENMARK - DANISH KRONE/US$	JAPAN - YEN/US$	MALAYSIA - RINGGIT/US$	NORWAY - NORWEGIAN KRONE/US$	SWEDEN - KRONA/US$	SRI LANKA - SRI LANKAN RUPEE/US$	SWITZERLAND - FRANC/US$	TAIWAN - NEW TAIWAN DOLLAR/US$	THAILAND - BAHT/US$
0	1/3/2000	1.5172	0.9847	1.9033	0.6146	1.805	1.4465	8.2798	7.7765	43.55	1128	9.4015	6.126	1.6563	7.329	101.7	3.8	7.964	8.443	72.3	1.5808	31.38	36.97
1	1/4/2000	1.5239	0.97	1.9238	0.6109	1.8405	1.4518	8.2799	7.7775	43.55	1122.5	9.457	6.085	1.6535	7.218	103.09	3.8	7.934	8.36	72.65	1.5565	30.6	37.13
2	1/5/2000	1.5267	0.9676	1.9339	0.6092	1.856	1.4518	8.2798	7.778	43.55	1135	9.535	6.07	1.656	7.208	103.77	3.8

### After reading the file, check row and column number in the list of lists (all three versions)

In [None]:
ShowRowsAndCols(data)
ShowRowsAndCols(vdataset)
ShowRowsAndCols(ds)

### We can do some simple calculation with the dataset we read
Here, I just show you about calculating the mean value of Australia data. 

In [None]:
sum = 0
# iterate from first row to last row
for i in data:
    # add Australia data of every row to sum
    sum = sum + float(i[2])
mean = sum/len(data)
mean

### Look at the column headers

In [None]:
# Use the dataset that includes the headers (ds)
ds[0]