# Learning Objectives

- [ ]  2.2.2 Use common library functions for input/output, strings and mathematical operations.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/njc-cz2-2021/Materials/blob/main/365-Days-of-H2-Computing/Day_028.ipynb)

# D28 Basic CSV Processing

CSV files are plain text files which use specific format to store tabular data. CSV stands for "Comma Separated Values".

* Each line of the file is a data record. 
* Each record consists of one or more fields, separated by commas.
* Normally first line of the file gives table header.

>``` text
>year, sex, type_of_course, no_of_graduates
>1993, Males, Humanities & Social Sciences, 481
>1993, Males, Mass Communication, na
>1993, Males, Accountancy, 295
>1993, Males, Business & Administration, 282
>```

## D28.1 Why Use CSV?

* CSV is a common format for data exchange because it is simple and compact.
* Most relational databases provides tools to import and export CSV files.
* CSV files can be easily opened in Excel.

## D28.2 Read File into List

* Read the csv file using `readlines()` method, which returns data in a list.
* Use list slicing to remove header row
* Use string `strip()` method to remove any surrounding white spaces (space, tab, new line characters)


#### Example
* Read `sample-sales-data.csv` file into a list.
* Discard header row.
* Strip any leading & trailing white space from each line. You might want to use the string `.strip()` method for this.
* Print out first 3 items of the list.

>``` python
>with open('./resources/sample-sales-data.csv') as f:
>    x = f.readlines()
>    x = x[1 : ]
>    
>x = [i.strip() for i in x]
>x
>```

In [None]:
with open('./resources/sample-sales-data.csv') as f:
    x=f.readlines()
    x=x[1:]
    
x = [i.strip() for i in x]
x

Each a row in csv file is a record. The record is delimited by comma `,` .
- Use string `split(delimiter)` method to split the record into a list or a tuple.
- Use multi-level indexing to get a cell value

### Exercise 

- Read the file into a list such that each record is represented as a tuple
- Print out first 2 records in the list
- Print out company name of 1st record

In [None]:
#YOUR_CODE_HERE

## D28.3 Python `csv` Module

While we could use the built-in `open()` function to work with CSV files in Python, there is a dedicated `csv` module that makes working with CSV files much easier. It contains following built-in convenient functions:

* `csv.reader`
* `csv.writer`
* `csv.writer.writerow()`

### D28.3.1 Read CSV Files using `csv.reader`

After opening a CSV file, create a `csv.reader` object which returns a iterable object to process CSV data. Syntax is

>``` python
>csv.reader(your_file_here)
>```

* Each record is represented as a list.
* All fields are `str` type.

#### Example

* Use `csv.reader` to read and print out all rows in `'olympics-medals-sample.csv'`.
* Instead of printing out, save all rows in `'olympics-medals-sample.csv'` into a list `data`.

>``` python
>import csv
>
>with open ('./resources/sample-sales-data.csv') as f:
>    reader = csv.reader(f)
>    data = [row for row in reader]
>
>print(data)
>```

The character used to separate values is called a **delimiter**. Apart from comma (`,`), other delimiters include the tab (`\t`), colon (`:`) and semi-colon (`;`) characters.

For tab separated values, it is common to save it with extension `*.tsv`.

#### Exercise
* Use `csv.reader` to read file `'olympics-medals-sample.tsv'`; save both header and data in list.

In [None]:
#YOUR_CODE_HERE

The `csv.reader()` function only has one required argument, which is the file object, but it has a couple of other optional arguments:

* `delimiter`: This argument specifies which delimiter the writer will use. It defaults to `','`, but you can set it to any other character.
* `quotechar`: This specifies which character will be used for quoting. It defaults to `'"'`
* `escapechar`: This specifies the character that will be used to escape the delimiter if quoting is not being used. It defaults to nothing.

### D28.3.2 Write CSV Files using `csv.writer`

A `csv.writer` can be used to write a CSV file. The `csv.writer()` function returns a `writer` object that converts the user's data into a delimited string and write to file using its `writerow()` function.

The `newline` argument is set to '' when opening a file which the `csv.writer` will write each row in a line.

**Exercise:**
* Use `csv.writer` to save following data into a csv file `'sample.csv'`.

>``` text
>["Symbol", "Name", "Price (Intraday)"]
>["TMVWY", "TeamViewer AG", 21.05]
>["AXSM", "Axsome Therapeutics, Inc.", 88.87]
>["SAGE", "Sage Therapeutics, Inc.",	53.36]
>```

In [None]:
#YOUR_CODE_HERE

The `csv.writer()` function has only 1 required parameter, the file object. You can also add following optional keyword arguments:

* `delimiter`: This argument specifies which delimiter the writer will use. It defaults to ',', but you can set it to any other character.
* `quotechar`: This specifies which character will be used for quoting. It defaults to '"'
* `escapechar`: This specifies the character that will be used to escape the delimiter if quoting is not being used. It defaults to nothing.

The `quoting` argument: this specifies which fields should be quoted, there are a few options:
* `csv.QUOTE_ALL`: All fields will be quoted
* `csv.QUOTE_MINIMAL`: Only fields containing the delimiter or quotechar will be quoted.
* `csv.QUOTE_NONNUMERIC`: The writer will quote all fields containing text and it converts all numbers to float values
* `csv.QUOTE_NONE`: No fields will be quoted, the writer instead escapes delimiters. If you use this value, you also need to provide the escapechar argument.

#### Example
>``` python
>import csv
>with open('stock_sample.tsv', 'w', newline = '') as file:
>    writer = csv.writer(
>        file,
>        delimiter = '\t',
>        quotechar = '|',
>        quoting = csv.QUOTE_ALL
>    )
>    writer.writerow(['stock', 'price', 'cost', 'profit'])
>    writer.writerow(['21', '121.34', '45.34', '76'])

The `writerows()` method of `csv.writer` allow you to write 2-dimensional list to a CSV file.

#### Exercise

Save following data to a csv file `stock_sample.csv` using `csv.writer`.

>``` python
>[['stock', 'price', 'cost', 'profit'], ['21', '121.34', '45.34', '76']]
>```

In [None]:
#YOUR_CODE_HERE

## D28.3 Common Things to do with CSV

### D28.3.1 Load Data into List

#### Exercise

Read `sample-sales-data.csv` file; save its header into variable `header` and its data into variable `data`.

In [None]:
#YOUR_CODE_HERE

### D28.3.2 Find Distinct Values

You can use `set()` constructor function to find distinct value of a column.

#### Exercise

* List all the companies contained in the file `sample-sales-data.csv`.

In [None]:
#YOUR_CODE_HERE

### D28.3.3 Filter Data

The list can be filtered based on condition(s) by using: 
* `for` loop, or
* list comprehension.

#### Exercise
* Find all sales records by company `Initech` and print out first 3 records.
* Find all sales done on date '2015-01-06'.

In [None]:
#YOUR_CODE_HERE

### D28.3.4 Filter Data

Both `isdigit()` and `isnumeric()` can be used to check a string which can be converted to **a positive integer**, e.g. `'1234'`.
* But it will return `False` for either `'-1234'` or `'12.34'`

#### Example
>``` python
>print('1234'.isdigit(),'1234'.isnumeric())
>print('-1234'.isdigit(),'-1234'.isnumeric())
>print('12.34'.isdigit(),'12.34'.isnumeric())
>print('一二三四五'.isdigit(),'一二三四五'.isnumeric())
>```

### D28.3.5 Compute on Records

You can perform simple data analysis on the data:
* `sum()`, `count()`, `min()`, `max()` etc.
* Remember to convert data to numerical value for computation or comparison.

#### Exercise
* Remove records with invalid Units value.
* Find total units of sales on "Hardware".

In [None]:
#YOUR_CODE_HERE