# Data Files
Two very common file formats used to store data are `.txt`(txt) and `.csv` (`Comman Separated Values`) files.
- Both file types are just ASCII encoded **strings**
    - neither includes formatting
    - a `.csv` file is frequently opened and displayed with a spreadsheet while a `.txt.` file can be opened by a simple text editor like `Notepad`
- the data is ***organized* by the programmer** when the file is exported by using symbols to delineate the data elements: `,` `;` `\n` `\t ` or a space  etc.
    - Data in **one column of many rows** uses the `\n` symbol to indicate a *new line*.
    - Data in **one row of many columns** often uses a `,`
- Data files can contain column or row labels, if the programmer chooses, but it is also reasonable to separate each set of data and **use self-documenting** file names.<br>
    - When different data sets refer to different attributes of the same record, and thus are of the same length, they are often called **parallel files**.
    -It behooves the programmer to export **clean** data files or they are obliged to deal with **dirty data** (empty trailing elements, missing elements, elements of unexpected / incorrect type) when they write code to import the data into lists.
    
## Algorithm to Write a File.

### Pseudocode

1. Have data (lists?)
1. Open the a file object in the writing mode 
1. Having decided the file's organization (column or row)
    - For each item in the data set
        1. Convert data to strings, use dilenators as requred and...
        1. Use a file object's method to write to the file
1. Close file.

### Flowchart
![writingFile.png](writingFile.png)

### Example of Clean File Written Code
```python
demoList=[5,44,8.8] 
fileObjectName = open('dataInColumnLabelled.csv', 'w') 
#optional chance to send a label to file
fileObjectName.write("label\n") *
i=0
while i<len(demoList-1):  # all but the last item
    dataToWrite=str(demoList[i])+'\n'  # string on new row** 
     i+=1
dataToWrite=str(demoList[i])  # last item without trailin newline
fileObjectName.write(dataToWrite)
fileObjectName.close()
```

\* label is optional<br>
\** instead of `\n` the program could us `,` to make the file into a single row.

## Algorithms that Read a File

### Pseudocode

1. Determine/know the structure of the data
    - One can always look at it via Excel?
1. Open a file object of the file in read mode (default)
1. Read all of it (or read lines if data is in columns) as a string(s)
1. Parse the resulting string:
    - Split the string by `,`commas and or `\n`
1. Close the file before anything goes wrong
1. Deal with dirty data as needed.
1. Convert the resulting list into the required data type if needed.

### Flowchart
![readingFile.png](readingFile.png)

### Example of Reading Dirty File in Rows

```python
newList=[] 
fileObjectName = open('dataInRowDirty.csv’) 
data = fileObjectName.read()
newList=data.split(",")
fileObjectName.close()

del(newList[0])  # We looked, the file had a string label
newList.pop()    # Trailing empty item? pop it

while '' in newList:
    newList.remove('')  # 'elegant' way to remove all accidentally empty items

i=0
while i<len(newList):
    newList[i]=float(newList[i])  # or other data type as needed
    i+=1
```

### Example of Reading a Clean File in Columns
The file object method `=fo.readlines()` will automatically parse each row into its own list item. 

```python
newList=[]
fileObjectName = open('dataInColumnClean.csv') 
newList=fileObjectName.readlines()
fileObjectName.close()

i=0
while i< len(newList):
    newListp[i]=float(newList[i])
    i+=1
```