# Reading and Writing files

<b>Resources:</b> 

Google is your best friend! Searching in english is highly recommended.  

Numpy - http://www.numpy.org/

Numpy arrays - https://docs.scipy.org/doc/numpy/reference/generated/numpy.array.html

http://www.pythonforbeginners.com


# File Types

In Python, a file is categorized as <b>either text or binary</b>, and the difference between the two file types is important. 

Text files are structured as a sequence of lines, where each line includes a sequence of characters. This is what you know as code or syntax. 

Each line is terminated with a special character, called the EOL or End of Line character. It ends the current line and tells the interpreter a new one has begun. 

A binary file is any type of file that is not a text file. Because of their nature, binary files can only be processed by an application that know or understand the file’s structure. In other words, they must be applications that can read and interpret binary. There are several python libraries available to read and write binary files. E.g. pandas (https://pandas.pydata.org/), astropy (http://www.astropy.org/). 



# Open function

Reading or writing files is handled natively in python.

You can use the built-in <b>open()</b> function to open a file. 

When you use the open function, it returns something called a file object. File objects contain methods and attributes that can be used to collect information about the file you opened. They can also be used to manipulate said file.

For example, the mode attribute of a file object tells you which mode a file was opened in. And the name attribute tells you the name of the file that the file object has opened. 

The syntax to open a file object in Python is the following: 

<b>file_object  = open(“filename”, “mode”)</b> where file_object is the variable to add the file object. 

mode – tells the interpreter and developer which way the file will be used.

Including a mode argument is optional because a default value of ‘r’ will be assumed if it is omitted. The ‘r’ value stands for read mode, which is just one of many. 

The modes are: 

- ‘r’ – Read mode which is used when the file is only being read 
- ‘w’ – Write mode which is used to edit and write new information to the file (any existing files with the same name will be erased when this mode is activated) 
- ‘a’ – Appending mode, which is used to add new data to the end of the file; that is new information is automatically amended to the end 
- ‘r+’ – Special read and write mode, which is used to handle both actions when working with a file 

# Create a text file

Let’s create our own file: “testfile.txt”. 


In [247]:
# create a file:
file = open('testfile.txt','w') 
 
file.write('Hello World') 
file.write('This is our new text file') # this will appended to the end of the file 

# to include a line break use: \n
file.write('\n and this is another line.') 

# close the file object:
file.close() 

# Reading a Text File in Python

There are a number of ways to read a text file in Python. 

If you need to extract a string that contains all characters in the file, you can use the following method: 


<b>file.read()</b> 

In [248]:
file = open('testfile.txt', 'r') 
print(file.read())

Hello WorldThis is our new text file
 and this is another line.


Another way to read a file is to call a certain number of characters.  

For example, with the following code the interpreter will read the first five characters of stored data and return it as a string: 

In [249]:
file = open('testfile.txt', 'r')
 
print(file.read(5)) 

Hello


If you want to read a file line by line – as opposed to pulling the content of the entire file at once – then you use the <b>readlines()</b> function. 

In [250]:
file = open('testfile.txt', 'r')
print(file.readlines()) 

['Hello WorldThis is our new text file\n', ' and this is another line.']


In [251]:
# to access a specific line:

file = open('testfile.txt', 'r')
lines = file.readlines()

print(lines[1]) 

 and this is another line.


# Looping over a file

When you want to read – or return – all the lines from a file in a more memory efficient, and fast manner, you can use the loop over method. 

In [252]:
file = open('testfile.txt', 'r')
for line in file: 
    print(line) 

Hello WorldThis is our new text file

 and this is another line.


To close a file use the <b>close()</b> function. 


In [253]:
file.close()

# Another way to write into a file

Another way to write into a file is using the <b>print()</b> function.

In [254]:
print('Hello World!', file=open("testfile.txt", "w")) # using "w" argument will rewrite the file use "a" to append

file.close()

In [255]:
# Let's see what is in our file? 

file = open('testfile.txt', 'r')
print(file.read())     
    
file.close()    

Hello World!



# Creating a longer file or data table

A longer file or data table can be easily created by using loops and the <b>print()</b> or <b>write()</b> functions.

In [256]:
# create some data:

numbers = range(0,10)
numbers_2 = range(10,20)
numbers_3 = range(20,30)

for i in numbers:
    print(i)

0
1
2
3
4
5
6
7
8
9


In [257]:
# empty out the file:
file=open("testfile.txt", "w")
file.close()

#write data into a file with a loop:
for i in range(len(numbers)):
    print(numbers[i], numbers_2[i], numbers_3[i], file=open('testfile.txt', 'a')) # using "a" will append to the file

file.close()
    
file = open('testfile.txt', 'r')
for line in file: 
    print(line)     

0 10 20

1 11 21

2 12 22

3 13 23

4 14 24

5 15 25

6 16 26

7 17 27

8 18 28

9 19 29



In [258]:
# Another Way to write a table into a file
file=open("testfile.txt", "w")

for i in range(len(numbers)):
    file.write(str(numbers[i])+' '+str(numbers_2[i])+' '+str(numbers_3[i])+'\n') 

file.close()
    
file = open('testfile.txt', 'r')
for line in file: 
    print(line) 

0 10 20

1 11 21

2 12 22

3 13 23

4 14 24

5 15 25

6 16 26

7 17 27

8 18 28

9 19 29



How to make a data table with ',' spearated values:

In [259]:
# creating a data table
file=open("testfile.txt", "w")  #this can also be saved testfile.csv and oppened in excel 
file.write('ones, tens, twenties\n') 

for i in range(len(numbers)):
    file.write(str(numbers[i])+', '+str(numbers_2[i])+', '+str(numbers_3[i])+'\n') 

file.close()
    
file = open('testfile.txt', 'r')
for line in file: 
    print(line) 

ones, tens, twenties

0, 10, 20

1, 11, 21

2, 12, 22

3, 13, 23

4, 14, 24

5, 15, 25

6, 16, 26

7, 17, 27

8, 18, 28

9, 19, 29



# Splitting lines and reading in data files

For this we can use the <b>split()</b> function.

In [260]:
file = open('testfile.txt', 'r')
data = file.readlines()
for line in data:
    numbers = line.split()
    print(numbers) 
    
file.close()  

['ones,', 'tens,', 'twenties']
['0,', '10,', '20']
['1,', '11,', '21']
['2,', '12,', '22']
['3,', '13,', '23']
['4,', '14,', '24']
['5,', '15,', '25']
['6,', '16,', '26']
['7,', '17,', '27']
['8,', '18,', '28']
['9,', '19,', '29']


In [261]:
# define lists
ones = []
tens = []
twenties = []

# reade date from line
file = open('testfile.txt', 'r')
data = file.readlines()
for line in data:
    numbers = line.split()
    ones.append(numbers[0])   # we use append() to append elements to a list
    tens.append(numbers[1])
    
file.close()  

print(ones)
print(tens)

['ones,', '0,', '1,', '2,', '3,', '4,', '5,', '6,', '7,', '8,', '9,']
['tens,', '10,', '11,', '12,', '13,', '14,', '15,', '16,', '17,', '18,', '19,']


We can loop over all the lines by defining i = number of lines. 

How do we know how many lines we have in the file? - we can use the <b>len()</b> function to count the elemnets in 'data'. E.g. len(data).

Then we can create a list of indexes with the same number of elements as 'data' with <b>range()</b>:

<b>range(len(data))</b>

In [262]:
indexes = range(len(data))

In [263]:
# define lists
ones = []
tens = []

# reade date from line
file = open('testfile.txt', 'r')
data = file.readlines()
for i in range(1,len(data)):
    numbers = data[i].split(',')
    ones.append(numbers[0]) 
    tens.append(numbers[1])
    
file.close()  

print(ones)

['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']


# Python modules

A module is a Python object with arbitrarily named attributes that you can bind and reference. Simply, a module is a file consisting of Python code. A module can define functions, classes and variables.

<b>Important note: Most comonly used modules don't come with the basic python distribution, you will need to install modules separately. The good news is, this can be done in the same way as you installed python (e.g. macports, pip, anaconda ...).</b>

One of the exception modules is "math". This module can do basic mathematical operations. A bit like a calculator. 

A list of included modules is here: https://docs.python.org/3/library/index.html

## Most popluar modules

### Matplotlib

Matplotlib is a Python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms. Matplotlib can be used in Python scripts, the Python and IPython shells, the Jupyter notebook, web application servers, and four graphical user interface toolkits.

Matplotlib tries to make easy things easy and hard things possible. You can generate plots, histograms, power spectra, bar charts, errorcharts, scatterplots, etc., with just a few lines of code.

Matplotlib: https://matplotlib.org/index.html

Examples of plots: https://matplotlib.org/gallery.html   (This is super useful!)

### Scipy

SciPy (pronounced “Sigh Pie”) is a Python-based ecosystem of open-source software for mathematics, science, and engineering. In particular, these are some of the core packages:

- Numpy
- Pandas
- Scipy library
- matplotlib
- IPython
- Sympy

The Scipy library provides many user-friendly and efficient numerical routines such as routines for numerical integration and optimization.

### Numpy 

NumPy is the fundamental package for scientific computing with Python. It contains among other things:

- a powerful N-dimensional array object
- sophisticated (broadcasting) functions
- tools for integrating C/C++ and Fortran code
- useful linear algebra, Fourier transform, and random number capabilities

Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional container of generic data. Arbitrary data-types can be defined. This allows NumPy to seamlessly and speedily integrate with a wide variety of databases.

### Pandas

Pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

https://pandas.pydata.org/

### Astropy

All sorts of astronomomy related tools. E.g. using coordinates, units, fits files ...

http://www.astropy.org/

### Various interfaces to other programs


## Modules I will use

- numpy
- matplotlib

## how to import a module

Very simple:

<b>import "module name"</b> - e.g. >import math is the most simple module

It is possible to import specific functions and to make shortcuts:

<b>import numpy as np</b>  - from here on we can reference all numpy functions as "np.function()" e.g. "np.array()"

<b>import matplotlib.pyplot as plt</b> - this is very commonly used to make plots (more about this later)

Only importing a specific function:

<b>from matplotlib import pyplot</b>

In [264]:
import math

In [265]:
math.pi

3.141592653589793

In [266]:
math.sqrt(2)

1.4142135623730951

In [267]:
math.sin(45)

0.8509035245341184

In [268]:
math.log10(10)

1.0

# Converting lists into arrays

One of the most fundamental data structures in any language is the array. Python doesn't have a native array data structure, but it has the list which is much more general and can be used as a multidimensional array quite easily.

Lists are a very useful feature of python, however if we would like to perform more serious mathemathical operations, arrays are much better to use.  

<b>More information:</b>

Numpy - http://www.numpy.org/

Numpy arrays - https://docs.scipy.org/doc/numpy/reference/generated/numpy.array.html

A quick example using the numpay python package (more about numpy later):

In [269]:
import numpy as np

ones_ar = np.array(ones,dtype=float)

print(ones_ar)

[0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]


In [270]:
print(ones)
print(ones * 2)

['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9']


In [271]:
# converting strings to numbers (floats - 16 decimal digits 'float()' - or integers 'int()' )

for i in range(len(ones)):
    ones[i] = float(ones[i])
    
print(ones)    

[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]


In [272]:
print(ones * 2)

[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]


In [273]:
#print(ones + 2)

## Simple operations with arrays

array + number -> array[j] + number 

array * number -> array[j] * number

array1 * array2 -> array1[j] * array2[j]

In [274]:
print(ones_ar * 2.)

[ 0.  2.  4.  6.  8. 10. 12. 14. 16. 18.]


In [275]:
print(ones_ar + 2)

[ 2.  3.  4.  5.  6.  7.  8.  9. 10. 11.]


In [276]:
tens_ar = np.array(tens,dtype=float)

#print(ones_ar)
#print(tens_ar)

print(ones_ar * tens_ar)

[  0.  11.  24.  39.  56.  75.  96. 119. 144. 171.]


# # Exercise

Read in data table (iris.csv) into lists. One list for each column (sepal_length, sepal_width, petal_length, petal_width, specis)

1. Define lists
2. open the file
3. read in the individual lines in the file
4. split up the lines based on the deliminator (',')
5. add data to lists
6. print your lists to check them
