# Importing and exporting data
Native Python has a lot of possibilities to import and export data. This notebook gives a quick overview over the different methods.

## Importing data

### open(), read(), close()

In [None]:
input_file_name = "../data/test_textfile.txt"  # your file name here

To open a file, `open(file_name, mode)` can be used. `mode` defines the way, the file is handled. "file_handling_modes.png" ([source](https://stackoverflow.com/questions/6648493/how-to-open-a-file-for-both-reading-and-writing)) gives a great overview of the possible modes.

In [None]:
f = open(input_file_name, 'r')

Using `read()`, the contents of `f` can be accessed. Note, that `f.read()` returns the contents as a long string.

In [None]:
content = f.read()

In [None]:
content

At the end of the code, the file needs to be closed again, using `close()`.

In [None]:
f.close()

### with open()
To automatically close the file after usage, one can make use of the following syntax.

In [None]:
with open(input_file_name, 'r') as f:
    content = f.read()
    print(content)

### csv-Reader

When dealing with data, it is highly impratical to use the above methods, because the whole content will be turned into ONE SINGLE string. The `csv`-module provides useful tools for dealing with big data sets. It is a native Python module and doesn't need to be installed separately. 

In [None]:
import csv

In [None]:
input_file_name = "../data/test_dataset.csv"

We are able to initialize a csv-Reader using `csv.reader(file, delimiter)`. `delimiter` should be set to the charachter(s) separating the values. There are several other arguments that can be used, see [the official documentation](https://docs.python.org/3/library/csv.html#csv.reader) for more.

In [None]:
with open(input_file_name, 'r') as f:
    reader = csv.reader(f, delimiter=',')
    content = []

    for row in reader:
        content.append(row)

After executing the code above, `content` is a list of all rows, where each row is a list of the values without the delimiters. To access a specific column, we can make use of np.array-slicing.

In [None]:
import numpy as np

In [None]:
content = np.array(content)

# Extract all x-values (excluding the header)
x_arr = content[1:, 0]

print(x_arr)

## Exporting data
Using `"w"` (or other possible `mode` arguments), data from the code can be written into a file.

In [None]:
# Create some sample data
import numpy as np

x = np.arange(-10, 10.5, 0.5)
y = x ** 2

In [None]:
# Delcare name of output file
output_file_name = "../data/test_export.csv" 

### open(), write()
We open the file in a similar fashion as before. `write()` takes a string as an argument an writes it into the file. 

In [None]:
with open(output_file_name, "w") as f:
    
    # Write a header
    f.write("x,y\n")

    for i in range(len(x)):
        f.write(f"{x[i]},{y[i]}\n")

### csv-Writer
The `csv`-module also provides us with a Writer for writing .csv-files. When using `open()`, the keyword argument `newline` should be set to an empty string. Otherwise there would be an empty line after each written row. \
The `writerow()`-function of the `csv.writer` takes lists as an argument. Like with the `csv.reader` every list stands for one written line.

In [None]:
with open(output_file_name, 'w', newline='') as f:
    
    writer = csv.writer(f, delimiter=',')
    
    # Write a header
    writer.writerow(['x','y'])
    
    for i in range(len(x)):
        writer.writerow([x[i],y[i]])

## Example
`students_dataset.csv` (uploaded to StudIP) contains a list of students with their age and their study course. \
Using the `csv`-module
- calculate the average age of the students (*Hint: Transform the age-column into an `numpy`-array with integers*) 

**Extra**
- count the number of participants in each course \
(*Hint: Create an array `course_names` that stores each unique course name and an array `course_count` that stores the counts. Iterate over the "study course"-column.*) \
(*Hint 2: Use the `course_names.index(course)` function, to find the index of the `course` in the `course_names`-array.*)  

- create a csv file that gives a overview of the number of participants in each course (columns: study course, number of participants)

In [None]:
students_file = "../data/students_dataset.csv"

### average age

In [None]:
with open(students_file, 'r') as f:
    csv_reader = csv.reader(f, delimiter=',')
    students_data = []

    for row in csv_reader:
        students_data.append(row)

print(students_data)

In [None]:
students_data = np.array(students_data)
age_column = np.array(students_data[1:,1], dtype=int)

In [None]:
print("Average age:", np.average(age_column))

# Alternative
sum = 0
for i in age_column:
    sum += i

average = sum / len(age_column)
print("Average age:", average)

### Number of course participants

In [None]:
course_column = np.array(students_data[1:,2], dtype=str)

print(course_column)

In [None]:
course_names = []
course_counts = []

for i in course_column:
    if i not in course_names:
        course_names.append(i)
        course_counts.append(1)
    else:
        index_course = course_names.index(i)
        course_counts[index_course] += 1

print(course_names)
print(course_counts)

In [None]:
with open("../data/number_of_participants.csv", "w", newline="") as f:
    csv_writer = csv.writer(f, delimiter=',')

    csv_writer.writerow(['study course', 'number of participants'])

    for i in range(len(course_names)):
        csv_writer.writerow([course_names[i], course_counts[i]])