### CS4102 - Geometric Foundations of Data Analysis I
Prof. Götz Pfeiffer<br />
School of Mathematical and Statistical Sciences<br />
University of Galway

# Week 1: Least Squares Fitting

## Questions

1. Python - how?
2. Data - how does it get into a python session
4. How to multiply (transpose, invert) matrices in python?
3. Visualization - packages?

## 1. Python

Python is

* a **programming language** with built-in **functions** (`sum(list)`),

* accompanied by a **standard library** of **modules** (`from math import pi`),

* support by a large **ecosystem** of **packages** (`import numpy as np`).

More info is on the web:

* [python.org](https://www.python.org)

* [ipython.org](https://ipython.org)

* [anaconda.com](https://www.anaconda.com)

* [jupyter.org](https://jupyter.org)

### Python as a Pocket Calculator

* Sums of numbers:

In [None]:
17 + 4

* Sums of strings:

In [None]:
 '17' + '4'

* String repetition:

In [None]:
24 * '7'

* The **sum** of a number and some text, however, is **not defined** and will produce an **error message** instead of a result:
```python
24 + '7'
```

* Conversion to type `str`

In [None]:
str(24)

In [None]:
str('24')

* Conversion to type `int`

In [None]:
int('24')

* Not every text can be converted to a number:
```python
int('three')
```
will produce an error ...

A more detailed tutorial on basic python is to follow ...

## 2. Data

* The $x$- and $y$-values for today's example are contained in a file `production.csv`.

### Magic

* The jupyter magic command `%cat` prints the content of a file, but does not load the data into the current session ...

In [None]:
%cat production.csv

### Python

* Python's basic **file handling** commands are `open`,  `read` and  `close`:

In [None]:
f = open('production.csv')
print(f.read())
f.close()

* `readline` reads a file line by line ...

In [None]:
f = open('production.csv')
print(f.readline())
print(f.readline())
f.close()

* a `for` loop over the file has a similar effect:

In [None]:
f =  open('production.csv')
for row in f:
    print(row)
f.close()

* `readlines` (plural!) produces a (python) list of all the lines in the file

In [None]:
f =  open('production.csv')
print(f.readlines())
f.close()

* Note how every line ends in a **newline** character `\n`

### CSV: Values Separated by Commas

* The standard library contains the `csv` module for dealing with `csv` files: https://docs.python.org/3/library/csv.html

* `csv.reader` turns each row (including the header) into a list of (string) values:

In [None]:
import csv
with open('production.csv') as csvfile:
    reader = csv.reader(csvfile)
    for row in reader:
        print(row)

* `csv.DictReader` interprets the header as **keys** and turns each row into a (python) **dictionary** (a list of **key/value** pairs)

In [None]:
with open('production.csv') as csvfile:
    reader = csv.DictReader(csvfile)
    rows = [row for row in reader]
    
rows

* From this list of dictionaries, the $x$- and $y$-values can be extracted, for example by (python) **list comprehension**:

In [None]:
xs = [int(row['x']) for row in rows]
xs

In [None]:
ys = [int(row['y']) for row in rows]
ys

### The `pandas`  Package

* `pandas` is a large python package for data manipulation and analysis (https://pandas.pydata.org/)
* It contains its own `csv` reader, and many methods for working with the resulting table of data:

In [None]:
import pandas as pd

In [None]:
data = pd.read_csv('production.csv')
data

## 3. Matrix Algebra

* Plain python doesn't do matrix multiplication, or linear algebra
* Some built-in commands can help with computing values of interest.

* $\sum_i x_i$  and $\sum_i y_i$

In [None]:
sum(xs)

In [None]:
sum(ys)

* $\sum_i x_i^2$

In [None]:
sum(x**2 for x in xs)

* $\sum_i x_i y_i$

In [None]:
sum(xs[i] * ys[i] for i in range(len(xs)))

In [None]:
sum (x * ys[i] for i, x in enumerate(xs))

In [None]:
sum(x * y for x, y in zip(xs,ys))

## 4. Plotting

* later ...

## Exercises

* Find and study the **documentation** for those elements of python in this notebook which are new to you.
* Write down any **remaining questions** and bring them to our next class.