# Week 1. Tools (scratchy notes)
Cognitive Systems for Health Technology Applications<br>
16.1.2019, Sakari Lukkarinen<br>
[Helsinki Metropolia University of Applied Sciences](https://www.metropolia.fi/en/)

## Contents

- [Anaconda](https://www.anaconda.com/)
- [Python](https://www.python.org/)
- [numpy](http://www.numpy.org/)
- [scipy](https://www.scipy.org/)
- [matplotlib](https://matplotlib.org/)
- [pandas](http://pandas.pydata.org/)
- [keras](https://keras.io/)

# [1. What is anaconda?](https://www.anaconda.com/what-is-anaconda/)

https://docs.anaconda.com/

### [Anaconda Distribution](https://www.anaconda.com/distribution/)
<img src="https://www.anaconda.com/wp-content/uploads/2017/08/Anaconda-Distribution-Diagram.png" alt="Anaconda Distribution Diagram" width="400">

### [Data science libraries](https://docs.anaconda.com/anaconda/packages/pkg-docs/)
<img src="https://www.anaconda.com/wp-content/uploads/2017/08/DataScienceLibraries-01.png" alt="Data Science Libraries" width="400">

### [Anaconda navigator](https://docs.anaconda.com/anaconda/navigator/)
<img src="https://www.anaconda.com/wp-content/uploads/2018/06/2018-06-navigator-macos.png" alt="Anaconda Navigator" width="400">



## [2. Python tutorial](https://docs.python.org/3/tutorial/index.html)

### [Importing Python modules](http://effbot.org/zone/import-confusion.htm)

In [None]:
%pylab inline
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from pandas import read_csv

### [if-statement](https://docs.python.org/3/tutorial/controlflow.html#if-statements)

In [None]:
x = int(input("Please enter an integer: "))

if x < 0:
    x = 0
    print('Negative changed to zero')
elif x == 0:
    print('Zero')
elif x == 1:
    print('Single')
else:
    print('More')

### [for statements](https://docs.python.org/3/tutorial/controlflow.html#for-statements)

In [None]:
# Measure some strings:
words = ['cat', 'window', 'defenestrate']
for w in words:
    print(w, len(w))

In [None]:
for i in range(5):
    print(i)

In [None]:
a = ['Mary', 'had', 'a', 'little', 'lamb']
for i in range(len(a)):
    print(i, a[i])

### [Functions](https://docs.python.org/3/tutorial/controlflow.html#defining-functions)

In [None]:
def fib(n):    # write Fibonacci series up to n
    """Print a Fibonacci series up to n."""
    a, b = 0, 1
    while a < n:
        print(a, end=' ')
        a, b = b, a + b
    print()

# Now call the function we just defined:
fib(2000)

### [Lists](https://docs.python.org/3/tutorial/introduction.html#lists)

In [None]:
squares = [1, 4, 9, 16, 25]
squares

### [More on lists](https://docs.python.org/3/tutorial/datastructures.html#more-on-lists)

In [None]:
squares = []
for x in range(10):
    squares.append(x**2)

squares

In [None]:
squares = [x**2 for x in range(10)]

squares

### [Tuples](https://docs.python.org/3/tutorial/datastructures.html#tuples-and-sequences)

In [None]:
t = 12345, 54321, 'hello!'
t[0]

In [None]:
x, y, z = t
print(x, y, z)

### [Sets](https://docs.python.org/3/tutorial/datastructures.html#sets)

In [None]:
basket = {'apple', 'orange', 'apple', 'pear', 'orange', 'banana'}
print(basket)                      # show that duplicates have been removed

In [None]:
'orange' in basket   

### [Dictionaries](https://docs.python.org/3/tutorial/datastructures.html#dictionaries)

In [None]:
tel = {'jack': 4098, 'sape': 4139}
tel['guido'] = 4127
tel

In [None]:
tel['jack']

## [3. Numpy tutorial](https://docs.scipy.org/doc/numpy/user/quickstart.html)

### [Array creation](https://docs.scipy.org/doc/numpy/user/quickstart.html#array-creation)

In [None]:
import numpy as np
a = np.array([2,3,4])
a

In [None]:
np.zeros( (3,4) )

In [None]:
from numpy import pi
x = np.linspace( 0, 2*pi, 10 )        # useful to evaluate function at lots of points
f = np.sin(x)
f

In [None]:
f[0:4]

In [None]:
f[-1]

## [4. Matplotlib tutorial](https://matplotlib.org/tutorials/index.html)

### [pyplot tutorial](https://matplotlib.org/tutorials/introductory/pyplot.html#sphx-glr-tutorials-introductory-pyplot-py)

In [None]:
import matplotlib.pyplot as plt

x = [1, 2, 3, 4]
y = [1, 4, 9, 16]

plt.figure()
plt.plot(x, y, 'o:')
plt.xlabel('Index')
plt.ylabel('some numbers')
plt.show()

## [5. Ten minutes to Pandas](http://pandas.pydata.org/pandas-docs/stable/10min.html)

### [IO tools](http://pandas.pydata.org/pandas-docs/stable/io.html)

See the data: http://archive.ics.uci.edu/ml/datasets/Heart+Disease

In [None]:
import pandas as pd

url = r'http://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/processed.cleveland.data'
dataframe = pd.read_csv(url, 
                        sep = ',', 
                        header = None, 
                        index_col = None,
                        na_values = '?')
dataframe.head()

In [None]:
# There are some missing values (NaN)
dataframe.tail()

### [Filling missing data with some values](http://pandas.pydata.org/pandas-docs/stable/missing_data.html#filling-with-a-pandasobject)

In [None]:
# Filling missing data with columnwise median values
dataframe = dataframe.fillna(dataframe.median())
dataframe.tail()

### (Re)name the columns

In [None]:
name_list = ['age', 'sex', 'cp','trestbps', 'chol', 'fbs','restecg',
             'thalac','exang','oldpeak','slope','ca','thal','num']

dataframe.columns = name_list
dataframe.head()

### [Descriptive statistics](http://pandas.pydata.org/pandas-docs/stable/10min.html#viewing-data)

In [None]:
# Full descriptive statistics
dataframe.describe()

In [None]:
# Mean value of age column
dataframe['age'].mean()

### [Visualization](http://pandas.pydata.org/pandas-docs/stable/visualization.html)

In [None]:
plt.figure()
dataframe['age'].hist(bins = arange(25, 81, 5)) # Calls matplotlib functions
plt.xlabel('Age')
plt.ylabel('Frequency')
plt.title('Age histogram')
plt.show()

In [None]:
# Try your own!