**UNDER CONSTRUCTION**<br>
Version 01.05.2019, A. S. Lundervold

# Python, Numpy, Pandas, Matplotlib and Pytorch

If you're able to successfully run through this notebook then your Python environment is correctly configured.

Go to **ML - Part 0 - Getting started** in [Canvas](https://hvl.instructure.com) for more information about the software we'll use in the course. If you get any error messages when running the code in this notebook, go to https://github.com/alu042/DAT158ML for instructions.

## How to use Jupyter Notebook?

[Jupyter Notebook](http://jupyter.org/) is a convenient tool for experimenting with code. All text and code is written in HTML, Markdown and Python.

Use the arrow keys to navigate between cells. Press ENTER on a cell to enter edit mode. ESC to go back. (Try it now!)

In [None]:
print("This is a Jupyter cell containing Python code. Hit 'Run' in the menu to run the cell. ")

You can also run cells using **Shift+Enter** and **Ctrl+Enter**. Try running the above cell using both of these. 

Use Jupyter's Help menu above for more information.

DAT158-ML will use Notebook for most of our coding. You'll get good Notebook skills after a while.

Here's a nice tutorial on Jupyter Notebook that I recommend that you at least skim through: [Jupyter Notebook Tutorial: The Definitive Guide](https://www.datacamp.com/community/tutorials/tutorial-jupyter-notebook).

### Exercise
- Experiment with *Tab completion* and *tooltips* in Jupyter
- Read about Jupyter *magic commands*.

Hint: Google. Or have a look <a href="http://nbviewer.jupyter.org/github/jvns/pandas-cookbook/blob/v0.2/cookbook/A%20quick%20tour%20of%20IPython%20Notebook.ipynb">here</a>.

# Import libraries

These are libraries we'll use frequently:

In [None]:
# To display plots directly in notebooks:
%matplotlib inline

In [None]:
# A commonly used plotting library:
import matplotlib
import matplotlib.pyplot as plt

In [None]:
# An extension of matplotlib that can generate even nicer plots:
import seaborn as sns

In [None]:
# A library for efficient manipulation of matrices (and more):
import numpy as np

In [None]:
# To read, write and process tabular data:
import pandas as pd

In [None]:
# For machine learning:
import sklearn

In [None]:
# For neural networks ("deep learning")
# This has been commented out as we won't need it until later in the course

#import torch
#import torchvision

# Test libraries

**NB:** The purpose of the following is to test your installation. Don't worry if things don't make much sense to you right now. It'll all become familiar during the course.

## `Numpy`

In [None]:
import numpy as np

In [None]:
a = np.array([1, 2, 3])
print(type(a))

In [None]:
e = np.random.random((3,3))
e

## `matplotlib`: a simple plot 

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

This should result in a figure displaying a sine function.

In [None]:
# Data to be plotted (generated using Numpy)
t = np.arange(0.0, 2.0, 0.01)
s = 1 + np.sin(2 * np.pi * t)

# Create a figure of a certain size
f = plt.figure(figsize=(8,4))

# Plot t versus s
plt.plot(t, s)

# Add title and labels:
plt.title('A simple plot')
plt.xlabel('time (s)')
plt.ylabel('voltage')

# Vis plot:
plt.show()

## `Seaborn`: a more advanced plot

In [None]:
import seaborn as sns

Source: [Link](https://seaborn.pydata.org/examples/scatterplot_categorical.html)

In [None]:
sns.set(style="whitegrid", palette="muted")

# Load the example iris dataset
iris = sns.load_dataset("iris")

# "Melt" the dataset to "long-form" or "tidy" representation
iris = pd.melt(iris, "species", var_name="measurement")

# Set up figure
f, ax = plt.subplots(figsize=(8,8))

# Draw a categorical scatterplot to show each observation
sns.swarmplot(x="measurement", y="value", hue="species", data=iris, size=5, ax=ax)

plt.show()

## `Pandas`

In [None]:
import pandas as pd

In [None]:
df = pd.read_csv('0.0-test_data.csv')

In [None]:
df.head()

In [None]:
df['age'].hist()
plt.title("Histogram of age")
plt.xlabel("Age")
plt.show()

## `scikit-learn`: machine learning

In [None]:
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

In [None]:
data = datasets.load_breast_cancer()

In [None]:
X = data['data']
y = data['target']
features = data['feature_names']
labels = data['target_names']

In [None]:
print(features)

In [None]:
print(labels)

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y)

In [None]:
rf = RandomForestClassifier(n_estimators=100)

In [None]:
rf.fit(X_train, y_train)

In [None]:
predictions = rf.predict(X_test)

In [None]:
accuracy_score(y_test, predictions) * 100

## `PyTorch`: Advanced machine learning / neural networks

> YOU CAN SKIP THIS FOR NOW. We won't need PyTorch until the end of the course. If you want to run the below code you'll have to install PyTorch in your DAT158 environment by running
```
conda activate dat158
conda install -c pytorch pytorch torchvision
```

The below example is from PyTorch's "Deep learning with PyTorch: A 60 minute blitz": https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html. In the neural networks part of DAT158 you'll learn what neural networks are and how to build them essentially from skratch using PyTorch.

**To test your installation you only have to run the first cell below.** Run the rest only of you're curious.. :-)

In [None]:
import torch
import torchvision
import torchvision.transforms as transforms

In [None]:
transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=4,
                                         shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

In [None]:
import matplotlib.pyplot as plt
import numpy as np

# functions to show an image

def imshow(img):
    img = img / 2 + 0.5     # unnormalize
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))


# get some random training images
dataiter = iter(trainloader)
images, labels = dataiter.next()

# show images
imshow(torchvision.utils.make_grid(images))
# print labels
print(' '.join('%5s' % classes[labels[j]] for j in range(4)))

In [None]:
import torch.nn as nn
import torch.nn.functional as F


class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x


net = Net()

In [None]:
import torch.optim as optim

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

In [None]:
for epoch in range(2):  # loop over the dataset multiple times

    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        # get the inputs
        inputs, labels = data

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        # print statistics
        running_loss += loss.item()
        if i % 2000 == 1999:    # print every 2000 mini-batches
            print('[%d, %5d] loss: %.3f' %
                  (epoch + 1, i + 1, running_loss / 2000))
            running_loss = 0.0

print('Finished Training')

In [None]:
dataiter = iter(testloader)
images, labels = dataiter.next()

# show images
imshow(torchvision.utils.make_grid(images))
print('GroundTruth: ', ' '.join('%5s' % classes[labels[j]] for j in range(4)))

In [None]:
_, predicted = torch.max(outputs, 1)

print('Predicted: ', ' '.join('%5s' % classes[predicted[j]]
                              for j in range(4)))

In [None]:
class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))
with torch.no_grad():
    for data in testloader:
        images, labels = data
        outputs = net(images)
        _, predicted = torch.max(outputs, 1)
        c = (predicted == labels).squeeze()
        for i in range(4):
            label = labels[i]
            class_correct[label] += c[i].item()
            class_total[label] += 1


for i in range(10):
    print('Accuracy of %5s : %2d %%' % (
        classes[i], 100 * class_correct[i] / class_total[i]))