# TM351 VM Installation Test

This notebook provides a series of tests to ensure that the virtual machine is running correctly.

Run each cell in turn by clicking the play button or keyboard shortcut `shift-return`. (A full list of keyboard shortcuts can be found from the *Help* menu or via the keyboard shortcut `ESC-h`.)

The cells should run without error.

## Versions

Display the VM build version and build time, as well as database service versions and `pandas` version.

In [None]:
!cat /opt/version.txt

In [None]:
! psql --version

In [None]:
! mongod --version

In [None]:
import pandas as pd
pd.__version__

## Test Core Packages

In [None]:
import pandas as pd

In [None]:
import matplotlib.pyplot as plt

In [None]:
#When this cell is run, a simple line chart should be displayed
plt.plot([1,2,3,4])
plt.ylabel('some numbers')
plt.show()

## Database tests

Check that the database services are running as required.

In [None]:
#SET DATABASE CONNECTION STRINGS
import os
if os.environ.get('DOCKERBUILD')!='1':
    #Database connection strings for monolithic VM
    PGCONN='postgresql://tm351:tm351@localhost:5432/tm351'
    MONGOHOST='localhost'
    MONGOPORT=27351
else:
    #Database connection strings for docker build
    PGCONN='postgresql://postgres:PGPass@postgres:5432/tm351'
    MONGOHOST='mongodb'
    MONGOPORT=27017
MONGOCONN='mongodb://{MONGOHOST}:{MONGOPORT}/'.format(MONGOHOST=MONGOHOST,MONGOPORT=MONGOPORT)

### PostgreSQL

Check the connection to the PostgreSQL server.

In [None]:
from sqlalchemy import create_engine
engine = create_engine(PGCONN)

In [None]:
#Run a simple query on a default table
from pandas import read_sql_query as psql

psql("SELECT table_schema,table_name FROM information_schema.tables \
    ORDER BY table_schema,table_name LIMIT 3;", engine)
#A table containing three rows should appear

#### SQL Cell Magic

We can use cell magics to allow the writing of SQL statements within a code cell flagged appropriately.

To invoke the cell magic in a cell, we need to run the following (though we could perhaps autoload this in every notebook?)

In the following example, magic SQL cells will be configured to run as a the root user:

In [None]:
%load_ext sql
%sql {PGCONN}

In [None]:
%%sql
SELECT table_schema,table_name FROM information_schema.tables ORDER BY table_schema,table_name LIMIT 1;

Test the ability to pull the result of a SQL query directly into a dataframe:

In [None]:
demo=%sql SELECT table_schema FROM information_schema.tables LIMIT 3
demo

### MongDB

Test that the mongoDB database is running... This example also shows how to connect to the database.

In [None]:
import pymongo
from pymongo import MongoClient

In [None]:
#If connecting to the default port, you can omit the second (port number) parameter
# Open a connection to the Mongo server, open the accidents database and name the collections of accidents and labels
c = pymongo.MongoClient(MONGOCONN)

By default, this database should contain an accidents database along with any default databases.

In [None]:
c.database_names()

In [None]:
db = c.accidents
accidents = db.accidents
accidents.find_one()

### Sharded MongoDB server

A sharded mongo server, populated with content, is also provided:

- start the sharded server: `!/etc/mongo-shards-up`
- stop the sharded server: `!/etc/mongo-shards-up`

In [None]:
#Quick way to kill all mongo processes...
!sudo killall mongod
!sudo killall mongos
#...then bring the base mongo server as service on 27351 back up
!sudo systemctl restart mongodb

In [None]:
!sudo /etc/mongo-shards-down
!sudo /etc/mongo-shards-up

Once again, an *accidents* database should be available as well as administrative databases.

In [None]:
c2 = pymongo.MongoClient('mongodb://localhost:27017/')
c2.database_names()

In [None]:
#Test a query on the sharded database
db = c2.accidents
accidents = db.accidents
accidents.find_one()

In [None]:
#Turn the sharded server off
!/etc/mongo-shards-down

## Chart Tests

Viewing data in charts provides a handy way of actually looking at your data...

In [None]:
import seaborn

from numpy.random import randn
data = randn(75)
plt.hist(data);
#Running this cell should produce a histogram.

In [None]:
ts = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2000', periods=1000))
ts = ts.cumsum()
ts.plot();
#Running this cell should produce a line chart.

## Maps

Several of the data investigations may benefit from displaying data on a map. Test that the mapping functions work:

In [None]:
import folium
#Note - this will not display a map if you are offline.

#A network connection is required to retrieve the map tiles
osmap = folium.Map(location=[52.01, -0.71], zoom_start=13,height=500,width=800)
folium.Marker([52.0250, -0.7056], popup='The <b>Open University</b> campus.').add_to(osmap)
osmap

In [None]:
#Example of how to explicitly save map as HTML file
osmap.save('test.html')

## Other VM Services

- [OpenRefine - by default on host port 35181](http://127.0.0.1:35181)

From the notebook home page (`/tree`) you should also be able to launch a terminal as well as a new notebook.