# Network speed test

## Returning to the 20th Century

Jupyter is all well and good, but sometimes what we want are simple Python scripts and a traditional scientific interface. This where Spyder comes in...

## Starting Spyder

Hit the "Activites" icon in the top-right and type "spyder"<Return>

[Spyder](https://www.spyder-ide.org/) has been growing over the last couple years and is quite well set up for scientific Python...

You will see on the left, a large editor window like any other IDE. On the bottom right, is an IPython notebook. This is really just the text version (and origin of) Jupyter. You can use it that same way, modulo interactive graphs and plotting (which you can instead write to a file, say). The top right window provides a handy way of accessing object documentation.

# Virtualenv
## Building sandcastles

Virtualenv provides a means of sandboxing a Python installation, that is, installing a particular version of Python and all the modules you want in a directory, and using everything from there.

* Press the Windows key and start typing "Terminal" - when you see a black computer screen icon, click it.

Put up your arrows when done!

`venv` is a Python for setting up virtual environments - where you can create an "environment" and install completely separate packages from the system or any other environment. Basically, it's a directory with a load of Python modules

* In the `python-course` directory execute **`python3 -m venv env`**
  * This creates a new subdirectory called "env"
* Execute **`source env/bin/activate`**
  * This loads the new environment

On Windows, this is basically the same but without `source` and swapping slash direction. Mac is the same.

However, I will recommend a tool called Anaconda, which can vastly simplify Python configuration on Windows and Mac - as I only work with Linux at present, I haven't needed it, but it is highly recommended for those systems. Anaconda is a "distribution" of Python, which bundles a version of the Python interpreter, a whole load of modules and a similar, but slightly different, way of working with environments. Instead of `pip`, it uses a package manager called `conda`.

* Run **`pip3 install sympy spyder`**
  * Notice on the third-last line, it is working inside `env`

`pip` is Python's home-grown package manager. `pip3` is the Python3 version, which is the one we want in this course. It installs modules from [PyPI](https://pypi.python.org/pypi), the official online repository for modules. It is similar to, say, *packagist* for PHP, *rubygems* for Ruby or *Hackage* for Haskell. 

* When you want to return to the normal system Python, you can execute **`deactivate`**
* (but don't yet)

We want to use Spyder within the virtual environment. This means we have to start it from the terminal. To do so, type ``spyder3 & <Return>``. The ampersand (&) tells Linux that we want to continue entering commands while Spyder is running. Go back to the terminal and hit ``<Return>`` once or twice and you should see the normal prompt waiting for more input.

Since many of you are computer scientists, and I'm not, I am going to to take a risk and theme this session on measuring latency between a server and client. Please ignore the simplistic methodology - if you find a few minutes spare, feel free to improve the approach! However, the aim here is to show you some tools, so don't worry too much about the numbers.

More broadly, for those who are coming from an engineering end, I can say from experience that if you start to scale up your number of simulations, you will be looking for bottlenecks and profiling. There are better, tailored utilities for doing so, but this should give you an idea about how to think about the problem.

# Latency
## Time tracking

Return to Spyder and open up **`006-latency/network_test_client.py`**

Nothing here should be too shocking. On line 21, you should remove the # and replace TBC with the number on the board beside me. As this is testing on a local server, please keep connection attempt values low, bearing in mind we're multiplying several loops together.

So far, this code simply sets up a few parameters and the logger, as we saw earlier. In the gap near the bottom we will put some code to calculate a variable `return_time`, which will be the average delay (from send to receive) over `repeats` number of 1K bounces off a server, (on a single socket connection).

You can follow along typing with me or, if you find it hard to concentrate on what I am saying while you type, open up **`network_test_client_partial_1.py`**

Now, I will start a function that we wish to time. Anyone who feels I am polluting the purity of the timing function by adding extra calls is welcome to improve their version!

In [None]:
# Define our actual measured operation
def round_trip(skt):
    # Create a random message to test our connection
    payload = os.urandom(1024)
    
    # Network-limited part
    skt.sendall(payload)
    received_payload = skt.recv(1024)
    
    if received_payload != payload:
        raise IOError("We received an incorrect echo")

We define this as a function that takes a TCP socket. It creates 1024 bytes of random data (this is the crypto-quality generator - unnecessary but useful for you to be aware of). We send our payload off down the wire, and expect to see it arrive here. Any exceptions will get thrown straight through for simplicity, but we will catch them later. 

We even throw our own exception, a common superclass for most IO exceptions, `IOError` when there is a problem with the payload. It is maybe helpful to subclass IOError (or a more appropriate error class) to create more specific exceptions - [see this link for a tutorial](https://docs.python.org/3/tutorial/errors.html#tut-userexceptions). Remember that the decision in an `except` statement, whether to handle an exception or pass it on it, is based on the class. Creating your own allows you to grab it (and only it) in a `try-except`.

In [None]:
# Use a `with` context to make sure the socket automatically
# gets cleaned up
with socket.create_connection(address=(host, port), timeout=timeout) as skt:
    logger.info("Created connection")
    # we will do some task in this gap
    logger.info("Completed trial")

We start writing this after our function. This is part of the main flow. I should mention, a more common pattern in Python, even in scripts, is to have virtually no global scope code like this. Instead, you would create a `main` or `run` function, like many compiled languages, and your only top-level call would be to run it. If you're feeling even more adventurous, you would wrap this in a class and create an application object, then call its `run` method, say.

To create the connection, we use the `socket` library convenience function `create_connection`. This saves a few lines, creating, listening and binding, but, for those of you who care about your network code, you can do those explicitly very easily. We have also used three of our parameters.

Note that `with` has made another appearance - anything with a so-called *context manager* methods works with a `with` statement. Here, the `socket` is guaranteed to be closed on exit, even if we leave via an exception. We name it `skt`.

In [None]:
try:
    with socket.create_connection(address=(host, port), timeout=timeout) as skt:
        logger.info("Created connection")
        # TESTING CODE HERE
        logger.info("Completed trial")
        
except OSError as e:
    logger.error(
        "We could not create a socket connection to the "
        "remote echo server"
    )
    raise e

I mentioned earlier that `try-except` is your friend. Bear in mind, if you highlight several lines in Spyder, Tab will indent them - Shift+Tab deindents.

If we get a socket error from anywhere inside the `with` (including our test routine from earlier), it will get caught here. In practice this is only being used to inject an extra logging line, but it illustrates the point. Note that we aren't catching the `IOError` I mentioned earlier. To do so, you can add an extra except clause, or turn "`OSError as e`" into "`(IOError, OSError) as e`", if we are happy to use the same one. Note that you do need the tuple parens (parens == parentheses).

In [None]:
        logger.info("Created connection")
        # This is going to add a bit of misleading overhead, but for this
        # purpose we'll use lambda for simplicity
        return_time = timeit.timeit(
            lambda: round_trip(skt),
            number=repeats
        )
        logger.info("Completed trial")

Finally, we make the call that will run our function. We use a module called `timeit` for this purpose. Again, this follows the Python theme of "don't roll your own, when experts have done it for you". If you're that ambitious, it is better to improve their code if you can than start from scratch, everybody wins. `timeit` is a core module and, supposedly, avoids a number of common function timing pitfalls.

To use it, you supply a routine to test timings for as the first argument, and the number of repeats as the second. Please keep that `number` argument in there, as the default is 100,000 and I would rather you all didn't hit my server with 100M of socket traffic at the same time.

Another feature of Python has been slipped in there at the same time. You can see a `lambda` function. This is a very simple construct of the form:

```python
lambda arg1, arg2: statement_using(arg1 + arg2)
```

It is equivalent to

```python
def func(arg1, arg2):
    return statement_using(arg1 + arg2)
```

Basically, anywhere we would use "func", passing it as a callback or whatever, we can swap our anonymous lambda function. In this particular case, `timeit` will always call the function in the first argument with no arguments, but we need to pass the socket to our routine. How do we solve this? By creating a function that `timeit` can call with no arguments, but that forwards the call on to `round_trip` with the `skt` variable shoehorned in.

Why don't we just name a function and forget about lambda? It is subjective, but here it is likely to confuse our code - what we need is a single line function that gets called with no arguments and calls `round_trip` with one. If we add another routine called `round_trip_caller` or something like that, on first glance we will wonder where it is being used and why, it doubles the number of `def` blocks in our code, and adds a couple of extra source lines that don't really clarify anything that a good comment wouldn't fix.

(If you want a reference version, or to save typing along, open ``network_test_client_complete.py``)

Hit F5, or go to `Run->Run`

If you get a dialog saying "`Run Settings`", choose "`Execute in a new dedicated Python console`" and continue.

You should now see some text appearing on the lower right hand pane. It shows the output of your code. You could also run this script from the command line with "`python3 network_test_client_complete.py`".

The final output line should show the average time taken, and be somewhere in the 100ths of a second. Well that's fine, but suppose we want to see if that bears up under simultaneous calls, instead of just consecutive ones. This is where threading comes in.

If you do not see the average time taken, then make sure you select the **``Console``** tab at the bottom right, and **``Python 1``** before running.

# Threading our way
## Weaving Python

Python makes threading straightforward, (with a couple of caveats). We will start with a short example and expand it using the code we have already written.

Open up "`network_test_client2.py`"

We have already imported the `threading` module. Now we need some threads...

In [None]:
# Threads are given identifiers as integers 1-N
# This is easier to handle in numpy
threads = [threading.Thread(target=run) for i in range(thread_count)]

# The items method turns a module into pairs (tuples) of key and value
for thread in threads:
    thread.start()

# Wait for all threads to complete
for thread in threads:
    thread.join()

First we introduce the concept of a *generator*. This is a Python construct that embeds a loop in quite a few possible places. Mostly you will see them used to create lists and dicts. The format is (for basic use):

```python
[fn(x) for x in iterable] --- e.g. [s.upper() for s in strings]
```

This takes each item in an iterable, such as list, and does something to it to provide each entry in a new list. The example given goes through a list of, say, strings and uppercases all of them. The whole thing is another (equal size) list. In our case, we are using a bit of short-hand to say we want "`thread_count`" threads, each element in the list being a new instance of the `threading.Thread` class.

We then start each thread, so a new parallel run of the `run` routine heads off. For completeness, we make sure every thread has finished before we reach the final `logging` statement. The `join` method blocks returning until that thread has completed.

In [None]:
def run():
    # Get the current thread    
    currentThread = threading.currentThread()
    
    # Send out a message
    logging.info("Hi, my name is {name}".format(
        name=currentThread.getName()
    ))

Here we add some simple text to the `run` routine. `threading` provides a handy `currentThread` function that each thread can call to get itself. When it does, we can get a name for it. We use a new method on the string class here: `format`. This is allows us to name fields in the string and is actually the recommended approach. The values themselves are passed as named arguments to `format`.

Try running again - you should get a "Hello" from every thread.

Unfortunately, we cannot rely on thread names to be unique and, while all threads get a unique ID, it's horrendously un-user-friendly. As such we will name our threads from 1 up to `thread_count`

In [None]:
# Generate some simple numeric way to refer to these
thread_indices = range(1, thread_count + 1)
threads = {i: threading.Thread(target=run) for i in thread_indices}

# The items method turns a module into pairs (tuples) of key and value
for idx, thread in threads.items():
    thread.index = idx
    thread.start()

# Wait for all threads to complete
for thread in threads.values():
    thread.join()

First of all, we change our threads list to a dict - that makes more conceptual sense if we are naming them. Our list of names are just the integers from 1 to `thread_count`, so we can use `range`. You can see on the second line a variation of the generator notation that we saw a minute ago applied to dicts. The only change is that we now supply a key and a colon before the value. This produces a dict like any other mapping `thread_count` integers to `thread_count` new threads.

To loop, we introduce a couple of methods - `dict.items`, which returns a key-value tuple (pair) for each element - and `dict.values`, which returns all of the values, with no keys. As you can imagine, there is also `dict.keys`, (in fact, if you use the dict itself as the loop iterable, you will get only the keys).

We cheekily slip in a dynamic modification to the thread object. This isn't extremely bad, but it's not the tidiest way of passing information - we don't know for sure that threading.Thread or its superclasses have no `index` member, for instance. However, it does highlight the fact that objects in Python, by default, can have members added on the fly.

In [None]:
def run():
    currentThread = threading.currentThread()
    
    # Send out a message
    logging.info("Hi, my name is {name} and my index is {index}".format(
        name=currentThread.getName(),
        index=currentThread.index
    ))

Now we can have updated with a minor extension of "`run`". Try executing the code - you should now get a unique number fom 1 to `thread_count` from each thread. Check your code matches `network_test_client2_partial2.py`

# Challenge
## Stars up!

Combine our first code into the `run` function to produce a script that tests 10 times (timeit arg) from each of 10 threads.

Use a global list `result` to store the `average_return_time` for each thread as a tuple `(index, average_return_time)`.

 Don't worry about atomic operations for the moment.

# Combined Code

Compare with `network_test_client2_complete.py`

In [None]:
results = []
lock = threading.Lock()
thread_indices = range(1, thread_count + 1)
...

This is the approach I used. I told you not to worry about atomic operations (Python, for built-ins, actually looks after this itself). However, to illustrate, I have created a lock object, along with the `result` list, just before the `thread_indices` assignment

In [None]:
    # sockets over-arching try-catch here
    
    # Strictly, a lock isn't required for accessing a dict, but this is an
    # opportunity to demonstrate the use of locks
    with lock:
        results.append((currentThread.index, average_return_time))
    
    logger.info("Average time taken: {delay} s".format(delay=average_return_time))

Inside the `run`, we have the original try-except. At the end of it, I have updated `results` with the pair I described. This is the integer index and the average return time. This shows the diversity of `with` - here it succinctly grabs and releases the lock before accessing the global results. Clear and concise.

Now... how do we analyse this?

In [None]:
# Now we switch our results (2xthread_count) list to a numpy structure
data = np.array(results)

# However, we are likely to want to play around with the statistics, in
# Jupyter or elsewhere, so we save them...
np.save(output_filename, data)

Right at the end, we have this. Rather than running our code for every analysis, we dump out a numpy object that we can read in in a separate script. And so you have it! Running `network_test_client2_complete.py` (or updating your own code) will output a file with this data.

# The Ghost of Coding Future
## Styling for future you

Only recently, when having to work with code written by new Pythoners, and my appeals for *some* code style have fallen on deaf ears, have I realised how worthwhile emphasizing this at the start is. Not that it was worse than any other language, and as you're mostly computer scientists rather than physical scientists, decent style is just status quo.

However, half the point of Python is that this *should not* be a problem in Python, and so Python-style is part of learning the language. I know for a fact, my Python was pretty ropey when I started out, and I paid the price when I went back six months later to edit some of it. However, I was simultaneously editing my six month old mathematician C++, so even with the ropey Python, I was sold on the benefits.

To begin, I am going to give you a group 10 minute challenge:

 * rewrite the code in `bad_python.py` to print, when `newversion` is `True`:

```
1:PRINTING VALUES

2:0.84 is sin(x), 0.54 is cos(x)
3:0.91 is sin(x), -0.42 is cos(x)
4:0.14 is sin(x), -0.99 is cos(x)
5:-0.76 is sin(x), -0.65 is cos(x)
6:-0.96 is sin(x), 0.28 is cos(x)
7:-0.28 is sin(x), 0.96 is cos(x)
8:0.66 is sin(x), 0.75 is cos(x)
9:0.99 is sin(x), -0.15 is cos(x)
10:0.41 is sin(x), -0.91 is cos(x)
```
 * return to original functionality when `newversion` is `False`

There are a few new features snuck in there - use Google and ask on Etherpad to find out about them. If you have any ideas or hints, put them into Etherpad - exchange ideas! And put up your stars!

This isn't just me being irritating - this is the kind of code accretion that can happen with shortcuts to include a feature - writing code rather than using libraries, taking the first solution rather than looking for a Pythonic one... it's not hard to end up with this sort of thing...

# The Revelation

"*Scrooge hung his head to hear his own words quoted by the Spirit, and was overcome with penitence and grief.*"<br/> ~ A Christmas Carol, Ch. Dickens

(also me after dealing with past-me's code)

We will try this now with slightly more readable code. Still not ideal, and there's a few niceties left out for simplicity in this lesson, so don't take it as perfection!

Now try it with better code:

 * rewrite the code in `better_python.py` to print, when `newversion` is `True`:

```
1:PRINTING VALUES OF SIN AND COS FOR x IN 1, 2,..., 9

2:0.84 is sin(x), 0.54 is cos(x)
3:0.91 is sin(x), -0.42 is cos(x)
4:0.14 is sin(x), -0.99 is cos(x)
5:-0.76 is sin(x), -0.65 is cos(x)
6:-0.96 is sin(x), 0.28 is cos(x)
7:-0.28 is sin(x), 0.96 is cos(x)
8:0.66 is sin(x), 0.75 is cos(x)
9:0.99 is sin(x), -0.15 is cos(x)
10:0.41 is sin(x), -0.91 is cos(x)
```
 * return to original functionality when `newversion` is `False`

TIP 1: [zip](https://docs.python.org/3/library/functions.html#zip) pairs up items in equal length lists/arrays/etc. and turns them into a series of tuples

TIP 2: the `in` operator can assign N-length tuples on the right to N comma separated variables on the left, e.g.
```python
x = [(1, 3, 1), (4, 2, 9), (1, 0, 10), (9, 18, 1)]
for a, b, c in x:
    print("A+B/C =", a + b / c)
```

# Challenge - Ease of extension
## It is easier to extend good code than PhD funding

So, experiment wildly, which Python is awesome for, but also aim to write for panicking future-you, who wants to get last-minute final chapter stuff added quickly and painlessly...

Use `matplotlib` or `bokeh` to add plotting functionality to `better_python.py`. Note that both of them can output to a file (`bokeh` to HTML for interactivity).

Do this whatever way you want - with title, axes labels, interactivity, line colours, separate functionality for newversion on and off (or just when on). Add notes to Etherpad to suggest original ideas for others also, and let us know your method.

# Final Challenge - Combination

Create a system for visualizing the data from the network client. Try to make it flexible, so you can increase or decrease the number of latency tests. Experiment with a few different ways of visualizing the data - explore the matplotlib and bokeh docs. Post anything interesting you find on Etherpad!