<h1>Workshop 1 - Get the Data</h1>
<b>For this part of the workshop, you'll need the same Arduino code that we used in Arduino Exercises - Exercise 3</b>

In the next set of Python exercises, we'll look at different ways of reading, recording, and storing data from your Nicla.

Work through the code below by entering it into your own Jupyter Notebook and exploring what it does.

<h2>Python Activity 1 - Data Wrangling the Hard Way</h2>

We'll start by writing a minimal script that gathers data off your Nicla - over the workshop, we'll improve functionality and code quality.

<h3> 1.1 - Import Some Libraries</h3>
First, we'll import the libraries that Python needs for this exercise. Libraries are just bits of code that other people have written for us.

In [2]:
import serial
import serial.tools.list_ports

<h3> 1.2 - Find your Nicla</h3>
Next, let's scan the Serial ports on your computer for any connected devices:

In [3]:
# List the COM ports, and the devices that are connected to them
ports = serial.tools.list_ports.comports()
for n in ports:
    print(n.device, n.description)


COM8 USB Serial Device (COM8)


<h3>1.3 - Talk to Your Nicla</h3>
One of the listed ports should have some clue that it's associated with the Nicla. Copy the name of the most Nicla-looking port below and it should connect to the Arduino:

In [4]:
dev = 'COM8'   # Note that the name of the port is a string
nicla = serial.Serial(port=dev, baudrate=115200, timeout=.1)

<h3>1.4 - Read Some Data</h3>
Now let's get some data off our Nicla using the code below. 

In [16]:
n_bits = 10   # Number of bits to read

# Clear the buffer for the Nicla serial port
nicla.flush()
nicla.reset_input_buffer()

# Read the data
data = nicla.read(n_bits)
data


b'-135.00,39.00\n181533.00,-4092.00,-124.00,21.00\n181537.00,-4073.00,-127.00,10.00\n181541.00,-4090.00,-115.00,36.00\n181545.00,-4103.00,-110.00,25.00\n181549.00,-4103.00,-130.00,11.00\n181553.00,-4105.00,-111.00,31.00\n181556.00,-4069.00,-135.00,7.00\n181559.00,-4084.00,-124.00,-2.00\n181563.00,-4111.00,-100'

<h3>Python Exercise 1.1</h3>
Vary the value of n_bits until you see some repeating patterns.


<h3>1.5 - Tidying up Data</h3>
The output you get above is as bit of a mess, but in amongst the mess you can see some sensible things:

<ol>
  <li>The 'b' at the start - this means it's binary data</li>
  <li>Numbers! These are clearly data</li>
  <li>Commas - These separate the data points.</li>
  <li>'\n' - This marks the end of each loop that the Arduino writes.</li>
</ol>

Extract a single line of output from the Arduino.

Our first friend is the <b>split</b> command:

In [17]:
data1 = str(data)

# Print data1 before and after we use split()
print('Before we use split:')
print(data1)
data1 = data1.split('\\n')
print('After we use split:')
for n in data1:
    print(n)


Before we use split:
b'-135.00,39.00\n181533.00,-4092.00,-124.00,21.00\n181537.00,-4073.00,-127.00,10.00\n181541.00,-4090.00,-115.00,36.00\n181545.00,-4103.00,-110.00,25.00\n181549.00,-4103.00,-130.00,11.00\n181553.00,-4105.00,-111.00,31.00\n181556.00,-4069.00,-135.00,7.00\n181559.00,-4084.00,-124.00,-2.00\n181563.00,-4111.00,-100'
After we use split:
b'-135.00,39.00
181533.00,-4092.00,-124.00,21.00
181537.00,-4073.00,-127.00,10.00
181541.00,-4090.00,-115.00,36.00
181545.00,-4103.00,-110.00,25.00
181549.00,-4103.00,-130.00,11.00
181553.00,-4105.00,-111.00,31.00
181556.00,-4069.00,-135.00,7.00
181559.00,-4084.00,-124.00,-2.00
181563.00,-4111.00,-100'


<b>split</b> has broken the data into multiple lines. Now let's rummage through the data:

In [18]:
# You can select each line of the data by using data1[0], data1[1], etc.:
data1[0]            # This selects the first element in data1
print('The first element in data1 is', data1[0])     # Print the first element in data1

# You can find out how many lines are in your data file using the len() function:
print('The second element in data1 is', data1[1])     # This prints the second element in data1
print('And it contains the following number of elements:')
len(data1[1])       # This prints the length of th second element in data1


The first element in data1 is b'-135.00,39.00
The second element in data1 is 181533.00,-4092.00,-124.00,21.00
And it contains the following number of elements:


32

If you print an element of data1, you can see that it's composed of elements separated by commas. Let's split it up again using the break command:

In [19]:
data1_sub = data1[1]
print('Before splitting, data1[1] looks like', data1_sub)
print('After splitting, data1[1] looks like', data1_sub.split(','))

Before splitting, data1[1] looks like 181533.00,-4092.00,-124.00,21.00
After splitting, data1[1] looks like ['181533.00', '-4092.00', '-124.00', '21.00']


<h3>Python Exercise 1.2</h3>
Use a for loop like the one shown above to write code runs through each element of the list shown above, i.e., ['181533.00', '-4092.00', '-124.00', '21.00'] (obviously the numbers will different for your code) and prints out the name of each variable and the associated value.

The first row of output for the example above would look like:

Time - 181533.00 ms

<h3>1.6 - Filtering Out Incomplete Readings</h3>
You can see that each complete data line should have a certain number of elements - in this example 4. 

Let's scan through the data file, and extract the elements that have the right numbers of data points:

In [20]:
# Create a list to store our good data points
data2 = []

for n in data1:
    if len(n.split(',')) == 4:
        data2.append(n)

# Let's ignore the first row - which often contains bad data points
data2 = data2[1:]
data2


['181537.00,-4073.00,-127.00,10.00',
 '181541.00,-4090.00,-115.00,36.00',
 '181545.00,-4103.00,-110.00,25.00',
 '181549.00,-4103.00,-130.00,11.00',
 '181553.00,-4105.00,-111.00,31.00',
 '181556.00,-4069.00,-135.00,7.00',
 '181559.00,-4084.00,-124.00,-2.00']

<h3>Python Exercise 1.3</h3>
Modify the above code so instead of skipping the first row, it outputs:

1. Rows 2-5
1. The list row in the list
1. The same list, but in reverse order
1. The second-to-last row.

You'll find the internet is helpful in figuring out how to do this!

<h3> 1.7 - Saving the Data</h3>
This looks like progress! In the above example, we have 4 different datasets - one in each column. Let's a look at two different ways of turning this into a useful dataset:

In [21]:
# Now let's make 4 lists to store each data point
time = []
acc_x, acc_y, acc_z = [], [], []

for row in data2:
    point = row.split(',')
    time.append(point[0])
    acc_x.append(point[1])
    acc_y.append(point[2])
    acc_z.append(point[3])

print(time)
print(acc_x)
print(acc_y)
print(acc_z)


['181537.00', '181541.00', '181545.00', '181549.00', '181553.00', '181556.00', '181559.00']
['-4073.00', '-4090.00', '-4103.00', '-4103.00', '-4105.00', '-4069.00', '-4084.00']
['-127.00', '-115.00', '-110.00', '-130.00', '-111.00', '-135.00', '-124.00']
['10.00', '36.00', '25.00', '11.00', '31.00', '7.00', '-2.00']


Let's now write that to a data file. We'll first use Python's open command to open up a file to write to (if it doesn't exist already, Python will create it). We then sequentially write each data point and close the file.

We call the file 'data_out.csv'. The './' at the start of the file directory tells python to create the file in the same directory as this code is located.



In [22]:
file_out = open('./data_out.csv', 'w')
for n in range(0, len(time)):
    file_out.write(time[n] + ',' + acc_x[n] + ',' + acc_y[n] + ',' + acc_z[n] + '\n')
file_out.close()

<h3>1.8 - Saving the Data a Bit Better</h3>
You <b>should</b> now have a file in the same directory as this notebook called 'data_out.csv' that contains measurements from your Nicla. Open it up in Excel and check it's saved ok.

It would be great if our columns had the names of each variable too. Let's add that to the file.

In [23]:
file_out = open('./data_out.csv', 'w')
column_titles = ['time', 'acc_x', 'acc_y', 'acc_z']
file_out.write(','.join(column_titles) + '\n')

for n in range(0, len(time)):
    file_out.write(time[n] + ',' + acc_x[n] + ',' + acc_y[n] + ',' + acc_z[n] + '\n')
file_out.close()

Note that - in the above code - we've used two different ways of writing a line of data. Both are fine!

<h3>Python Exercise 1.4</h3>
Modify the above code so it adds a 5th column of data - one that calculates the sum of the x, y, and z accelerations.

<h3>1.9 - Reading the Data Back In</h3>
Finally - let's read out data back in to check everything's worked fine.

In [24]:
time = []
acc_x, acc_y, acc_z = [], [], []

file_path = './data_out.csv'

# Open the file for reading using 'with' statement
with open(file_path, 'r') as file_in:
    # Read the file line by line using a for loop
    for line in file_in:
        # Process each line of the file
        line = line.strip()
        line = line.split(',')
        time.append(line[0])
        acc_x.append(line[1])
        acc_y.append(line[2])
        acc_z.append(line[3])

print(time)
print(acc_x)
print(acc_y)
print(acc_z)

['time', '181537.00', '181541.00', '181545.00', '181549.00', '181553.00', '181556.00', '181559.00']
['acc_x', '-4073.00', '-4090.00', '-4103.00', '-4103.00', '-4105.00', '-4069.00', '-4084.00']
['acc_y', '-127.00', '-115.00', '-110.00', '-130.00', '-111.00', '-135.00', '-124.00']
['acc_z', '10.00', '36.00', '25.00', '11.00', '31.00', '7.00', '-2.00']


<h2>Python Activity 2 - Data Wrangling the Easy Way</h2>
We can use libraries to handle the data a bit more easily. Here, we'll use some packages that you might have heard of:

<ol>
  <li>Pandas</li>
  <li>Numpy</li>
</ol>

<h3>2.1 - Import Libraries</h3>

In [25]:
import pandas as pd
import numpy as np

<h3> 2.2 - Talk to Your Nicla</h3>

Here, we'll use an <b>if</b> statement to automatically pick the port to connect to:

In [28]:
ports = serial.tools.list_ports.comports()

# For Mac users this works well:
for n in ports:
    print(n.device, n.description)
    if 'Nicla' in n.description:
        dev = n.device
nicla = serial.Serial(port=dev, baudrate=115200, timeout=.1)

# For Windows users, you have to use the code below and manually set the port:
for n in ports:
    print(n.device, n.description)

dev = 'COM8'
nicla = serial.Serial(port=dev, baudrate=115200, timeout=.1)


COM8 USB Serial Device (COM8)


<h3>Python Exercise 2.1</h3>
Modify the above code so that it runs on your computer.

<h3>2.3 - Get Data the Cleaner Way </h3>
We'll use two functions from <b>numpy</b> - <b>zeros</b> and <b>fromstring</b> - to read the data from the Arduino into an array this a way of storing data that's easier to manipulate.

In [30]:
# Clear the buffer for the Nicla serial port
nicla.flush()
nicla.reset_input_buffer()

n_readings = 20

# Create a table to store the data
data_table = np.zeros((n_readings,4))

for n in range(n_readings):
    data = nicla.readline()

    data = np.fromstring(data, sep=',')

    if len(data) == 4:
        data_table[n,:] = data

data_table

array([[    0.,     0.,     0.,     0.],
       [    0.,     0.,     0.,     0.],
       [    0.,     0.,     0.,     0.],
       [    0.,     0.,     0.,     0.],
       [19374., -4100.,   -49.,   171.],
       [19377., -4087.,   -63.,   160.],
       [19382., -4077.,   -98.,   140.],
       [19385., -4100.,   -63.,   144.],
       [19388., -4114.,   -52.,   174.],
       [19391., -4069.,   -99.,   142.],
       [19394., -4069.,   -77.,   141.],
       [    0.,     0.,     0.,     0.],
       [24298., -4104.,   -61.,   168.],
       [24301., -4103.,   -78.,   153.],
       [24305., -4074.,   -91.,   142.],
       [24309., -4118.,   -63.,   169.],
       [24312., -4069.,  -117.,   152.],
       [24316., -4073.,   -91.,   137.],
       [24320., -4117.,   -57.,   179.],
       [24323., -4081.,   -88.,   153.]])

<h3>Python Exercise 2.2</h3>
Modify the above code so that it records 1, 5, and 20 lines of data from the Nicla.

<h3>2.4 - Save the Data the Cleaner Way</h3>
Next we'll use <b>pandas</b> to create a DataFrame. This is a way of storing data that, again, allows us to manipulate and save data easily.

First, we'll remove the first row of data as that's always a bit rubbish (we'll look at a smarter way of treating this next time) and then we'll save the data file.

In [31]:
# Remove the often-rubbish first row of data
data_table = data_table[1:]
column_titles = ['time', 'acc_x', 'acc_y', 'acc_z']

data_table = pd.DataFrame(data_table, columns=column_titles)
data_table.to_csv('./data_table.csv', header=column_titles, index=False)


<h3>2.5 - Checking Everything's Worked</h3>
Now let's use pandas again to open up the file and check everthing's worked ok.

In [5]:
data = pd.read_csv('./data_table.csv')
data

Unnamed: 0,time,acc_x,acc_y,acc_z
0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0
3,19374.0,-4100.0,-49.0,171.0
4,19377.0,-4087.0,-63.0,160.0
5,19382.0,-4077.0,-98.0,140.0
6,19385.0,-4100.0,-63.0,144.0
7,19388.0,-4114.0,-52.0,174.0
8,19391.0,-4069.0,-99.0,142.0
9,19394.0,-4069.0,-77.0,141.0


<h3>2.6 Saving Multiple Datasets</h3>
Say you want to save a bunch of datasets? Let's write a function to do this:

In [6]:
n_files = 10
filename = 'data_out'

def data_save(filename_in, n_files_in):
    for n in range(0, n_files_in):
        data.to_csv(filename_in + str(n) + '.csv')

data_save(filename, n_files)


The above example is sort of pointless - it just saves the same file over and over again but gives it a different name.
However - functions are important for keping your code tidy, and we'll use them over and over again during this course. Speaking of which...

<h1>Final Python Exercise - Writing a Single Script</h1>
Produce a single script that connects to the Nicla, acquires data from it, and saves the data. Combine all the elements above to write a script that saves 1, 5, and 10 different datasets, each with a different filename. Make sure you keep your code tidy by putting it in a function.

<b>End of the Python exercises - you can go back to the word doc now!</b>