# Experimental physics: data collection
[**Download this notebook**](https://ifa-edu-it.github.io/learning-material/courses/experimental-physics/data.ipynb)

An important part of experimental physics is to collect data and to analyze it. In this notebook we will learn how to collect data, manually and from files. Later we will also learn how to fit a function to the data and use it to extract information from the data.

## Collecting data

### Manual approach
When conducting an experiment, data may be collected in a number of ways. The simplest is manual data collection, where data is written down by hand. This can be done in multiple ways, but one good way is to use numpy arrays. Lets start off by importing numpy and filling some dummy data.

In [2]:
import numpy as np
# It is a good idea to keep your imports in a seperate cell so as to not
# clutter your code with them. Furthermore, repeatedly importing the same
# modules can be slow, so it is good practice to import them only once.

In [6]:

angles = np.array([0,14,36,42,55,57,61,64,74,88,91]) #degrees
values = np.array([0.05, 0.25, 0.59, 0.64, 0.80, 0.83, 0.87, 0.91, 0.96, 1.00, 1.02]) #mV

### Automatic approach: Loading data from files

In many cases, data is collected automatically and stored in a file. In this case, we can use numpy to load the data from the file. We have provided a file called `cosmic_data.cvs`, which contains data from an experiment in which cosmic rays were measured at different detector positions. Download the data [here](https://ifa-edu-it.github.io/learning-material/courses/experimental-physics/data/cosmic_data.cvs) and place it in a folder called `data` next to this notebook. Let's open the file and see what it contains. Here are the first 3 lines of the file:

```text
num ; coinc ; date ; time ; sec ; RecTime ; A ; B ; C ; COINC ; Pressure ; Temp ; Humidity ; Altitude

1; A-C; 8/11/2021; 10:10:35; 0; 600; 0; 0; 0; 0; 1013.56; 21.15; 38.89; -2.55

2; A-C; 8/11/2021; 10:21:6; 623; 600; 14875; 129616; 7772; 576; 1013.56; 21.15; 38.89; -2.55
```

This seems somwhat messy, but we can identify a few things:
- The first line contains the names of the columns
- The columns are separated by semicolons (`;`)
- The data is in a table format, with each row containing data for one measurement
- The data has both numbers and text

To load this we can use `np.loadtxt` with the keyword arguments `dtype=str` and `delimiter=';'`. This will load the data as a numpy array of strings, with each row containing one measurement:

In [46]:
data = np.loadtxt('data/cosmic_data.cvs', dtype='str',delimiter=';')
print('Headers: ',data[0])
print('---------------------------------------------------')
print('First row: ',data[1])
print('---------------------------------------------------')
print('Last row: ',data[-1])
print('---------------------------------------------------')
print('dtype: ',type(data[0,0]))
print('---------------------------------------------------')
print('Shape: ',data.shape)

Headers:  ['num ' ' coinc ' ' date ' ' time ' ' sec ' ' RecTime ' ' A ' ' B ' ' C '
 ' COINC ' ' Pressure ' ' Temp ' ' Humidity ' ' Altitude ']
---------------------------------------------------
First row:  ['1' ' A-C' ' 8/11/2021' ' 10:10:35' ' 0' ' 600' ' 0' ' 0' ' 0' ' 0'
 ' 1013.56' ' 21.15' ' 38.89' ' -2.55']
---------------------------------------------------
Last row:  ['7' ' A-C' ' 8/11/2021' ' 11:14:33' ' 3829' ' 600' ' 8905' ' 99329'
 ' 5339' ' 110' ' 1014.32' ' 23.85' ' 31.34' ' -8.90']
---------------------------------------------------
dtype:  <class 'numpy.str_'>
---------------------------------------------------
Shape:  (8, 14)


We now have the data in a (8 x 14) numpy array, but it is still a bit messy. We can use **array slicing** to select the data we need. For this experiment we are interested in the number of cosmic rays detected by both detector A-C. This is the number in the column 'COINC', which is the 10th column, i.e. index 9. We can select this column using the following syntax:

In [52]:
coincs = data[:,9]
print('COINCS: ', coincs)

COINCS:  [' COINC ' ' 0' ' 576' ' 491' ' 423' ' 263' ' 172' ' 110']


Notice the double index syntax. the `:` means that we want all rows, and the `9` means that we want the 10th column. We dont need the first row, which contains the column names, so we can select all rows except the first one using the following syntax:

In [53]:
coincs = data[1:,9]
print('COINCS: ', coincs)

COINCS:  [' 0' ' 576' ' 491' ' 423' ' 263' ' 172' ' 110']


This is a neat syntax which is well worth remembering. Keep in mind that the first index is the row and the second is the column. We can summarize the syntax as:

```python
array[start_row:end_row, start_column:end_column]
```

By not putting a start or end index, we can select all rows or all columns. We can also use negative indices to count from the end of the array.

The numbers are still strings (you can tell by the `'` symbols surrounding each entry), so we need to convert them to integers. We can do this using the `astype` method:

In [56]:
coincs = coincs.astype(int)
recTimes = data[1:,5].astype(float) # Lets collect the integration times as well
print('COINCS: ', coincs)
print('REC TIMES: ', recTimes)

COINCS:  [  0 576 491 423 263 172 110]
REC TIMES:  [600. 600. 600. 600. 600. 600. 600.]


We have now successfully loaded the data from the file and extracted the relevant information. We can now plot the data using matplotlib. This we will do in [the next notebook](https://ifa-edu-it.github.io/learning-material/courses/experimental-physics/plotting.ipynb).