In [None]:
---
title: "Special Data Import"
execute:
    echo: true
    eval: true
--- 

## How to get Data and how to store it? {.unnumbered}

You can generate data via experiments, calculations, databanks, *etc.*. Data should be stored in a way that it is easy to access and analyze. Often there are used databases, spreadsheets, or text/csv files *ect.* . <br>
A text file with data often contains a header with the names of the columns and then the data in rows. Columns can be seperated by different delimiters (`spaces`, `,`, `;`, `tabs`, ...). <br> For example, a file with data from an experiment could look like this:

```plaintext
Time [s]  Temperature [°C]
0         20
10        21
...     ...

```

### How to read data from a file? {.unnumbered}

In Python there are different ways to read data from a file. You can use the `open` function to open a file and then read the data line by line. You can also use libraries like `numpy`or  `pandas`.

In [None]:
# open the file for reading
# data is from diechemiker.org (c) Julia Opitz 2015
data_file = open('data/fragmentation.csv', 'r') 
#read all lines into a list
fragments = data_file.readlines() 
# remove the first line from the list and store it as the header
header = fragments.pop(0)
print(header)
# remove the second line from the list and store it as the names of the columns
names = fragments.pop(0) 
 # remove the newline character from the end of the line
names = names.strip()
# split the names into a list of values
names = names.split(' ') 
print(names)
# strip the newline character from each line
data = [line.strip() for line in fragments]
# split each line into a list of values
data = [line.split(' ') for line in data] 
# convert each value to a float
data = [[float(value) for value in line] for line in data] 
# close the file
data_file.close()
print(data)

#Biochemisches Grundpraktikum Julia Opitz Juni 2015 diechemiker.org

['laenge_fragment_pb', 'Log(bp)', 'lauflaenge_pixel']
[[23130.0, 4.364175633, 19.19], [9416.0, 3.97386645, 31.19], [6557.0, 3.816705184, 43.19], [4361.0, 3.639586087, 60.78], [2322.0, 3.365862215, 108.77], [2027.0, 3.306853749, 123.16], [910.0, 2.959041392, 192.74], [564.0, 2.751279104, 233.53], [540.0, 2.73239376, 235.13], [235.0, 2.371067862, 284.71], [166.0, 2.220108088, 301.51]]


### Import data with Numpy {.unnumbered}
There are two function `loadtxt` and `genfromtxt` to import data from a file. <br> 
The difference between the two functions is that `genfromtxt` can handle missing data and different data types. <br>
More information under [https://numpy.org/doc/stable/reference/generated/numpy.loadtxt.html](https://numpy.org/doc/stable/reference/generated/numpy.loadtxt.html)  <br> or [https://numpy.org/doc/stable/reference/generated/numpy.genfromtxt.html](https://numpy.org/doc/stable/reference/generated/numpy.genfromtxt.html)

In [None]:
data = np.loadtxt('data/fragmentation.csv', delimiter=' ', skiprows=2)
header = np.loadtxt('data/fragmentation.csv', 
                    delimiter=' ', max_rows=1, dtype=str)
print(header)
print(data)

['fragment_runningsize_pb' 'Log(bp)' '' 'runningsize_pixel']
[[2.31300000e+04 4.36417563e+00 1.91900000e+01]
 [9.41600000e+03 3.97386645e+00 3.11900000e+01]
 [6.55700000e+03 3.81670518e+00 4.31900000e+01]
 [4.36100000e+03 3.63958609e+00 6.07800000e+01]
 [2.32200000e+03 3.36586221e+00 1.08770000e+02]
 [2.02700000e+03 3.30685375e+00 1.23160000e+02]
 [9.10000000e+02 2.95904139e+00 1.92740000e+02]
 [5.64000000e+02 2.75127910e+00 2.33530000e+02]
 [5.40000000e+02 2.73239376e+00 2.35130000e+02]
 [2.35000000e+02 2.37106786e+00 2.84710000e+02]
 [1.66000000e+02 2.22010809e+00 3.01510000e+02]]


  header = np.loadtxt('data/fragmentation.csv',


In [None]:
data = np.genfromtxt('data/fragmentation.csv', delimiter=' ', skip_header=2)
# missing values are replaced with space
header = np.genfromtxt('data/fragmentation.csv', delimiter=' ', 
                       max_rows=1, dtype=str, missing_values=' ') 
print("header: \n",header)
print("data: \n",data) 

header: 
 ['fragment_runningsize_pb' 'Log(bp)' '' 'runningsize_pixel']
data: 
 [[2.31300000e+04 4.36417563e+00 1.91900000e+01]
 [9.41600000e+03 3.97386645e+00 3.11900000e+01]
 [6.55700000e+03 3.81670518e+00 4.31900000e+01]
 [4.36100000e+03 3.63958609e+00 6.07800000e+01]
 [2.32200000e+03 3.36586221e+00 1.08770000e+02]
 [2.02700000e+03 3.30685375e+00 1.23160000e+02]
 [9.10000000e+02 2.95904139e+00 1.92740000e+02]
 [5.64000000e+02 2.75127910e+00 2.33530000e+02]
 [5.40000000e+02 2.73239376e+00 2.35130000e+02]
 [2.35000000e+02 2.37106786e+00 2.84710000e+02]
 [1.66000000e+02 2.22010809e+00 3.01510000e+02]]


### Import data with Pandas {.unnumbered}

You can use the `read_csv` function to import data from a file. The function can handle different file formats like csv or text based files. It can handle not valid values. <br>
More information under [https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html#pandas.read_csv](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html#pandas.read_csv). <br>
Other panda read in functions are `read_excel`, `read_json`, `read_sql`, `read_html`, `read_clipboard`, `read_pickle`, `read_stata`, `read_feather`, `read_parquet`, `read_orc`, `read_sas`, `read_spss`, `read_gbq`, `read_hdf`, `read_stata`, `read_fwf`, `read_table`, `read_sql_query`, `read_sql_table`, `read_gbq`, `read_hdf`.

In [None]:
# first line is skipped and the second line is used as header
#  (count from 0 because first line is skipped)
data = pd.read_csv('data/fragmentation.csv', sep=' ', skiprows=1, header=0)

In [None]:
print(data.head()) # print the first 5 rows

   fragment_runningsize_pb   Log(bp)  Unnamed: 2  runningsize_pixel
0                    23130  4.364176       19.19                NaN
1                     9416  3.973866       31.19                NaN
2                     6557  3.816705       43.19                NaN
3                     4361  3.639586       60.78                NaN
4                     2322  3.365862      108.77                NaN
