# Digitale Techniken: Data Extraction and plotting
2023-10, johanna.kerch@uni-goettingen.de, goeran.liebs@uni-goettingen.de

<img src="https://mirrors.creativecommons.org/presskit/buttons/88x31/png/by-nc-sa.png" style="height:50px" align="left"/> 

https://creativecommons.org/licenses/by-nc-sa/4.0/

### Keep numbers in memory and be ready for higher mathematics with the module ```numpy```:

A module for Python that allows you to work with multidimensional arrays and mathematical calculations.

In [None]:
# import the module for nummerical n-dim arrays (fields, vectors, tensors, etc...)
import numpy as np    # appreviation convention

In [None]:
# create an array with 2 lines and 3 columns
Array2x3 = np.array([[56,  1,      3],
                     [40,200,300.000]])
print(Array2x3)
type(Array2x3)

In [None]:
np.mean(Array2x3) # mean-METHOD of all values, ONLY ON OF MANY ARRAY FUNCTIONS

In [None]:
Array2x3[0,:] #  first line 

In [None]:
Array2x3[:,1] # second column

## How to find, open and read data files

Let's define a string containing the path and a wildcard ```*``` and the desired file endings. <br>
Subfolders are seperated with ```/``` or ```\``` regarding the operation system of your computer.

In [None]:
import os # use functionality of your OS formulated in python

In [None]:
path = os.path.join('..','data','sea_ice_extent','*.csv')# its just a string that will be modivied concerning your machines OS
path

Find all files matching a pattern, using ```glob```:

In [None]:
from glob import glob
files = glob(path)
files #check that list, shouting to 

Data were obtained from https://nsidc.org/data/seaice_index/archives, 2021-02-08.<br>
Fetterer, F., K. Knowles, W. N. Meier, M. Savoie, and A. K. Windnagel. 2017, updated daily. Sea Ice Index, Version 3. Boulder, Colorado USA. NSIDC: National Snow and Ice Data Center. doi: https://doi.org/10.7265/N5K072F8. 2020-11-30.

Open a file object, obtain a file handle, that is, not yet the actual content of the file:

In [None]:
open_file = open(files[-1]) # open last file in the list

header = open_file.readline()   # first line is red, 
#mouseover and hit SHIFT and TAP 
year=[]# make some empty lists
extent=[]
area=[]
for line in open_file: #iterate over the open_file list
    line = line.strip() # Use the method strip to pull off first and last characters
    cols = line.split(',') # split to separate the columns, seperated by commas:
    
    year.append(float(cols[0])) # add an element each iterarion to the lists as FLOATS
    extent.append(float(cols[4]))
    area.append(float(cols[5]))

year=np.array(year)           # transfomation to np.arrays for nice operattions
extent=np.array(extent)
area=np.array(area)

In [None]:
import matplotlib.pyplot as plt # submodules, MAKE SOME NICE PLOTS
plt.plot(year,extent,'D-', label='Goddard, DATA-TYPE')
plt.xlabel('Year')
plt.ylabel('extent [unit]')
plt.title('Read the Paper')
plt.grid(True)
plt.legend();

In [None]:


plt.plot(year,area,'D-', label='Goddard, DATA-TYPE')
plt.xlabel('Year')
plt.ylabel('area [unit]')
plt.title('Read the Paper')
plt.grid(True)
plt.legend();
plt.ylim(2,6)        #UUUHHH an outlayer

# Alternatively for column-based data (more convenient):

In [None]:
depth, temperature = np.loadtxt('../data/kcctemp.dat', unpack=True, skiprows=1)
depth

In [None]:
data = np.loadtxt('../data/kcctemp.dat', skiprows=1)

In [None]:
plt.plot(data[:,1],data[:,0])
plt.gca().invert_yaxis()

# Try out the possibilities to read files

In [None]:
# Examples of reading file line by line
# use them to do same es above

# Method 1: Using readlines() method
with open('filename.txt', 'r') as f:
    lines = f.readlines()

# Method 2: Using for loop
with open('filename.txt', 'r') as f:
    lines = []
    for line in f:
        lines.append(line.strip())

# Method 3: Using list comprehension
with open('filename.txt', 'r') as f:
    lines = [line.strip() for line in f]

# Method 4: Using map() function
with open('filename.txt', 'r') as f:
    lines = list(map(str.strip, f))

# Masking

Select values by masking with a condition:

In [None]:
# mask from condition
selection = temperature > -13
print(selection)

# apply mask
temp_select = temperature[temperature > -13]
print(temp_select)

#### Writing files

Exercise (voluntary): find it out yourself

#### Date formatting, Time is relativ

In [None]:
import datetime
dt = datetime.datetime(2001, 1, 31, 10, 51, 0) #an time object, an other format... IoI
#different date formats
print(dt.strftime('%d-%m-%Y::%H-%M'))
print(dt.strftime('%Y,%m,%d;%H:%M'))
print(dt.strftime('%Y-%d-%m:ß%!:%HUhr%M'))
   
###  Calculating time in different units    

d1 = datetime.date(1869, 1, 2) #two other time object
d2 = datetime.date(1869, 10, 2)

# Solutionn
print(str(d2 - d1)+' Zeit in Tagen')  # timedelta
print(str((d2-d1).total_seconds())+'Zeit in Sekunden') #
   

### Pandas

https://pandas.pydata.org/

- basically, this is "Excel" functionality within Python
- it is much faster than Excel for large data sets and especially if you use mixed data (numeric and other/factors)

Data is read as a data frame:

In [None]:
import pandas as pd    # conventional alias!

df = pd.read_csv(files[-1], usecols = [0,4,5])

In [None]:
df

## Exercise

Try this at home:
- open a Jupyter notebook
- take some data you work with (in Excel)
- use the simple routine I've shown above or pandas
- _or_ research other ways of opening/reading data from files with python
- approach me if it's not working, share your notebook/file and I'll try to help

In [None]:
## ... tracking changes....

In [None]:
#!git checkout Zweigname

In [None]:
#!git add 03_DT_python_easy_plot.ipynb #or use --all

In [None]:
#!git add --all

In [None]:
#!git commit -m 'Datetime in 03'

In [None]:
#!git push --set-upstream origin Zweigname