# Individual Assignment

## Introduction

Welcome! This notebook will be your individual assignment and guide you through a data exploration workflow.
In the following cells you find the instructions for the individual steps. You can either replace these instructions with yours solutions or add your solutions in a new cell after the instructions.

**Please make sure to keep the headlines as they are,** so we can recognize which solution belongs to which section for grading.

### About Jupyter Notebooks

Jupyter notebooks allow you to switch between Markdown text and Python code to create an interactive document.
This makes them very well suited for exploratory tasks, tutorials and explanations. 
Please make sure that you are familiar with the basic operation of these notebooks and can use the cells appropriately.

**You may always add additional cells if they help to better structure your text or code.**

### Handing in the Task

Please submit this notebook file (and only the notebook file) with your modifications.
**Make sure you have listed all used frameworks and their version** (besides the Python standard library) **in Part 2** so we can run your code.

### Grading

Here you can see the grading criteria, wich we will use for establishing your final score.

#### Overall

> Each of these criteria gives between 0 and 2 points, with
> 0 = rarely fulfilled
> 1 = fulfilled to an acceptable level
> 2 = expectation exceeded 

* Code quality
    * PEP-8 compliant code
    * Use of _speaking_ variable names
    * Comments aid understanding without being excessive
    * Use of functions to structure repeated tasks
    * Use of classes to structure complex data and achieve separation of concerns
    * Functions and classes have structured, consistent and informative docstrings
    * Use of established frameworks to handle complex tasks
    * Use of type hints
* Written sections
    * Clear communication of thought process
    * Use of text formatting to structure writeup
    * Sources and references are present and relevant to the context
    * Factual correctness of written contents

#### Individual parts

Each individual part (Part 1 - Part 6) is rated with between 0 and 2 points as explained above.
**Reaching 3 points in the individual parts is a requirement for passing the assignment.**

#### Bonus Points

* The data processing workflow also works with a different data set of the same kind (+2)
    * Will be tested by changing the link to the data to point to a modified version of the original data set
* Solved Task 7 (+2)

#### Summary

| Section               | Maximum Points |
|-----------------------|----------------|
| Overall               |             24 |
| Individual Parts      |             12 |
| Bonus                 |              4 |
| Total points possible |             40 |
| Raquirement for 1.0   |             36 |
| Requirement for 2.0   |             24 |
| Requirement for 3.0   |             18 |
| Requirement for 4.0   |             12 |



## Part 1) Choice of Data Source

### Choose a data set that you whish to analyze. 
The data must be available online for cross referencing (please add a link where to find it).
Possible data sources include, but are not limited to, 
* [zenodo](https://zenodo.org/)
* [rodare](https://rodare.hzdr.de/)
* [destatis](https://www.destatis.de/EN/Home/_node.html)
* The [NOAA](https://www.noaa.gov/nodd/datasets)
* [tableau](https://www.tableau.com/learn/articles/free-public-data-sets)
* [kaggle](https://www.kaggle.com/datasets)

**Please make sure to check the size of the data set,** if it is too small, there is not much to analyze, if it is too large the download and evaluation will take too long.
Data sets between 100KB and 1GB should be fine. 

If the file adheres to a standardized data format, name and link the standard document (if publicly available, otherwise refer to where to get the standard).
Make sure to include the proper citation for the file you are using.

### Characterize the data source you have chosen for your assignment.
Explain the contents, format and structure of the data file.
Highlight potential pitfalls or particular quirks in the data set that you may have to take care of during your implementation.

## Part 2) Choice of Frameworks

Investigate data processing and visualization frameworks and choose which ones to use in your project.
Create a brief comparison of the frameworks you have researched, their benefits and drawbacks and how they can interact / support each other. 
Make sure they can either support the data format you have choosen or outline how you are going to load the data otherwise.

In [None]:
import stackview 
from skimage import data
from skimage.filters import gaussian
from skimage.filters import threshold_otsu
from skimage.measure import label

## Part 3) Loading the Data

In the following cell write the code necessary to acquire the data from your online source.
**Do not copy the data itself into this notebook.**

Let us use a [timelapse image dataset of zebrafish optic tectum neurons by Xinyang L. (2022)](https://doi.org/10.5281/zenodo.6339707) licensed [CC-BY 4.0](https://creativecommons.org/licenses/by/4.0/deed.en).


https://zenodo.org/records/6060378
https://zenodo.org/records/6076614?utm_source=chatgpt.com


In [41]:
# Load the human mitosis dataset
image = data.human_mitosis()

# Display the image
image.shape

(512, 512)

In [45]:
# Load the human mitosis dataset
image = data.human_mitosis()

# Display the image
stackview.insight(image)

0,1
,"shape(512, 512) dtypeuint8 size256.0 kB min7max255"

0,1
shape,"(512, 512)"
dtype,uint8
size,256.0 kB
min,7
max,255


## Part 4) Cleaning the data

Your data set may have some quirks, strange formatting, incomplete entries or invalid data.
Note down any particularities you find and your intended steps to correct them.
Clean up the raw data you have loaded to bring it into a presentable shape, that can serve as the basis of future processing steps without having to worry about corner cases.
You may create, split, combine, discard, re-format or re-label rows and columns as needed.
Try to keep the cleaning procedure as generic as possible, so it could also work on a different data set of the same kind.

In [46]:
# Add your data cleaning code here
# After cleaning, print out the same sample as before, but this time in the cleaned state

In [47]:
# Gaussian blur
image_gaussian = gaussian(image, sigma=1)
stackview.insight(image_gaussian)

0,1
,"shape(512, 512) dtypefloat64 size2.0 MB min0.02922286013974766max0.8832697034964642"

0,1
shape,"(512, 512)"
dtype,float64
size,2.0 MB
min,0.02922286013974766
max,0.8832697034964642


In [48]:
#threshold otsu
thresh = threshold_otsu(image_gaussian)
image_binary = image_gaussian > thresh
stackview.insight(image_binary)

  h, _ = np.histogram(self.obj, bins=num_bins)


0,1
,"shape(512, 512) dtypebool size256.0 kB minFalsemaxTrue"

0,1
shape,"(512, 512)"
dtype,bool
size,256.0 kB
min,False
max,True


In [49]:
# Voronoi Otsu Labeling
img_labeled = label(image_binary)
stackview.insight(img_labeled)

0,1
,"shape(512, 512) dtypeint32 size1024.0 kB min0max276"

0,1
shape,"(512, 512)"
dtype,int32
size,1024.0 kB
min,0
max,276


## Part 5) Fundamental Exploration

Create a statistical analysis for each of the properties recorded in the data set.

* For numerical data calculate the minimum and maximum values and where/when they appear, the mean value as well as the standard derivation.
* For categorical data create a table how often each category appears
* For time/date colums indicate the covered timespan and the average frequency of events
* For any other kind of data discuss and implement a suitable statistical characterization

In [3]:
# Implement your statistical analysis here
# Print the analysis results

## Part 6) Visualization

Choose a _value over time_ or _two sequences of values_. 
Shortly discuss how they are related (or how the value behaves with relation to time) and which visualization forms are suitable to present this relationship.
Create a plot to visualize this relationship between these values.

In [4]:
# Implement your visualization here

## Part 7) Bonus: Highlight

Find a section in your data set with notable features (like strong derivations, extreme values, suspicious corelations, …).
Characterize those sections verbally and by statistical means and visualize the notable features.
Use a different kinds of visualization to explain the observed features.

In [5]:
# Add your highlight code here