## Data Sets
In this notebook we are going to read our first data set into memory. After that, we want to play a bit with the data and see what we can discover in that data

In [1]:
import numpy as np
import pandas as pd
from sklearn import datasets

### Read the data using sklearn since the data set is provided within the library

In [2]:
iris = datasets.load_iris()

### Transform to pd.DataFrame
To use the power of pandas, we have to create a DataFrame using loaded data

In [3]:
df = pd.DataFrame(iris.data, columns=['Sepal length', 'Sepal width', 'Petal length', 'Petal width'])

Align/Link the target class to each row

In [4]:
df["Class ID"] = pd.Series(iris.target)

In [5]:
# Get an overview of the contents of the data set
df.head()

Unnamed: 0,Sepal length,Sepal width,Petal length,Petal width,Class ID
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0


## Task 1: 
Link Class Name to given ID, using the field "df["Class Name"]". The Class Name (and thus the name of the iris) is also included.

Hints: 
- Look deeper into the input data to get the classes by name
- Iterate over them and/or use list comprehensions and iterators
- Allocate the field df["Class Name"] before you assign values to it

### Answer 1.1:

In [6]:
# Allocate field with name "Class Name". It doesn't matter if its empty, copy or what ever
df["Class Name"] = df["Class ID"].copy()

# Loop over the target_names and assign to the Class Name
for i, _ in enumerate(iris.target_names):
    df.loc[df["Class ID"] == i, "Class Name"] = iris.target_names[i]

In [7]:
# Display our received result
df.head()

Unnamed: 0,Sepal length,Sepal width,Petal length,Petal width,Class ID,Class Name
0,5.1,3.5,1.4,0.2,0,setosa
1,4.9,3.0,1.4,0.2,0,setosa
2,4.7,3.2,1.3,0.2,0,setosa
3,4.6,3.1,1.5,0.2,0,setosa
4,5.0,3.6,1.4,0.2,0,setosa


### Answer 1.2:

In [8]:
# Allocate field with name "Class Name". It doesn't matter if its empty, copy or what ever
df["Class Name"] = df["Class ID"].copy()

# Loop over the target_names and assign to the Class Name
for i in range(0, len(iris.target_names)):
    df["Class Name"][df["Class ID"] == i] = iris.target_names[i]

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df["Class Name"][df["Class ID"] == i] = iris.target_names[i]


In [9]:
# Display our received result
df.head()

Unnamed: 0,Sepal length,Sepal width,Petal length,Petal width,Class ID,Class Name
0,5.1,3.5,1.4,0.2,0,setosa
1,4.9,3.0,1.4,0.2,0,setosa
2,4.7,3.2,1.3,0.2,0,setosa
3,4.6,3.1,1.5,0.2,0,setosa
4,5.0,3.6,1.4,0.2,0,setosa


## Task 2:
Determine the mean of the Sepal length for each class. Store the result in a structure of your choice. Ensure to use the class names from the data frame, not from iris.target_names

Hint: 
- Get all different class names of the data set
- Data structures has to be defined before accessing them. list = [], dict = {}, df = pd.DataFrame(), ...

### Answer 2.1:

In [18]:
# Get the name of the class from df, not from iris input data 
classes = df["Class Name"].unique()

means = {}
for iris_class in classes: 
    means[iris_class] = df["Sepal length"][df["Class Name"] == iris_class].mean()

### Answer 2.2:

In [19]:
means = pd.DataFrame([df["Sepal length"][df["Class Name"] == x].mean() for x in df["Class Name"].unique()], index= df["Class Name"].unique(), columns=['mean'])

In [20]:
means

Unnamed: 0,mean
setosa,5.006
versicolor,5.936
virginica,6.588


## Answer 2.3:

In [39]:
df.groupby(df['Class Name']).mean()['Sepal length'].values

array([5.006, 5.936, 6.588])