<img style="width:30%; float:right" src="images/dbu_logo.png">

# Manipulation von Daten

Wir können in Pandas mit einfachen Befehlen Spalten und Zeilen anlegen, löschen oder verändern.

**Literaturempfehlung:**

[Jake VanderPlas (2016): _Python Data Science Handbook_](https://ebookcentral.proquest.com/lib/dbuas/reader.action?docID=4746657)

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns

In [2]:
tips = sns.load_dataset("tips")
tips.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


## Neue Spalten anlegen und Spalten löschen

In [3]:
tips["nonsense"] = "abc"
tips.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,nonsense
0,16.99,1.01,Female,No,Sun,Dinner,2,abc
1,10.34,1.66,Male,No,Sun,Dinner,3,abc
2,21.01,3.5,Male,No,Sun,Dinner,3,abc
3,23.68,3.31,Male,No,Sun,Dinner,2,abc
4,24.59,3.61,Female,No,Sun,Dinner,4,abc


In [4]:
tips["nonsense"] = "xyz"
tips.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,nonsense
0,16.99,1.01,Female,No,Sun,Dinner,2,xyz
1,10.34,1.66,Male,No,Sun,Dinner,3,xyz
2,21.01,3.5,Male,No,Sun,Dinner,3,xyz
3,23.68,3.31,Male,No,Sun,Dinner,2,xyz
4,24.59,3.61,Female,No,Sun,Dinner,4,xyz


In [5]:
tips.drop("nonsense", axis="columns", inplace=True)
tips.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


## Neue Spalten berechnen

In [6]:
tips["tip_percent"] = tips["tip"] / tips["total_bill"]
tips.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,tip_percent
0,16.99,1.01,Female,No,Sun,Dinner,2,0.059447
1,10.34,1.66,Male,No,Sun,Dinner,3,0.160542
2,21.01,3.5,Male,No,Sun,Dinner,3,0.166587
3,23.68,3.31,Male,No,Sun,Dinner,2,0.13978
4,24.59,3.61,Female,No,Sun,Dinner,4,0.146808


In [7]:
tips["total_in_euro"] = tips["total_bill"] * 0.91
tips.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,tip_percent,total_in_euro
0,16.99,1.01,Female,No,Sun,Dinner,2,0.059447,15.4609
1,10.34,1.66,Male,No,Sun,Dinner,3,0.160542,9.4094
2,21.01,3.5,Male,No,Sun,Dinner,3,0.166587,19.1191
3,23.68,3.31,Male,No,Sun,Dinner,2,0.13978,21.5488
4,24.59,3.61,Female,No,Sun,Dinner,4,0.146808,22.3769


In [8]:
tips["total_in_euro"] = tips["total_in_euro"].round(2)
tips.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,tip_percent,total_in_euro
0,16.99,1.01,Female,No,Sun,Dinner,2,0.059447,15.46
1,10.34,1.66,Male,No,Sun,Dinner,3,0.160542,9.41
2,21.01,3.5,Male,No,Sun,Dinner,3,0.166587,19.12
3,23.68,3.31,Male,No,Sun,Dinner,2,0.13978,21.55
4,24.59,3.61,Female,No,Sun,Dinner,4,0.146808,22.38


## Neue Zeile anlegen

In [9]:
new_row = pd.Series([5, 1, "Sun"], index=["total_bill", "tip", "day"])
new_row

total_bill      5
tip             1
day           Sun
dtype: object

In [10]:
tips = tips.append(new_row, ignore_index=True)
tips[-5:]

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,tip_percent,total_in_euro
240,27.18,2.0,Female,Yes,Sat,Dinner,2.0,0.073584,24.73
241,22.67,2.0,Male,Yes,Sat,Dinner,2.0,0.088222,20.63
242,17.82,1.75,Male,No,Sat,Dinner,2.0,0.098204,16.22
243,18.78,3.0,Female,No,Thur,Dinner,2.0,0.159744,17.09
244,5.0,1.0,,,Sun,,,,


## Umgang mit "falschen" Werten

In [11]:
tips.loc[tips["tip_percent"] > 0.3]

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,tip_percent,total_in_euro
67,3.07,1.0,Female,Yes,Sat,Dinner,1.0,0.325733,2.79
172,7.25,5.15,Male,Yes,Sun,Dinner,2.0,0.710345,6.6
178,9.6,4.0,Female,Yes,Sun,Dinner,2.0,0.416667,8.74


In [12]:
tips.loc[tips["tip_percent"] > 0.3, "tip"] = np.NaN

In [13]:
tips.loc[tips["tip_percent"] > 0.3]

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,tip_percent,total_in_euro
67,3.07,,Female,Yes,Sat,Dinner,1.0,0.325733,2.79
172,7.25,,Male,Yes,Sun,Dinner,2.0,0.710345,6.6
178,9.6,,Female,Yes,Sun,Dinner,2.0,0.416667,8.74


# Aufgaben

## Aufgabe 1:

Wir interessieren uns dafür, wieviel bei einer Mahlzeit pro Person ausgegeben wurde.

1. Erstelle dazu einen neue Spalte "total_per_person".
2. Wie bei den Euro-Beträgen, sollte die Spalte auf genau zwei Nachkommastellen genau sein.

## Aufgabe 2:

Die Spalte "tip_percent" stellt genau genommen noch keine echten Prozente dar, sondern Anteile zwischen 0 und 1 statt zwischen 0 und 100. Wo jetzt `0.059447` steht hätten wir also gerne `5.94` (uns reichen für den Momenten zwei Nachkommastellen).

Überschreibe die Spalte "tip_precent" entsprechend, dass beispielsweise `0.059447` zu `5.94` wird.