![title](./pic/indexing/index_manipulation/1_title.png)

In [30]:
import pandas as pd

In [31]:
df = pd.read_csv('./csv/titanic.csv')
df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,892,0,3,"Kelly, Mr. James",male,34.5,0,0,330911,7.8292,,Q
1,893,1,3,"Wilkes, Mrs. James (Ellen Needs)",female,47.0,1,0,363272,7.0,,S
2,894,0,2,"Myles, Mr. Thomas Francis",male,62.0,0,0,240276,9.6875,,Q
3,895,0,3,"Wirz, Mr. Albert",male,27.0,0,0,315154,8.6625,,S
4,896,1,3,"Hirvonen, Mrs. Alexander (Helga E Lindqvist)",female,22.0,1,1,3101298,12.2875,,S


---

## Manipulation mit `.set_index()`

Die kennzeichnungsbasierte Auswahl bezieht ihre Stärke aus den Kennzeichnungen im **Index**. Entscheidend ist, dass der von uns verwendete Index nicht unveränderlich ist. Wir können den Index auf jede erdenkliche Weise manipulieren.

Die Methode `set_index()` kann für diese Aufgabe verwendet werden. So sieht es aus, wenn wir den Index auf das Feld title setzen:

![title](./pic/indexing/index_manipulation/2_index_manipulation.png)

In [32]:
df.set_index("Name")

Unnamed: 0_level_0,PassengerId,Survived,Pclass,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
"Kelly, Mr. James",892,0,3,male,34.5,0,0,330911,7.8292,,Q
"Wilkes, Mrs. James (Ellen Needs)",893,1,3,female,47.0,1,0,363272,7.0000,,S
"Myles, Mr. Thomas Francis",894,0,2,male,62.0,0,0,240276,9.6875,,Q
"Wirz, Mr. Albert",895,0,3,male,27.0,0,0,315154,8.6625,,S
"Hirvonen, Mrs. Alexander (Helga E Lindqvist)",896,1,3,female,22.0,1,1,3101298,12.2875,,S
...,...,...,...,...,...,...,...,...,...,...,...
"Spector, Mr. Woolf",1305,0,3,male,,0,0,A.5. 3236,8.0500,,S
"Oliva y Ocana, Dona. Fermina",1306,1,1,female,39.0,0,0,PC 17758,108.9000,C105,C
"Saether, Mr. Simon Sivertsen",1307,0,3,male,38.5,0,0,SOTON/O.Q. 3101262,7.2500,,S
"Ware, Mr. Frederick",1308,0,3,male,,0,0,359309,8.0500,,S


In [33]:
df.set_index("PassengerId")

Unnamed: 0_level_0,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
PassengerId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
892,0,3,"Kelly, Mr. James",male,34.5,0,0,330911,7.8292,,Q
893,1,3,"Wilkes, Mrs. James (Ellen Needs)",female,47.0,1,0,363272,7.0000,,S
894,0,2,"Myles, Mr. Thomas Francis",male,62.0,0,0,240276,9.6875,,Q
895,0,3,"Wirz, Mr. Albert",male,27.0,0,0,315154,8.6625,,S
896,1,3,"Hirvonen, Mrs. Alexander (Helga E Lindqvist)",female,22.0,1,1,3101298,12.2875,,S
...,...,...,...,...,...,...,...,...,...,...,...
1305,0,3,"Spector, Mr. Woolf",male,,0,0,A.5. 3236,8.0500,,S
1306,1,1,"Oliva y Ocana, Dona. Fermina",female,39.0,0,0,PC 17758,108.9000,C105,C
1307,0,3,"Saether, Mr. Simon Sivertsen",male,38.5,0,0,SOTON/O.Q. 3101262,7.2500,,S
1308,0,3,"Ware, Mr. Frederick",male,,0,0,359309,8.0500,,S


<br>

Das ist vorallem dann hilfreich, wenn du zu dem Entschluss kommst, dass ein anderer Index besser als Metrik geeignet wäre.

Das sollte jedoch mit Vorsicht genossen werden, da der Index auch nach der Umwandlung eindeutig sein sollte. Meist wird es bei der Standardindexierung von `Pandas` belassen. Falls du den Index dennoch endern willst/musst, stelle aufjedenfall sicher, dass dieser `unique` ist.

In [34]:
df = df.set_index("PassengerId")
df

Unnamed: 0_level_0,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
PassengerId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
892,0,3,"Kelly, Mr. James",male,34.5,0,0,330911,7.8292,,Q
893,1,3,"Wilkes, Mrs. James (Ellen Needs)",female,47.0,1,0,363272,7.0000,,S
894,0,2,"Myles, Mr. Thomas Francis",male,62.0,0,0,240276,9.6875,,Q
895,0,3,"Wirz, Mr. Albert",male,27.0,0,0,315154,8.6625,,S
896,1,3,"Hirvonen, Mrs. Alexander (Helga E Lindqvist)",female,22.0,1,1,3101298,12.2875,,S
...,...,...,...,...,...,...,...,...,...,...,...
1305,0,3,"Spector, Mr. Woolf",male,,0,0,A.5. 3236,8.0500,,S
1306,1,1,"Oliva y Ocana, Dona. Fermina",female,39.0,0,0,PC 17758,108.9000,C105,C
1307,0,3,"Saether, Mr. Simon Sivertsen",male,38.5,0,0,SOTON/O.Q. 3101262,7.2500,,S
1308,0,3,"Ware, Mr. Frederick",male,,0,0,359309,8.0500,,S


---

## Manipulation mit `.reset_index()`

Mit der Methode `.reset_index()` kannst du den Index deines `DataFrames` zurücksetzen. Das kann vorallem dann hilfreich sein, wenn es sich dabei um ein resultierendes `DataFrame` aus einer vorherigen Abfrage handelt, wie im folgenden Beispiel:

In [48]:
df2 = df[df['Pclass'] > 1]
df2

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,892,0,3,"Kelly, Mr. James",male,34.5,0,0,330911,7.8292,,Q
1,893,1,3,"Wilkes, Mrs. James (Ellen Needs)",female,47.0,1,0,363272,7.0000,,S
2,894,0,2,"Myles, Mr. Thomas Francis",male,62.0,0,0,240276,9.6875,,Q
3,895,0,3,"Wirz, Mr. Albert",male,27.0,0,0,315154,8.6625,,S
4,896,1,3,"Hirvonen, Mrs. Alexander (Helga E Lindqvist)",female,22.0,1,1,3101298,12.2875,,S
...,...,...,...,...,...,...,...,...,...,...,...,...
412,1304,1,3,"Henriksson, Miss. Jenny Lovisa",female,28.0,0,0,347086,7.7750,,S
413,1305,0,3,"Spector, Mr. Woolf",male,,0,0,A.5. 3236,8.0500,,S
415,1307,0,3,"Saether, Mr. Simon Sivertsen",male,38.5,0,0,SOTON/O.Q. 3101262,7.2500,,S
416,1308,0,3,"Ware, Mr. Frederick",male,,0,0,359309,8.0500,,S


Versuchst du nämlich jetzt z.b. mit einer `for`-Schleife Daten zu bearbeiten (vermeide `for`-Schleifen in Pandas..), wirst du schnell feststellen, dass die Schleife Probleme mit der Indexierung hat.

In [51]:
# !
for i in range(len(df2)):
    df2.loc[i, 'birth_year'] = 2022 - df2.loc[i, 'Age']

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.obj[key] = infer_fill_value(value)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_column(loc, value, pi)


KeyError: 11

In [52]:
df2.head(15)

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,birth_year
0,892,0,3,"Kelly, Mr. James",male,34.5,0,0,330911,7.8292,,Q,1987.5
1,893,1,3,"Wilkes, Mrs. James (Ellen Needs)",female,47.0,1,0,363272,7.0,,S,1975.0
2,894,0,2,"Myles, Mr. Thomas Francis",male,62.0,0,0,240276,9.6875,,Q,1960.0
3,895,0,3,"Wirz, Mr. Albert",male,27.0,0,0,315154,8.6625,,S,1995.0
4,896,1,3,"Hirvonen, Mrs. Alexander (Helga E Lindqvist)",female,22.0,1,1,3101298,12.2875,,S,2000.0
5,897,0,3,"Svensson, Mr. Johan Cervin",male,14.0,0,0,7538,9.225,,S,2008.0
6,898,1,3,"Connolly, Miss. Kate",female,30.0,0,0,330972,7.6292,,Q,1992.0
7,899,0,2,"Caldwell, Mr. Albert Francis",male,26.0,1,1,248738,29.0,,S,1996.0
8,900,1,3,"Abrahim, Mrs. Joseph (Sophie Halaut Easu)",female,18.0,0,0,2657,7.2292,,C,2004.0
9,901,0,3,"Davies, Mr. John Samuel",male,21.0,2,0,A/4 48871,24.15,,S,2001.0


In [55]:
df2 = df2.reset_index()

In [56]:
df2

Unnamed: 0,index,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,birth_year
0,0,892,0,3,"Kelly, Mr. James",male,34.5,0,0,330911,7.8292,,Q,1987.5
1,1,893,1,3,"Wilkes, Mrs. James (Ellen Needs)",female,47.0,1,0,363272,7.0000,,S,1975.0
2,2,894,0,2,"Myles, Mr. Thomas Francis",male,62.0,0,0,240276,9.6875,,Q,1960.0
3,3,895,0,3,"Wirz, Mr. Albert",male,27.0,0,0,315154,8.6625,,S,1995.0
4,4,896,1,3,"Hirvonen, Mrs. Alexander (Helga E Lindqvist)",female,22.0,1,1,3101298,12.2875,,S,2000.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
306,412,1304,1,3,"Henriksson, Miss. Jenny Lovisa",female,28.0,0,0,347086,7.7750,,S,
307,413,1305,0,3,"Spector, Mr. Woolf",male,,0,0,A.5. 3236,8.0500,,S,
308,415,1307,0,3,"Saether, Mr. Simon Sivertsen",male,38.5,0,0,SOTON/O.Q. 3101262,7.2500,,S,
309,416,1308,0,3,"Ware, Mr. Frederick",male,,0,0,359309,8.0500,,S,


In [57]:
for i in range(len(df2)):
    df2.loc[i, 'birth_year'] = 2022 - df2.loc[i, 'Age']

In [58]:
df2

Unnamed: 0,index,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,birth_year
0,0,892,0,3,"Kelly, Mr. James",male,34.5,0,0,330911,7.8292,,Q,1987.5
1,1,893,1,3,"Wilkes, Mrs. James (Ellen Needs)",female,47.0,1,0,363272,7.0000,,S,1975.0
2,2,894,0,2,"Myles, Mr. Thomas Francis",male,62.0,0,0,240276,9.6875,,Q,1960.0
3,3,895,0,3,"Wirz, Mr. Albert",male,27.0,0,0,315154,8.6625,,S,1995.0
4,4,896,1,3,"Hirvonen, Mrs. Alexander (Helga E Lindqvist)",female,22.0,1,1,3101298,12.2875,,S,2000.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
306,412,1304,1,3,"Henriksson, Miss. Jenny Lovisa",female,28.0,0,0,347086,7.7750,,S,1994.0
307,413,1305,0,3,"Spector, Mr. Woolf",male,,0,0,A.5. 3236,8.0500,,S,
308,415,1307,0,3,"Saether, Mr. Simon Sivertsen",male,38.5,0,0,SOTON/O.Q. 3101262,7.2500,,S,1983.5
309,416,1308,0,3,"Ware, Mr. Frederick",male,,0,0,359309,8.0500,,S,


Nach erfolgreichem Zurücksetzen des Indexes, läuft nun auch die Schleife perfekt bis zum letzten Wert durch.