![title](./pic/selecting/bedingte_selektion/1_title.png)

In [18]:
import pandas as pd

In [19]:
df = pd.read_csv('./csv/titanic.csv')
df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,892,0,3,"Kelly, Mr. James",male,34.5,0,0,330911,7.8292,,Q
1,893,1,3,"Wilkes, Mrs. James (Ellen Needs)",female,47.0,1,0,363272,7.0,,S
2,894,0,2,"Myles, Mr. Thomas Francis",male,62.0,0,0,240276,9.6875,,Q
3,895,0,3,"Wirz, Mr. Albert",male,27.0,0,0,315154,8.6625,,S
4,896,1,3,"Hirvonen, Mrs. Alexander (Helga E Lindqvist)",female,22.0,1,1,3101298,12.2875,,S


---

Bisher haben wir verschiedene Datenschichten indiziert, indem wir strukturelle Eigenschaften des `DataFrame` selbst verwendet haben. Um jedoch interessante Dinge mit den Daten zu tun, müssen wir oft Fragen stellen, die auf **Bedingungen** basieren.

Hierfür gibt es 4 unterschiedliche Selektoren.

![title](./pic/selecting/bedingte_selektion/2_selektoren.png)

Die Abfrage ist abhängig vom jeweiligen Datentyp der Spalte die abgefragt werden soll. So werden `Strings` in " " abgefragt und `float` / `int` Werte ohne:

In [20]:
df.dtypes

PassengerId      int64
Survived         int64
Pclass           int64
Name            object
Sex             object
Age            float64
SibSp            int64
Parch            int64
Ticket          object
Fare           float64
Cabin           object
Embarked        object
dtype: object

Spalte vom Typ `int64`...

In [23]:
df.Age == 47

0      False
1       True
2      False
3      False
4      False
       ...  
413    False
414    False
415    False
416    False
417    False
Name: Age, Length: 418, dtype: bool

...Spalte vom Typ `object` also in diesem Fall `String`

In [22]:
df.Sex == 'female'

0      False
1       True
2      False
3      False
4       True
       ...  
413    False
414     True
415    False
416    False
417    False
Name: Sex, Length: 418, dtype: bool

Diese Operation erzeugt eine Reihe von `Wahr/Falsch`-Booleschen, die auf dem Geschlecht jedes Passagieres basieren. Dieses Ergebnis kann dann innerhalb von `[]` verwendet werden, um die relevanten Daten auszuwählen:

---

## Gleich: `==`

![title](./pic/selecting/bedingte_selektion/3_gleich.png)

<video width="1000" controls src="./pic/selecting/bedingte_selektion/4_gleich_examples.mp4" />

Nehmen wir zum Beispiel an, dass wir uns speziell für Passagiere die Frauen interessieren:

In [7]:
df[df.Sex == 'female']

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,893,1,3,"Wilkes, Mrs. James (Ellen Needs)",female,47.0,1,0,363272,7.0000,,S
4,896,1,3,"Hirvonen, Mrs. Alexander (Helga E Lindqvist)",female,22.0,1,1,3101298,12.2875,,S
6,898,1,3,"Connolly, Miss. Kate",female,30.0,0,0,330972,7.6292,,Q
8,900,1,3,"Abrahim, Mrs. Joseph (Sophie Halaut Easu)",female,18.0,0,0,2657,7.2292,,C
12,904,1,1,"Snyder, Mrs. John Pillsbury (Nelle Stevenson)",female,23.0,1,0,21228,82.2667,B45,S
...,...,...,...,...,...,...,...,...,...,...,...,...
409,1301,1,3,"Peacock, Miss. Treasteall",female,3.0,1,1,SOTON/O.Q. 3101315,13.7750,,S
410,1302,1,3,"Naughton, Miss. Hannah",female,,0,0,365237,7.7500,,Q
411,1303,1,1,"Minahan, Mrs. William Edward (Lillian E Thorpe)",female,37.0,1,0,19928,90.0000,C78,Q
412,1304,1,3,"Henriksson, Miss. Jenny Lovisa",female,28.0,0,0,347086,7.7750,,S


<br>

Das resultierende `DataFrame` beinhaltet **152** Zeilen. Das Original hatte **418**. Das bedeutet, dass etwa **36 % der Passagiere Frauen** sind.

<br>

In [8]:
df[df.Pclass == 1]

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
11,903,0,1,"Jones, Mr. Charles Cresson",male,46.0,0,0,694,26.0000,,S
12,904,1,1,"Snyder, Mrs. John Pillsbury (Nelle Stevenson)",female,23.0,1,0,21228,82.2667,B45,S
14,906,1,1,"Chaffee, Mrs. Herbert Fuller (Carrie Constance...",female,47.0,1,0,W.E.P. 5734,61.1750,E31,S
20,912,0,1,"Rothschild, Mr. Martin",male,55.0,1,0,PC 17603,59.4000,,C
22,914,1,1,"Flegenheim, Mrs. Alfred (Antoinette)",female,,0,0,PC 17598,31.6833,,S
...,...,...,...,...,...,...,...,...,...,...,...,...
403,1295,0,1,"Carrau, Mr. Jose Pedro",male,17.0,0,0,113059,47.1000,,S
404,1296,0,1,"Frauenthal, Mr. Isaac Gerald",male,43.0,1,0,17765,27.7208,D40,C
407,1299,0,1,"Widener, Mr. George Dunton",male,50.0,1,1,113503,211.5000,C80,C
411,1303,1,1,"Minahan, Mrs. William Edward (Lillian E Thorpe)",female,37.0,1,0,19928,90.0000,C78,Q


---

## Ungleich: `!=`

Wenn also 152 von 418 Passagieren Frauen sind, müssten 266 Passagiere *NICHT*-Frauen sein (also Männer). Das kann mit dem `Ungleich`-Operator überprüft werden.

![title](./pic/selecting/bedingte_selektion/5_ungleich.png)

<video width="1000" controls src="./pic/selecting/bedingte_selektion/6_ungleich_examples.mp4" />

In [9]:
df[df.Sex != 'female']

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,892,0,3,"Kelly, Mr. James",male,34.5,0,0,330911,7.8292,,Q
2,894,0,2,"Myles, Mr. Thomas Francis",male,62.0,0,0,240276,9.6875,,Q
3,895,0,3,"Wirz, Mr. Albert",male,27.0,0,0,315154,8.6625,,S
5,897,0,3,"Svensson, Mr. Johan Cervin",male,14.0,0,0,7538,9.2250,,S
7,899,0,2,"Caldwell, Mr. Albert Francis",male,26.0,1,1,248738,29.0000,,S
...,...,...,...,...,...,...,...,...,...,...,...,...
407,1299,0,1,"Widener, Mr. George Dunton",male,50.0,1,1,113503,211.5000,C80,C
413,1305,0,3,"Spector, Mr. Woolf",male,,0,0,A.5. 3236,8.0500,,S
415,1307,0,3,"Saether, Mr. Simon Sivertsen",male,38.5,0,0,SOTON/O.Q. 3101262,7.2500,,S
416,1308,0,3,"Ware, Mr. Frederick",male,,0,0,359309,8.0500,,S


In [10]:
df[df.Pclass != 1]

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,892,0,3,"Kelly, Mr. James",male,34.5,0,0,330911,7.8292,,Q
1,893,1,3,"Wilkes, Mrs. James (Ellen Needs)",female,47.0,1,0,363272,7.0000,,S
2,894,0,2,"Myles, Mr. Thomas Francis",male,62.0,0,0,240276,9.6875,,Q
3,895,0,3,"Wirz, Mr. Albert",male,27.0,0,0,315154,8.6625,,S
4,896,1,3,"Hirvonen, Mrs. Alexander (Helga E Lindqvist)",female,22.0,1,1,3101298,12.2875,,S
...,...,...,...,...,...,...,...,...,...,...,...,...
412,1304,1,3,"Henriksson, Miss. Jenny Lovisa",female,28.0,0,0,347086,7.7750,,S
413,1305,0,3,"Spector, Mr. Woolf",male,,0,0,A.5. 3236,8.0500,,S
415,1307,0,3,"Saether, Mr. Simon Sivertsen",male,38.5,0,0,SOTON/O.Q. 3101262,7.2500,,S
416,1308,0,3,"Ware, Mr. Frederick",male,,0,0,359309,8.0500,,S


---

## Kleiner oder Kleiner-Gleich: `<` `<=`

Willst du beispielsweise wissen, wieviele Kleinkinder sich an Board befunden haben, kannst du das mit dem Feature Alter abfragen. Dazu lasssen wir uns alle Passagiere mit dem `Kleiner`-Operator ausgeben, die jünger als 3 Jahre sind:

![title](./pic/selecting/bedingte_selektion/7_kleiner.png)

<video width="1000" controls src="./pic/selecting/bedingte_selektion/8_kleiner_examples.mp4" />

In [11]:
df[df['Age'] < 3.0]

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
89,981,0,2,"Wells, Master. Ralph Lester",male,2.0,1,1,29103,23.0,,S
117,1009,1,3,"Sandstrom, Miss. Beatrice Irene",female,1.0,1,1,PP 9549,16.7,G6,S
201,1093,0,3,"Danbom, Master. Gilbert Sigvard Emanuel",male,0.33,0,2,347080,14.4,,S
250,1142,1,2,"West, Miss. Barbara J",female,0.92,1,2,C.A. 34651,27.75,,S
263,1155,1,3,"Klasen, Miss. Gertrud Emilia",female,1.0,1,1,350405,12.1833,,S
281,1173,0,3,"Peacock, Master. Alfred Edward",male,0.75,1,1,SOTON/O.Q. 3101315,13.775,,S
284,1176,1,3,"Rosblom, Miss. Salli Helena",female,2.0,1,1,370129,20.2125,,S
296,1188,1,2,"Laroche, Miss. Louise",female,1.0,1,2,SC/Paris 2123,41.5792,,C
307,1199,0,3,"Aks, Master. Philip Frank",male,0.83,0,1,392091,9.35,,S
354,1246,1,3,"Dean, Miss. Elizabeth Gladys Millvina""""",female,0.17,1,2,C.A. 2315,20.575,,S


In [12]:
df[df.SibSp < 1]

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,892,0,3,"Kelly, Mr. James",male,34.5,0,0,330911,7.8292,,Q
2,894,0,2,"Myles, Mr. Thomas Francis",male,62.0,0,0,240276,9.6875,,Q
3,895,0,3,"Wirz, Mr. Albert",male,27.0,0,0,315154,8.6625,,S
5,897,0,3,"Svensson, Mr. Johan Cervin",male,14.0,0,0,7538,9.2250,,S
6,898,1,3,"Connolly, Miss. Kate",female,30.0,0,0,330972,7.6292,,Q
...,...,...,...,...,...,...,...,...,...,...,...,...
412,1304,1,3,"Henriksson, Miss. Jenny Lovisa",female,28.0,0,0,347086,7.7750,,S
413,1305,0,3,"Spector, Mr. Woolf",male,,0,0,A.5. 3236,8.0500,,S
414,1306,1,1,"Oliva y Ocana, Dona. Fermina",female,39.0,0,0,PC 17758,108.9000,C105,C
415,1307,0,3,"Saether, Mr. Simon Sivertsen",male,38.5,0,0,SOTON/O.Q. 3101262,7.2500,,S


Bei dieser Form der Abfrage ist das Alter 3 **exklusiv**. Das heißt, Kinder mit 3 Jahren, sind hier ausgeschlossen und werden bei der Abfrage nicht berücksichtigt. Wollen wir jetzt aber ebenfalls Kinder mit dem Alter 3, können wir entweder die Abfrage `alter < 4` bilden, oder aber `alter <= 3`, was das Alter von 3 **inkludiert**:

In [13]:
df[df['Age'] <= 3.0]

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
89,981,0,2,"Wells, Master. Ralph Lester",male,2.0,1,1,29103,23.0,,S
117,1009,1,3,"Sandstrom, Miss. Beatrice Irene",female,1.0,1,1,PP 9549,16.7,G6,S
201,1093,0,3,"Danbom, Master. Gilbert Sigvard Emanuel",male,0.33,0,2,347080,14.4,,S
250,1142,1,2,"West, Miss. Barbara J",female,0.92,1,2,C.A. 34651,27.75,,S
263,1155,1,3,"Klasen, Miss. Gertrud Emilia",female,1.0,1,1,350405,12.1833,,S
281,1173,0,3,"Peacock, Master. Alfred Edward",male,0.75,1,1,SOTON/O.Q. 3101315,13.775,,S
284,1176,1,3,"Rosblom, Miss. Salli Helena",female,2.0,1,1,370129,20.2125,,S
296,1188,1,2,"Laroche, Miss. Louise",female,1.0,1,2,SC/Paris 2123,41.5792,,C
307,1199,0,3,"Aks, Master. Philip Frank",male,0.83,0,1,392091,9.35,,S
354,1246,1,3,"Dean, Miss. Elizabeth Gladys Millvina""""",female,0.17,1,2,C.A. 2315,20.575,,S


Wie du also hier sehen kannst, ist Passagier 409 ein dreijähriges Kind und befindet sich nun auch in dem neuen `DataFrame`

---

## Größer oder Größer-Gleich: `>` `>=`

Wer ist als Familie gereist und wer nicht? Das erfährst du über das **SibSp** Feature. 0 Bedeutet in diesem Fall der Passagier ist alleine gereist. `>`0 dass sich mindestens ein Familienmitglied mit an Board befunden hat.

![title](./pic/selecting/bedingte_selektion/9_groesser.png)

<video width="1000" controls src="./pic/selecting/bedingte_selektion/10_groesser_examples.mp4" />

In [25]:
df[df['SibSp'] > 0]

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,893,1,3,"Wilkes, Mrs. James (Ellen Needs)",female,47.0,1,0,363272,7.0000,,S
4,896,1,3,"Hirvonen, Mrs. Alexander (Helga E Lindqvist)",female,22.0,1,1,3101298,12.2875,,S
7,899,0,2,"Caldwell, Mr. Albert Francis",male,26.0,1,1,248738,29.0000,,S
9,901,0,3,"Davies, Mr. John Samuel",male,21.0,2,0,A/4 48871,24.1500,,S
12,904,1,1,"Snyder, Mrs. John Pillsbury (Nelle Stevenson)",female,23.0,1,0,21228,82.2667,B45,S
...,...,...,...,...,...,...,...,...,...,...,...,...
406,1298,0,2,"Ware, Mr. William Jeffery",male,23.0,1,0,28666,10.5000,,S
407,1299,0,1,"Widener, Mr. George Dunton",male,50.0,1,1,113503,211.5000,C80,C
409,1301,1,3,"Peacock, Miss. Treasteall",female,3.0,1,1,SOTON/O.Q. 3101315,13.7750,,S
411,1303,1,1,"Minahan, Mrs. William Edward (Lillian E Thorpe)",female,37.0,1,0,19928,90.0000,C78,Q


Auch hier kann wieder der `>=`-Operator verwendet werden. Um das selbe Resultat wie oben zu erhalten, kann hierbei die Abfrage `sibsp >= 1` gestellt werden.

In [15]:
df[df['SibSp'] >= 1]

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,893,1,3,"Wilkes, Mrs. James (Ellen Needs)",female,47.0,1,0,363272,7.0000,,S
4,896,1,3,"Hirvonen, Mrs. Alexander (Helga E Lindqvist)",female,22.0,1,1,3101298,12.2875,,S
7,899,0,2,"Caldwell, Mr. Albert Francis",male,26.0,1,1,248738,29.0000,,S
9,901,0,3,"Davies, Mr. John Samuel",male,21.0,2,0,A/4 48871,24.1500,,S
12,904,1,1,"Snyder, Mrs. John Pillsbury (Nelle Stevenson)",female,23.0,1,0,21228,82.2667,B45,S
...,...,...,...,...,...,...,...,...,...,...,...,...
406,1298,0,2,"Ware, Mr. William Jeffery",male,23.0,1,0,28666,10.5000,,S
407,1299,0,1,"Widener, Mr. George Dunton",male,50.0,1,1,113503,211.5000,C80,C
409,1301,1,3,"Peacock, Miss. Treasteall",female,3.0,1,1,SOTON/O.Q. 3101315,13.7750,,S
411,1303,1,1,"Minahan, Mrs. William Edward (Lillian E Thorpe)",female,37.0,1,0,19928,90.0000,C78,Q


---

## Modulo-Operator `%`

Auch wenn er in diesem Szenario eher selten zum Einsatz kommt, will ich der Vollständigkeitshalber auch auf den vorerst letzten Operator eingehen: der `Modulo`-Operator. 

In [26]:
df[df['Age'] % 2 == 0]

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
2,894,0,2,"Myles, Mr. Thomas Francis",male,62.0,0,0,240276,9.6875,,Q
4,896,1,3,"Hirvonen, Mrs. Alexander (Helga E Lindqvist)",female,22.0,1,1,3101298,12.2875,,S
5,897,0,3,"Svensson, Mr. Johan Cervin",male,14.0,0,0,7538,9.2250,,S
6,898,1,3,"Connolly, Miss. Kate",female,30.0,0,0,330972,7.6292,,Q
7,899,0,2,"Caldwell, Mr. Albert Francis",male,26.0,1,1,248738,29.0000,,S
...,...,...,...,...,...,...,...,...,...,...,...,...
401,1293,0,2,"Gale, Mr. Harry",male,38.0,1,0,28664,21.0000,,S
402,1294,1,1,"Gibson, Miss. Dorothy Winifred",female,22.0,0,1,112378,59.4000,,C
405,1297,0,2,"Nourney, Mr. Alfred (Baron von Drachstedt"")""",male,20.0,0,0,SC/PARIS 2166,13.8625,D38,C
407,1299,0,1,"Widener, Mr. George Dunton",male,50.0,1,1,113503,211.5000,C80,C
