![title](./pic/merging/for_loop/1_title.png)

In [1]:
import pandas as pd
from tqdm.notebook import trange
import time

In [2]:
df = pd.read_csv('./csv/titanic.csv')
df.head(3)

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,892,0,3,"Kelly, Mr. James",male,34.5,0,0,330911,7.8292,,Q
1,893,1,3,"Wilkes, Mrs. James (Ellen Needs)",female,47.0,1,0,363272,7.0,,S
2,894,0,2,"Myles, Mr. Thomas Francis",male,62.0,0,0,240276,9.6875,,Q


<br>

Die Funktion `pd.merge()` erkennt, dass jedes `DataFrame` eine Spalte "x" hat, und fügt sie automatisch unter Verwendung dieser Spalte als Schlüssel zusammen. Das Ergebnis der Zusammenführung ist ein neuem `DataFrame`, das die Informationen aus den beiden Eingaben, kombiniert.

---

## Vorbereitung

In [3]:
df_passenger = df[['PassengerId', 'Name', 'Sex']]
df_passenger

Unnamed: 0,PassengerId,Name,Sex
0,892,"Kelly, Mr. James",male
1,893,"Wilkes, Mrs. James (Ellen Needs)",female
2,894,"Myles, Mr. Thomas Francis",male
3,895,"Wirz, Mr. Albert",male
4,896,"Hirvonen, Mrs. Alexander (Helga E Lindqvist)",female
...,...,...,...
413,1305,"Spector, Mr. Woolf",male
414,1306,"Oliva y Ocana, Dona. Fermina",female
415,1307,"Saether, Mr. Simon Sivertsen",male
416,1308,"Ware, Mr. Frederick",male


In [4]:
df_ids = df[['PassengerId', 'Survived']]
df_ids

Unnamed: 0,PassengerId,Survived
0,892,0
1,893,1
2,894,0
3,895,0
4,896,1
...,...,...
413,1305,0
414,1306,1
415,1307,0
416,1308,0


In [5]:
# df_ids.sample(frac=1).head()
df_ids = df_ids.sample(frac=1).reset_index(drop=True)
df_ids

Unnamed: 0,PassengerId,Survived
0,932,0
1,1011,1
2,1080,1
3,1000,0
4,918,1
...,...,...
413,910,1
414,1290,0
415,1019,1
416,1296,0


---

# Merging mit nested-`for`-loop

In [6]:
start_time = time.time()

merged_df = pd.DataFrame()

for i in trange(len(df_ids)):
    current_id = df_ids.loc[i, 'PassengerId']
    
    for j in range(len(df_passenger)):
        passenger_id = df_passenger.loc[j, 'PassengerId']
        
        if current_id == passenger_id:
            row = {
                'PassengerId': current_id,
                'Name': df_passenger.loc[j, 'Name'],
                'Sex': df_passenger.loc[j, 'Sex']
            }
            merged_df = merged_df.append(row, ignore_index=True)
            
end_time = time.time()

  0%|          | 0/418 [00:00<?, ?it/s]

In [7]:
merged_df

Unnamed: 0,Name,PassengerId,Sex
0,"Karun, Mr. Franz",932.0,male
1,"Chapman, Mrs. John Henry (Sara Elizabeth Lawry)",1011.0,female
2,"Sage, Miss. Ada",1080.0,female
3,"Willer, Mr. Aaron (Abi Weller"")""",1000.0,male
4,"Ostby, Miss. Helene Ragnhild",918.0,female
...,...,...,...
413,"Ilmakangas, Miss. Ida Livija",910.0,female
414,"Larsson-Rondberg, Mr. Edvard A",1290.0,male
415,"McCoy, Miss. Alicia",1019.0,female
416,"Frauenthal, Mr. Isaac Gerald",1296.0,male


---

## Manuelle Stichprobe zur Überprüfung

In [8]:
df[df['PassengerId'] == 1280]

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
388,1280,0,3,"Canavan, Mr. Patrick",male,21.0,0,0,364858,7.75,,Q


In [9]:
df[df['Name'] == 'Kelly, Mr. James']

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,892,0,3,"Kelly, Mr. James",male,34.5,0,0,330911,7.8292,,Q


---

## Benötigte Zeit

In [10]:
print(f'{round(end_time-start_time, 2)} Sekunden')

2.2 Sekunden
