### Cleaning and Enriching the data

The scraped data includes the following things that need to be cleaned up:

- {""} in the Character_name columns
- Sometimes ; in the end of a value
- Names within character_name have []
- Duplicates need to be cleaned up
- The order and the index needs to be fixed


In [1]:
import pandas as pd
import numpy as np

In [2]:
df_B_stage2 = pd.read_csv("../Data/B_stage2.csv", delimiter = ",")

In [3]:
df_B_stage2

Unnamed: 0,award_year,category,name1,name2,name3,film_title,character_name,film_title2,character_name2,film_title3,character_name3,film_title4,character_name4,honorary_statement,description,note
0,1989 (62nd),DIRECTING,Oliver Stone,,,Born on the Fourth of July,,,,,,,,,,
1,1957 (30th),MUSIC (Scoring),Malcolm Arnold,,,The Bridge on the River Kwai,,,,,,,,,,
2,1992 (65th),VISUAL EFFECTS,"Ken Ralston, Doug Chiang, Doug Smythe, Tom Woo...",,,Death Becomes Her,,,,,,,,,,
3,2001 (74th),SHORT FILM (Live Action),"Ray McKinnon, Lisa Blount",,,the accountant,,,,,,,,,,
4,2014 (87th),ACTOR IN A SUPPORTING ROLE,J.K. Simmons,,,Whiplash,"{""Fletcher""}",,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2648,1942 (15th),MUSIC (Song),Music and Lyrics by Irving Berlin,,,Holiday Inn,,,,,,,,,,
2649,2019 (92nd),ACTOR IN A LEADING ROLE,Joaquin Phoenix,,,Joker,"{""Arthur Fleck""}",,,,,,,,,
2650,2019 (92nd),ANIMATED FEATURE FILM,"Josh Cooley, Mark Nielsen and Jonas Rivera",,,Toy Story 4,,,,,,,,,,
2651,1935 (8th),WRITING (Original Story),"Ben Hecht, Charles MacArthur",,,The Scoundrel,,,,,,,,,,


### 1. Cleaning of the data

#### 1.1 Changing the order

It becomes quite obvious that the order of the entries is not the same as it is on the website. The aim here is to begin again with the 1st oscars and then go up to the 95th.

The order needs to be fixed and afterwards it needs to be sorted by category. The index will also be resetted. 

In [4]:
df = df_B_stage2.sort_values(by=['award_year', 'category'], ascending = [True, True]).reset_index(drop=True)


#### 1.2 Create separate rows and delete everything after 2017

Split the Years from the number representing the how maniest time the oscars took place. Therefore, two new columns were created the year and the oscars_count. The columns are rearranged. 

All the years after 2017 were deleted. 

In [5]:
#create two separate rows with the years and the number
df[['year', 'oscars_count']] = df['award_year'].str.extract(r'(\d{4}(?:/\d{2})?) \((\d+)\w*\)')
df = df[['year', 'oscars_count', 'award_year','category', 'name1', 'name2', 'name3', 'film_title', 'character_name', 
         'film_title2','character_name2','film_title3','character_name3','film_title4','character_name4',
         'honorary_statement','description','note']]
#delete all the years after 2017
df = df.loc[df.year < '2018']

df

Unnamed: 0,year,oscars_count,award_year,category,name1,name2,name3,film_title,character_name,film_title2,character_name2,film_title3,character_name3,film_title4,character_name4,honorary_statement,description,note
0,1927/28,1,1927/28 (1st),ACTOR,Emil Jannings,,,The Last Command,"{""General Dolgorucki [Grand Duke Sergius Alexa...",The Way of All Flesh,"{""August Schilling""}",,,,,,,
1,1927/28,1,1927/28 (1st),ACTRESS,Janet Gaynor,,,7th Heaven,"{""Diane""};",Street Angel,"{""Angela""};",Sunrise,"{""The Wife""}",,,,,
2,1927/28,1,1927/28 (1st),ART DIRECTION,William Cameron Menzies,,,The Dove;,,Tempest,,,,,,,,
3,1927/28,1,1927/28 (1st),CINEMATOGRAPHY,Charles Rosher,Karl Struss,,Sunrise,,Sunrise,,,,,,,,"[NOTE: For this awards year, awards were prese..."
4,1927/28,1,1927/28 (1st),DIRECTING (Comedy Picture),Lewis Milestone,,,Two Arabian Knights,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2510,2017,90,2017 (90th),SPECIAL AWARD,,,,,,,,,,,,To Alejandro G. Iñárritu's CARNE y ARENA virtu...,,
2511,2017,90,2017 (90th),VISUAL EFFECTS,"John Nelson, Gerd Nefzer, Paul Lambert and Ric...",,,Blade Runner 2049,,,,,,,,,,
2512,2017,90,2017 (90th),WRITING (Adapted Screenplay),Screenplay by James Ivory,,,Call Me by Your Name,,,,,,,,,,
2513,2017,90,2017 (90th),WRITING (Original Screenplay),Written by Jordan Peele,,,Get Out,,,,,,,,,,


The award-year column is no longer needed. 

In [6]:
df = df.drop(['award_year'], axis=1)
df

Unnamed: 0,year,oscars_count,category,name1,name2,name3,film_title,character_name,film_title2,character_name2,film_title3,character_name3,film_title4,character_name4,honorary_statement,description,note
0,1927/28,1,ACTOR,Emil Jannings,,,The Last Command,"{""General Dolgorucki [Grand Duke Sergius Alexa...",The Way of All Flesh,"{""August Schilling""}",,,,,,,
1,1927/28,1,ACTRESS,Janet Gaynor,,,7th Heaven,"{""Diane""};",Street Angel,"{""Angela""};",Sunrise,"{""The Wife""}",,,,,
2,1927/28,1,ART DIRECTION,William Cameron Menzies,,,The Dove;,,Tempest,,,,,,,,
3,1927/28,1,CINEMATOGRAPHY,Charles Rosher,Karl Struss,,Sunrise,,Sunrise,,,,,,,,"[NOTE: For this awards year, awards were prese..."
4,1927/28,1,DIRECTING (Comedy Picture),Lewis Milestone,,,Two Arabian Knights,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2510,2017,90,SPECIAL AWARD,,,,,,,,,,,,To Alejandro G. Iñárritu's CARNE y ARENA virtu...,,
2511,2017,90,VISUAL EFFECTS,"John Nelson, Gerd Nefzer, Paul Lambert and Ric...",,,Blade Runner 2049,,,,,,,,,,
2512,2017,90,WRITING (Adapted Screenplay),Screenplay by James Ivory,,,Call Me by Your Name,,,,,,,,,,
2513,2017,90,WRITING (Original Screenplay),Written by Jordan Peele,,,Get Out,,,,,,,,,,


#### 1.3 Check for duplicates and delete them

In a first step it will be tested if there are any duplicates. 

In [7]:
duplicates =df[df.duplicated(keep=False)]             #to check if there are duplicates
duplicates

Unnamed: 0,year,oscars_count,category,name1,name2,name3,film_title,character_name,film_title2,character_name2,film_title3,character_name3,film_title4,character_name4,honorary_statement,description,note
9,1927/28,1,UNIQUE AND ARTISTIC PICTURE,Fox,,,Sunrise,,,,,,,,,,
10,1927/28,1,UNIQUE AND ARTISTIC PICTURE,Fox,,,Sunrise,,,,,,,,,,
21,1929/30,3,ACTOR,George Arliss,,,Disraeli,"{""Benjamin Disraeli""}",,,,,,,,,[NOTE: As allowed by the award rules for this ...
22,1929/30,3,ACTOR,George Arliss,,,Disraeli,"{""Benjamin Disraeli""}",,,,,,,,,[NOTE: As allowed by the award rules for this ...
75,1934,7,ASSISTANT DIRECTOR,John Waters,,,Viva Villa!,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2359,2012,85,SHORT FILM (Live Action),Shawn Christensen,,,Curfew,,,,,,,,,,
2402,2014,87,BEST PICTURE,"Alejandro G. Iñárritu, John Lesher and James W...",,,Birdman or (The Unexpected Virtue of Ignorance),,,,,,,,,,
2403,2014,87,BEST PICTURE,"Alejandro G. Iñárritu, John Lesher and James W...",,,Birdman or (The Unexpected Virtue of Ignorance),,,,,,,,,,
2513,2017,90,WRITING (Original Screenplay),Written by Jordan Peele,,,Get Out,,,,,,,,,,


Delete all the duplicates and reset the index numbers. 

In [8]:
df = df.drop_duplicates()
df = df.reset_index(drop=True)
len(df.index) # check how many rows there are left

2454

#### 1.4 Delete semicolon in the end of some words

Removal of the semicolon in the end of names in the colums character_name/2 and film_title/2

In [9]:
# Check for missing values in the column
#film_name
mask_missing = df['film_title'].isna()

# Check if values end with a semicolon and are not missing
mask_semicolon = df['film_title'].str.endswith(';') & ~mask_missing

# Remove semicolons in values that end with a semicolon
df.loc[mask_semicolon, 'film_title'] = df.loc[mask_semicolon, 'film_title'].str.rstrip(';')

# redo the steps for film_name2
mask_missing = df['film_title2'].isna()
mask_semicolon = df['film_title2'].str.endswith(';') & ~mask_missing
df.loc[mask_semicolon, 'film_title2'] = df.loc[mask_semicolon, 'film_title2'].str.rstrip(';')

# redo the steps for Character_name
mask_missing = df['character_name'].isna()
mask_semicolon = df['character_name'].str.endswith(';') & ~mask_missing
df.loc[mask_semicolon, 'character_name'] = df.loc[mask_semicolon, 'character_name'].str.rstrip(';')

# redo the steps for Character_name2
mask_missing = df['character_name2'].isna()
mask_semicolon = df['character_name2'].str.endswith(';') & ~mask_missing
df.loc[mask_semicolon, 'character_name2'] = df.loc[mask_semicolon, 'character_name2'].str.rstrip(';')

df

Unnamed: 0,year,oscars_count,category,name1,name2,name3,film_title,character_name,film_title2,character_name2,film_title3,character_name3,film_title4,character_name4,honorary_statement,description,note
0,1927/28,1,ACTOR,Emil Jannings,,,The Last Command,"{""General Dolgorucki [Grand Duke Sergius Alexa...",The Way of All Flesh,"{""August Schilling""}",,,,,,,
1,1927/28,1,ACTRESS,Janet Gaynor,,,7th Heaven,"{""Diane""}",Street Angel,"{""Angela""}",Sunrise,"{""The Wife""}",,,,,
2,1927/28,1,ART DIRECTION,William Cameron Menzies,,,The Dove,,Tempest,,,,,,,,
3,1927/28,1,CINEMATOGRAPHY,Charles Rosher,Karl Struss,,Sunrise,,Sunrise,,,,,,,,"[NOTE: For this awards year, awards were prese..."
4,1927/28,1,DIRECTING (Comedy Picture),Lewis Milestone,,,Two Arabian Knights,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2449,2017,90,SOUND MIXING,"Gregg Landaker, Gary A. Rizzo and Mark Weingarten",,,Dunkirk,,,,,,,,,,
2450,2017,90,SPECIAL AWARD,,,,,,,,,,,,To Alejandro G. Iñárritu's CARNE y ARENA virtu...,,
2451,2017,90,VISUAL EFFECTS,"John Nelson, Gerd Nefzer, Paul Lambert and Ric...",,,Blade Runner 2049,,,,,,,,,,
2452,2017,90,WRITING (Adapted Screenplay),Screenplay by James Ivory,,,Call Me by Your Name,,,,,,,,,,


#### 1.5 Clean all the character_names

All the character Names are written in the following format: {"Character"}. To have clean data the {""} will be removed so that only the name will be left. This can be done in one step for the whole dataframe: 

In [10]:
df = df.applymap(lambda x: x.strip('{}') if isinstance(x, str) else x)
df = df.applymap(lambda x: x.strip('""') if isinstance(x, str) else x)

In [11]:
df

Unnamed: 0,year,oscars_count,category,name1,name2,name3,film_title,character_name,film_title2,character_name2,film_title3,character_name3,film_title4,character_name4,honorary_statement,description,note
0,1927/28,1,ACTOR,Emil Jannings,,,The Last Command,General Dolgorucki [Grand Duke Sergius Alexander],The Way of All Flesh,August Schilling,,,,,,,
1,1927/28,1,ACTRESS,Janet Gaynor,,,7th Heaven,Diane,Street Angel,Angela,Sunrise,The Wife,,,,,
2,1927/28,1,ART DIRECTION,William Cameron Menzies,,,The Dove,,Tempest,,,,,,,,
3,1927/28,1,CINEMATOGRAPHY,Charles Rosher,Karl Struss,,Sunrise,,Sunrise,,,,,,,,"[NOTE: For this awards year, awards were prese..."
4,1927/28,1,DIRECTING (Comedy Picture),Lewis Milestone,,,Two Arabian Knights,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2449,2017,90,SOUND MIXING,"Gregg Landaker, Gary A. Rizzo and Mark Weingarten",,,Dunkirk,,,,,,,,,,
2450,2017,90,SPECIAL AWARD,,,,,,,,,,,,To Alejandro G. Iñárritu's CARNE y ARENA virtu...,,
2451,2017,90,VISUAL EFFECTS,"John Nelson, Gerd Nefzer, Paul Lambert and Ric...",,,Blade Runner 2049,,,,,,,,,,
2452,2017,90,WRITING (Adapted Screenplay),Screenplay by James Ivory,,,Call Me by Your Name,,,,,,,,,,


#### 1.6 Remove not needed Note: in note column

Remove NOTE: and [] from the note column.

In [12]:
# Remove the [Note: prefix from values in the column
df['note'] = df['note'].str.replace('\[NOTE: ', '')
# remove the ] in the end
mask_missing = df['note'].isna()
mask_semicolon = df['note'].str.endswith(']') & ~mask_missing
df.loc[mask_semicolon, 'note'] = df.loc[mask_semicolon, 'note'].str.rstrip(']')

df

  df['note'] = df['note'].str.replace('\[NOTE: ', '')


Unnamed: 0,year,oscars_count,category,name1,name2,name3,film_title,character_name,film_title2,character_name2,film_title3,character_name3,film_title4,character_name4,honorary_statement,description,note
0,1927/28,1,ACTOR,Emil Jannings,,,The Last Command,General Dolgorucki [Grand Duke Sergius Alexander],The Way of All Flesh,August Schilling,,,,,,,
1,1927/28,1,ACTRESS,Janet Gaynor,,,7th Heaven,Diane,Street Angel,Angela,Sunrise,The Wife,,,,,
2,1927/28,1,ART DIRECTION,William Cameron Menzies,,,The Dove,,Tempest,,,,,,,,
3,1927/28,1,CINEMATOGRAPHY,Charles Rosher,Karl Struss,,Sunrise,,Sunrise,,,,,,,,"For this awards year, awards were presented in..."
4,1927/28,1,DIRECTING (Comedy Picture),Lewis Milestone,,,Two Arabian Knights,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2449,2017,90,SOUND MIXING,"Gregg Landaker, Gary A. Rizzo and Mark Weingarten",,,Dunkirk,,,,,,,,,,
2450,2017,90,SPECIAL AWARD,,,,,,,,,,,,To Alejandro G. Iñárritu's CARNE y ARENA virtu...,,
2451,2017,90,VISUAL EFFECTS,"John Nelson, Gerd Nefzer, Paul Lambert and Ric...",,,Blade Runner 2049,,,,,,,,,,
2452,2017,90,WRITING (Adapted Screenplay),Screenplay by James Ivory,,,Call Me by Your Name,,,,,,,,,,


In [13]:
df.loc[12,'note'] # check if the [NOTE: ] is gone.

'This award was not associated with any specific film title.'

#### 1.7 Check if there are special characters to be cleaned in film title column

Look for special characters and clean them if necessary

In [14]:
# all the special characters can be added
pattern = r'[@éàè§^~]'

# create special with all the values that contain a special character
special = df['film_title'].str.contains(pattern, na=False)

# filter the dataframe for the special values
values_with_special_characters = df[special]

# show the rows including a special character in the defined column
values_with_special_characters

Unnamed: 0,year,oscars_count,category,name1,name2,name3,film_title,character_name,film_title2,character_name2,film_title3,character_name3,film_title4,character_name4,honorary_statement,description,note
2279,2012,85,ACTRESS IN A SUPPORTING ROLE,Anne Hathaway,,,Les Misérables,Fantine,,,,,,,,,
2292,2012,85,MAKEUP AND HAIRSTYLING,Lisa Westcott and Julie Dartnell,,,Les Misérables,,,,,,,,,,
2302,2012,85,SOUND MIXING,"Andy Nelson, Mark Paterson and Simon Hayes",,,Les Misérables,,,,,,,,,,


### 2. Enrichment of the data


#### 2.1 Add column True/False

As a first step we would like to know if there are more awards won by an actor/actress or a director/directing position. A new column is created with the Values True/False related to the fact who the winner of the award was. 

In [15]:
# parts of word to be included
words = 'ACTOR|ACTRESS|DIRECT'  

# Create a new column "act_dir" that checks for the specified part of the word
df['act_dir'] = df['category'].str.contains(words, case=False)

df.act_dir.sum()

590

#### 2.2 Award won by single or multiple person

Another column is created if an award was won by only one person or by multiple ones. This is shown in the new column winner_type.

In [16]:
# Create a new column "winner_type" to indicate if it's a single winner or multiple winners
df['winner_type'] = 'Single Winner'

# Check if there are additional names in name2 and name3 columns
df.loc[df['name2'].notna() | df['name3'].notna(), 'winner_type'] = 'Multiple Winners'

df.winner_type.value_counts()


Single Winner       2443
Multiple Winners      11
Name: winner_type, dtype: int64

In [17]:
df[df.columns[-2:]]

Unnamed: 0,act_dir,winner_type
0,True,Single Winner
1,True,Single Winner
2,True,Single Winner
3,False,Multiple Winners
4,True,Single Winner
...,...,...
2449,False,Single Winner
2450,False,Single Winner
2451,False,Single Winner
2452,False,Single Winner


#### Separate two winners to two rows

In [18]:
new_df = pd.DataFrame(columns=['year', 'oscars_count', 'category', 'name', 'film_title', 'character_name', 'honorary_statement',
                               'description', 'note', 'act_dir', 'winner_type'])

# Iterate over each row in the original dataframe
for index, row in df.iterrows():
    # Extract the common values from the first row
    year = row['year']
    oscars_count = row['oscars_count']
    category = row['category']
    honorary_statement = row['honorary_statement']
    description = row['description']
    note = row['note']
    act_dir = row['act_dir']
    winner_type = row['winner_type']
    
    # Extract the values from the first set of columns (name1, film_title, character_name)
    name1 = row['name1']
    film_title = row['film_title']
    character_name = row['character_name']
    
    # Create a new row in the new dataframe with the extracted values
    new_row = pd.DataFrame({'year': [year], 'oscars_count': [oscars_count], 'category': [category],
                            'name': [name1], 'film_title': [film_title], 'character_name': [character_name],
                            'honorary_statement' : [honorary_statement],'description' : [description], 
                            'note' : [note], 'act_dir' : [act_dir], 'winner_type' : [winner_type]})
    new_df = pd.concat([new_df, new_row], ignore_index=True)
    
    # Check if the second set of columns has values and copy them to a new row if conditions are met
    if pd.notna(row['name2']) and pd.notna(row['film_title2']):
        film_title2 = row['film_title2']
        character_name2 = row['character_name2']
        name2 = row['name2']
        
        new_row = pd.DataFrame({'year': [year], 'oscars_count': [oscars_count], 'category': [category],
                                'name': [name2], 'film_title': [film_title2], 'character_name': [character_name2],
                                'honorary_statement' : [honorary_statement],'description' : [description], 
                                'note' : [note], 'act_dir' : [act_dir], 'winner_type' : [winner_type]})
        new_df = pd.concat([new_df, new_row], ignore_index=True)
    
    # Check if the third set of columns has values and copy them to a new row if conditions are met
    if pd.notna(row['name3']) and pd.notna(row['film_title3']):
        film_title3 = row['film_title3']
        character_name3 = row['character_name3']
        name3 = row['name3']
        
        new_row = pd.DataFrame({'year': [year], 'oscars_count': [oscars_count], 'category': [category],
                                'name': [name3], 'film_title': [film_title3], 'character_name': [character_name3],
                                'honorary_statement' : [honorary_statement],'description' : [description], 
                                'note' : [note], 'act_dir' : [act_dir], 'winner_type' : [winner_type]})
        new_df = pd.concat([new_df, new_row], ignore_index=True)
    
    # Check if the fourth set of columns has values and copy them to a new row if conditions are met
    if pd.notna(row['film_title4']):
        film_title4 = row['film_title4']
        character_name4 = row['character_name4']
        
        new_row = pd.DataFrame({'year': [year], 'oscars_count': [oscars_count], 'category': [category],
                                'name': [name1], 'film_title': [film_title4], 'character_name': [character_name4],
                                'honorary_statement' : [honorary_statement],'description' : [description], 
                                'note' : [note], 'act_dir' : [act_dir], 'winner_type' : [winner_type]})
        new_df = pd.concat([new_df, new_row]).reset_index(drop=True)


  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=Tru

  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=Tru

  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=Tru

  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=Tru

  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=Tru

  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=Tru

  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=Tru

  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=Tru

  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=Tru

  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=Tru

  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=Tru

  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=Tru

  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=Tru

  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=Tru

  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=Tru

  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=Tru

  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=Tru

  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=Tru

  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=Tru

  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=Tru

  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=Tru

  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=Tru

  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=Tru

  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=Tru

  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=Tru

  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=Tru

  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=Tru

  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=Tru

  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=Tru

  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=Tru

  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=Tru

  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=Tru

  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=Tru

  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=Tru

  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=Tru

  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=Tru

  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=True)
  new_df = pd.concat([new_df, new_row], ignore_index=Tru

In [19]:
df = new_df

In [20]:
df

Unnamed: 0,year,oscars_count,category,name,film_title,character_name,honorary_statement,description,note,act_dir,winner_type
0,1927/28,1,ACTOR,Emil Jannings,The Last Command,General Dolgorucki [Grand Duke Sergius Alexander],,,,True,Single Winner
1,1927/28,1,ACTRESS,Janet Gaynor,7th Heaven,Diane,,,,True,Single Winner
2,1927/28,1,ART DIRECTION,William Cameron Menzies,The Dove,,,,,True,Single Winner
3,1927/28,1,CINEMATOGRAPHY,Charles Rosher,Sunrise,,,,"For this awards year, awards were presented in...",False,Multiple Winners
4,1927/28,1,CINEMATOGRAPHY,Karl Struss,Sunrise,,,,"For this awards year, awards were presented in...",False,Multiple Winners
...,...,...,...,...,...,...,...,...,...,...,...
2460,2017,90,SOUND MIXING,"Gregg Landaker, Gary A. Rizzo and Mark Weingarten",Dunkirk,,,,,False,Single Winner
2461,2017,90,SPECIAL AWARD,,,,To Alejandro G. Iñárritu's CARNE y ARENA virtu...,,,False,Single Winner
2462,2017,90,VISUAL EFFECTS,"John Nelson, Gerd Nefzer, Paul Lambert and Ric...",Blade Runner 2049,,,,,False,Single Winner
2463,2017,90,WRITING (Adapted Screenplay),Screenplay by James Ivory,Call Me by Your Name,,,,,False,Single Winner


#### 2.3 Number of awards per movie

Create a new list with the movie titles and the number of awards.

In [21]:
# Create an empty dictionary 
film_data = {}

# Iterate over the columns containing the film titles
for column in ['film_title']:
    # Get the film titles from the column and count their occurrences
    film_counts = df[column].value_counts()
    # Update the film_data dictionary with the film titles and their Oscar counts
    film_data.update(film_counts)

# Create a new dataframe from the film_data dictionary
sep_df = pd.DataFrame.from_dict(film_data, orient='index', columns=['oscar_total'])

# Reset the index and rename the columns
sep_df = sep_df.reset_index().rename(columns={'index': 'film_title'})

sep_df

Unnamed: 0,film_title,oscar_total
0,Titanic,12
1,Ben-Hur,11
2,The Lord of the Rings: The Return of the King,11
3,West Side Story,10
4,Gigi,9
...,...,...
1239,Black Fox,1
1240,Dylan Thomas,1
1241,Sundays and Cybele,1
1242,Meredith Willson's The Music Man,1


This would be a possiblility to rename the columns so it fits together with the other datasets. However, this can also be done in a later step. When the data is merged or even when it is uploaded to a database. 

In [22]:
#df = df.rename(columns={'year': 'Year', 'oscars_count': 'OscarsCount', 'category': 'Category', 'name1': 'Name1', 
#                        'name2': 'Name2', 'name3': 'Name3','film_title': 'FilmTitle', 'film_title2': 'FilmTitle2',
#                       'film_title3': 'FilmTitle3', 'film_title4': 'FilmTitle4', 'character_name': 'CharacterName',
#                       'character_name2': 'CharacterName2', 'character_name3': 'CharacterName3', 
#                        'character_name4': 'CharacterName4',  'honorary_statement': 'HonoraryStatement', 
#                        'description': 'Description', 'note': 'Note'})

#df

In [23]:
df = df.rename(columns={'oscars_count': 'oscars_held'})

### 3. Setting an index row

Before saving the file to a csv it was necessary to create a new column containing the index number. This makes it possible to add the whole dataset to the MariaDB. To upload and link a dataset a primary key is needed. As seen above there are some movie titles listed more than once in the dataframe. With the index number and the film title a unique primary key can be created.

In [24]:
df_with_index = df.reset_index()
df_with_index

Unnamed: 0,index,year,oscars_held,category,name,film_title,character_name,honorary_statement,description,note,act_dir,winner_type
0,0,1927/28,1,ACTOR,Emil Jannings,The Last Command,General Dolgorucki [Grand Duke Sergius Alexander],,,,True,Single Winner
1,1,1927/28,1,ACTRESS,Janet Gaynor,7th Heaven,Diane,,,,True,Single Winner
2,2,1927/28,1,ART DIRECTION,William Cameron Menzies,The Dove,,,,,True,Single Winner
3,3,1927/28,1,CINEMATOGRAPHY,Charles Rosher,Sunrise,,,,"For this awards year, awards were presented in...",False,Multiple Winners
4,4,1927/28,1,CINEMATOGRAPHY,Karl Struss,Sunrise,,,,"For this awards year, awards were presented in...",False,Multiple Winners
...,...,...,...,...,...,...,...,...,...,...,...,...
2460,2460,2017,90,SOUND MIXING,"Gregg Landaker, Gary A. Rizzo and Mark Weingarten",Dunkirk,,,,,False,Single Winner
2461,2461,2017,90,SPECIAL AWARD,,,,To Alejandro G. Iñárritu's CARNE y ARENA virtu...,,,False,Single Winner
2462,2462,2017,90,VISUAL EFFECTS,"John Nelson, Gerd Nefzer, Paul Lambert and Ric...",Blade Runner 2049,,,,,False,Single Winner
2463,2463,2017,90,WRITING (Adapted Screenplay),Screenplay by James Ivory,Call Me by Your Name,,,,,False,Single Winner


### 4. Create a csv files

A new csv file with the cleaned data is created and saved as B_stage3. The newly created dataframe with the awards by movies is also exported to a csv file.

In [25]:
B_stage3 = df_with_index.to_csv("../Data/B_stage3.csv", index = False)

In [26]:
B_sum_df = sep_df.to_csv("../Data/B_sum_df.csv", index = False)