# Finding actors/actresses and directors gender

After scrapping all the data from FilmAffinity, as my main purpose was to detect the gender gap in cinematographic world, I used and API called [Genderize.io](https://genderize.io/) in order to detect the cast gender and directors gender for each movie and TV show.

In this notebook, I found it and saved the obtained information in a new .csv file.

In [1]:
import pandas as pd
import numpy as np

import sys
sys.path.insert(0, '/Users/gina/Documents/allWomen/Functions')
from gender import *

## Reading the FilmAffinity dataset

I started reaing .csv file with all the movies and TV show information extracted form FilmAffinity. 

In [59]:
data = pd.read_csv('datasets/filmaffinity_data2.csv', index_col=0)
data.head()

Unnamed: 0,Title,Link,Type,Duration,Country,Directors,Cast,Genres,Description,RatingAverage,Votes,Reviews
0,Money Heist,https://www.filmaffinity.com/us/film879405.html,TV show,70 min.,Spain,"Álex Pina, Jesús Colmenar, Miguel Ángel Vivas,...","Álvaro Morte, Úrsula Corberó, Itziar Ituño, Al...","TV Series, Thriller, Mystery, Heist Film, Kidn...",TV Series (2017-Present Day). 4 Seasons. A mys...,7.1,25691.0,"""[4th Season Review]: [It] is like an extended..."
1,The Blacklist,https://www.filmaffinity.com/us/film573633.html,TV show,42 min.,United States,"Jon Bokenkamp, Michael W. Watkins, Andrew McCa...","James Spader, Megan Boone, Diego Klattenhoff, ...","TV Series, Mystery, Drama, Crime, Spy Film","The world's most wanted criminal, Thomas Raymo...",6.4,5148.0,"""His name is above the title and, depending ho..."
2,Locked Up,https://www.filmaffinity.com/us/film441483.html,TV show,50 min.,Spain,"Iván Escobar, Esther Martínez Lobato, Daniel É...","Maggie Civantos, Najwa Nimri, Roberto Enríquez...","TV Series, Thriller, Drama, Prison Drama",Macarena Ferreiro is a young naive woman who f...,7.0,6941.0,
3,Prison Break,https://www.filmaffinity.com/us/film822756.html,TV show,42 min.,United States,"Paul Scheuring, Bobby Roth, Kevin Hooks, Dwigh...","Wentworth Miller, Dominic Purcell, Robert Knep...","TV Series, Action, Drama, Prison Drama, Cop Mo...",TV Series (2005-2009). 5 Seasons. 90 Episodes....,7.3,71511.0,"""A strong cast led by Wentworth Miller (...) I..."
4,13 Reasons Why,https://www.filmaffinity.com/us/film999360.html,TV show,60 min.,United States,"Brian Yorkey, Tom McCarthy, Kyle Patrick Alvar...","Dylan Minnette, Katherine Langford, Christian ...","TV Series, Drama, Mystery, Teen/coming-of-age,...","'Thirteen Reasons Why', based on the best-sell...",6.8,21496.0,"""[2nd Season Review]: [It] is a frustratingly ..."


In [60]:
data.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2383 entries, 0 to 2382
Data columns (total 12 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Title          2383 non-null   object 
 1   Link           2383 non-null   object 
 2   Type           2383 non-null   object 
 3   Duration       2265 non-null   object 
 4   Country        2383 non-null   object 
 5   Directors      2360 non-null   object 
 6   Cast           1969 non-null   object 
 7   Genres         2383 non-null   object 
 8   Description    2334 non-null   object 
 9   RatingAverage  2190 non-null   float64
 10  Votes          2190 non-null   float64
 11  Reviews        1789 non-null   object 
dtypes: float64(2), object(10)
memory usage: 242.0+ KB


## Splitting and transposing the data

Next, I checked if there were any null values that I needed to correct, in order to split and transpose all the information included in 'Cast' and 'Directors' column.

I needed to check the total number of names from which I would obtain the gender, as the basic subscription of Genderize only allows to check 100 thousand names per month.

### Cast

#### Checking the total number of cast

In [92]:
gender = data.copy()
gender.isnull().sum()

Title              0
Link               0
Type               0
Duration         118
Country            0
Directors         23
Cast             414
Genres             0
Description       49
RatingAverage    193
Votes            193
Reviews          594
dtype: int64

In [22]:
gender = gender.dropna(subset=['Cast'])
gender.isnull().sum()

Title              0
Link               0
Type               0
Duration         103
Country            0
Directors         14
Cast               0
Genres             0
Description       40
RatingAverage    139
Votes            139
Reviews          374
dtype: int64

In [23]:
gender = pd.DataFrame([x.split(',') for x in gender.Cast])
gender.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,98,99,100,101,102,103,104,105,106,107
0,Álvaro Morte,Úrsula Corberó,Itziar Ituño,Alba Flores,Paco Tous,Najwa Nimri,Pedro Alonso,Miguel Herrán,Jaime Lorente,Esther Acebo,...,,,,,,,,,,
1,James Spader,Megan Boone,Diego Klattenhoff,Harry Lennix,Hisham Tawfiq,Amir Arison,Ryan Eggold,Mozhan Marnò,Parminder Nagra,Susan Blommaert,...,,,,,,,,,,
2,Maggie Civantos,Najwa Nimri,Roberto Enríquez,Berta Vázquez,Alba Flores,Ramiro Blas,Carlos Hipólito,Cristina Plazas,María Isabel Díaz,Jesús Castejón,...,,,,,,,,,,
3,Wentworth Miller,Dominic Purcell,Robert Knepper,Sarah Wayne Callies,Amaury Nolasco,Wade Williams,William Fichtner,Paul Adelstein,Robin Tunney,Peter Stormare,...,,,,,,,,,,
4,Dylan Minnette,Katherine Langford,Christian Navarro,Alisha Boe,Brandon Flynn,Justin Prentice,Miles Heizer,Ross Butler,Devin Druid,Amy Hargreaves,...,,,,,,,,,,


In [25]:
gender2 = pd.DataFrame()

for item in gender.index:
    gender2 = pd.concat([gender2, gender.T[item]])

In [28]:
gender2 = gender2.dropna()

In [30]:
gender2.isnull().sum()

0    0
dtype: int64

In [31]:
gender2

Unnamed: 0,0
0,Álvaro Morte
1,Úrsula Corberó
2,Itziar Ituño
3,Alba Flores
4,Paco Tous
...,...
19,Tracy S. Lee
20,B.K. Cannon
21,Judith Light
22,Martina Navratilova


#### Splitting and transposing the data

In [92]:
gender = data.copy()
gender.isnull().sum()

Title              0
Link               0
Type               0
Duration         118
Country            0
Directors         23
Cast             414
Genres             0
Description       49
RatingAverage    193
Votes            193
Reviews          594
dtype: int64

In [93]:
gender['Cast'] = gender['Cast'].fillna('no actor')
gender.isnull().sum()

Title              0
Link               0
Type               0
Duration         118
Country            0
Directors         23
Cast               0
Genres             0
Description       49
RatingAverage    193
Votes            193
Reviews          594
dtype: int64

In [94]:
gender = pd.DataFrame([x.split(',') for x in gender.Cast])
gender.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,98,99,100,101,102,103,104,105,106,107
0,Álvaro Morte,Úrsula Corberó,Itziar Ituño,Alba Flores,Paco Tous,Najwa Nimri,Pedro Alonso,Miguel Herrán,Jaime Lorente,Esther Acebo,...,,,,,,,,,,
1,James Spader,Megan Boone,Diego Klattenhoff,Harry Lennix,Hisham Tawfiq,Amir Arison,Ryan Eggold,Mozhan Marnò,Parminder Nagra,Susan Blommaert,...,,,,,,,,,,
2,Maggie Civantos,Najwa Nimri,Roberto Enríquez,Berta Vázquez,Alba Flores,Ramiro Blas,Carlos Hipólito,Cristina Plazas,María Isabel Díaz,Jesús Castejón,...,,,,,,,,,,
3,Wentworth Miller,Dominic Purcell,Robert Knepper,Sarah Wayne Callies,Amaury Nolasco,Wade Williams,William Fichtner,Paul Adelstein,Robin Tunney,Peter Stormare,...,,,,,,,,,,
4,Dylan Minnette,Katherine Langford,Christian Navarro,Alisha Boe,Brandon Flynn,Justin Prentice,Miles Heizer,Ross Butler,Devin Druid,Amy Hargreaves,...,,,,,,,,,,


In [95]:
gender = gender.fillna('no actor')
gender.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,98,99,100,101,102,103,104,105,106,107
0,Álvaro Morte,Úrsula Corberó,Itziar Ituño,Alba Flores,Paco Tous,Najwa Nimri,Pedro Alonso,Miguel Herrán,Jaime Lorente,Esther Acebo,...,no actor,no actor,no actor,no actor,no actor,no actor,no actor,no actor,no actor,no actor
1,James Spader,Megan Boone,Diego Klattenhoff,Harry Lennix,Hisham Tawfiq,Amir Arison,Ryan Eggold,Mozhan Marnò,Parminder Nagra,Susan Blommaert,...,no actor,no actor,no actor,no actor,no actor,no actor,no actor,no actor,no actor,no actor
2,Maggie Civantos,Najwa Nimri,Roberto Enríquez,Berta Vázquez,Alba Flores,Ramiro Blas,Carlos Hipólito,Cristina Plazas,María Isabel Díaz,Jesús Castejón,...,no actor,no actor,no actor,no actor,no actor,no actor,no actor,no actor,no actor,no actor
3,Wentworth Miller,Dominic Purcell,Robert Knepper,Sarah Wayne Callies,Amaury Nolasco,Wade Williams,William Fichtner,Paul Adelstein,Robin Tunney,Peter Stormare,...,no actor,no actor,no actor,no actor,no actor,no actor,no actor,no actor,no actor,no actor
4,Dylan Minnette,Katherine Langford,Christian Navarro,Alisha Boe,Brandon Flynn,Justin Prentice,Miles Heizer,Ross Butler,Devin Druid,Amy Hargreaves,...,no actor,no actor,no actor,no actor,no actor,no actor,no actor,no actor,no actor,no actor


#### Cheking the gender and saving the information in a new DataFrame

In [65]:
res = pd.DataFrame()
res['Title'] = data['Title']
res['Women_Cast'] = 0
res['Men_Cast'] = 0
res['Not_Set_Cast'] = 0
res['Total_Cast'] = 0
res.head()

Unnamed: 0,Title,Women_Cast,Men_Cast,Not_Set_Cast,Total_Cast
0,Money Heist,0,0,0,0
1,The Blacklist,0,0,0,0
2,Locked Up,0,0,0,0
3,Prison Break,0,0,0,0
4,13 Reasons Why,0,0,0,0


In [1]:
import sys
sys.path.insert(0, '/Users/gina/Documents/allWomen/Functions/genderize-master')
from genderize import Genderize

In [2]:
genderize = Genderize(
    user_agent='GenderizeDocs/0.0',
    api_key='10566f9437abf33e7005cb4f28c90768',
    timeout=1000.0)

In [6]:
names = genderize.get(['Gina'])
print(names)

[{'name': 'Gina', 'gender': 'female', 'probability': 0.98, 'count': 22233}]


In [71]:
data.iloc[6]['Cast']
#Riverdale

'Mädchen Amick, Cole Sprouse, Lochlyn Munro, Marisol Nichols, Luke Perry, Ross Butler, Sarah Habel, Lili Reinhart, K.J. Apa, Tom McBeath, Barclay Hope, Olivia Ryan Stern, Colin Lawrence, Asha Bromfield, Camila Mendes, Madelaine Petsch, Skeet Ulrich, Hart Denton'

In [72]:
for actor in gender.T[6]:
    if actor != 'no actor':
        a = actor.strip()
        print(a.split(' ')[0])
        print(genderize.get([a.split(' ')[0]])[0]['gender'])
        
        # 7 female, 9 male, 2 None

Mädchen
None
Cole
male
Lochlyn
male
Marisol
female
Luke
male
Ross
male
Sarah
female
Lili
female
K.J.
None
Tom
male
Barclay
male
Olivia
female
Colin
male
Asha
female
Camila
female
Madelaine
female
Skeet
male
Hart
male


In [96]:
gender = gender.iloc[7:,:]

In [98]:
gender.index

RangeIndex(start=7, stop=2383, step=1)

In [99]:
women_cast = 0
men_cast = 0
not_set_cast = 0

for item in gender.index:
    print('index: ', item)
    
    for actor in gender.T[item]:
        if actor != 'no actor':
            a = actor.strip()
            gndr = genderize.get([a.split(' ')[0]])[0]['gender']
        
            if  gndr == 'female':
                women_cast += 1
            elif gndr == 'male':
                men_cast += 1
            else:
                not_set_cast += 1
    
    res['Women_Cast'][item] = women_cast
    res['Men_Cast'][item] = men_cast
    res['Not_Set_Cast'][item] = not_set_cast
    
    women_cast = 0
    men_cast = 0
    not_set_cast = 0

index:  7


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


index:  8
index:  9
index:  10
index:  11
index:  12
index:  13
index:  14
index:  15
index:  16
index:  17
index:  18
index:  19
index:  20
index:  21
index:  22
index:  23
index:  24
index:  25
index:  26
index:  27
index:  28
index:  29
index:  30
index:  31
index:  32
index:  33
index:  34
index:  35
index:  36
index:  37
index:  38
index:  39
index:  40
index:  41
index:  42
index:  43
index:  44
index:  45
index:  46
index:  47
index:  48
index:  49
index:  50
index:  51
index:  52
index:  53
index:  54
index:  55
index:  56
index:  57
index:  58
index:  59
index:  60
index:  61
index:  62
index:  63
index:  64
index:  65
index:  66
index:  67
index:  68
index:  69
index:  70
index:  71
index:  72
index:  73
index:  74
index:  75
index:  76
index:  77
index:  78
index:  79
index:  80
index:  81
index:  82
index:  83
index:  84
index:  85
index:  86
index:  87
index:  88
index:  89
index:  90
index:  91
index:  92
index:  93
index:  94
index:  95
index:  96
index:  97
index:  98
i

index:  699
index:  700
index:  701
index:  702
index:  703
index:  704
index:  705
index:  706
index:  707
index:  708
index:  709
index:  710
index:  711
index:  712
index:  713
index:  714
index:  715
index:  716
index:  717
index:  718
index:  719
index:  720
index:  721
index:  722
index:  723
index:  724
index:  725
index:  726
index:  727
index:  728
index:  729
index:  730
index:  731
index:  732
index:  733
index:  734
index:  735
index:  736
index:  737
index:  738
index:  739
index:  740
index:  741
index:  742
index:  743
index:  744
index:  745
index:  746
index:  747
index:  748
index:  749
index:  750
index:  751
index:  752
index:  753
index:  754
index:  755
index:  756
index:  757
index:  758
index:  759
index:  760
index:  761
index:  762
index:  763
index:  764
index:  765
index:  766
index:  767
index:  768
index:  769
index:  770
index:  771
index:  772
index:  773
index:  774
index:  775
index:  776
index:  777
index:  778
index:  779
index:  780
index:  781
inde

index:  1353
index:  1354
index:  1355
index:  1356
index:  1357
index:  1358
index:  1359
index:  1360
index:  1361
index:  1362
index:  1363
index:  1364
index:  1365
index:  1366
index:  1367
index:  1368
index:  1369
index:  1370
index:  1371
index:  1372
index:  1373
index:  1374
index:  1375
index:  1376
index:  1377
index:  1378
index:  1379
index:  1380
index:  1381
index:  1382
index:  1383
index:  1384
index:  1385
index:  1386
index:  1387
index:  1388
index:  1389
index:  1390
index:  1391
index:  1392
index:  1393
index:  1394
index:  1395
index:  1396
index:  1397
index:  1398
index:  1399
index:  1400
index:  1401
index:  1402
index:  1403
index:  1404
index:  1405
index:  1406
index:  1407
index:  1408
index:  1409
index:  1410
index:  1411
index:  1412
index:  1413
index:  1414
index:  1415
index:  1416
index:  1417
index:  1418
index:  1419
index:  1420
index:  1421
index:  1422
index:  1423
index:  1424
index:  1425
index:  1426
index:  1427
index:  1428
index:  1429

index:  1984
index:  1985
index:  1986
index:  1987
index:  1988
index:  1989
index:  1990
index:  1991
index:  1992
index:  1993
index:  1994
index:  1995
index:  1996
index:  1997
index:  1998
index:  1999
index:  2000
index:  2001
index:  2002
index:  2003
index:  2004
index:  2005
index:  2006
index:  2007
index:  2008
index:  2009
index:  2010
index:  2011
index:  2012
index:  2013
index:  2014
index:  2015
index:  2016
index:  2017
index:  2018
index:  2019
index:  2020
index:  2021
index:  2022
index:  2023
index:  2024
index:  2025
index:  2026
index:  2027
index:  2028
index:  2029
index:  2030
index:  2031
index:  2032
index:  2033
index:  2034
index:  2035
index:  2036
index:  2037
index:  2038
index:  2039
index:  2040
index:  2041
index:  2042
index:  2043
index:  2044
index:  2045
index:  2046
index:  2047
index:  2048
index:  2049
index:  2050
index:  2051
index:  2052
index:  2053
index:  2054
index:  2055
index:  2056
index:  2057
index:  2058
index:  2059
index:  2060

In [100]:
res.head()

Unnamed: 0,Title,Women_Cast,Men_Cast,Not_Set_Cast,Total_Cast
0,Money Heist,11,26,2,0
1,The Blacklist,9,15,0,0
2,Locked Up,21,9,2,0
3,Prison Break,4,15,2,0
4,13 Reasons Why,12,22,1,0


In [125]:
total_cast = 0
for item in gender.iloc[:7, :].index:
    print(item)
    for actor in gender.T[item]:
        if actor != 'no actor':
            total_cast += 1
    
    res['Total_Cast'][item] = total_cast
    total_cast = 0

7
8
9


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


10
11
12
13


In [127]:
total_cast = 0
for item in res.iloc[:7, :].index:
    print(item)
    total_cast = res.Women_Cast[item] + res.Men_Cast[item] + res.Not_Set_Cast[item]
    
    res['Total_Cast'][item] = total_cast
    total_cast = 0

0
1
2
3
4
5
6


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


In [128]:
res.head(8)

Unnamed: 0,Title,Women_Cast,Men_Cast,Not_Set_Cast,Total_Cast
0,Money Heist,11,26,2,39
1,The Blacklist,9,15,0,24
2,Locked Up,21,9,2,32
3,Prison Break,4,15,2,21
4,13 Reasons Why,12,22,1,35
5,Dark,17,19,0,36
6,Riverdale,7,9,2,18
7,Lucifer,12,25,2,39


In [129]:
res.to_csv('/Users/gina/Documents/allWomen/Final project/datasets/filmaffinity_gender5.csv')

### Directors

#### Checking the total number of directors

In [103]:
directors = data.copy()

In [104]:
directors.isnull().sum()

Title              0
Link               0
Type               0
Duration         118
Country            0
Directors         23
Cast             414
Genres             0
Description       49
RatingAverage    193
Votes            193
Reviews          594
dtype: int64

In [34]:
directors = directors.dropna(subset=['Directors'])
directors.isnull().sum()

Title              0
Link               0
Type               0
Duration         113
Country            0
Directors          0
Cast             405
Genres             0
Description       49
RatingAverage    179
Votes            179
Reviews          577
dtype: int64

In [36]:
directors = pd.DataFrame([x.split(',') for x in directors.Directors])
directors.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,36,37,38,39,40,41,42,43,44,45
0,Álex Pina,Jesús Colmenar,Miguel Ángel Vivas,Alex Rodrigo,Alejandro Bazzano,Koldo Serra,Javier Quintas,,,,...,,,,,,,,,,
1,Jon Bokenkamp,Michael W. Watkins,Andrew McCarthy,Steven A. Adelson,Karen Gaviola,Donald E. Thorin Jr.,Bill Roe,Joe Carnahan,John Terlesky,Vincent Misiano,...,,,,,,,,,,
2,Iván Escobar,Esther Martínez Lobato,Daniel Écija,Álex Pina,Jesús Colmenar,Jesús Rodrigo,Sandra Gallego,Alex Rodrigo,David Molina Encinas,,...,,,,,,,,,,
3,Paul Scheuring,Bobby Roth,Kevin Hooks,Dwight H. Little,Karen Gaviola,Michael Switzer,Greg Yaitanes,Vincent Misiano,Milan Cheylov,Brad Turner,...,,,,,,,,,,
4,Brian Yorkey,Tom McCarthy,Kyle Patrick Alvarez,Gregg Araki,Carl Franklin,Jessica Yu,Helen Shaver,,,,...,,,,,,,,,,


In [37]:
directors2 = pd.DataFrame()

for item in directors.index:
    directors2 = pd.concat([directors2, directors.T[item]])

In [38]:
directors2 = directors2.dropna()

In [39]:
directors2.isnull().sum()

0    0
dtype: int64

In [40]:
directors2

Unnamed: 0,0
0,Álex Pina
1,Jesús Colmenar
2,Miguel Ángel Vivas
3,Alex Rodrigo
4,Alejandro Bazzano
...,...
0,Ryan Murphy
1,Helen Hunt
2,Ryan Murphy
3,Janet Mock


#### Splitting and transposing the data

In [None]:
directors = data.copy()
directors.isnull().sum()

In [105]:
directors['Directors'] = directors['Directors'].fillna('no director')
directors.isnull().sum()

Title              0
Link               0
Type               0
Duration         118
Country            0
Directors          0
Cast             414
Genres             0
Description       49
RatingAverage    193
Votes            193
Reviews          594
dtype: int64

In [106]:
directors = pd.DataFrame([x.split(',') for x in directors.Directors])
directors.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,36,37,38,39,40,41,42,43,44,45
0,Álex Pina,Jesús Colmenar,Miguel Ángel Vivas,Alex Rodrigo,Alejandro Bazzano,Koldo Serra,Javier Quintas,,,,...,,,,,,,,,,
1,Jon Bokenkamp,Michael W. Watkins,Andrew McCarthy,Steven A. Adelson,Karen Gaviola,Donald E. Thorin Jr.,Bill Roe,Joe Carnahan,John Terlesky,Vincent Misiano,...,,,,,,,,,,
2,Iván Escobar,Esther Martínez Lobato,Daniel Écija,Álex Pina,Jesús Colmenar,Jesús Rodrigo,Sandra Gallego,Alex Rodrigo,David Molina Encinas,,...,,,,,,,,,,
3,Paul Scheuring,Bobby Roth,Kevin Hooks,Dwight H. Little,Karen Gaviola,Michael Switzer,Greg Yaitanes,Vincent Misiano,Milan Cheylov,Brad Turner,...,,,,,,,,,,
4,Brian Yorkey,Tom McCarthy,Kyle Patrick Alvarez,Gregg Araki,Carl Franklin,Jessica Yu,Helen Shaver,,,,...,,,,,,,,,,


In [107]:
directors = directors.fillna('no director')
directors.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,36,37,38,39,40,41,42,43,44,45
0,Álex Pina,Jesús Colmenar,Miguel Ángel Vivas,Alex Rodrigo,Alejandro Bazzano,Koldo Serra,Javier Quintas,no director,no director,no director,...,no director,no director,no director,no director,no director,no director,no director,no director,no director,no director
1,Jon Bokenkamp,Michael W. Watkins,Andrew McCarthy,Steven A. Adelson,Karen Gaviola,Donald E. Thorin Jr.,Bill Roe,Joe Carnahan,John Terlesky,Vincent Misiano,...,no director,no director,no director,no director,no director,no director,no director,no director,no director,no director
2,Iván Escobar,Esther Martínez Lobato,Daniel Écija,Álex Pina,Jesús Colmenar,Jesús Rodrigo,Sandra Gallego,Alex Rodrigo,David Molina Encinas,no director,...,no director,no director,no director,no director,no director,no director,no director,no director,no director,no director
3,Paul Scheuring,Bobby Roth,Kevin Hooks,Dwight H. Little,Karen Gaviola,Michael Switzer,Greg Yaitanes,Vincent Misiano,Milan Cheylov,Brad Turner,...,no director,no director,no director,no director,no director,no director,no director,no director,no director,no director
4,Brian Yorkey,Tom McCarthy,Kyle Patrick Alvarez,Gregg Araki,Carl Franklin,Jessica Yu,Helen Shaver,no director,no director,no director,...,no director,no director,no director,no director,no director,no director,no director,no director,no director,no director


#### Cheking the gender and saving the information in a new DataFrame

In [108]:
res2 = pd.DataFrame()
res2['Title'] = data['Title']
res2['Women_Directors'] = 0
res2['Men_Directors'] = 0
res2['Not_Set_Directors'] = 0
res2['Total_Directors'] = 0
res2.head()

Unnamed: 0,Title,Women_Directors,Men_Directors,Not_Set_Directors,Total_Directors
0,Money Heist,0,0,0,0
1,The Blacklist,0,0,0,0
2,Locked Up,0,0,0,0
3,Prison Break,0,0,0,0
4,13 Reasons Why,0,0,0,0


In [None]:
directors.iloc[1530:,:]

In [114]:
women_directors = 0
men_directors = 0
not_set_directors = 0

for item in directors.iloc[1530:,:].index:
    print('index: ', item)
    
    for director in directors.T[item]:
        if director != 'no director':
            a = director.strip()
            gndr = genderize.get([a.split(' ')[0]])[0]['gender']
        
            if gndr == 'female':
                women_directors += 1
            elif gndr == 'male':
                men_directors += 1
            else:
                not_set_directors += 1
    
    res2['Women_Directors'][item] = women_directors
    res2['Men_Directors'][item] = men_directors
    res2['Not_Set_Directors'][item] = not_set_directors
    
    women_directors = 0
    men_directors = 0
    not_set_directors = 0

index:  1530


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


index:  1531
index:  1532
index:  1533
index:  1534
index:  1535
index:  1536
index:  1537
index:  1538
index:  1539
index:  1540
index:  1541
index:  1542
index:  1543
index:  1544
index:  1545
index:  1546
index:  1547
index:  1548
index:  1549
index:  1550
index:  1551
index:  1552
index:  1553
index:  1554
index:  1555
index:  1556
index:  1557
index:  1558
index:  1559
index:  1560
index:  1561
index:  1562
index:  1563
index:  1564
index:  1565
index:  1566
index:  1567
index:  1568
index:  1569
index:  1570
index:  1571
index:  1572
index:  1573
index:  1574
index:  1575
index:  1576
index:  1577
index:  1578
index:  1579
index:  1580
index:  1581
index:  1582
index:  1583
index:  1584
index:  1585
index:  1586
index:  1587
index:  1588
index:  1589
index:  1590
index:  1591
index:  1592
index:  1593
index:  1594
index:  1595
index:  1596
index:  1597
index:  1598
index:  1599
index:  1600
index:  1601
index:  1602
index:  1603
index:  1604
index:  1605
index:  1606
index:  1607

index:  2163
index:  2164
index:  2165
index:  2166
index:  2167
index:  2168
index:  2169
index:  2170
index:  2171
index:  2172
index:  2173
index:  2174
index:  2175
index:  2176
index:  2177
index:  2178
index:  2179
index:  2180
index:  2181
index:  2182
index:  2183
index:  2184
index:  2185
index:  2186
index:  2187
index:  2188
index:  2189
index:  2190
index:  2191
index:  2192
index:  2193
index:  2194
index:  2195
index:  2196
index:  2197
index:  2198
index:  2199
index:  2200
index:  2201
index:  2202
index:  2203
index:  2204
index:  2205
index:  2206
index:  2207
index:  2208
index:  2209
index:  2210
index:  2211
index:  2212
index:  2213
index:  2214
index:  2215
index:  2216
index:  2217
index:  2218
index:  2219
index:  2220
index:  2221
index:  2222
index:  2223
index:  2224
index:  2225
index:  2226
index:  2227
index:  2228
index:  2229
index:  2230
index:  2231
index:  2232
index:  2233
index:  2234
index:  2235
index:  2236
index:  2237
index:  2238
index:  2239

In [115]:
res2.head()

Unnamed: 0,Title,Women_Directors,Men_Directors,Not_Set_Directors,Total_Directors
0,Money Heist,0,5,2,0
1,The Blacklist,4,32,0,0
2,Locked Up,2,3,4,0
3,Prison Break,1,30,1,0
4,13 Reasons Why,2,5,0,0


In [122]:
total_directors = 0
for item in directors.index:
    print(item)
    for director in directors.T[item]:
        if director != 'no director':
            total_directors += 1
    
    res2['Total_Directors'][item] = total_directors
    total_directors = 0

0
1
2
3


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
27

1863
1864
1865
1866
1867
1868
1869
1870
1871
1872
1873
1874
1875
1876
1877
1878
1879
1880
1881
1882
1883
1884
1885
1886
1887
1888
1889
1890
1891
1892
1893
1894
1895
1896
1897
1898
1899
1900
1901
1902
1903
1904
1905
1906
1907
1908
1909
1910
1911
1912
1913
1914
1915
1916
1917
1918
1919
1920
1921
1922
1923
1924
1925
1926
1927
1928
1929
1930
1931
1932
1933
1934
1935
1936
1937
1938
1939
1940
1941
1942
1943
1944
1945
1946
1947
1948
1949
1950
1951
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
2027
2028
2029
2030
2031
2032
2033
2034
2035
2036
2037
2038
2039
2040
2041
2042
2043
2044
2045
2046
2047
2048
2049
2050
2051
2052
2053
2054
2055
2056
2057
2058
2059
2060
2061
2062


In [123]:
res2.head()

Unnamed: 0,Title,Women_Directors,Men_Directors,Not_Set_Directors,Total_Directors
0,Money Heist,0,5,2,7
1,The Blacklist,4,32,0,36
2,Locked Up,2,3,4,9
3,Prison Break,1,30,1,32
4,13 Reasons Why,2,5,0,7


In [124]:
res2.to_csv('/Users/gina/Documents/allWomen/Final project/datasets/filmaffinity_gender6(directors).csv')

## Merging and saving the new dataset

After getting the gender for cast and directors, I merged them and saved the final DataFrame in a new .csv file, in order to start the analysis.

In [133]:
res.columns = ['Title1', 'Women_Cast', 'Men_Cast','Not_Set_Cast', 'Total_Cast']
res.head()

Unnamed: 0,Title1,Women_Cast,Men_Cast,Not_Set_Cast,Total_Cast
0,Money Heist,11,26,2,39
1,The Blacklist,9,15,0,24
2,Locked Up,21,9,2,32
3,Prison Break,4,15,2,21
4,13 Reasons Why,12,22,1,35


In [134]:
final_df = pd.concat([data, res], axis=1)
final_df

Unnamed: 0,Title,Link,Type,Duration,Country,Directors,Cast,Genres,Description,RatingAverage,Votes,Reviews,Title1,Women_Cast,Men_Cast,Not_Set_Cast,Total_Cast
0,Money Heist,https://www.filmaffinity.com/us/film879405.html,TV show,70 min.,Spain,"Álex Pina, Jesús Colmenar, Miguel Ángel Vivas,...","Álvaro Morte, Úrsula Corberó, Itziar Ituño, Al...","TV Series, Thriller, Mystery, Heist Film, Kidn...",TV Series (2017-Present Day). 4 Seasons. A mys...,7.1,25691.0,"""[4th Season Review]: [It] is like an extended...",Money Heist,11,26,2,39
1,The Blacklist,https://www.filmaffinity.com/us/film573633.html,TV show,42 min.,United States,"Jon Bokenkamp, Michael W. Watkins, Andrew McCa...","James Spader, Megan Boone, Diego Klattenhoff, ...","TV Series, Mystery, Drama, Crime, Spy Film","The world's most wanted criminal, Thomas Raymo...",6.4,5148.0,"""His name is above the title and, depending ho...",The Blacklist,9,15,0,24
2,Locked Up,https://www.filmaffinity.com/us/film441483.html,TV show,50 min.,Spain,"Iván Escobar, Esther Martínez Lobato, Daniel É...","Maggie Civantos, Najwa Nimri, Roberto Enríquez...","TV Series, Thriller, Drama, Prison Drama",Macarena Ferreiro is a young naive woman who f...,7.0,6941.0,,Locked Up,21,9,2,32
3,Prison Break,https://www.filmaffinity.com/us/film822756.html,TV show,42 min.,United States,"Paul Scheuring, Bobby Roth, Kevin Hooks, Dwigh...","Wentworth Miller, Dominic Purcell, Robert Knep...","TV Series, Action, Drama, Prison Drama, Cop Mo...",TV Series (2005-2009). 5 Seasons. 90 Episodes....,7.3,71511.0,"""A strong cast led by Wentworth Miller (...) I...",Prison Break,4,15,2,21
4,13 Reasons Why,https://www.filmaffinity.com/us/film999360.html,TV show,60 min.,United States,"Brian Yorkey, Tom McCarthy, Kyle Patrick Alvar...","Dylan Minnette, Katherine Langford, Christian ...","TV Series, Drama, Mystery, Teen/coming-of-age,...","'Thirteen Reasons Why', based on the best-sell...",6.8,21496.0,"""[2nd Season Review]: [It] is a frustratingly ...",13 Reasons Why,12,22,1,35
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2378,El rey: monarca eterno (Serie de TV),https://www.filmaffinity.com/es/film953955.html,film,60 min.,Corea del Sur,Sang-Hoon Baek,"Lee Min-ho, Kim Go-eun, Do-Hwan Woo, Kyung-Nam...","Serie de TV, Fantástico, Romance",Un drama de fantasía y romance en el que una d...,,,,El rey: monarca eterno (Serie de TV),1,5,0,6
2379,La red Avispa,https://www.filmaffinity.com/es/film826923.html,film,123 min.,Francia,Olivier Assayas,"Penélope Cruz, Edgar Ramirez, Wagner Moura, Ga...","Thriller, Años 90, Espionaje","La Habana, principios de los 90. René González...",5.6,550.0,"""Desconcertante revoltijo de géneros que se ma...",La red Avispa,3,15,1,19
2380,The Sinner 3 (Miniserie de TV),https://www.filmaffinity.com/es/film892789.html,film,,Estados Unidos,"Derek Simonds, Antonio Campos","Bill Pullman, Matt Bomer, Chris Messina, Jessi...","Serie de TV, Thriller, Intriga, Serie de antol...",Miniserie de TV (2020). 8 episodios. El detect...,5.8,322.0,"""Consigue desarrollar su historia con eficacia...",The Sinner 3 (Miniserie de TV),8,9,0,17
2381,Coisa Mais Linda (Serie de TV),https://www.filmaffinity.com/es/film793120.html,film,45 min.,Brasil,Giuliano Cedroni,"Maria Casadevall, Pathy Dejesus, Fernanda Vasc...","Serie de TV, Comedia, Romance",Serie de TV (2019-). 7 episodios. Después de l...,6.2,131.0,"""En gran medida, maneja bien la historia, con ...",Coisa Mais Linda (Serie de TV),6,9,0,15


In [135]:
res2.columns = ['Title2', 'Women_Directors', 'Men_Directors','Not_Set_Directors', 'Total_Directors']
res2.head()

Unnamed: 0,Title2,Women_Directors,Men_Directors,Not_Set_Directors,Total_Directors
0,Money Heist,0,5,2,7
1,The Blacklist,4,32,0,36
2,Locked Up,2,3,4,9
3,Prison Break,1,30,1,32
4,13 Reasons Why,2,5,0,7


In [136]:
final_df = pd.concat([final_df, res2], axis=1)
final_df

Unnamed: 0,Title,Link,Type,Duration,Country,Directors,Cast,Genres,Description,RatingAverage,...,Title1,Women_Cast,Men_Cast,Not_Set_Cast,Total_Cast,Title2,Women_Directors,Men_Directors,Not_Set_Directors,Total_Directors
0,Money Heist,https://www.filmaffinity.com/us/film879405.html,TV show,70 min.,Spain,"Álex Pina, Jesús Colmenar, Miguel Ángel Vivas,...","Álvaro Morte, Úrsula Corberó, Itziar Ituño, Al...","TV Series, Thriller, Mystery, Heist Film, Kidn...",TV Series (2017-Present Day). 4 Seasons. A mys...,7.1,...,Money Heist,11,26,2,39,Money Heist,0,5,2,7
1,The Blacklist,https://www.filmaffinity.com/us/film573633.html,TV show,42 min.,United States,"Jon Bokenkamp, Michael W. Watkins, Andrew McCa...","James Spader, Megan Boone, Diego Klattenhoff, ...","TV Series, Mystery, Drama, Crime, Spy Film","The world's most wanted criminal, Thomas Raymo...",6.4,...,The Blacklist,9,15,0,24,The Blacklist,4,32,0,36
2,Locked Up,https://www.filmaffinity.com/us/film441483.html,TV show,50 min.,Spain,"Iván Escobar, Esther Martínez Lobato, Daniel É...","Maggie Civantos, Najwa Nimri, Roberto Enríquez...","TV Series, Thriller, Drama, Prison Drama",Macarena Ferreiro is a young naive woman who f...,7.0,...,Locked Up,21,9,2,32,Locked Up,2,3,4,9
3,Prison Break,https://www.filmaffinity.com/us/film822756.html,TV show,42 min.,United States,"Paul Scheuring, Bobby Roth, Kevin Hooks, Dwigh...","Wentworth Miller, Dominic Purcell, Robert Knep...","TV Series, Action, Drama, Prison Drama, Cop Mo...",TV Series (2005-2009). 5 Seasons. 90 Episodes....,7.3,...,Prison Break,4,15,2,21,Prison Break,1,30,1,32
4,13 Reasons Why,https://www.filmaffinity.com/us/film999360.html,TV show,60 min.,United States,"Brian Yorkey, Tom McCarthy, Kyle Patrick Alvar...","Dylan Minnette, Katherine Langford, Christian ...","TV Series, Drama, Mystery, Teen/coming-of-age,...","'Thirteen Reasons Why', based on the best-sell...",6.8,...,13 Reasons Why,12,22,1,35,13 Reasons Why,2,5,0,7
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2378,El rey: monarca eterno (Serie de TV),https://www.filmaffinity.com/es/film953955.html,film,60 min.,Corea del Sur,Sang-Hoon Baek,"Lee Min-ho, Kim Go-eun, Do-Hwan Woo, Kyung-Nam...","Serie de TV, Fantástico, Romance",Un drama de fantasía y romance en el que una d...,,...,El rey: monarca eterno (Serie de TV),1,5,0,6,El rey: monarca eterno (Serie de TV),0,1,0,1
2379,La red Avispa,https://www.filmaffinity.com/es/film826923.html,film,123 min.,Francia,Olivier Assayas,"Penélope Cruz, Edgar Ramirez, Wagner Moura, Ga...","Thriller, Años 90, Espionaje","La Habana, principios de los 90. René González...",5.6,...,La red Avispa,3,15,1,19,La red Avispa,0,1,0,1
2380,The Sinner 3 (Miniserie de TV),https://www.filmaffinity.com/es/film892789.html,film,,Estados Unidos,"Derek Simonds, Antonio Campos","Bill Pullman, Matt Bomer, Chris Messina, Jessi...","Serie de TV, Thriller, Intriga, Serie de antol...",Miniserie de TV (2020). 8 episodios. El detect...,5.8,...,The Sinner 3 (Miniserie de TV),8,9,0,17,The Sinner 3 (Miniserie de TV),0,2,0,2
2381,Coisa Mais Linda (Serie de TV),https://www.filmaffinity.com/es/film793120.html,film,45 min.,Brasil,Giuliano Cedroni,"Maria Casadevall, Pathy Dejesus, Fernanda Vasc...","Serie de TV, Comedia, Romance",Serie de TV (2019-). 7 episodios. Después de l...,6.2,...,Coisa Mais Linda (Serie de TV),6,9,0,15,Coisa Mais Linda (Serie de TV),0,1,0,1


In [137]:
final_df = final_df.drop(['Title1','Title2'], axis=1)
final_df

Unnamed: 0,Title,Link,Type,Duration,Country,Directors,Cast,Genres,Description,RatingAverage,Votes,Reviews,Women_Cast,Men_Cast,Not_Set_Cast,Total_Cast,Women_Directors,Men_Directors,Not_Set_Directors,Total_Directors
0,Money Heist,https://www.filmaffinity.com/us/film879405.html,TV show,70 min.,Spain,"Álex Pina, Jesús Colmenar, Miguel Ángel Vivas,...","Álvaro Morte, Úrsula Corberó, Itziar Ituño, Al...","TV Series, Thriller, Mystery, Heist Film, Kidn...",TV Series (2017-Present Day). 4 Seasons. A mys...,7.1,25691.0,"""[4th Season Review]: [It] is like an extended...",11,26,2,39,0,5,2,7
1,The Blacklist,https://www.filmaffinity.com/us/film573633.html,TV show,42 min.,United States,"Jon Bokenkamp, Michael W. Watkins, Andrew McCa...","James Spader, Megan Boone, Diego Klattenhoff, ...","TV Series, Mystery, Drama, Crime, Spy Film","The world's most wanted criminal, Thomas Raymo...",6.4,5148.0,"""His name is above the title and, depending ho...",9,15,0,24,4,32,0,36
2,Locked Up,https://www.filmaffinity.com/us/film441483.html,TV show,50 min.,Spain,"Iván Escobar, Esther Martínez Lobato, Daniel É...","Maggie Civantos, Najwa Nimri, Roberto Enríquez...","TV Series, Thriller, Drama, Prison Drama",Macarena Ferreiro is a young naive woman who f...,7.0,6941.0,,21,9,2,32,2,3,4,9
3,Prison Break,https://www.filmaffinity.com/us/film822756.html,TV show,42 min.,United States,"Paul Scheuring, Bobby Roth, Kevin Hooks, Dwigh...","Wentworth Miller, Dominic Purcell, Robert Knep...","TV Series, Action, Drama, Prison Drama, Cop Mo...",TV Series (2005-2009). 5 Seasons. 90 Episodes....,7.3,71511.0,"""A strong cast led by Wentworth Miller (...) I...",4,15,2,21,1,30,1,32
4,13 Reasons Why,https://www.filmaffinity.com/us/film999360.html,TV show,60 min.,United States,"Brian Yorkey, Tom McCarthy, Kyle Patrick Alvar...","Dylan Minnette, Katherine Langford, Christian ...","TV Series, Drama, Mystery, Teen/coming-of-age,...","'Thirteen Reasons Why', based on the best-sell...",6.8,21496.0,"""[2nd Season Review]: [It] is a frustratingly ...",12,22,1,35,2,5,0,7
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2378,El rey: monarca eterno (Serie de TV),https://www.filmaffinity.com/es/film953955.html,film,60 min.,Corea del Sur,Sang-Hoon Baek,"Lee Min-ho, Kim Go-eun, Do-Hwan Woo, Kyung-Nam...","Serie de TV, Fantástico, Romance",Un drama de fantasía y romance en el que una d...,,,,1,5,0,6,0,1,0,1
2379,La red Avispa,https://www.filmaffinity.com/es/film826923.html,film,123 min.,Francia,Olivier Assayas,"Penélope Cruz, Edgar Ramirez, Wagner Moura, Ga...","Thriller, Años 90, Espionaje","La Habana, principios de los 90. René González...",5.6,550.0,"""Desconcertante revoltijo de géneros que se ma...",3,15,1,19,0,1,0,1
2380,The Sinner 3 (Miniserie de TV),https://www.filmaffinity.com/es/film892789.html,film,,Estados Unidos,"Derek Simonds, Antonio Campos","Bill Pullman, Matt Bomer, Chris Messina, Jessi...","Serie de TV, Thriller, Intriga, Serie de antol...",Miniserie de TV (2020). 8 episodios. El detect...,5.8,322.0,"""Consigue desarrollar su historia con eficacia...",8,9,0,17,0,2,0,2
2381,Coisa Mais Linda (Serie de TV),https://www.filmaffinity.com/es/film793120.html,film,45 min.,Brasil,Giuliano Cedroni,"Maria Casadevall, Pathy Dejesus, Fernanda Vasc...","Serie de TV, Comedia, Romance",Serie de TV (2019-). 7 episodios. Después de l...,6.2,131.0,"""En gran medida, maneja bien la historia, con ...",6,9,0,15,0,1,0,1


In [138]:
final_df.to_csv('/Users/gina/Documents/allWomen/Final project/datasets/final_dataset2.csv')