# How Diverse is TIME'S 100 Most Influential People List?

Each year, beginning in 1999 (and digitally from 2004), TIMES Magazine releases a list of the 100 most influential people in the world for that year. Chosen exclusively by Time's editors after nominations from alumnae and international staff, the list features people who are "recognized for changing the world, regardless of the consequences of their actions."

Appearing on this list is generally considered an honor, and has thus generated controversy over whom it includes as well as excludes. While the list is presented as a _reflection_ of the current state of the world, the editorial decisions behind it are telling of the power structures that shape our worldviews. 

I will be scraping Time's 100 Most Influential People List from the years 2004, 2007, 2010, 2013, 2015, 2017, 2019 and 2020 and analyze it for changing trends in nationality, gender, age and "category" of influence. 

# 1. Getting the raw data

Time has different formats and graphics on each of their lists, and unfortunately none of them are very consistent. 

For instance, the most recent [list from 2020](https://time.com/collection/100-most-influential-people-2020/) features a beautiful visual design with videos, photographs and interactive buttons. The [list from 2017](https://time.com/collection/2017-time-100/) has an entirely different layout and organization structure, as does the [2015 list](https://time.com/collection/2015-time-100/). And everthing before [2013](https://time100.time.com/2013/04/18/time-100/slide/all/) is archived in yet another format. 

And all of these different designs means dramatically different div trees and tags!

I tried to loop through these URLs while using both BeautifulSoup and Selenium, but found lots of errors and inconsistencies because the differently structured data. It was easier to go through each URL one at a time, troubleshoot, and then copy the functioning code to extract the necessary info.

In [1]:
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
import pandas as pd
pd.set_option("display.max_rows", 500)
pd.set_option('max_colwidth', 400)
import re
import numpy as np



(I actually did each of these in separate Notebooks, but compiling them here for clarity.)

## Time 2020 
The most recent, and therefore most interactive and graphic-heavy list. Using BeautifulSoup for this made it a lot easier to understand the div-tree, because most (if not all) the links were hover boxes (without their own URLs), making it hard to use Selenium and inspect the page accurately.

In [2]:
response = requests.get("https://time.com/collection/100-most-influential-people-2020/")
doc = BeautifulSoup(response.content, "html.parser")
doc.prettify()

'<!DOCTYPE html>\n<!--\n:////////////////////////: ..-////-..        .//`                 :/-        ..-///////////////////-\n-`````````.////.`````````:   `////.          -///                -///           ////-`````````````.-\n-         .////.         -   `////.          ////:              .////`          ////-              -\n          .////.             `////.         `/////-            `/////-          ////-\n          .////.             `////.         ://////.          `///////          ////-\n          .////.             `////.         //..////.         :/:-////.         ////-           `.\n          .////.             `////.        .//` :////.       :/: `////:         ////:..........:-\n          .////.             `////.        //:   :////`     -/:`  -////`        ////:...........:-\n          .////.             `////.       `//`    :////`   -//`   `////-        ////-           `.\n          .////.             `////.       ://     `/////` -//`     :////        ////-\n          

In [3]:
full = doc.find_all('a')
allnames2020 = []
count = 0
for each in full[56:158]:
    names = each('span')
    names = (names[0].text.strip())
    links = (each['href'])
    if names == 'LASTESIS':
        names = 'Las Tesis'
    if names == names.upper():
        category = names
        #print (category)
        count = count + 1
    if count == 1 and names != category:
        eachnamedict = {'Name': names,
                       'Category': category,
                       'Year': '2020',
                       'Link': links}
        allnames2020.append(eachnamedict)
    if count == 2 and names != category:
        eachnamedict = {'Name': names,
                       'Category': category,
                       'Year': '2020',
                       'Link': links}
        allnames2020.append(eachnamedict)
    if count == 3 and names != category:
        eachnamedict = {'Name': names,
                       'Category': category,
                       'Year': '2020',
                       'Link': links}
        allnames2020.append(eachnamedict)
    if count == 4 and names != category:
        eachnamedict = {'Name': names,
                       'Category': category,
                       'Year': '2020',
                       'Link': links}
        allnames2020.append(eachnamedict)
    if count == 5 and names != category:
        eachnamedict = {'Name': names,
                       'Category': category,
                       'Year': '2020',
                       'Link': links}
        allnames2020.append(eachnamedict)
    if count == 6 and names != category:
        eachnamedict = {'Name': names,
                       'Category': category,
                       'Year': '2020',
                       'Link': links}
        allnames2020.append(eachnamedict)
print (len(allnames2020))
allnames2020

97


[{'Name': 'Megan Thee Stallion',
  'Category': 'PIONEERS',
  'Year': '2020',
  'Link': 'https://time.com/collection/100-most-influential-people-2020/5888165/megan-thee-stallion-pioneer/'},
 {'Name': 'Giannis Antetokounmpo',
  'Category': 'PIONEERS',
  'Year': '2020',
  'Link': 'https://time.com/collection/100-most-influential-people-2020/5888173/giannis-antetokounmpo/'},
 {'Name': 'Ibram X. Kendi',
  'Category': 'PIONEERS',
  'Year': '2020',
  'Link': 'https://time.com/collection/100-most-influential-people-2020/5888207/ibram-x-kendi/'},
 {'Name': 'Nathan Law',
  'Category': 'PIONEERS',
  'Year': '2020',
  'Link': 'https://time.com/collection/100-most-influential-people-2020/5888201/nathan-law/'},
 {'Name': 'Tomi Adeyemi',
  'Category': 'PIONEERS',
  'Year': '2020',
  'Link': 'https://time.com/collection/100-most-influential-people-2020/5888211/tomi-adeyemi/'},
 {'Name': 'Astronauts Christina Koch and Jessica Meir',
  'Category': 'PIONEERS',
  'Year': '2020',
  'Link': 'https://time.co

In [4]:
df20 = pd.DataFrame(allnames2020)

df20

Unnamed: 0,Name,Category,Year,Link
0,Megan Thee Stallion,PIONEERS,2020,https://time.com/collection/100-most-influential-people-2020/5888165/megan-thee-stallion-pioneer/
1,Giannis Antetokounmpo,PIONEERS,2020,https://time.com/collection/100-most-influential-people-2020/5888173/giannis-antetokounmpo/
2,Ibram X. Kendi,PIONEERS,2020,https://time.com/collection/100-most-influential-people-2020/5888207/ibram-x-kendi/
3,Nathan Law,PIONEERS,2020,https://time.com/collection/100-most-influential-people-2020/5888201/nathan-law/
4,Tomi Adeyemi,PIONEERS,2020,https://time.com/collection/100-most-influential-people-2020/5888211/tomi-adeyemi/
5,Astronauts Christina Koch and Jessica Meir,PIONEERS,2020,https://time.com/collection/100-most-influential-people-2020/5888216/all-women-spacewalk-christina-koch-jessica-meir/
6,Julie K. Brown,PIONEERS,2020,https://time.com/collection/100-most-influential-people-2020/5888209/julie-k-brown/
7,Cecilia Martinez,PIONEERS,2020,https://time.com/collection/100-most-influential-people-2020/5888195/cecilia-martinez/
8,Maya Moore,PIONEERS,2020,https://time.com/collection/100-most-influential-people-2020/5888213/maya-moore/
9,Chase Strangio,PIONEERS,2020,https://time.com/collection/100-most-influential-people-2020/5888158/chase-strangio/


This dataframe gives me each person's **name**, the **category** for which they are selected, a **link to the blurb** about them, and the **year** of the list (for when I eventually combine all the dataframes).

#### Some manual cleaning:
There were many little errors, especially when multiple names were grouped together into one row. I manually entered some code to clean through them, first by adding in rows for the second/third person in the grouped row, and then renaming the original row to only the first person:

In [5]:
cleaning = [
            {'Name': 'Jessica Meir',
            'Category': 'PIONEERS',
            'Year': '2020',
            'Link': 'https://time.com/collection/100-most-influential-people-2020/5888216/all-women-spacewalk-christina-koch-jessica-meir/'},
             {'Name': ' Patrisse Cullors',
            'Category': 'ICONS',
            'Year': '2020',
            'Link': 'https://time.com/collection/100-most-influential-people-2020/5888228/black-lives-matter-founders/'},
            {'Name': ' Opal Tometi',
            'Category': 'ICONS',
            'Year': '2020',
            'Link':'https://time.com/collection/100-most-influential-people-2020/5888228/black-lives-matter-founders/'}
            ]

df20 = df20.append(cleaning, ignore_index=True)


In [6]:
df20.Name[df20.Name == 'Astronauts Christina Koch and Jessica Meir'] = df20.Name[df20.Name == 'Astronauts Christina Koch and Jessica Meir'].str.replace('Astronauts Christina Koch and Jessica Meir', 'Christina Koch')
df20.Name[df20.Name == 'Black Lives Matter Founders Alicia Garza, Patrisse Cullors and Opal Tometi'] = df20.Name[df20.Name == 'Black Lives Matter Founders Alicia Garza, Patrisse Cullors and Opal Tometi'].str.replace('Black Lives Matter Founders Alicia Garza, Patrisse Cullors and Opal Tometi', 'Alicia Garza')

df20

Unnamed: 0,Name,Category,Year,Link
0,Megan Thee Stallion,PIONEERS,2020,https://time.com/collection/100-most-influential-people-2020/5888165/megan-thee-stallion-pioneer/
1,Giannis Antetokounmpo,PIONEERS,2020,https://time.com/collection/100-most-influential-people-2020/5888173/giannis-antetokounmpo/
2,Ibram X. Kendi,PIONEERS,2020,https://time.com/collection/100-most-influential-people-2020/5888207/ibram-x-kendi/
3,Nathan Law,PIONEERS,2020,https://time.com/collection/100-most-influential-people-2020/5888201/nathan-law/
4,Tomi Adeyemi,PIONEERS,2020,https://time.com/collection/100-most-influential-people-2020/5888211/tomi-adeyemi/
5,Christina Koch,PIONEERS,2020,https://time.com/collection/100-most-influential-people-2020/5888216/all-women-spacewalk-christina-koch-jessica-meir/
6,Julie K. Brown,PIONEERS,2020,https://time.com/collection/100-most-influential-people-2020/5888209/julie-k-brown/
7,Cecilia Martinez,PIONEERS,2020,https://time.com/collection/100-most-influential-people-2020/5888195/cecilia-martinez/
8,Maya Moore,PIONEERS,2020,https://time.com/collection/100-most-influential-people-2020/5888213/maya-moore/
9,Chase Strangio,PIONEERS,2020,https://time.com/collection/100-most-influential-people-2020/5888158/chase-strangio/


Saving it to a CSV for good measure.

In [7]:
df20.to_csv('2020_Times_Scraped_List.csv',sep=',',index=False)
df20

Unnamed: 0,Name,Category,Year,Link
0,Megan Thee Stallion,PIONEERS,2020,https://time.com/collection/100-most-influential-people-2020/5888165/megan-thee-stallion-pioneer/
1,Giannis Antetokounmpo,PIONEERS,2020,https://time.com/collection/100-most-influential-people-2020/5888173/giannis-antetokounmpo/
2,Ibram X. Kendi,PIONEERS,2020,https://time.com/collection/100-most-influential-people-2020/5888207/ibram-x-kendi/
3,Nathan Law,PIONEERS,2020,https://time.com/collection/100-most-influential-people-2020/5888201/nathan-law/
4,Tomi Adeyemi,PIONEERS,2020,https://time.com/collection/100-most-influential-people-2020/5888211/tomi-adeyemi/
5,Christina Koch,PIONEERS,2020,https://time.com/collection/100-most-influential-people-2020/5888216/all-women-spacewalk-christina-koch-jessica-meir/
6,Julie K. Brown,PIONEERS,2020,https://time.com/collection/100-most-influential-people-2020/5888209/julie-k-brown/
7,Cecilia Martinez,PIONEERS,2020,https://time.com/collection/100-most-influential-people-2020/5888195/cecilia-martinez/
8,Maya Moore,PIONEERS,2020,https://time.com/collection/100-most-influential-people-2020/5888213/maya-moore/
9,Chase Strangio,PIONEERS,2020,https://time.com/collection/100-most-influential-people-2020/5888158/chase-strangio/


#### But wait! 
It turns out that Time's website actually has no demographic data whatsoever about any of the 100 people in any of the 16 or so years they have been publishing it! Sometimes it makes sense – you wouldn't necessarily write a super famous person's gender on a list honoring their life's work. 


So to add basic information about each person, I then extracted their names and reformatted it into a Google search query, which I subsequently looped through using Selenium.

In [8]:
urlnames2020 = []
for eachname in df20.Name:
        eachname = eachname.replace(" ","+")
        if eachname == 'Tourmaline':
            eachname = eachname.replace('Tourmaline', 'Tourmaline+filmmaker')
        urlnames2020.append(f"https://www.google.com/search?q={eachname}")
        
urlnames2020

['https://www.google.com/search?q=Megan+Thee+Stallion',
 'https://www.google.com/search?q=Giannis+Antetokounmpo',
 'https://www.google.com/search?q=Ibram+X.+Kendi',
 'https://www.google.com/search?q=Nathan+Law',
 'https://www.google.com/search?q=Tomi+Adeyemi',
 'https://www.google.com/search?q=Christina+Koch',
 'https://www.google.com/search?q=Julie+K.+Brown',
 'https://www.google.com/search?q=Cecilia+Martinez',
 'https://www.google.com/search?q=Maya+Moore',
 'https://www.google.com/search?q=Chase+Strangio',
 'https://www.google.com/search?q=Zhang+Yongzhen',
 'https://www.google.com/search?q=Tourmaline+filmmaker',
 'https://www.google.com/search?q=Waad+al-Kateab',
 'https://www.google.com/search?q=Abubacarr+Tambadou',
 'https://www.google.com/search?q=Gabriela+Cámara',
 'https://www.google.com/search?q=Camilla+Rothe',
 'https://www.google.com/search?q=Rebecca+Gomperts',
 'https://www.google.com/search?q=Ravindra+Gupta',
 'https://www.google.com/search?q=Lauren+Gardner',
 'https://www.g

#### Switching to Selenium
Just a lot simpler than Beautiful Soup. I used Selenium to extract elements from Google's graph search (the little infobox on the right). The tags are more or less consistent (in some cases they were one of two, so I manually picked either according to the errors I got).

The main issue here was who showed up on Google in the first place and just how much information was available about them! I tested the code for a few artists in each list first (putting just one example of that here), did some troubleshooting, and once it was working, looped through the list of URLs. 

In [102]:
driver = webdriver.Chrome()

In [103]:
#Testing with one artist first:
driver.get('https://www.google.com/search?q=Megan+Thee+Stallion')

try:
    names = driver.find_element_by_class_name("SPZz6b").text 
    name = (names.split('\n'))[0]
except: 
    name = 'Null'

    #use ZxoDOe -- if that class doesn't work 
    
try:
    label = driver.find_element_by_class_name("wwUB2c").text
except: 
    label = 'Null'

    # use EGmpye -- if that class doesn't work 
    
try:
    born = driver.find_element_by_class_name("rVusze").text
except: 
    born = 'Null'

try:
    full_desc = driver.find_element_by_class_name("kno-rdesc").text
except: 
    full_desc = 'Null'

meg_info = {'Name': name,
                'Profession': label,
                'Birth Info': born,
                'Description': full_desc}

print (meg_info)

{'Name': 'Megan Thee Stallion', 'Profession': 'American rapper', 'Birth Info': 'Born: February 15, 1995 (age 25 years), Bexar County, TX', 'Description': 'Description\nMegan Jovon Ruth Pete, known professionally as Megan Thee Stallion, is an American rapper, singer, and songwriter. Originally from Houston, Texas, she first garnered attention when videos of her freestyling became popular on social media platforms such as Instagram. Wikipedia'}


In [10]:
import time

full_list2020 = []

for url in urlnames2020:
    driver.get(url)
    
    try:
        names = driver.find_element_by_class_name("SPZz6b").text 
        name = (names.split('\n'))[0]
    except: 
        name = 'Null'

    try:
        label = driver.find_element_by_class_name("wwUB2c").text
    except: 
        label = 'Null'

    try:
        born = driver.find_element_by_class_name("rVusze").text
    except: 
        born = 'Null'

    try:
        full_desc = driver.find_element_by_class_name("kno-rdesc").text
    except: 
        full_desc = 'Null'

    person_info = {'Name': name,
                    'Profession': label,
                    'Birth Info': born,
                    'Description': full_desc}
    
    full_list2020.append(person_info)
    
    time.sleep(1)

print (len(full_list2020))
full_list2020

100


[{'Name': 'Megan Thee Stallion',
  'Profession': 'American rapper',
  'Birth Info': 'Born: February 15, 1995 (age 25 years), Bexar County, TX',
  'Description': 'Description\nMegan Jovon Ruth Pete, known professionally as Megan Thee Stallion, is an American rapper, singer, and songwriter. Originally from Houston, Texas, she first garnered attention when videos of her freestyling became popular on social media platforms such as Instagram. Wikipedia'},
 {'Name': 'Giannis Antetokounmpo',
  'Profession': 'Basketball player',
  'Birth Info': 'Born: December 6, 1994 (age 26 years), Athens, Greece',
  'Description': 'Description\nGiannis Sina Ugo Antetokounmpo is a Greek professional basketball player for the Milwaukee Bucks of the National Basketball Association. Born in Greece to Nigerian parents, Antetokounmpo began playing basketball for the youth teams of Filathlitikos in Athens. Wikipedia'},
 {'Name': 'Ibram X. Kendi',
  'Profession': 'American author',
  'Birth Info': 'Born: 1982 (age 

I added a `time.sleep(1)` function because Google's API kept thinking I was a robot. 

When I had the full list, I made it into my second dataframe:

In [11]:
df20b = pd.DataFrame(full_list2020)
df20b

Unnamed: 0,Name,Profession,Birth Info,Description
0,Megan Thee Stallion,American rapper,"Born: February 15, 1995 (age 25 years), Bexar County, TX","Description\nMegan Jovon Ruth Pete, known professionally as Megan Thee Stallion, is an American rapper, singer, and songwriter. Originally from Houston, Texas, she first garnered attention when videos of her freestyling became popular on social media platforms such as Instagram. Wikipedia"
1,Giannis Antetokounmpo,Basketball player,"Born: December 6, 1994 (age 26 years), Athens, Greece","Description\nGiannis Sina Ugo Antetokounmpo is a Greek professional basketball player for the Milwaukee Bucks of the National Basketball Association. Born in Greece to Nigerian parents, Antetokounmpo began playing basketball for the youth teams of Filathlitikos in Athens. Wikipedia"
2,Ibram X. Kendi,American author,"Born: 1982 (age 38 years), Jamaica, New York, NY","Description\nIbram Xolani Kendi is an American author, professor, anti-racist activist, and historian of race and discriminatory policy in America. In July 2020, he assumed the position of director of the Center for Antiracist Research at Boston University. Kendi was included in Time Magazine's 100 Most Influential People of 2020. Wikipedia"
3,Nathan Law,Political leader,"Born: July 13, 1993 (age 27 years), Shenzhen, China","Description\nNathan Law Kwun-chung is an activist from Hong Kong. As a former student leader, he has been chairman of the Representative Council of the Lingnan University Students' Union, acting president of the LUSU, and secretary-general of the Hong Kong Federation of Students. Wikipedia"
4,Tomi Adeyemi,American novelist,"Born: August 1, 1993 (age 27 years), United States","Description\nTomi Adeyemi is a bestselling Nigerian-American novelist and creative writing coach. She is known for her #1 NY Times bestselling book Children of Blood and Bone, the first in the Legacy of Orïsha ... Wikipedia"
5,Christina Koch,American engineer,"Born: January 29, 1979 (age 41 years), Grand Rapids, MI",Description\nChristina Hammock Koch is an American engineer and NASA astronaut of the class of 2013. She received Bachelor of Science degrees in Electrical Engineering and Physics and a Master of Science in Electrical Engineering from North Carolina State University. Wikipedia
6,Julie K. Brown,American journalist,"Born: 1961 (age 59 years), Philadelphia, PA","Description\nJulie K. Brown is an American investigative journalist with the Miami Herald best known for pursuing the sex trafficking story surrounding Jeffrey Epstein, who in 2008 was allowed to plead guilty to two state-level prostitution offenses. Wikipedia"
7,Cecilia Martinez,Null,"Born: Taos, NM","Description\nCecilia Martinez is the co-founder and Executive Director of the Center for Earth, Energy, and Democracy. In 2020, she was named one of Time Magazine's 100 Most Influential People, nominated by New Jersey Senator Cory Booker for her work in environmental justice and climate policy. Wikipedia"
8,Maya Moore,American basketball player,"Born: June 11, 1989 (age 31 years), Jefferson City, MO","Description\nMaya April Moore is an American professional basketball player for the Minnesota Lynx of the Women's National Basketball Association who is on sabbatical. Naming her their inaugural Performer of the Year in 2017, Sports Illustrated called Moore the greatest winner in the history of women's basketball. Wikipedia"
9,Chase Strangio,American lawyer,Born: Massachusetts,Description\nChase Strangio is an American lawyer and transgender rights activist. He is a staff attorney with the American Civil Liberties Union. Wikipedia


#### More data cleaning and getting the right columns
The Google graph was not always 100% consistent with what data it returned:


Sometimes it would say "_Born: June 26, 1995 (age 25 years), Mumbai, India_" and sometimes it would say "_Born: Mumbai, India, 1995, XYZ Hospital_" and sometimes it would say "_Born: June_."

So in the code above I just asked Google for everything after the word "_Born:_". Now I had to extract the actual birthdate, place, and age using `Regex`

In [12]:
df20b['Birthplace'] = df20b['Birth Info'].str.extract(r"years\), ([\w\W]+)")
df20b['Birthdate'] = df20b['Birth Info'].str.extract (r"Born: ([\w\W]+) \(age")
df20b['Age'] = df20b['Birth Info'].str.extract(r"\(age ([\d]*) years\)")
df20b['Description'] = df20b.Description.str.replace('Description\n', '')
df20b['Description'] = df20b.Description.str.replace('Wikipedia', '')

try:
    df20b['Pronouns'] = df20b['Description'].str.extract(r"\b(she|he|her|his)(?i)\b")
    
except:
    pass


df20b

Unnamed: 0,Name,Profession,Birth Info,Description,Birthplace,Birthdate,Age,Pronouns
0,Megan Thee Stallion,American rapper,"Born: February 15, 1995 (age 25 years), Bexar County, TX","Megan Jovon Ruth Pete, known professionally as Megan Thee Stallion, is an American rapper, singer, and songwriter. Originally from Houston, Texas, she first garnered attention when videos of her freestyling became popular on social media platforms such as Instagram.","Bexar County, TX","February 15, 1995",25.0,she
1,Giannis Antetokounmpo,Basketball player,"Born: December 6, 1994 (age 26 years), Athens, Greece","Giannis Sina Ugo Antetokounmpo is a Greek professional basketball player for the Milwaukee Bucks of the National Basketball Association. Born in Greece to Nigerian parents, Antetokounmpo began playing basketball for the youth teams of Filathlitikos in Athens.","Athens, Greece","December 6, 1994",26.0,
2,Ibram X. Kendi,American author,"Born: 1982 (age 38 years), Jamaica, New York, NY","Ibram Xolani Kendi is an American author, professor, anti-racist activist, and historian of race and discriminatory policy in America. In July 2020, he assumed the position of director of the Center for Antiracist Research at Boston University. Kendi was included in Time Magazine's 100 Most Influential People of 2020.","Jamaica, New York, NY",1982,38.0,he
3,Nathan Law,Political leader,"Born: July 13, 1993 (age 27 years), Shenzhen, China","Nathan Law Kwun-chung is an activist from Hong Kong. As a former student leader, he has been chairman of the Representative Council of the Lingnan University Students' Union, acting president of the LUSU, and secretary-general of the Hong Kong Federation of Students.","Shenzhen, China","July 13, 1993",27.0,he
4,Tomi Adeyemi,American novelist,"Born: August 1, 1993 (age 27 years), United States","Tomi Adeyemi is a bestselling Nigerian-American novelist and creative writing coach. She is known for her #1 NY Times bestselling book Children of Blood and Bone, the first in the Legacy of Orïsha ...",United States,"August 1, 1993",27.0,She
5,Christina Koch,American engineer,"Born: January 29, 1979 (age 41 years), Grand Rapids, MI",Christina Hammock Koch is an American engineer and NASA astronaut of the class of 2013. She received Bachelor of Science degrees in Electrical Engineering and Physics and a Master of Science in Electrical Engineering from North Carolina State University.,"Grand Rapids, MI","January 29, 1979",41.0,She
6,Julie K. Brown,American journalist,"Born: 1961 (age 59 years), Philadelphia, PA","Julie K. Brown is an American investigative journalist with the Miami Herald best known for pursuing the sex trafficking story surrounding Jeffrey Epstein, who in 2008 was allowed to plead guilty to two state-level prostitution offenses.","Philadelphia, PA",1961,59.0,
7,Cecilia Martinez,Null,"Born: Taos, NM","Cecilia Martinez is the co-founder and Executive Director of the Center for Earth, Energy, and Democracy. In 2020, she was named one of Time Magazine's 100 Most Influential People, nominated by New Jersey Senator Cory Booker for her work in environmental justice and climate policy.",,,,she
8,Maya Moore,American basketball player,"Born: June 11, 1989 (age 31 years), Jefferson City, MO","Maya April Moore is an American professional basketball player for the Minnesota Lynx of the Women's National Basketball Association who is on sabbatical. Naming her their inaugural Performer of the Year in 2017, Sports Illustrated called Moore the greatest winner in the history of women's basketball.","Jefferson City, MO","June 11, 1989",31.0,her
9,Chase Strangio,American lawyer,Born: Massachusetts,Chase Strangio is an American lawyer and transgender rights activist. He is a staff attorney with the American Civil Liberties Union.,,,,He


And save that to a csv!

In [13]:
df20b.to_csv("2020_Demographic_Data.csv",index=False)

#### Checking value counts for null and missing values, more manual cleaning:


In [14]:
df20b.Name[df20b.Name == 'Null'].value_counts()

Null    4
Name: Name, dtype: int64

In [15]:
df20b.Birthplace.isna().value_counts()
#df20b[df20b.Birthplace.isna()]

False    76
True     24
Name: Birthplace, dtype: int64

In [16]:
df20b.Birthplace[df20b.Name == 'Lina Attalah'] = df20b.Birthplace[df20b.Name == 'Lina Attalah'].fillna('Egypt')
df20b.Birthplace[df20b.Name == 'Bilkis Dadi'] = df20b.Birthplace[df20b.Name == 'Bilkis Dadi'].fillna('Shaheen Bagh, India')
df20b.Birthplace[df20b.Name == 'Ady Barkan'] = df20b.Birthplace[df20b.Name == 'Ady Barkan'].fillna('New York, NY')
df20b.Birthplace[df20b.Name == 'Sister Norma Pimentel'] = df20b.Birthplace[df20b.Name == 'Sister Norma Pimentel'].fillna('Brownsville, TX')
df20b.Birthplace[df20b.Name == 'Jean-Jacques Muyembe-Tamfum'] = df20b.Birthplace[df20b.Name == 'Jean-Jacques Muyembe-Tamfum'].fillna('Democratic Repuublic of the Congo')
df20b.Birthplace[df20b.Name == 'Michaela Coel'] = df20b.Birthplace[df20b.Name == 'Michaela Coel'].fillna('London, United Kingdom')
df20b.Birthplace[df20b.Name == 'Nemonte Nenquimo'] = df20b.Birthplace[df20b.Name == 'Nemonte Nenquimo'].fillna('Waorani, Ecuador')
df20b.Birthplace[df20b.Name == 'Ravindra Gupta'] = df20b.Birthplace[df20b.Name == 'Ravindra Gupta'].fillna('Durban, South Africa')
df20b.Birthplace[df20b.Name == 'Cecilia Martinez'] = df20b.Birthplace[df20b.Name == 'Cecilia Martinez'].fillna('Taos, MN')
df20b.Birthplace[df20b.Name == 'Chase Strangio'] = df20b.Birthplace[df20b.Name == 'Chase Strangio'].fillna('Massachusetts')
df20b.Birthplace[df20b.Name == 'Gabriela Cámara'] = df20b.Birthplace[df20b.Name == 'Gabriela Cámara'].fillna('Chihuahua, Mexico')
df20b.Birthplace[df20b.Name == 'Abubacarr Tambadou'] = df20b.Birthplace[df20b.Name == 'Abubacarr Tambadou'].fillna('Gambia')
df20b.Birthplace[df20b.Name == 'Lauren Gardner'] = df20b.Birthplace[df20b.Name == 'Lauren Gardner'].fillna('New York, NY')
df20b.Birthplace[df20b.Name == 'Donald Trump'] = df20b.Birthplace[df20b.Name == 'Donald Trump'].fillna('New York, NY')
df20b.Birthplace[df20b.Name == 'A Rapist in Your Path'] = df20b.Birthplace[df20b.Name == 'A Rapist in Your Path'].fillna('Santiago, Chile')



df20b

Unnamed: 0,Name,Profession,Birth Info,Description,Birthplace,Birthdate,Age,Pronouns
0,Megan Thee Stallion,American rapper,"Born: February 15, 1995 (age 25 years), Bexar County, TX","Megan Jovon Ruth Pete, known professionally as Megan Thee Stallion, is an American rapper, singer, and songwriter. Originally from Houston, Texas, she first garnered attention when videos of her freestyling became popular on social media platforms such as Instagram.","Bexar County, TX","February 15, 1995",25.0,she
1,Giannis Antetokounmpo,Basketball player,"Born: December 6, 1994 (age 26 years), Athens, Greece","Giannis Sina Ugo Antetokounmpo is a Greek professional basketball player for the Milwaukee Bucks of the National Basketball Association. Born in Greece to Nigerian parents, Antetokounmpo began playing basketball for the youth teams of Filathlitikos in Athens.","Athens, Greece","December 6, 1994",26.0,
2,Ibram X. Kendi,American author,"Born: 1982 (age 38 years), Jamaica, New York, NY","Ibram Xolani Kendi is an American author, professor, anti-racist activist, and historian of race and discriminatory policy in America. In July 2020, he assumed the position of director of the Center for Antiracist Research at Boston University. Kendi was included in Time Magazine's 100 Most Influential People of 2020.","Jamaica, New York, NY",1982,38.0,he
3,Nathan Law,Political leader,"Born: July 13, 1993 (age 27 years), Shenzhen, China","Nathan Law Kwun-chung is an activist from Hong Kong. As a former student leader, he has been chairman of the Representative Council of the Lingnan University Students' Union, acting president of the LUSU, and secretary-general of the Hong Kong Federation of Students.","Shenzhen, China","July 13, 1993",27.0,he
4,Tomi Adeyemi,American novelist,"Born: August 1, 1993 (age 27 years), United States","Tomi Adeyemi is a bestselling Nigerian-American novelist and creative writing coach. She is known for her #1 NY Times bestselling book Children of Blood and Bone, the first in the Legacy of Orïsha ...",United States,"August 1, 1993",27.0,She
5,Christina Koch,American engineer,"Born: January 29, 1979 (age 41 years), Grand Rapids, MI",Christina Hammock Koch is an American engineer and NASA astronaut of the class of 2013. She received Bachelor of Science degrees in Electrical Engineering and Physics and a Master of Science in Electrical Engineering from North Carolina State University.,"Grand Rapids, MI","January 29, 1979",41.0,She
6,Julie K. Brown,American journalist,"Born: 1961 (age 59 years), Philadelphia, PA","Julie K. Brown is an American investigative journalist with the Miami Herald best known for pursuing the sex trafficking story surrounding Jeffrey Epstein, who in 2008 was allowed to plead guilty to two state-level prostitution offenses.","Philadelphia, PA",1961,59.0,
7,Cecilia Martinez,Null,"Born: Taos, NM","Cecilia Martinez is the co-founder and Executive Director of the Center for Earth, Energy, and Democracy. In 2020, she was named one of Time Magazine's 100 Most Influential People, nominated by New Jersey Senator Cory Booker for her work in environmental justice and climate policy.","Taos, MN",,,she
8,Maya Moore,American basketball player,"Born: June 11, 1989 (age 31 years), Jefferson City, MO","Maya April Moore is an American professional basketball player for the Minnesota Lynx of the Women's National Basketball Association who is on sabbatical. Naming her their inaugural Performer of the Year in 2017, Sports Illustrated called Moore the greatest winner in the history of women's basketball.","Jefferson City, MO","June 11, 1989",31.0,her
9,Chase Strangio,American lawyer,Born: Massachusetts,Chase Strangio is an American lawyer and transgender rights activist. He is a staff attorney with the American Civil Liberties Union.,Massachusetts,,,He


There were still a bunch of missing or null values, particularly in the **Pronouns** column, since that was dependent entirely on whether Google's description was more than one line and required the use of pronouns. I left the blank values in for this + subsequent dataframes, and eventually combed through the final CSV, manually adding gender and missing age/place (if the info was available in other columns)

#### Merging the two dataframes above into one:
And 1-2 more rows of cleaning

In [17]:
time2020 = df20.join(df20b, rsuffix='_right')

time2020

Unnamed: 0,Name,Category,Year,Link,Name_right,Profession,Birth Info,Description,Birthplace,Birthdate,Age,Pronouns
0,Megan Thee Stallion,PIONEERS,2020,https://time.com/collection/100-most-influential-people-2020/5888165/megan-thee-stallion-pioneer/,Megan Thee Stallion,American rapper,"Born: February 15, 1995 (age 25 years), Bexar County, TX","Megan Jovon Ruth Pete, known professionally as Megan Thee Stallion, is an American rapper, singer, and songwriter. Originally from Houston, Texas, she first garnered attention when videos of her freestyling became popular on social media platforms such as Instagram.","Bexar County, TX","February 15, 1995",25.0,she
1,Giannis Antetokounmpo,PIONEERS,2020,https://time.com/collection/100-most-influential-people-2020/5888173/giannis-antetokounmpo/,Giannis Antetokounmpo,Basketball player,"Born: December 6, 1994 (age 26 years), Athens, Greece","Giannis Sina Ugo Antetokounmpo is a Greek professional basketball player for the Milwaukee Bucks of the National Basketball Association. Born in Greece to Nigerian parents, Antetokounmpo began playing basketball for the youth teams of Filathlitikos in Athens.","Athens, Greece","December 6, 1994",26.0,
2,Ibram X. Kendi,PIONEERS,2020,https://time.com/collection/100-most-influential-people-2020/5888207/ibram-x-kendi/,Ibram X. Kendi,American author,"Born: 1982 (age 38 years), Jamaica, New York, NY","Ibram Xolani Kendi is an American author, professor, anti-racist activist, and historian of race and discriminatory policy in America. In July 2020, he assumed the position of director of the Center for Antiracist Research at Boston University. Kendi was included in Time Magazine's 100 Most Influential People of 2020.","Jamaica, New York, NY",1982,38.0,he
3,Nathan Law,PIONEERS,2020,https://time.com/collection/100-most-influential-people-2020/5888201/nathan-law/,Nathan Law,Political leader,"Born: July 13, 1993 (age 27 years), Shenzhen, China","Nathan Law Kwun-chung is an activist from Hong Kong. As a former student leader, he has been chairman of the Representative Council of the Lingnan University Students' Union, acting president of the LUSU, and secretary-general of the Hong Kong Federation of Students.","Shenzhen, China","July 13, 1993",27.0,he
4,Tomi Adeyemi,PIONEERS,2020,https://time.com/collection/100-most-influential-people-2020/5888211/tomi-adeyemi/,Tomi Adeyemi,American novelist,"Born: August 1, 1993 (age 27 years), United States","Tomi Adeyemi is a bestselling Nigerian-American novelist and creative writing coach. She is known for her #1 NY Times bestselling book Children of Blood and Bone, the first in the Legacy of Orïsha ...",United States,"August 1, 1993",27.0,She
5,Christina Koch,PIONEERS,2020,https://time.com/collection/100-most-influential-people-2020/5888216/all-women-spacewalk-christina-koch-jessica-meir/,Christina Koch,American engineer,"Born: January 29, 1979 (age 41 years), Grand Rapids, MI",Christina Hammock Koch is an American engineer and NASA astronaut of the class of 2013. She received Bachelor of Science degrees in Electrical Engineering and Physics and a Master of Science in Electrical Engineering from North Carolina State University.,"Grand Rapids, MI","January 29, 1979",41.0,She
6,Julie K. Brown,PIONEERS,2020,https://time.com/collection/100-most-influential-people-2020/5888209/julie-k-brown/,Julie K. Brown,American journalist,"Born: 1961 (age 59 years), Philadelphia, PA","Julie K. Brown is an American investigative journalist with the Miami Herald best known for pursuing the sex trafficking story surrounding Jeffrey Epstein, who in 2008 was allowed to plead guilty to two state-level prostitution offenses.","Philadelphia, PA",1961,59.0,
7,Cecilia Martinez,PIONEERS,2020,https://time.com/collection/100-most-influential-people-2020/5888195/cecilia-martinez/,Cecilia Martinez,Null,"Born: Taos, NM","Cecilia Martinez is the co-founder and Executive Director of the Center for Earth, Energy, and Democracy. In 2020, she was named one of Time Magazine's 100 Most Influential People, nominated by New Jersey Senator Cory Booker for her work in environmental justice and climate policy.","Taos, MN",,,she
8,Maya Moore,PIONEERS,2020,https://time.com/collection/100-most-influential-people-2020/5888213/maya-moore/,Maya Moore,American basketball player,"Born: June 11, 1989 (age 31 years), Jefferson City, MO","Maya April Moore is an American professional basketball player for the Minnesota Lynx of the Women's National Basketball Association who is on sabbatical. Naming her their inaugural Performer of the Year in 2017, Sports Illustrated called Moore the greatest winner in the history of women's basketball.","Jefferson City, MO","June 11, 1989",31.0,her
9,Chase Strangio,PIONEERS,2020,https://time.com/collection/100-most-influential-people-2020/5888158/chase-strangio/,Chase Strangio,American lawyer,Born: Massachusetts,Chase Strangio is an American lawyer and transgender rights activist. He is a staff attorney with the American Civil Liberties Union.,Massachusetts,,,He


In [18]:

time2020.Profession[time2020.Name == 'Donald Trump'] = time2020.Profession[time2020.Name == 'Donald Trump'].replace('Niece: Mary L. Trump Trending', 'June 14, 1946 (age 74 years)')
time2020.Birthdate[time2020.Name == 'Donald Trump'] = time2020.Birthdate[time2020.Name == 'Donald Trump'].fillna('June 14, 1946')
time2020.Age[time2020.Name == 'Donald Trump'] = time2020.Age[time2020.Name == 'Donald Trump'].fillna('74')


In [19]:
time2020.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 12 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   Name         100 non-null    object
 1   Category     100 non-null    object
 2   Year         100 non-null    object
 3   Link         100 non-null    object
 4   Name_right   100 non-null    object
 5   Profession   100 non-null    object
 6   Birth Info   100 non-null    object
 7   Description  100 non-null    object
 8   Birthplace   90 non-null     object
 9   Birthdate    83 non-null     object
 10  Age          83 non-null     object
 11  Pronouns     66 non-null     object
dtypes: object(12)
memory usage: 9.5+ KB


#### And ***finally***... saving it as the merged CSV for Time 2020:

In [20]:
time2020.to_csv("Time2020_Merged_Full.csv")

### --------------------

## Time 2019
It's far from over!!!
Now I'm repeating the steps above for the remaining Time lists -- again, because looping through the Time URLs didn't work at all and it was very messy, and it _was_ actually faster to do it this way.

In [21]:
response = requests.get("https://time.com/collection/100-most-influential-people-2019/")
doc2 = BeautifulSoup(response.content, "html.parser")
doc2.prettify()



In [22]:
full2 = doc2.find_all('li')
count = 0
allnames2019 = []
#print (full2)

for each in full2[3:]:
    linkbucket = each.find_all('a')
    link = linkbucket[0]['href']
    names = (each.text)
    name = names.split('By')[0]
    count = count + 1

#Manually figuring out where the labels are changing

    if 0 < count < 21:
        category = 'PIONEERS'
        
    if 21 < count < 39:
        category = 'ARTISTS'
        
    if 39 < count < 65:
        category = 'LEADERS'
        
    if 65 < count < 79:
        category = 'ICONS'
        
    if 79 < count:
        category = 'TITANS'
    
    eachdict = {'Name': name,
                'Category': category,
                'Year': '2019',
                'Link': link}
    #print (eachdict)
    allnames2019.append(eachdict)
    

print (len(allnames2019))
allnames2019

94


[{'Name': 'Sandra Oh',
  'Category': 'PIONEERS',
  'Year': '2019',
  'Link': '/collection/100-most-influential-people-2019/5567697/sandra-oh'},
 {'Name': 'Indya Moore',
  'Category': 'PIONEERS',
  'Year': '2019',
  'Link': '/collection/100-most-influential-people-2019/5567698/indya-moore'},
 {'Name': 'Marlon James',
  'Category': 'PIONEERS',
  'Year': '2019',
  'Link': '/collection/100-most-influential-people-2019/5567700/marlon-james'},
 {'Name': 'Chrissy Teigen',
  'Category': 'PIONEERS',
  'Year': '2019',
  'Link': '/collection/100-most-influential-people-2019/5567708/chrissy-teigen'},
 {'Name': 'Massimo Bottura',
  'Category': 'PIONEERS',
  'Year': '2019',
  'Link': '/collection/100-most-influential-people-2019/5567706/massimo-bottura'},
 {'Name': 'Hasan Minhaj',
  'Category': 'PIONEERS',
  'Year': '2019',
  'Link': '/collection/100-most-influential-people-2019/5567704/hasan-minhaj'},
 {'Name': 'Samin Nosrat',
  'Category': 'PIONEERS',
  'Year': '2019',
  'Link': '/collection/100-m

In [23]:
df19 = pd.DataFrame(allnames2019)

df19

Unnamed: 0,Name,Category,Year,Link
0,Sandra Oh,PIONEERS,2019,/collection/100-most-influential-people-2019/5567697/sandra-oh
1,Indya Moore,PIONEERS,2019,/collection/100-most-influential-people-2019/5567698/indya-moore
2,Marlon James,PIONEERS,2019,/collection/100-most-influential-people-2019/5567700/marlon-james
3,Chrissy Teigen,PIONEERS,2019,/collection/100-most-influential-people-2019/5567708/chrissy-teigen
4,Massimo Bottura,PIONEERS,2019,/collection/100-most-influential-people-2019/5567706/massimo-bottura
5,Hasan Minhaj,PIONEERS,2019,/collection/100-most-influential-people-2019/5567704/hasan-minhaj
6,Samin Nosrat,PIONEERS,2019,/collection/100-most-influential-people-2019/5567705/samin-nosrat
7,Ninja,PIONEERS,2019,/collection/100-most-influential-people-2019/5567713/ninja
8,Arundhati Katju and Menaka Guruswamy,PIONEERS,2019,/collection/100-most-influential-people-2019/5567711/arundhati-katju-menaka-guruswamy
9,Naomi Osaka,PIONEERS,2019,/collection/100-most-influential-people-2019/5567714/naomi-osaka


In [24]:
cleaning2 = [
            {'Name': 'Menaka Guruswamy',
            'Category': 'PIONEERS',
            'Year': '2019',
            'Link': '/web/20190417112245/http://time.com/collection/100-most-influential-people-2019/5567711/arundhati-katju-menaka-guruswamy/'},
             {'Name': 'Ezra Levin',
            'Category': 'PIONEERS',
            'Year': '2019',
            'Link': '/web/20190417112245/http://time.com/collection/100-most-influential-people-2019/5567710/leah-greenberg-ezra-levin/'},
            {'Name': 'James Monsees',
            'Category': 'PIONEERS',
            'Year': '2019',
            'Link':'/web/20190417112245/http://time.com/collection/100-most-influential-people-2019/5567701/adam-bowen-james-monsees/'},
            {'Name': 'Emily Comer',
            'Category': 'PIONEERS',
            'Year': '2019',
            'Link': '/web/20190417112245/http://time.com/collection/100-most-influential-people-2019/5567709/jay-oneal-emily-comer/'},
            {'Name': 'Ailbhe Smyth',
            'Category': 'ICONS',
            'Year': '2019',
            'Link': '/web/20190417112245/http://time.com/collection/100-most-influential-people-2019/5567678/grainne-griffin-ailbhe-smyth-orla-oconnor/'},
             {'Name': "Orla O'Connor",
            'Category': 'ICONS',
            'Year': '2019',
            'Link': '/web/20190417112245/http://time.com/collection/100-most-influential-people-2019/5567678/grainne-griffin-ailbhe-smyth-orla-oconnor/'}
    
            ]

df19 = df19.append(cleaning2, ignore_index=True)

df19

Unnamed: 0,Name,Category,Year,Link
0,Sandra Oh,PIONEERS,2019,/collection/100-most-influential-people-2019/5567697/sandra-oh
1,Indya Moore,PIONEERS,2019,/collection/100-most-influential-people-2019/5567698/indya-moore
2,Marlon James,PIONEERS,2019,/collection/100-most-influential-people-2019/5567700/marlon-james
3,Chrissy Teigen,PIONEERS,2019,/collection/100-most-influential-people-2019/5567708/chrissy-teigen
4,Massimo Bottura,PIONEERS,2019,/collection/100-most-influential-people-2019/5567706/massimo-bottura
5,Hasan Minhaj,PIONEERS,2019,/collection/100-most-influential-people-2019/5567704/hasan-minhaj
6,Samin Nosrat,PIONEERS,2019,/collection/100-most-influential-people-2019/5567705/samin-nosrat
7,Ninja,PIONEERS,2019,/collection/100-most-influential-people-2019/5567713/ninja
8,Arundhati Katju and Menaka Guruswamy,PIONEERS,2019,/collection/100-most-influential-people-2019/5567711/arundhati-katju-menaka-guruswamy
9,Naomi Osaka,PIONEERS,2019,/collection/100-most-influential-people-2019/5567714/naomi-osaka


In [25]:
df19.Name[df19.Name == 'Arundhati Katju and Menaka Guruswamy'] = df19.Name[df19.Name == 'Arundhati Katju and Menaka Guruswamy'].str.replace('Arundhati Katju and Menaka Guruswamy', 'Arundhati Katju')
df19.Name[df19.Name == 'Leah Greenberg and Ezra Levin'] = df19.Name[df19.Name == 'Leah Greenberg and Ezra Levin'].str.replace('Leah Greenberg and Ezra Levin', 'Leah Greenberg')
df19.Name[df19.Name == 'Adam Bowen and James Monsees'] = df19.Name[df19.Name == 'Adam Bowen and James Monsees'].str.replace('Adam Bowen and James Monsees', 'Adam Bowen')
df19.Name[df19.Name == "Grainne Griffin, Ailbhe Smyth and Orla O'Connor"] = df19.Name[df19.Name == "Grainne Griffin, Ailbhe Smyth and Orla O'Connor"].str.replace("Grainne Griffin, Ailbhe Smyth and Orla O'Connor", "Grainne Griffin")

df19 = df19.drop(labels=20)
df19 = df19.reset_index()

In [26]:
df19.to_csv('2019_Times_Scraped_List.csv',sep=',',index=False)
df19

Unnamed: 0,index,Name,Category,Year,Link
0,0,Sandra Oh,PIONEERS,2019,/collection/100-most-influential-people-2019/5567697/sandra-oh
1,1,Indya Moore,PIONEERS,2019,/collection/100-most-influential-people-2019/5567698/indya-moore
2,2,Marlon James,PIONEERS,2019,/collection/100-most-influential-people-2019/5567700/marlon-james
3,3,Chrissy Teigen,PIONEERS,2019,/collection/100-most-influential-people-2019/5567708/chrissy-teigen
4,4,Massimo Bottura,PIONEERS,2019,/collection/100-most-influential-people-2019/5567706/massimo-bottura
5,5,Hasan Minhaj,PIONEERS,2019,/collection/100-most-influential-people-2019/5567704/hasan-minhaj
6,6,Samin Nosrat,PIONEERS,2019,/collection/100-most-influential-people-2019/5567705/samin-nosrat
7,7,Ninja,PIONEERS,2019,/collection/100-most-influential-people-2019/5567713/ninja
8,8,Arundhati Katju,PIONEERS,2019,/collection/100-most-influential-people-2019/5567711/arundhati-katju-menaka-guruswamy
9,9,Naomi Osaka,PIONEERS,2019,/collection/100-most-influential-people-2019/5567714/naomi-osaka


Doing the same thing with the URLs for Google graph search!

In [27]:
urlnames2019 = []
for eachname in df19.Name:
        eachname = eachname.replace(" ","+")
        urlnames2019.append(f"https://www.google.com/search?q={eachname}")
        
urlnames2019

['https://www.google.com/search?q=Sandra+Oh',
 'https://www.google.com/search?q=Indya+Moore',
 'https://www.google.com/search?q=Marlon+James',
 'https://www.google.com/search?q=Chrissy+Teigen',
 'https://www.google.com/search?q=Massimo+Bottura',
 'https://www.google.com/search?q=Hasan+Minhaj',
 'https://www.google.com/search?q=Samin+Nosrat',
 'https://www.google.com/search?q=Ninja',
 'https://www.google.com/search?q=Arundhati+Katju',
 'https://www.google.com/search?q=Naomi+Osaka',
 'https://www.google.com/search?q=Leah+Greenberg',
 'https://www.google.com/search?q=Fred+Swaniker',
 'https://www.google.com/search?q=Lynn+Nottage',
 'https://www.google.com/search?q=Tara+Westover',
 'https://www.google.com/search?q=Adam+Bowen',
 'https://www.google.com/search?q=Barbara+Rae-Venter',
 'https://www.google.com/search?q=He+Jiankui',
 'https://www.google.com/search?q=Aileen+Lee',
 "https://www.google.com/search?q=Jay+O'Neal+and+Emily+Comer",
 'https://www.google.com/search?q=Shep+Doeleman',
 'htt

In [28]:
import time

full_list2019 = []

for url in urlnames2019:
    driver.get(url)
    
    try:
        names = driver.find_element_by_class_name("ZxoDOe").text 
        name = (names.split('\n'))[0]
    except: 
        name = 'Null'

    try:
        label = driver.find_element_by_class_name("EGmpye").text
    except: 
        label = 'Null'

    try:
        born = driver.find_element_by_class_name("rVusze").text
    except: 
        born = 'Null'

    try:
        full_desc = driver.find_element_by_class_name("kno-rdesc").text
    except: 
        full_desc = 'Null'

    person_info = {'Name': name,
                    'Profession': label,
                    'Birth Info': born,
                    'Description': full_desc}
    
    full_list2019.append(person_info)
    
    time.sleep(1)

print (len(full_list2019))
full_list2019

99


[{'Name': 'Null',
  'Profession': 'Null',
  'Birth Info': 'Born: July 20, 1971 (age 49 years), Nepean, Ottawa, Canada',
  'Description': "Description\nSandra Miju Oh is a Canadian-American actress. She is best known for her starring roles as Cristina Yang on the ABC medical drama series Grey's Anatomy and Eve Polastri in the spy thriller series Killing Eve. Wikipedia"},
 {'Name': 'Null',
  'Profession': 'Null',
  'Birth Info': 'Born: January 17, 1995 (age 25 years), The Bronx, New York, NY',
  'Description': 'Description\nIndya Adrianna Moore is an American actor and model. They are known for playing the role of Angel Evangelista in the FX television series Pose. Time magazine named the actor one of the 100 most influential people in the world in 2019. Moore is transgender and non-binary, and uses they/them pronouns. Wikipedia'},
 {'Name': 'Null',
  'Profession': 'Null',
  'Birth Info': 'Born: November 24, 1970 (age 50 years), Kingston, Jamaica',
  'Description': "Description\nMarlon J

In [29]:
df19b = pd.DataFrame(full_list2019)
df19b

Unnamed: 0,Name,Profession,Birth Info,Description
0,Null,Null,"Born: July 20, 1971 (age 49 years), Nepean, Ottawa, Canada",Description\nSandra Miju Oh is a Canadian-American actress. She is best known for her starring roles as Cristina Yang on the ABC medical drama series Grey's Anatomy and Eve Polastri in the spy thriller series Killing Eve. Wikipedia
1,Null,Null,"Born: January 17, 1995 (age 25 years), The Bronx, New York, NY","Description\nIndya Adrianna Moore is an American actor and model. They are known for playing the role of Angel Evangelista in the FX television series Pose. Time magazine named the actor one of the 100 most influential people in the world in 2019. Moore is transgender and non-binary, and uses they/them pronouns. Wikipedia"
2,Null,Null,"Born: November 24, 1970 (age 50 years), Kingston, Jamaica","Description\nMarlon James is a Jamaican writer. He has written four novels: John Crow's Devil, The Book of Night Women, A Brief History of Seven Killings, winner of the 2015 Man Booker Prize, and Black Leopard, Red Wolf. Wikipedia"
3,Null,Null,"Born: November 30, 1985 (age 35 years), Delta, UT","Description\nChristine Diane Teigen is an American model, television personality, author, and entrepreneur. She made her professional modeling debut in the annual Sports Illustrated Swimsuit Issue in 2010 and later appeared on the 50th anniversary cover alongside Nina Agdal and Lily Aldridge in 2014. Wikipedia"
4,Null,Null,"Born: September 30, 1962 (age 58 years), Modena, Italy","Description\nMassimo Bottura is an Italian restaurateur and the chef patron of Osteria Francescana, a three-Michelin-star restaurant based in Modena, Italy which has been listed in the top 5 at The World's 50 Best Restaurants Awards since 2010 and received top ratings from L'Espresso, Gambero Rosso and the Touring Club guides. Wikipedia"
5,Null,Null,"Born: September 23, 1985 (age 35 years), Davis, CA","Description\nHasan Minhaj is an American comedian, writer, producer, political commentator, actor, and television host. Best known for his Netflix show Patriot Act with Hasan Minhaj, he has won two Peabody Awards and two Webby Awards. Wikipedia"
6,Null,Null,"Born: November 7, 1979 (age 41 years), San Diego, CA","Description\nSamin Nosrat is an American chef, TV host and food writer. She is a regular food columnist for The New York Times Magazine and has a Netflix docu-series based on her cookbook, Salt Fat Acid Heat. Wikipedia"
7,Null,Null,"Born: June 5, 1991 (age 29 years), Detroit, MI","Description\nRichard Tyler Blevins, better known by his online alias Ninja, is an American streamer, YouTuber, professional gamer, Internet personality, author, and actor. Wikipedia"
8,Null,Null,"Born: August 19, 1982 (age 38 years), Prayagraj, India","Description\nArundhati Katju is a lawyer qualified to practice in India and New York. She has litigated many notable cases at the Supreme Court of India and the Delhi High Court, including the Section 377 case, the ... Wikipedia"
9,Null,Null,"Born: October 16, 1997 (age 23 years), Chuo Ward, Osaka, Japan","Description\nNaomi Osaka is a Japanese professional tennis player. Osaka has been ranked No. 1 by the Women's Tennis Association, and is the first Asian player to hold the top ranking in singles. She is a three-time Grand Slam singles champion, and is the reigning champion at the US Open. Wikipedia"


Extracting columns using `regex`

In [30]:
df19b['Birthplace'] = df19b['Birth Info'].str.extract(r"years\), ([\w\W]+)")
df19b['Birthdate'] = df19b['Birth Info'].str.extract (r"Born: ([\w\W]+) \(age")
df19b['Age'] = df19b['Birth Info'].str.extract(r"\(age ([\d]*) years\)")
df19b['Description'] = df19b.Description.str.replace('Description\n', '')
df19b['Description'] = df19b.Description.str.replace('Wikipedia', '')

try:
    df19b['Pronouns'] = df19b['Description'].str.extract(r"\b(she|he|her|his)(?i)\b")
    
except:
    pass


df19b

Unnamed: 0,Name,Profession,Birth Info,Description,Birthplace,Birthdate,Age,Pronouns
0,Null,Null,"Born: July 20, 1971 (age 49 years), Nepean, Ottawa, Canada",Sandra Miju Oh is a Canadian-American actress. She is best known for her starring roles as Cristina Yang on the ABC medical drama series Grey's Anatomy and Eve Polastri in the spy thriller series Killing Eve.,"Nepean, Ottawa, Canada","July 20, 1971",49.0,She
1,Null,Null,"Born: January 17, 1995 (age 25 years), The Bronx, New York, NY","Indya Adrianna Moore is an American actor and model. They are known for playing the role of Angel Evangelista in the FX television series Pose. Time magazine named the actor one of the 100 most influential people in the world in 2019. Moore is transgender and non-binary, and uses they/them pronouns.","The Bronx, New York, NY","January 17, 1995",25.0,
2,Null,Null,"Born: November 24, 1970 (age 50 years), Kingston, Jamaica","Marlon James is a Jamaican writer. He has written four novels: John Crow's Devil, The Book of Night Women, A Brief History of Seven Killings, winner of the 2015 Man Booker Prize, and Black Leopard, Red Wolf.","Kingston, Jamaica","November 24, 1970",50.0,He
3,Null,Null,"Born: November 30, 1985 (age 35 years), Delta, UT","Christine Diane Teigen is an American model, television personality, author, and entrepreneur. She made her professional modeling debut in the annual Sports Illustrated Swimsuit Issue in 2010 and later appeared on the 50th anniversary cover alongside Nina Agdal and Lily Aldridge in 2014.","Delta, UT","November 30, 1985",35.0,She
4,Null,Null,"Born: September 30, 1962 (age 58 years), Modena, Italy","Massimo Bottura is an Italian restaurateur and the chef patron of Osteria Francescana, a three-Michelin-star restaurant based in Modena, Italy which has been listed in the top 5 at The World's 50 Best Restaurants Awards since 2010 and received top ratings from L'Espresso, Gambero Rosso and the Touring Club guides.","Modena, Italy","September 30, 1962",58.0,
5,Null,Null,"Born: September 23, 1985 (age 35 years), Davis, CA","Hasan Minhaj is an American comedian, writer, producer, political commentator, actor, and television host. Best known for his Netflix show Patriot Act with Hasan Minhaj, he has won two Peabody Awards and two Webby Awards.","Davis, CA","September 23, 1985",35.0,his
6,Null,Null,"Born: November 7, 1979 (age 41 years), San Diego, CA","Samin Nosrat is an American chef, TV host and food writer. She is a regular food columnist for The New York Times Magazine and has a Netflix docu-series based on her cookbook, Salt Fat Acid Heat.","San Diego, CA","November 7, 1979",41.0,She
7,Null,Null,"Born: June 5, 1991 (age 29 years), Detroit, MI","Richard Tyler Blevins, better known by his online alias Ninja, is an American streamer, YouTuber, professional gamer, Internet personality, author, and actor.","Detroit, MI","June 5, 1991",29.0,his
8,Null,Null,"Born: August 19, 1982 (age 38 years), Prayagraj, India","Arundhati Katju is a lawyer qualified to practice in India and New York. She has litigated many notable cases at the Supreme Court of India and the Delhi High Court, including the Section 377 case, the ...","Prayagraj, India","August 19, 1982",38.0,She
9,Null,Null,"Born: October 16, 1997 (age 23 years), Chuo Ward, Osaka, Japan","Naomi Osaka is a Japanese professional tennis player. Osaka has been ranked No. 1 by the Women's Tennis Association, and is the first Asian player to hold the top ranking in singles. She is a three-time Grand Slam singles champion, and is the reigning champion at the US Open.","Chuo Ward, Osaka, Japan","October 16, 1997",23.0,She


In [31]:
df19b.to_csv("2019_Demographic_Data.csv",index=False)

Manually cleaning some stuff after checking nulls (not pasting the code for the checks here)

In [32]:
df19b.Birthplace[df19b.Name == 'James Monsees'] = df19b.Birthplace[df19b.Name == 'James Monsees'].fillna('St. Louis, MO')
df19b.Birthplace[df19b.Name == 'Mukesh Ambani'] = df19b.Birthplace[df19b.Name == 'Mukesh Ambani'].fillna('Mumbai, India')
df19b.Birthplace[df19b.Name == 'Radhya Al-Mutawakel'] = df19b.Birthplace[df19b.Name == 'Radhya Al-Mutawakel'].fillna('Sanaa, Yemen')
df19b.Birthplace[df19b.Name == 'Christine Blasey Ford'] = df19b.Birthplace[df19b.Name == 'Christine Blasey Ford'].fillna('Palo Alto, CA')
df19b.Birthplace[df19b.Name == 'Luchita Hurtado'] = df19b.Birthplace[df19b.Name == 'Luchita Hurtado'].fillna('Caracas, Venezuela')
df19b.Birthplace[df19b.Name == 'Fixer Upper'] = df19b.Birthplace[df19b.Name == 'Fixer Upper'].fillna('Waco, TX')
df19b.Birthplace[df19b.Name == 'BTS'] = df19b.Birthplace[df19b.Name == 'BTS'].fillna('Seoul, South Korea')
df19b.Birthplace[df19b.Name == 'Adam Bowen'] = df19b.Birthplace[df19b.Name == 'Adam Bowen'].fillna('Washington, DC')


df19b


Unnamed: 0,Name,Profession,Birth Info,Description,Birthplace,Birthdate,Age,Pronouns
0,Null,Null,"Born: July 20, 1971 (age 49 years), Nepean, Ottawa, Canada",Sandra Miju Oh is a Canadian-American actress. She is best known for her starring roles as Cristina Yang on the ABC medical drama series Grey's Anatomy and Eve Polastri in the spy thriller series Killing Eve.,"Nepean, Ottawa, Canada","July 20, 1971",49.0,She
1,Null,Null,"Born: January 17, 1995 (age 25 years), The Bronx, New York, NY","Indya Adrianna Moore is an American actor and model. They are known for playing the role of Angel Evangelista in the FX television series Pose. Time magazine named the actor one of the 100 most influential people in the world in 2019. Moore is transgender and non-binary, and uses they/them pronouns.","The Bronx, New York, NY","January 17, 1995",25.0,
2,Null,Null,"Born: November 24, 1970 (age 50 years), Kingston, Jamaica","Marlon James is a Jamaican writer. He has written four novels: John Crow's Devil, The Book of Night Women, A Brief History of Seven Killings, winner of the 2015 Man Booker Prize, and Black Leopard, Red Wolf.","Kingston, Jamaica","November 24, 1970",50.0,He
3,Null,Null,"Born: November 30, 1985 (age 35 years), Delta, UT","Christine Diane Teigen is an American model, television personality, author, and entrepreneur. She made her professional modeling debut in the annual Sports Illustrated Swimsuit Issue in 2010 and later appeared on the 50th anniversary cover alongside Nina Agdal and Lily Aldridge in 2014.","Delta, UT","November 30, 1985",35.0,She
4,Null,Null,"Born: September 30, 1962 (age 58 years), Modena, Italy","Massimo Bottura is an Italian restaurateur and the chef patron of Osteria Francescana, a three-Michelin-star restaurant based in Modena, Italy which has been listed in the top 5 at The World's 50 Best Restaurants Awards since 2010 and received top ratings from L'Espresso, Gambero Rosso and the Touring Club guides.","Modena, Italy","September 30, 1962",58.0,
5,Null,Null,"Born: September 23, 1985 (age 35 years), Davis, CA","Hasan Minhaj is an American comedian, writer, producer, political commentator, actor, and television host. Best known for his Netflix show Patriot Act with Hasan Minhaj, he has won two Peabody Awards and two Webby Awards.","Davis, CA","September 23, 1985",35.0,his
6,Null,Null,"Born: November 7, 1979 (age 41 years), San Diego, CA","Samin Nosrat is an American chef, TV host and food writer. She is a regular food columnist for The New York Times Magazine and has a Netflix docu-series based on her cookbook, Salt Fat Acid Heat.","San Diego, CA","November 7, 1979",41.0,She
7,Null,Null,"Born: June 5, 1991 (age 29 years), Detroit, MI","Richard Tyler Blevins, better known by his online alias Ninja, is an American streamer, YouTuber, professional gamer, Internet personality, author, and actor.","Detroit, MI","June 5, 1991",29.0,his
8,Null,Null,"Born: August 19, 1982 (age 38 years), Prayagraj, India","Arundhati Katju is a lawyer qualified to practice in India and New York. She has litigated many notable cases at the Supreme Court of India and the Delhi High Court, including the Section 377 case, the ...","Prayagraj, India","August 19, 1982",38.0,She
9,Null,Null,"Born: October 16, 1997 (age 23 years), Chuo Ward, Osaka, Japan","Naomi Osaka is a Japanese professional tennis player. Osaka has been ranked No. 1 by the Women's Tennis Association, and is the first Asian player to hold the top ranking in singles. She is a three-time Grand Slam singles champion, and is the reigning champion at the US Open.","Chuo Ward, Osaka, Japan","October 16, 1997",23.0,She


Merging and saving to CSV

In [34]:
time2019 = df19.join(df19b, rsuffix='_right')

time2019

Unnamed: 0,index,Name,Category,Year,Link,Name_right,Profession,Birth Info,Description,Birthplace,Birthdate,Age,Pronouns
0,0,Sandra Oh,PIONEERS,2019,/collection/100-most-influential-people-2019/5567697/sandra-oh,Null,Null,"Born: July 20, 1971 (age 49 years), Nepean, Ottawa, Canada",Sandra Miju Oh is a Canadian-American actress. She is best known for her starring roles as Cristina Yang on the ABC medical drama series Grey's Anatomy and Eve Polastri in the spy thriller series Killing Eve.,"Nepean, Ottawa, Canada","July 20, 1971",49.0,She
1,1,Indya Moore,PIONEERS,2019,/collection/100-most-influential-people-2019/5567698/indya-moore,Null,Null,"Born: January 17, 1995 (age 25 years), The Bronx, New York, NY","Indya Adrianna Moore is an American actor and model. They are known for playing the role of Angel Evangelista in the FX television series Pose. Time magazine named the actor one of the 100 most influential people in the world in 2019. Moore is transgender and non-binary, and uses they/them pronouns.","The Bronx, New York, NY","January 17, 1995",25.0,
2,2,Marlon James,PIONEERS,2019,/collection/100-most-influential-people-2019/5567700/marlon-james,Null,Null,"Born: November 24, 1970 (age 50 years), Kingston, Jamaica","Marlon James is a Jamaican writer. He has written four novels: John Crow's Devil, The Book of Night Women, A Brief History of Seven Killings, winner of the 2015 Man Booker Prize, and Black Leopard, Red Wolf.","Kingston, Jamaica","November 24, 1970",50.0,He
3,3,Chrissy Teigen,PIONEERS,2019,/collection/100-most-influential-people-2019/5567708/chrissy-teigen,Null,Null,"Born: November 30, 1985 (age 35 years), Delta, UT","Christine Diane Teigen is an American model, television personality, author, and entrepreneur. She made her professional modeling debut in the annual Sports Illustrated Swimsuit Issue in 2010 and later appeared on the 50th anniversary cover alongside Nina Agdal and Lily Aldridge in 2014.","Delta, UT","November 30, 1985",35.0,She
4,4,Massimo Bottura,PIONEERS,2019,/collection/100-most-influential-people-2019/5567706/massimo-bottura,Null,Null,"Born: September 30, 1962 (age 58 years), Modena, Italy","Massimo Bottura is an Italian restaurateur and the chef patron of Osteria Francescana, a three-Michelin-star restaurant based in Modena, Italy which has been listed in the top 5 at The World's 50 Best Restaurants Awards since 2010 and received top ratings from L'Espresso, Gambero Rosso and the Touring Club guides.","Modena, Italy","September 30, 1962",58.0,
5,5,Hasan Minhaj,PIONEERS,2019,/collection/100-most-influential-people-2019/5567704/hasan-minhaj,Null,Null,"Born: September 23, 1985 (age 35 years), Davis, CA","Hasan Minhaj is an American comedian, writer, producer, political commentator, actor, and television host. Best known for his Netflix show Patriot Act with Hasan Minhaj, he has won two Peabody Awards and two Webby Awards.","Davis, CA","September 23, 1985",35.0,his
6,6,Samin Nosrat,PIONEERS,2019,/collection/100-most-influential-people-2019/5567705/samin-nosrat,Null,Null,"Born: November 7, 1979 (age 41 years), San Diego, CA","Samin Nosrat is an American chef, TV host and food writer. She is a regular food columnist for The New York Times Magazine and has a Netflix docu-series based on her cookbook, Salt Fat Acid Heat.","San Diego, CA","November 7, 1979",41.0,She
7,7,Ninja,PIONEERS,2019,/collection/100-most-influential-people-2019/5567713/ninja,Null,Null,"Born: June 5, 1991 (age 29 years), Detroit, MI","Richard Tyler Blevins, better known by his online alias Ninja, is an American streamer, YouTuber, professional gamer, Internet personality, author, and actor.","Detroit, MI","June 5, 1991",29.0,his
8,8,Arundhati Katju,PIONEERS,2019,/collection/100-most-influential-people-2019/5567711/arundhati-katju-menaka-guruswamy,Null,Null,"Born: August 19, 1982 (age 38 years), Prayagraj, India","Arundhati Katju is a lawyer qualified to practice in India and New York. She has litigated many notable cases at the Supreme Court of India and the Delhi High Court, including the Section 377 case, the ...","Prayagraj, India","August 19, 1982",38.0,She
9,9,Naomi Osaka,PIONEERS,2019,/collection/100-most-influential-people-2019/5567714/naomi-osaka,Null,Null,"Born: October 16, 1997 (age 23 years), Chuo Ward, Osaka, Japan","Naomi Osaka is a Japanese professional tennis player. Osaka has been ranked No. 1 by the Women's Tennis Association, and is the first Asian player to hold the top ranking in singles. She is a three-time Grand Slam singles champion, and is the reigning champion at the US Open.","Chuo Ward, Osaka, Japan","October 16, 1997",23.0,She


In [35]:
time2019.to_csv("Time2019_Full_merged.csv")

## Time 2017


In [36]:
response = requests.get("https://time.com/collection/2017-time-100/")
doc3 = BeautifulSoup(response.content, "html.parser")
doc3.prettify()



In [37]:
full3 = doc3.find_all('a')

allnames2017 = []
count = 0

for each in full3[8:103]:
    name = (each.text)
    links = (each['href'])
    link = "https://time.com/" + links
    count = count + 1
    #print (count, name, link)

    
    if 0 < count < 21:
        category = 'PIONEERS'
        
    if 20 < count < 38:
        category = 'ARTISTS'
  
    if 37 < count < 62:
         category = 'LEADERS'

   
    if 61 < count < 76:
        category = 'TITANS'
        
    if 75 < count:
         category = 'ICONS'
    
    eachdict = {'Name': name, 
                 'Category': category,
                 'Year': '2017',
                 'Link': link}
    
    allnames2017.append(eachdict)

allnames2017
    

[{'Name': 'Samantha Bee',
  'Category': 'PIONEERS',
  'Year': '2017',
  'Link': 'https://time.com//collection/2017-time-100/4742710/samantha-bee/'},
 {'Name': 'Chance the Rapper',
  'Category': 'PIONEERS',
  'Year': '2017',
  'Link': 'https://time.com//collection/2017-time-100/4742682/chance-the-rapper/'},
 {'Name': 'Constance Wu',
  'Category': 'PIONEERS',
  'Year': '2017',
  'Link': 'https://time.com//collection/2017-time-100/4742685/constance-wu/'},
 {'Name': 'Gavin Grimm',
  'Category': 'PIONEERS',
  'Year': '2017',
  'Link': 'https://time.com//collection/2017-time-100/4742687/gavin-grimm/'},
 {'Name': 'Kirsten Green',
  'Category': 'PIONEERS',
  'Year': '2017',
  'Link': 'https://time.com//collection/2017-time-100/4742703/kirsten-green/'},
 {'Name': 'Bob Ferguson',
  'Category': 'PIONEERS',
  'Year': '2017',
  'Link': 'https://time.com//collection/2017-time-100/4742678/bob-ferguson/'},
 {'Name': 'Ivanka Trump',
  'Category': 'PIONEERS',
  'Year': '2017',
  'Link': 'https://time.co

In [38]:
len(allnames2017)

95

In [39]:
df17 = pd.DataFrame(allnames2017)

df17

Unnamed: 0,Name,Category,Year,Link
0,Samantha Bee,PIONEERS,2017,https://time.com//collection/2017-time-100/4742710/samantha-bee/
1,Chance the Rapper,PIONEERS,2017,https://time.com//collection/2017-time-100/4742682/chance-the-rapper/
2,Constance Wu,PIONEERS,2017,https://time.com//collection/2017-time-100/4742685/constance-wu/
3,Gavin Grimm,PIONEERS,2017,https://time.com//collection/2017-time-100/4742687/gavin-grimm/
4,Kirsten Green,PIONEERS,2017,https://time.com//collection/2017-time-100/4742703/kirsten-green/
5,Bob Ferguson,PIONEERS,2017,https://time.com//collection/2017-time-100/4742678/bob-ferguson/
6,Ivanka Trump,PIONEERS,2017,https://time.com//collection/2017-time-100/4742699/ivanka-trump/
7,Demis Hassabis,PIONEERS,2017,https://time.com//collection/2017-time-100/4742686/demis-hassabis/
8,Barbara Lynch,PIONEERS,2017,https://time.com//collection/2017-time-100/4742669/barbara-lynch/
9,Hamdi Ulukaya,PIONEERS,2017,https://time.com//collection/2017-time-100/4742698/hamdi-ulukaya/


In [40]:
df17.to_csv('2017_Times_Scraped_List.csv',sep=',',index=False)


In [41]:
urlnames2017 = []
for eachname in df17.Name:
        eachname = eachname.replace(" ","+")
        urlnames2017.append(f"https://www.google.com/search?q={eachname}")
        
urlnames2017

driver = webdriver.Chrome()

In [42]:
import time

full_list2017 = []

for url in urlnames2017:
    driver.get(url)
    
    try:
        names = driver.find_element_by_class_name("ZxoDOe").text 
        name = (names.split('\n'))[0]
        
    except: 
        name = 'Null'

    try:
        label = driver.find_element_by_class_name("EGmpye").text
    except: 
        label = 'Null'

    try:
        born = driver.find_element_by_class_name("rVusze").text
    except: 
        born = 'Null'

    try:
        full_desc = driver.find_element_by_class_name("kno-rdesc").text
    except: 
        full_desc = 'Null'

    person_info = {'Name': name,
                    'Profession': label,
                    'Birth Info': born,
                    'Description': full_desc}
    
    full_list2017.append(person_info)
    
    time.sleep(1)

print (len(full_list2017))
full_list2017

95


[{'Name': 'Null',
  'Profession': 'Null',
  'Birth Info': 'Born: October 25, 1969 (age 51 years), Toronto, Canada',
  'Description': 'Description\nSamantha Anne Bee is a Canadian-American comedian, writer, producer, political commentator, actress, and television host. Bee rose to fame as a correspondent on The Daily Show with Jon Stewart, where she became the longest-serving regular correspondent. Wikipedia'},
 {'Name': 'Null',
  'Profession': 'Null',
  'Birth Info': 'Born: April 16, 1993 (age 27 years), Chicago, IL',
  'Description': 'Description\nChancelor Johnathan Bennett, known professionally as Chance the Rapper, is an American rapper, singer, songwriter, record producer, activist, actor, and philanthropist. Born in Chicago, Illinois, Chance the Rapper released his debut mixtape 10 Day in 2012. Wikipedia'},
 {'Name': 'Null',
  'Profession': 'Null',
  'Birth Info': 'Born: March 22, 1982 (age 38 years), Richmond, VA',
  'Description': "Description\nConstance Wu is an American actre

In [43]:
df17b = pd.DataFrame(full_list2017)
df17b

Unnamed: 0,Name,Profession,Birth Info,Description
0,Null,Null,"Born: October 25, 1969 (age 51 years), Toronto, Canada","Description\nSamantha Anne Bee is a Canadian-American comedian, writer, producer, political commentator, actress, and television host. Bee rose to fame as a correspondent on The Daily Show with Jon Stewart, where she became the longest-serving regular correspondent. Wikipedia"
1,Null,Null,"Born: April 16, 1993 (age 27 years), Chicago, IL","Description\nChancelor Johnathan Bennett, known professionally as Chance the Rapper, is an American rapper, singer, songwriter, record producer, activist, actor, and philanthropist. Born in Chicago, Illinois, Chance the Rapper released his debut mixtape 10 Day in 2012. Wikipedia"
2,Null,Null,"Born: March 22, 1982 (age 38 years), Richmond, VA","Description\nConstance Wu is an American actress. She stars as Jessica Huang in the ABC television comedy Fresh Off the Boat, which is her breakthrough role. She has been nominated for two TCA Awards and four Critics' Choice Television Awards for Fresh Off the Boat. Wikipedia"
3,Null,Null,Null,Null
4,Null,Null,"Education: University of California, Los Angeles","Description\nKirsten Green is an American venture capitalist, the founder and managing partner of Forerunner Ventures. Wikipedia"
5,Null,Null,"Born: February 23, 1965 (age 55 years), Queen Anne, Seattle, WA","Description\nRobert Watson Ferguson is an American lawyer and politician serving as the 18th Attorney General of Washington. A Democrat, he was elected in 2012 and re-elected in 2016. Prior to serving as Attorney General, Ferguson was a member of the King County Council. Wikipedia"
6,Null,Null,"Born: October 30, 1981 (age 39 years), Manhattan, New York, NY","Description\nIvana Marie ""Ivanka"" Trump is an American businesswoman, serving since 2017 as Advisor to the President, her father Donald Trump, and the Director of the Office of Economic Initiatives and Entrepreneurship. Wikipedia"
7,Null,Null,"Born: July 27, 1976 (age 44 years), London, United Kingdom","Description\nDr Demis HassabisCBE FRS FREng FRSA is a British artificial intelligence researcher, neuroscientist, video game designer, entrepreneur, and five times winner of the Pentamind board games championship. He is the chief executive officer and co-founder of DeepMind, and a UK Government AI Advisor since 2018. Wikipedia"
8,Null,Null,"Born: March 19, 1964 (age 56 years), Boston, MA","Description\nBarbara Lynch is a restaurateur. In 2017 she was included in Time magazine's ""Top 100 Most Influential People of the Year"" for her pioneering contributions in the culinary world and her focus on local wealth creation through agronomy. Wikipedia"
9,Null,Null,"Born: October 26, 1972 (age 48 years), Erzincan, Turkey","Description\nHamdi Ulukaya is a Turkish billionaire, philantropist and activist of Kurdish ancestry based in the United States. Ulukaya is the owner, founder, chairman, and CEO of Chobani, the #1-selling strained yogurt brand in the US. He established production facilities first in upstate New York, and since then has expanded. Wikipedia"


In [44]:
df17b['Birthplace'] = df17b['Birth Info'].str.extract(r"\), ([\w\W]+)")
df17b['Birthdate'] = df17b['Birth Info'].str.extract (r"Born: ([\w\s\d\,]+) [\(age]?")
df17b['Age'] = df17b['Birth Info'].str.extract(r"\(age ([\d]*) years\)")
df17b['Description'] = df17b.Description.str.replace('Description\n', '')
df17b['Description'] = df17b.Description.str.replace('Wikipedia', '')

try:
    df17b['Pronouns'] = df17b['Description'].str.extract(r"\b(she|he|her|his)(?i)\b")
    
except:
    pass


df17b

Unnamed: 0,Name,Profession,Birth Info,Description,Birthplace,Birthdate,Age,Pronouns
0,Null,Null,"Born: October 25, 1969 (age 51 years), Toronto, Canada","Samantha Anne Bee is a Canadian-American comedian, writer, producer, political commentator, actress, and television host. Bee rose to fame as a correspondent on The Daily Show with Jon Stewart, where she became the longest-serving regular correspondent.","Toronto, Canada","October 25, 1969",51.0,she
1,Null,Null,"Born: April 16, 1993 (age 27 years), Chicago, IL","Chancelor Johnathan Bennett, known professionally as Chance the Rapper, is an American rapper, singer, songwriter, record producer, activist, actor, and philanthropist. Born in Chicago, Illinois, Chance the Rapper released his debut mixtape 10 Day in 2012.","Chicago, IL","April 16, 1993",27.0,his
2,Null,Null,"Born: March 22, 1982 (age 38 years), Richmond, VA","Constance Wu is an American actress. She stars as Jessica Huang in the ABC television comedy Fresh Off the Boat, which is her breakthrough role. She has been nominated for two TCA Awards and four Critics' Choice Television Awards for Fresh Off the Boat.","Richmond, VA","March 22, 1982",38.0,She
3,Null,Null,Null,Null,,,,
4,Null,Null,"Education: University of California, Los Angeles","Kirsten Green is an American venture capitalist, the founder and managing partner of Forerunner Ventures.",,,,
5,Null,Null,"Born: February 23, 1965 (age 55 years), Queen Anne, Seattle, WA","Robert Watson Ferguson is an American lawyer and politician serving as the 18th Attorney General of Washington. A Democrat, he was elected in 2012 and re-elected in 2016. Prior to serving as Attorney General, Ferguson was a member of the King County Council.","Queen Anne, Seattle, WA","February 23, 1965",55.0,he
6,Null,Null,"Born: October 30, 1981 (age 39 years), Manhattan, New York, NY","Ivana Marie ""Ivanka"" Trump is an American businesswoman, serving since 2017 as Advisor to the President, her father Donald Trump, and the Director of the Office of Economic Initiatives and Entrepreneurship.","Manhattan, New York, NY","October 30, 1981",39.0,her
7,Null,Null,"Born: July 27, 1976 (age 44 years), London, United Kingdom","Dr Demis HassabisCBE FRS FREng FRSA is a British artificial intelligence researcher, neuroscientist, video game designer, entrepreneur, and five times winner of the Pentamind board games championship. He is the chief executive officer and co-founder of DeepMind, and a UK Government AI Advisor since 2018.","London, United Kingdom","July 27, 1976",44.0,He
8,Null,Null,"Born: March 19, 1964 (age 56 years), Boston, MA","Barbara Lynch is a restaurateur. In 2017 she was included in Time magazine's ""Top 100 Most Influential People of the Year"" for her pioneering contributions in the culinary world and her focus on local wealth creation through agronomy.","Boston, MA","March 19, 1964",56.0,she
9,Null,Null,"Born: October 26, 1972 (age 48 years), Erzincan, Turkey","Hamdi Ulukaya is a Turkish billionaire, philantropist and activist of Kurdish ancestry based in the United States. Ulukaya is the owner, founder, chairman, and CEO of Chobani, the #1-selling strained yogurt brand in the US. He established production facilities first in upstate New York, and since then has expanded.","Erzincan, Turkey","October 26, 1972",48.0,He


In [62]:
df17b[df17b.Name == 'Null']
#Now I know which ones are null so I will manually change the .csv in excel

Unnamed: 0,Name,Profession,Birth Info,Description,Birthplace,Birthdate,Age,Pronouns
0,Null,Null,"Born: October 25, 1969 (age 51 years), Toronto, Canada","Samantha Anne Bee is a Canadian-American comedian, writer, producer, political commentator, actress, and television host. Bee rose to fame as a correspondent on The Daily Show with Jon Stewart, where she became the longest-serving regular correspondent.","Toronto, Canada","October 25, 1969",51.0,she
1,Null,Null,"Born: April 16, 1993 (age 27 years), Chicago, IL","Chancelor Johnathan Bennett, known professionally as Chance the Rapper, is an American rapper, singer, songwriter, record producer, activist, actor, and philanthropist. Born in Chicago, Illinois, Chance the Rapper released his debut mixtape 10 Day in 2012.","Chicago, IL","April 16, 1993",27.0,his
2,Null,Null,"Born: March 22, 1982 (age 38 years), Richmond, VA","Constance Wu is an American actress. She stars as Jessica Huang in the ABC television comedy Fresh Off the Boat, which is her breakthrough role. She has been nominated for two TCA Awards and four Critics' Choice Television Awards for Fresh Off the Boat.","Richmond, VA","March 22, 1982",38.0,She
3,Null,Null,Null,Null,,,,
4,Null,Null,"Education: University of California, Los Angeles","Kirsten Green is an American venture capitalist, the founder and managing partner of Forerunner Ventures.",,,,
5,Null,Null,"Born: February 23, 1965 (age 55 years), Queen Anne, Seattle, WA","Robert Watson Ferguson is an American lawyer and politician serving as the 18th Attorney General of Washington. A Democrat, he was elected in 2012 and re-elected in 2016. Prior to serving as Attorney General, Ferguson was a member of the King County Council.","Queen Anne, Seattle, WA","February 23, 1965",55.0,he
6,Null,Null,"Born: October 30, 1981 (age 39 years), Manhattan, New York, NY","Ivana Marie ""Ivanka"" Trump is an American businesswoman, serving since 2017 as Advisor to the President, her father Donald Trump, and the Director of the Office of Economic Initiatives and Entrepreneurship.","Manhattan, New York, NY","October 30, 1981",39.0,her
7,Null,Null,"Born: July 27, 1976 (age 44 years), London, United Kingdom","Dr Demis HassabisCBE FRS FREng FRSA is a British artificial intelligence researcher, neuroscientist, video game designer, entrepreneur, and five times winner of the Pentamind board games championship. He is the chief executive officer and co-founder of DeepMind, and a UK Government AI Advisor since 2018.","London, United Kingdom","July 27, 1976",44.0,He
8,Null,Null,"Born: March 19, 1964 (age 56 years), Boston, MA","Barbara Lynch is a restaurateur. In 2017 she was included in Time magazine's ""Top 100 Most Influential People of the Year"" for her pioneering contributions in the culinary world and her focus on local wealth creation through agronomy.","Boston, MA","March 19, 1964",56.0,she
9,Null,Null,"Born: October 26, 1972 (age 48 years), Erzincan, Turkey","Hamdi Ulukaya is a Turkish billionaire, philantropist and activist of Kurdish ancestry based in the United States. Ulukaya is the owner, founder, chairman, and CEO of Chobani, the #1-selling strained yogurt brand in the US. He established production facilities first in upstate New York, and since then has expanded.","Erzincan, Turkey","October 26, 1972",48.0,He


In [46]:
df17b.Birthplace[df17b.Name == 'Martin J. Blaser'] = df17b.Birthplace[df17b.Name == 'Martin J. Blaser'].fillna('Rutgers, NJ')
df17b.Birthplace[df17b.Name == 'John Oliver'] = df17b.Birthplace[df17b.Name == 'John Oliver'].fillna('London, United Kingdom')

df17b

Unnamed: 0,Name,Profession,Birth Info,Description,Birthplace,Birthdate,Age,Pronouns
0,Null,Null,"Born: October 25, 1969 (age 51 years), Toronto, Canada","Samantha Anne Bee is a Canadian-American comedian, writer, producer, political commentator, actress, and television host. Bee rose to fame as a correspondent on The Daily Show with Jon Stewart, where she became the longest-serving regular correspondent.","Toronto, Canada","October 25, 1969",51.0,she
1,Null,Null,"Born: April 16, 1993 (age 27 years), Chicago, IL","Chancelor Johnathan Bennett, known professionally as Chance the Rapper, is an American rapper, singer, songwriter, record producer, activist, actor, and philanthropist. Born in Chicago, Illinois, Chance the Rapper released his debut mixtape 10 Day in 2012.","Chicago, IL","April 16, 1993",27.0,his
2,Null,Null,"Born: March 22, 1982 (age 38 years), Richmond, VA","Constance Wu is an American actress. She stars as Jessica Huang in the ABC television comedy Fresh Off the Boat, which is her breakthrough role. She has been nominated for two TCA Awards and four Critics' Choice Television Awards for Fresh Off the Boat.","Richmond, VA","March 22, 1982",38.0,She
3,Null,Null,Null,Null,,,,
4,Null,Null,"Education: University of California, Los Angeles","Kirsten Green is an American venture capitalist, the founder and managing partner of Forerunner Ventures.",,,,
5,Null,Null,"Born: February 23, 1965 (age 55 years), Queen Anne, Seattle, WA","Robert Watson Ferguson is an American lawyer and politician serving as the 18th Attorney General of Washington. A Democrat, he was elected in 2012 and re-elected in 2016. Prior to serving as Attorney General, Ferguson was a member of the King County Council.","Queen Anne, Seattle, WA","February 23, 1965",55.0,he
6,Null,Null,"Born: October 30, 1981 (age 39 years), Manhattan, New York, NY","Ivana Marie ""Ivanka"" Trump is an American businesswoman, serving since 2017 as Advisor to the President, her father Donald Trump, and the Director of the Office of Economic Initiatives and Entrepreneurship.","Manhattan, New York, NY","October 30, 1981",39.0,her
7,Null,Null,"Born: July 27, 1976 (age 44 years), London, United Kingdom","Dr Demis HassabisCBE FRS FREng FRSA is a British artificial intelligence researcher, neuroscientist, video game designer, entrepreneur, and five times winner of the Pentamind board games championship. He is the chief executive officer and co-founder of DeepMind, and a UK Government AI Advisor since 2018.","London, United Kingdom","July 27, 1976",44.0,He
8,Null,Null,"Born: March 19, 1964 (age 56 years), Boston, MA","Barbara Lynch is a restaurateur. In 2017 she was included in Time magazine's ""Top 100 Most Influential People of the Year"" for her pioneering contributions in the culinary world and her focus on local wealth creation through agronomy.","Boston, MA","March 19, 1964",56.0,she
9,Null,Null,"Born: October 26, 1972 (age 48 years), Erzincan, Turkey","Hamdi Ulukaya is a Turkish billionaire, philantropist and activist of Kurdish ancestry based in the United States. Ulukaya is the owner, founder, chairman, and CEO of Chobani, the #1-selling strained yogurt brand in the US. He established production facilities first in upstate New York, and since then has expanded.","Erzincan, Turkey","October 26, 1972",48.0,He


In [47]:
df17b.to_csv("2017_Demographic_Data.csv",index=False)

In [48]:
time2017 = df17.join(df17b, rsuffix='_right')

time2017

Unnamed: 0,Name,Category,Year,Link,Name_right,Profession,Birth Info,Description,Birthplace,Birthdate,Age,Pronouns
0,Samantha Bee,PIONEERS,2017,https://time.com//collection/2017-time-100/4742710/samantha-bee/,Null,Null,"Born: October 25, 1969 (age 51 years), Toronto, Canada","Samantha Anne Bee is a Canadian-American comedian, writer, producer, political commentator, actress, and television host. Bee rose to fame as a correspondent on The Daily Show with Jon Stewart, where she became the longest-serving regular correspondent.","Toronto, Canada","October 25, 1969",51.0,she
1,Chance the Rapper,PIONEERS,2017,https://time.com//collection/2017-time-100/4742682/chance-the-rapper/,Null,Null,"Born: April 16, 1993 (age 27 years), Chicago, IL","Chancelor Johnathan Bennett, known professionally as Chance the Rapper, is an American rapper, singer, songwriter, record producer, activist, actor, and philanthropist. Born in Chicago, Illinois, Chance the Rapper released his debut mixtape 10 Day in 2012.","Chicago, IL","April 16, 1993",27.0,his
2,Constance Wu,PIONEERS,2017,https://time.com//collection/2017-time-100/4742685/constance-wu/,Null,Null,"Born: March 22, 1982 (age 38 years), Richmond, VA","Constance Wu is an American actress. She stars as Jessica Huang in the ABC television comedy Fresh Off the Boat, which is her breakthrough role. She has been nominated for two TCA Awards and four Critics' Choice Television Awards for Fresh Off the Boat.","Richmond, VA","March 22, 1982",38.0,She
3,Gavin Grimm,PIONEERS,2017,https://time.com//collection/2017-time-100/4742687/gavin-grimm/,Null,Null,Null,Null,,,,
4,Kirsten Green,PIONEERS,2017,https://time.com//collection/2017-time-100/4742703/kirsten-green/,Null,Null,"Education: University of California, Los Angeles","Kirsten Green is an American venture capitalist, the founder and managing partner of Forerunner Ventures.",,,,
5,Bob Ferguson,PIONEERS,2017,https://time.com//collection/2017-time-100/4742678/bob-ferguson/,Null,Null,"Born: February 23, 1965 (age 55 years), Queen Anne, Seattle, WA","Robert Watson Ferguson is an American lawyer and politician serving as the 18th Attorney General of Washington. A Democrat, he was elected in 2012 and re-elected in 2016. Prior to serving as Attorney General, Ferguson was a member of the King County Council.","Queen Anne, Seattle, WA","February 23, 1965",55.0,he
6,Ivanka Trump,PIONEERS,2017,https://time.com//collection/2017-time-100/4742699/ivanka-trump/,Null,Null,"Born: October 30, 1981 (age 39 years), Manhattan, New York, NY","Ivana Marie ""Ivanka"" Trump is an American businesswoman, serving since 2017 as Advisor to the President, her father Donald Trump, and the Director of the Office of Economic Initiatives and Entrepreneurship.","Manhattan, New York, NY","October 30, 1981",39.0,her
7,Demis Hassabis,PIONEERS,2017,https://time.com//collection/2017-time-100/4742686/demis-hassabis/,Null,Null,"Born: July 27, 1976 (age 44 years), London, United Kingdom","Dr Demis HassabisCBE FRS FREng FRSA is a British artificial intelligence researcher, neuroscientist, video game designer, entrepreneur, and five times winner of the Pentamind board games championship. He is the chief executive officer and co-founder of DeepMind, and a UK Government AI Advisor since 2018.","London, United Kingdom","July 27, 1976",44.0,He
8,Barbara Lynch,PIONEERS,2017,https://time.com//collection/2017-time-100/4742669/barbara-lynch/,Null,Null,"Born: March 19, 1964 (age 56 years), Boston, MA","Barbara Lynch is a restaurateur. In 2017 she was included in Time magazine's ""Top 100 Most Influential People of the Year"" for her pioneering contributions in the culinary world and her focus on local wealth creation through agronomy.","Boston, MA","March 19, 1964",56.0,she
9,Hamdi Ulukaya,PIONEERS,2017,https://time.com//collection/2017-time-100/4742698/hamdi-ulukaya/,Null,Null,"Born: October 26, 1972 (age 48 years), Erzincan, Turkey","Hamdi Ulukaya is a Turkish billionaire, philantropist and activist of Kurdish ancestry based in the United States. Ulukaya is the owner, founder, chairman, and CEO of Chobani, the #1-selling strained yogurt brand in the US. He established production facilities first in upstate New York, and since then has expanded.","Erzincan, Turkey","October 26, 1972",48.0,He


In [49]:
time2017.to_csv("Time2017_Full_merged.csv")

## Time 2015

In [50]:
driver = webdriver.Chrome()

In [51]:
driver.get("https://time.com/collection/2015-time-100/")

In [52]:
full4 = driver.find_element_by_id("article-container")

#Labels change:
# Titans - Kanye 1
# Pioneers - Misty Copeland 19
# Artists - Bradley Cooper 40
# Leaders - Jorge Ramos 56
# Icons - RBG 87

names = full4.find_elements_by_tag_name("h2")
count = 0
allnames2015 = []

for eachname in names:
    name = (eachname.text)
    link = (eachname.find_elements_by_tag_name('a')[0].get_attribute('href'))
    count = count + 1
    #print (name, link, count)
    
    if 0 < count < 19:
        category = 'TITANS'
        
    if 19 < count < 40:
        category = 'PIONEERS'
        
    if 40 < count < 56:
        category = 'ARTISTS'
        
    if 56 < count < 87:
        category = 'LEADERS'
        
    if 87 < count:
        category = 'ICONS'
    
    eachdict = {'Name': name,
                'Category': category,
                'Year': '2015',
                'Link': link}
    
    allnames2015.append(eachdict)

allnames2015
    

[{'Name': 'Kanye West',
  'Category': 'TITANS',
  'Year': '2015',
  'Link': 'https://time.com/collection-post/3822841/kanye-west-2015-time-100/'},
 {'Name': 'Lorne Michaels',
  'Category': 'TITANS',
  'Year': '2015',
  'Link': 'https://time.com/collection-post/3822845/lorne-michaels-2015-time-100/'},
 {'Name': 'Mellody Hobson',
  'Category': 'TITANS',
  'Year': '2015',
  'Link': 'https://time.com/collection-post/3822587/mellody-hobson-2015-time-100/'},
 {'Name': 'Tim Cook',
  'Category': 'TITANS',
  'Year': '2015',
  'Link': 'https://time.com/collection-post/3822599/tim-cook-2015-time-100/'},
 {'Name': 'Elizabeth Holmes',
  'Category': 'TITANS',
  'Year': '2015',
  'Link': 'https://time.com/collection-post/3822734/elizabeth-holmes-2015-time-100/'},
 {'Name': 'Charles Koch & David Koch',
  'Category': 'TITANS',
  'Year': '2015',
  'Link': 'https://time.com/collection-post/3822767/charles-koch-david-koch-2015-time-100/'},
 {'Name': 'Susan Wojcicki',
  'Category': 'TITANS',
  'Year': '201

In [53]:
len(allnames2015)

98

In [54]:
df15 = pd.DataFrame(allnames2015)

df15

Unnamed: 0,Name,Category,Year,Link
0,Kanye West,TITANS,2015,https://time.com/collection-post/3822841/kanye-west-2015-time-100/
1,Lorne Michaels,TITANS,2015,https://time.com/collection-post/3822845/lorne-michaels-2015-time-100/
2,Mellody Hobson,TITANS,2015,https://time.com/collection-post/3822587/mellody-hobson-2015-time-100/
3,Tim Cook,TITANS,2015,https://time.com/collection-post/3822599/tim-cook-2015-time-100/
4,Elizabeth Holmes,TITANS,2015,https://time.com/collection-post/3822734/elizabeth-holmes-2015-time-100/
5,Charles Koch & David Koch,TITANS,2015,https://time.com/collection-post/3822767/charles-koch-david-koch-2015-time-100/
6,Susan Wojcicki,TITANS,2015,https://time.com/collection-post/3822770/susan-wojcicki-2015-time-100/
7,Chanda Kochhar,TITANS,2015,https://time.com/collection-post/3822610/chanda-kochhar-2015-time-100/
8,Tony Fernandes,TITANS,2015,https://time.com/collection-post/3822614/tony-fernandes-2015-time-100/
9,Lee Daniels,TITANS,2015,https://time.com/collection-post/3822623/lee-daniels-2015-time-100/


In [55]:
# Manual cleaning

cleaning = [
            {'Name': 'David Koch',
            'Category': 'TITANS',
            'Year': '2015',
            'Link': 'https://time.com/collection-post/3822767/charles-koch-david-koch-2015-time-100/'},
             {'Name': 'Jennifer Doudna',
            'Category': 'PIONEERS',
            'Year': '2015',
            'Link': 'https://time.com/collection-post/3822554/emmanuelle-charpentier-jennifer-doudna-2015-time-100/'}
    
            ]

df15 = df15.append(cleaning, ignore_index=True)

df15

Unnamed: 0,Name,Category,Year,Link
0,Kanye West,TITANS,2015,https://time.com/collection-post/3822841/kanye-west-2015-time-100/
1,Lorne Michaels,TITANS,2015,https://time.com/collection-post/3822845/lorne-michaels-2015-time-100/
2,Mellody Hobson,TITANS,2015,https://time.com/collection-post/3822587/mellody-hobson-2015-time-100/
3,Tim Cook,TITANS,2015,https://time.com/collection-post/3822599/tim-cook-2015-time-100/
4,Elizabeth Holmes,TITANS,2015,https://time.com/collection-post/3822734/elizabeth-holmes-2015-time-100/
5,Charles Koch & David Koch,TITANS,2015,https://time.com/collection-post/3822767/charles-koch-david-koch-2015-time-100/
6,Susan Wojcicki,TITANS,2015,https://time.com/collection-post/3822770/susan-wojcicki-2015-time-100/
7,Chanda Kochhar,TITANS,2015,https://time.com/collection-post/3822610/chanda-kochhar-2015-time-100/
8,Tony Fernandes,TITANS,2015,https://time.com/collection-post/3822614/tony-fernandes-2015-time-100/
9,Lee Daniels,TITANS,2015,https://time.com/collection-post/3822623/lee-daniels-2015-time-100/


In [56]:
df15.Name[df15.Name == 'Charles Koch & David Koch'] = df15.Name[df15.Name == 'Charles Koch & David Koch'].str.replace('Charles Koch & David Koch', 'Charles Koch')
df15.Name[df15.Name == 'Emmanuelle Charpentier & Jennifer Doudna'] = df15.Name[df15.Name == 'Emmanuelle Charpentier & Jennifer Doudna'].str.replace('Emmanuelle Charpentier & Jennifer Doudna', 'Emmanuelle Charpentier')

df15

Unnamed: 0,Name,Category,Year,Link
0,Kanye West,TITANS,2015,https://time.com/collection-post/3822841/kanye-west-2015-time-100/
1,Lorne Michaels,TITANS,2015,https://time.com/collection-post/3822845/lorne-michaels-2015-time-100/
2,Mellody Hobson,TITANS,2015,https://time.com/collection-post/3822587/mellody-hobson-2015-time-100/
3,Tim Cook,TITANS,2015,https://time.com/collection-post/3822599/tim-cook-2015-time-100/
4,Elizabeth Holmes,TITANS,2015,https://time.com/collection-post/3822734/elizabeth-holmes-2015-time-100/
5,Charles Koch,TITANS,2015,https://time.com/collection-post/3822767/charles-koch-david-koch-2015-time-100/
6,Susan Wojcicki,TITANS,2015,https://time.com/collection-post/3822770/susan-wojcicki-2015-time-100/
7,Chanda Kochhar,TITANS,2015,https://time.com/collection-post/3822610/chanda-kochhar-2015-time-100/
8,Tony Fernandes,TITANS,2015,https://time.com/collection-post/3822614/tony-fernandes-2015-time-100/
9,Lee Daniels,TITANS,2015,https://time.com/collection-post/3822623/lee-daniels-2015-time-100/


In [57]:
df15.to_csv('2015_Times_Scraped_List.csv',sep=',',index=False)
df15

Unnamed: 0,Name,Category,Year,Link
0,Kanye West,TITANS,2015,https://time.com/collection-post/3822841/kanye-west-2015-time-100/
1,Lorne Michaels,TITANS,2015,https://time.com/collection-post/3822845/lorne-michaels-2015-time-100/
2,Mellody Hobson,TITANS,2015,https://time.com/collection-post/3822587/mellody-hobson-2015-time-100/
3,Tim Cook,TITANS,2015,https://time.com/collection-post/3822599/tim-cook-2015-time-100/
4,Elizabeth Holmes,TITANS,2015,https://time.com/collection-post/3822734/elizabeth-holmes-2015-time-100/
5,Charles Koch,TITANS,2015,https://time.com/collection-post/3822767/charles-koch-david-koch-2015-time-100/
6,Susan Wojcicki,TITANS,2015,https://time.com/collection-post/3822770/susan-wojcicki-2015-time-100/
7,Chanda Kochhar,TITANS,2015,https://time.com/collection-post/3822610/chanda-kochhar-2015-time-100/
8,Tony Fernandes,TITANS,2015,https://time.com/collection-post/3822614/tony-fernandes-2015-time-100/
9,Lee Daniels,TITANS,2015,https://time.com/collection-post/3822623/lee-daniels-2015-time-100/


In [58]:
urlnames2015 = []
for eachname in df15.Name:
        eachname = eachname.replace(" ","+")
        urlnames2015.append(f"https://www.google.com/search?q={eachname}")
        
urlnames2015

['https://www.google.com/search?q=Kanye+West',
 'https://www.google.com/search?q=Lorne+Michaels',
 'https://www.google.com/search?q=Mellody+Hobson',
 'https://www.google.com/search?q=Tim+Cook',
 'https://www.google.com/search?q=Elizabeth+Holmes',
 'https://www.google.com/search?q=Charles+Koch',
 'https://www.google.com/search?q=Susan+Wojcicki',
 'https://www.google.com/search?q=Chanda+Kochhar',
 'https://www.google.com/search?q=Tony+Fernandes',
 'https://www.google.com/search?q=Lee+Daniels',
 'https://www.google.com/search?q=Reid+Hoffman',
 'https://www.google.com/search?q=Kim+Kardashian+West',
 'https://www.google.com/search?q=Janet+Yellen',
 'https://www.google.com/search?q=Danny+Meyer',
 'https://www.google.com/search?q=Lei+Jun',
 'https://www.google.com/search?q=Bob+Iger',
 'https://www.google.com/search?q=Satya+Nadella',
 'https://www.google.com/search?q=Jorge+Paulo+Lemann',
 'https://www.google.com/search?q=Misty+Copeland',
 'https://www.google.com/search?q=Scott+Kelly',
 'https:

In [59]:
driver = webdriver.Chrome()

In [60]:
import time

full_list2015 = []

for url in urlnames2015:
    driver.get(url)
    
    try:
        names = driver.find_element_by_class_name("SPZz6b").text 
        name = (names.split('\n'))[0]

    except:
        name = 'Null'

    try:
        label = driver.find_element_by_class_name("wwUB2c").text
    except: 
        label = 'Null'

    try:
        born = driver.find_element_by_class_name("rVusze").text
    except: 
        born = 'Null'

    try:
        full_desc = driver.find_element_by_class_name("kno-rdesc").text
    except: 
        full_desc = 'Null'

    person_info = {'Name': name,
                    'Profession': label,
                    'Birth Info': born,
                    'Description': full_desc}
    
    full_list2015.append(person_info)
    
    time.sleep(1)

print (len(full_list2015))
full_list2015

100


[{'Name': 'Kanye West',
  'Profession': 'American rapper',
  'Birth Info': 'Born: June 8, 1977 (age 43 years), Atlanta, GA',
  'Description': 'Description\nKanye Omari West is an American rapper, record producer, and fashion designer. He has been influential in the 21st-century development of mainstream hip hop and popular music in general. Wikipedia'},
 {'Name': 'Lorne Michaels',
  'Profession': 'American-Canadian television producer',
  'Birth Info': 'Born: November 17, 1944 (age 76 years), Toronto, Canada',
  'Description': 'Description\nLorne Michaels CC is a Canadian-American television producer and screenwriter best known for creating and producing Saturday Night Live and producing the Late Night series, The Kids in the Hall and The Tonight Show. Wikipedia'},
 {'Name': 'Mellody Hobson',
  'Profession': 'American businesswoman',
  'Birth Info': 'Born: April 3, 1969 (age 51 years), Chicago, IL',
  'Description': 'Description\nMellody Hobson is an American businesswoman who is the p

In [63]:
df15b = pd.DataFrame(full_list2015)
df15b

df15b['Birthplace'] = df15b['Birth Info'].str.extract(r"years\), ([\w\W]+)")
df15b['Birthdate'] = df15b['Birth Info'].str.extract (r"Born: ([\w\W]+) \(age")
df15b['Age'] = df15b['Birth Info'].str.extract(r"\(age ([\d]*) years\)")
df15b['Description'] = df15b.Description.str.replace('Description\n', '')
df15b['Description'] = df15b.Description.str.replace('Wikipedia', '')

In [64]:
try:
    df15b['Pronouns'] = df15b['Description'].str.extract(r"\b(she|he|her|his)(?i)\b")
    
except:
    pass


df15b

Unnamed: 0,Name,Profession,Birth Info,Description,Birthplace,Birthdate,Age,Pronouns
0,Kanye West,American rapper,"Born: June 8, 1977 (age 43 years), Atlanta, GA","Kanye Omari West is an American rapper, record producer, and fashion designer. He has been influential in the 21st-century development of mainstream hip hop and popular music in general.","Atlanta, GA","June 8, 1977",43.0,He
1,Lorne Michaels,American-Canadian television producer,"Born: November 17, 1944 (age 76 years), Toronto, Canada","Lorne Michaels CC is a Canadian-American television producer and screenwriter best known for creating and producing Saturday Night Live and producing the Late Night series, The Kids in the Hall and The Tonight Show.","Toronto, Canada","November 17, 1944",76.0,
2,Mellody Hobson,American businesswoman,"Born: April 3, 1969 (age 51 years), Chicago, IL","Mellody Hobson is an American businesswoman who is the president and co-CEO of Ariel Investments. She is the former chairwoman of DreamWorks Animation, having stepped down after negotiating the acquisition of DreamWorks Animation SKG, Inc., by NBCUniversal in August, 2016.","Chicago, IL","April 3, 1969",51.0,She
3,Tim Cook,Chief Executive Officer of Apple,"Born: November 1, 1960 (age 60 years), Mobile, AL","Timothy Donald Cook is an American business executive, philanthropist and engineer. Cook is the chief executive officer of Apple Inc., and previously served as the company's chief operating officer under its cofounder Steve Jobs.","Mobile, AL","November 1, 1960",60.0,
4,Elizabeth Holmes,American businesswoman,"Born: February 3, 1984 (age 36 years), Washington, D.C.","Elizabeth Anne Holmes is an American businesswoman who founded and was the CEO of Theranos, a now-defunct health technology company.","Washington, D.C.","February 3, 1984",36.0,
5,Charles Koch,Chief Executive Officer of Koch Industries,"Born: November 1, 1935 (age 85 years), Wichita, KS","Charles de Ganahl Koch is an American businessman and philanthropist. As of March 2019, he was ranked as the 11th-richest person in the world, with an estimated net worth of $50.5 billion.","Wichita, KS","November 1, 1935",85.0,he
6,Susan Wojcicki,CEO of YouTube,"Born: July 5, 1968 (age 52 years), Santa Clara County, CA","Susan Diane Wojcicki is the CEO of YouTube. She is the longest tenured CEO in the history of YouTube. Wojcicki was involved in the founding of Google, and became Google's first marketing manager in 1999. She later led the company's online advertising business and was put in charge of Google's original video service.","Santa Clara County, CA","July 5, 1968",52.0,She
7,Chanda Kochhar,Executive,"Born: November 17, 1961 (age 59 years), Jodhpur, India","Chanda Kochhar was the managing director and chief executive officer of ICICI Bank. However, on 4 October 2018 she stepped down from her position following allegations of corruption. Amidst investigations related to Videocon bad loans, she was forced by the board of ICICI Bank to take indefinite leave.","Jodhpur, India","November 17, 1961",59.0,she
8,Tony Fernandes,Entrepreneur,"Born: April 30, 1964 (age 56 years), Kuala Lumpur, Malaysia","Anthony Francis Fernandes PSM, CBE is a Malaysian entrepreneur. He is the founder of Tune Air Sdn. Bhd., who introduced the first budget no-frills airline, AirAsia, to Malaysians with the tagline ""Now everyone can fly"".","Kuala Lumpur, Malaysia","April 30, 1964",56.0,He
9,Lee Daniels,American film writer,"Born: December 24, 1959 (age 60 years), Philadelphia, PA","Lee Daniels is an American film and television writer, director, and producer. Daniels produced 2001’s Academy Award winning Monster's Ball.","Philadelphia, PA","December 24, 1959",60.0,


In [65]:
df15b.Name[df15b.Name == 'Null'].value_counts()

Null    1
Name: Name, dtype: int64

In [66]:
df15b.Birthplace.isna().value_counts()

False    90
True     10
Name: Birthplace, dtype: int64

In [67]:
df15b[df15b.Birthplace.isna()]

Unnamed: 0,Name,Profession,Birth Info,Description,Birthplace,Birthdate,Age,Pronouns
30,Null,Null,Null,Null,,,,
31,Aura Elena Farfán,Guatemalan human rights activists,"Born: February 2, 1940 (age 80 years)","Aura Elena Farfán is a Guatemalan human rights activist. She is one of the founders and Executive Director of FAMDEGUA, a Guatemala City-based organization dedicated to surviving family members of people who have been disappeared by the Guatemalan government. It is one of Guatemala's oldest human rights organizations.",,"February 2, 1940",80.0,She
32,Martin J. Blaser,Professor,h-index: 171,Martin J. Blaser is the director of the Center for Advanced Biotechnology and Medicine at Rutgers Biomedical and Health Sciences and the Henry Rutgers Chair of the Human Microbiome and professor of medicine and microbiology at the Rutgers Robert Wood Johnson Medical School in New Jersey.,,,,
36,See results about,Null,Null,Null,,,,
54,Last Week Tonight with John Oliver,2014 ‧ News ‧ 7 seasons,"First episode date: April 27, 2014","John Oliver won an Emmy for his work as a writer on ""The Daily Show With Jon Stewart,"" but it wasn't until he guest-hosted that show in the summer of 2013 that HBO took notice of his ""singular perspective and distinct voice."" Thanks to that memorable gig, Oliver gets to show off his talent in front … MORE",,,,his
56,Narendra Modi,Prime Minister of India,"Education: Gujarat University (1983), School of Open Learning, University of Delhi (1978), University of Delhi Trending",Narendra Damodardas Modi is an Indian politician serving as the 14th and current Prime Minister of India since 2014. He was the Chief Minister of Gujarat from 2001 to 2014 and is the Member of Parliament for Varanasi.,,,,He
66,Joko Widodo,President of Indonesia,"Children: Gibran Rakabuming Raka, Kaesang Pangarep, Kahiyang Ayu Trending","Joko Widodo, also known as Jokowi, is an Indonesian politician who is the 7th and current president of Indonesia.",,,,
78,Beji Caid Essebsi,Former President of Tunisia,"Born: November 29, 1926, Sidi Bou Said, Carthage, Tunisia","Mohamed Beji Caid Essebsi was a Tunisian politician who was the President of Tunisia from 31 December 2014 until his death on 25 July 2019. Previously, he served as Minister of Foreign Affairs from 1981 to 1986 and as Prime Minister from February 2011 to December 2011.",,,,his
86,Ruth Bader Ginsburg,Former Associate Justice of the Supreme Court of the United States,"Born: March 15, 1933, Brooklyn","Joan Ruth Bader Ginsburg was an associate justice of the Supreme Court of the United States from 1993 until her death on September 18, 2020. She was nominated by President Bill Clinton, replacing retiring justice Byron White, and at the time was generally viewed as a moderate consensus-builder.",,,,her
98,David Koch,American businessman,"Born: May 3, 1940, Wichita, KS","David Hamilton Koch was an American businessman, philanthropist, political activist, and chemical engineer. In 1970, he joined the family business: Koch Industries, the largest privately held company in the United States.",,,,he


In [68]:
#Manually cleaning some null values for place, using the current location / nationality as general birthplace because it will be mapped

df15b.Birthplace[df15b.Name == 'Aura Elena Farfán'] = df15b.Birthplace[df15b.Name == 'Aura Elena Farfán'].fillna('Guatemala City, Guatemala')
df15b.Birthplace[df15b.Name == 'Joko Widodo'] = df15b.Birthplace[df15b.Name == 'Joko Widodo'].fillna('Jakarta, Indonesia')
df15b.Birthplace[df15b.Name == 'Beji Caid Essebsi'] = df15b.Birthplace[df15b.Name == 'Beji Caid Essebsi'].fillna('Carthage, Tunisia')
df15b.Birthplace[df15b.Name == 'Ruth Bader Ginsburg'] = df15b.Birthplace[df15b.Name == 'Ruth Bader Ginsburg'].fillna('Brooklyn, NY')
df15b.Birthplace[df15b.Name == 'David Koch'] = df15b.Birthplace[df15b.Name == 'David Koch'].fillna('Wichita, KS')
df15b.Birthplace[df15b.Name == 'Martin J. Blaser'] = df15b.Birthplace[df15b.Name == 'Martin J. Blaser'].fillna('Rutgers, NJ')

df15b.Name[df15b.Name == 'Last Week Tonight with John Oliver'] = df15b.Name[df15b.Name == 'Last Week Tonight with John Oliver'].str.replace('Last Week Tonight with John Oliver', 'John Oliver')


df15b

Unnamed: 0,Name,Profession,Birth Info,Description,Birthplace,Birthdate,Age,Pronouns
0,Kanye West,American rapper,"Born: June 8, 1977 (age 43 years), Atlanta, GA","Kanye Omari West is an American rapper, record producer, and fashion designer. He has been influential in the 21st-century development of mainstream hip hop and popular music in general.","Atlanta, GA","June 8, 1977",43.0,He
1,Lorne Michaels,American-Canadian television producer,"Born: November 17, 1944 (age 76 years), Toronto, Canada","Lorne Michaels CC is a Canadian-American television producer and screenwriter best known for creating and producing Saturday Night Live and producing the Late Night series, The Kids in the Hall and The Tonight Show.","Toronto, Canada","November 17, 1944",76.0,
2,Mellody Hobson,American businesswoman,"Born: April 3, 1969 (age 51 years), Chicago, IL","Mellody Hobson is an American businesswoman who is the president and co-CEO of Ariel Investments. She is the former chairwoman of DreamWorks Animation, having stepped down after negotiating the acquisition of DreamWorks Animation SKG, Inc., by NBCUniversal in August, 2016.","Chicago, IL","April 3, 1969",51.0,She
3,Tim Cook,Chief Executive Officer of Apple,"Born: November 1, 1960 (age 60 years), Mobile, AL","Timothy Donald Cook is an American business executive, philanthropist and engineer. Cook is the chief executive officer of Apple Inc., and previously served as the company's chief operating officer under its cofounder Steve Jobs.","Mobile, AL","November 1, 1960",60.0,
4,Elizabeth Holmes,American businesswoman,"Born: February 3, 1984 (age 36 years), Washington, D.C.","Elizabeth Anne Holmes is an American businesswoman who founded and was the CEO of Theranos, a now-defunct health technology company.","Washington, D.C.","February 3, 1984",36.0,
5,Charles Koch,Chief Executive Officer of Koch Industries,"Born: November 1, 1935 (age 85 years), Wichita, KS","Charles de Ganahl Koch is an American businessman and philanthropist. As of March 2019, he was ranked as the 11th-richest person in the world, with an estimated net worth of $50.5 billion.","Wichita, KS","November 1, 1935",85.0,he
6,Susan Wojcicki,CEO of YouTube,"Born: July 5, 1968 (age 52 years), Santa Clara County, CA","Susan Diane Wojcicki is the CEO of YouTube. She is the longest tenured CEO in the history of YouTube. Wojcicki was involved in the founding of Google, and became Google's first marketing manager in 1999. She later led the company's online advertising business and was put in charge of Google's original video service.","Santa Clara County, CA","July 5, 1968",52.0,She
7,Chanda Kochhar,Executive,"Born: November 17, 1961 (age 59 years), Jodhpur, India","Chanda Kochhar was the managing director and chief executive officer of ICICI Bank. However, on 4 October 2018 she stepped down from her position following allegations of corruption. Amidst investigations related to Videocon bad loans, she was forced by the board of ICICI Bank to take indefinite leave.","Jodhpur, India","November 17, 1961",59.0,she
8,Tony Fernandes,Entrepreneur,"Born: April 30, 1964 (age 56 years), Kuala Lumpur, Malaysia","Anthony Francis Fernandes PSM, CBE is a Malaysian entrepreneur. He is the founder of Tune Air Sdn. Bhd., who introduced the first budget no-frills airline, AirAsia, to Malaysians with the tagline ""Now everyone can fly"".","Kuala Lumpur, Malaysia","April 30, 1964",56.0,He
9,Lee Daniels,American film writer,"Born: December 24, 1959 (age 60 years), Philadelphia, PA","Lee Daniels is an American film and television writer, director, and producer. Daniels produced 2001’s Academy Award winning Monster's Ball.","Philadelphia, PA","December 24, 1959",60.0,


In [69]:
df15b.Birthplace[df15b.Name == 'Martin J. Blaser'] = df15b.Birthplace[df15b.Name == 'Martin J. Blaser'].fillna('Rutgers, NJ')
df15b.Birthplace[df15b.Name == 'John Oliver'] = df15b.Birthplace[df15b.Name == 'John Oliver'].fillna('London, United Kingdom')

In [70]:
df15b.to_csv("2015_Demographic_Data.csv",index=False)

In [71]:
time2015 = df15.join(df15b, rsuffix='_right')

time2015

Unnamed: 0,Name,Category,Year,Link,Name_right,Profession,Birth Info,Description,Birthplace,Birthdate,Age,Pronouns
0,Kanye West,TITANS,2015,https://time.com/collection-post/3822841/kanye-west-2015-time-100/,Kanye West,American rapper,"Born: June 8, 1977 (age 43 years), Atlanta, GA","Kanye Omari West is an American rapper, record producer, and fashion designer. He has been influential in the 21st-century development of mainstream hip hop and popular music in general.","Atlanta, GA","June 8, 1977",43.0,He
1,Lorne Michaels,TITANS,2015,https://time.com/collection-post/3822845/lorne-michaels-2015-time-100/,Lorne Michaels,American-Canadian television producer,"Born: November 17, 1944 (age 76 years), Toronto, Canada","Lorne Michaels CC is a Canadian-American television producer and screenwriter best known for creating and producing Saturday Night Live and producing the Late Night series, The Kids in the Hall and The Tonight Show.","Toronto, Canada","November 17, 1944",76.0,
2,Mellody Hobson,TITANS,2015,https://time.com/collection-post/3822587/mellody-hobson-2015-time-100/,Mellody Hobson,American businesswoman,"Born: April 3, 1969 (age 51 years), Chicago, IL","Mellody Hobson is an American businesswoman who is the president and co-CEO of Ariel Investments. She is the former chairwoman of DreamWorks Animation, having stepped down after negotiating the acquisition of DreamWorks Animation SKG, Inc., by NBCUniversal in August, 2016.","Chicago, IL","April 3, 1969",51.0,She
3,Tim Cook,TITANS,2015,https://time.com/collection-post/3822599/tim-cook-2015-time-100/,Tim Cook,Chief Executive Officer of Apple,"Born: November 1, 1960 (age 60 years), Mobile, AL","Timothy Donald Cook is an American business executive, philanthropist and engineer. Cook is the chief executive officer of Apple Inc., and previously served as the company's chief operating officer under its cofounder Steve Jobs.","Mobile, AL","November 1, 1960",60.0,
4,Elizabeth Holmes,TITANS,2015,https://time.com/collection-post/3822734/elizabeth-holmes-2015-time-100/,Elizabeth Holmes,American businesswoman,"Born: February 3, 1984 (age 36 years), Washington, D.C.","Elizabeth Anne Holmes is an American businesswoman who founded and was the CEO of Theranos, a now-defunct health technology company.","Washington, D.C.","February 3, 1984",36.0,
5,Charles Koch,TITANS,2015,https://time.com/collection-post/3822767/charles-koch-david-koch-2015-time-100/,Charles Koch,Chief Executive Officer of Koch Industries,"Born: November 1, 1935 (age 85 years), Wichita, KS","Charles de Ganahl Koch is an American businessman and philanthropist. As of March 2019, he was ranked as the 11th-richest person in the world, with an estimated net worth of $50.5 billion.","Wichita, KS","November 1, 1935",85.0,he
6,Susan Wojcicki,TITANS,2015,https://time.com/collection-post/3822770/susan-wojcicki-2015-time-100/,Susan Wojcicki,CEO of YouTube,"Born: July 5, 1968 (age 52 years), Santa Clara County, CA","Susan Diane Wojcicki is the CEO of YouTube. She is the longest tenured CEO in the history of YouTube. Wojcicki was involved in the founding of Google, and became Google's first marketing manager in 1999. She later led the company's online advertising business and was put in charge of Google's original video service.","Santa Clara County, CA","July 5, 1968",52.0,She
7,Chanda Kochhar,TITANS,2015,https://time.com/collection-post/3822610/chanda-kochhar-2015-time-100/,Chanda Kochhar,Executive,"Born: November 17, 1961 (age 59 years), Jodhpur, India","Chanda Kochhar was the managing director and chief executive officer of ICICI Bank. However, on 4 October 2018 she stepped down from her position following allegations of corruption. Amidst investigations related to Videocon bad loans, she was forced by the board of ICICI Bank to take indefinite leave.","Jodhpur, India","November 17, 1961",59.0,she
8,Tony Fernandes,TITANS,2015,https://time.com/collection-post/3822614/tony-fernandes-2015-time-100/,Tony Fernandes,Entrepreneur,"Born: April 30, 1964 (age 56 years), Kuala Lumpur, Malaysia","Anthony Francis Fernandes PSM, CBE is a Malaysian entrepreneur. He is the founder of Tune Air Sdn. Bhd., who introduced the first budget no-frills airline, AirAsia, to Malaysians with the tagline ""Now everyone can fly"".","Kuala Lumpur, Malaysia","April 30, 1964",56.0,He
9,Lee Daniels,TITANS,2015,https://time.com/collection-post/3822623/lee-daniels-2015-time-100/,Lee Daniels,American film writer,"Born: December 24, 1959 (age 60 years), Philadelphia, PA","Lee Daniels is an American film and television writer, director, and producer. Daniels produced 2001’s Academy Award winning Monster's Ball.","Philadelphia, PA","December 24, 1959",60.0,


In [72]:
time2015.info()

time2015[time2015.Name_right == 'Null']

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 12 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   Name         100 non-null    object
 1   Category     100 non-null    object
 2   Year         100 non-null    object
 3   Link         100 non-null    object
 4   Name_right   100 non-null    object
 5   Profession   100 non-null    object
 6   Birth Info   100 non-null    object
 7   Description  100 non-null    object
 8   Birthplace   97 non-null     object
 9   Birthdate    91 non-null     object
 10  Age          91 non-null     object
 11  Pronouns     54 non-null     object
dtypes: object(12)
memory usage: 9.5+ KB


Unnamed: 0,Name,Category,Year,Link,Name_right,Profession,Birth Info,Description,Birthplace,Birthdate,Age,Pronouns
30,Kira Orange Jones,PIONEERS,2015,https://time.com/collection-post/3822936/kira-orange-jones-2015-time-100/,Null,Null,Null,Null,,,,


In [73]:
time2015.to_csv("Time2015_Full_merged.csv")

## Time 2013

In [74]:
response = requests.get("https://time100.time.com/2013/04/18/time-100/slide/all/")
doc2013 = BeautifulSoup(response.content, "html.parser")
doc2013.prettify()

'<!DOCTYPE html>\n<!--[if lt IE 7]> <html lang="en-us" class="no-js ie lt-ie9 lt-ie8 lt-ie7"> <![endif]-->\n<!--[if IE 7]>    <html lang="en-us" class="no-js ie lt-ie9 lt-ie8"> <![endif]-->\n<!--[if IE 8]>    <html lang="en-us" class="no-js ie lt-ie9"> <![endif]-->\n<!--[if gt IE 8]><!-->\n<html class="no-js" lang="en-us">\n <!--<![endif]-->\n <head>\n  <meta charset="utf-8"/>\n  <script type="text/javascript">\n   (window.NREUM||(NREUM={})).loader_config={licenseKey:"05d81ee1c4",applicationID:"263618420"};window.NREUM||(NREUM={}),__nr_require=function(e,t,n){function r(n){if(!t[n]){var i=t[n]={exports:{}};e[n][0].call(i.exports,function(t){var i=e[n][1][t];return r(i||t)},i,i.exports)}return t[n].exports}if("function"==typeof __nr_require)return __nr_require;for(var i=0;i<n.length;i++)r(n[i]);return r}({1:[function(e,t,n){function r(){}function i(e,t,n){return function(){return o(e,[u.now()].concat(c(arguments)),t?null:this,n),t?void 0:this}}var o=e("handle"),a=e(6),c=e(7),f=e("ee").g

In [75]:
full2013 = doc2013.find(class_='full-list group col-1')
full2013_2 = doc2013.find(class_='full-list group col-2')
listofnames2013 = full2013.find_all('a')
listofnames2013_2 = full2013_2.find_all('a')

allnames2013 = []

count = 0

for listppl in listofnames2013:
    name = (listppl.text)
    links = listppl['href']
    link = 'content.time.com' + links
    count = count + 1
    #print (count, name, link)
    
    if 0 < count < 21:
        category = 'TITANS'
        
    if 20 < count < 44:
        category = 'LEADERS'
  
    if 43 < count < 59:
         category = 'ARTISTS'

    eachdict = {'Name': name, 
                 'Category': category,
                 'Year': '2013',
                 'Link': link}

    allnames2013.append(eachdict)

for listppl in listofnames2013_2:
    name = (listppl.text)
    links = listppl['href']
    link = 'content.time.com' + links
    count = count + 1
    
    #print (count, name, link)
    if 58 < count < 81:
        category = 'PIONEERS'
        
    if 80 < count < 101:
         category = 'ICONS'
    
    eachdict2 = {'Name': name, 
                 'Category': category,
                 'Year': '2013',
                 'Link': link}
    
    allnames2013.append(eachdict2)

allnames2013
    

[{'Name': 'Jay Z',
  'Category': 'TITANS',
  'Year': '2013',
  'Link': 'content.time.comhttps://time100.time.com/2013/04/18/time-100/slide/jay-z/'},
 {'Name': 'Valerie Jarrett',
  'Category': 'TITANS',
  'Year': '2013',
  'Link': 'content.time.comhttps://time100.time.com/2013/04/18/time-100/slide/valerie-jarrett/'},
 {'Name': 'Elon Musk',
  'Category': 'TITANS',
  'Year': '2013',
  'Link': 'content.time.comhttps://time100.time.com/2013/04/18/time-100/slide/elon-musk/'},
 {'Name': 'Oh-Hyun Kwon',
  'Category': 'TITANS',
  'Year': '2013',
  'Link': 'content.time.comhttps://time100.time.com/2013/04/18/time-100/slide/oh-hyun-kwon/'},
 {'Name': 'Scooter Braun',
  'Category': 'TITANS',
  'Year': '2013',
  'Link': 'content.time.comhttps://time100.time.com/2013/04/18/time-100/slide/scooter-braun/'},
 {'Name': 'Kevin Systrom',
  'Category': 'TITANS',
  'Year': '2013',
  'Link': 'content.time.comhttps://time100.time.com/2013/04/18/time-100/slide/kevin-systrom/'},
 {'Name': 'Michael Kors',
  'Cat

In [76]:
len(allnames2013)

94

In [77]:
df13 = pd.DataFrame(allnames2013)

df13

Unnamed: 0,Name,Category,Year,Link
0,Jay Z,TITANS,2013,content.time.comhttps://time100.time.com/2013/04/18/time-100/slide/jay-z/
1,Valerie Jarrett,TITANS,2013,content.time.comhttps://time100.time.com/2013/04/18/time-100/slide/valerie-jarrett/
2,Elon Musk,TITANS,2013,content.time.comhttps://time100.time.com/2013/04/18/time-100/slide/elon-musk/
3,Oh-Hyun Kwon,TITANS,2013,content.time.comhttps://time100.time.com/2013/04/18/time-100/slide/oh-hyun-kwon/
4,Scooter Braun,TITANS,2013,content.time.comhttps://time100.time.com/2013/04/18/time-100/slide/scooter-braun/
5,Kevin Systrom,TITANS,2013,content.time.comhttps://time100.time.com/2013/04/18/time-100/slide/kevin-systrom/
6,Michael Kors,TITANS,2013,content.time.comhttps://time100.time.com/2013/04/18/time-100/slide/michael-kors/
7,Palaniappan Chidambaram,TITANS,2013,content.time.comhttps://time100.time.com/2013/04/18/time-100/slide/palaniappan-chidambaram/
8,Ren Zhengfei,TITANS,2013,content.time.comhttps://time100.time.com/2013/04/18/time-100/slide/ren-zhengfei/
9,Ted Sarandos,TITANS,2013,content.time.comhttps://time100.time.com/2013/04/18/time-100/slide/ted-sarandos/


In [78]:
df13.to_csv('2013_Times_Scraped_List.csv',sep=',',index=False)

In [79]:
urlnames2013 = []
for eachname in df13.Name:
        eachname = eachname.replace(" ","+")
        urlnames2013.append(f"https://www.google.com/search?q={eachname}")
        
urlnames2013

['https://www.google.com/search?q=Jay+Z',
 'https://www.google.com/search?q=Valerie+Jarrett',
 'https://www.google.com/search?q=Elon+Musk',
 'https://www.google.com/search?q=Oh-Hyun+Kwon',
 'https://www.google.com/search?q=Scooter+Braun',
 'https://www.google.com/search?q=Kevin+Systrom',
 'https://www.google.com/search?q=Michael+Kors',
 'https://www.google.com/search?q=Palaniappan+Chidambaram',
 'https://www.google.com/search?q=Ren+Zhengfei',
 'https://www.google.com/search?q=Ted+Sarandos',
 'https://www.google.com/search?q=Gina+Rinehart',
 'https://www.google.com/search?q=Markus+Persson+and+Jens+Bergensten',
 'https://www.google.com/search?q=Igor+Sechin',
 'https://www.google.com/search?q=Tadashi+Yanai',
 'https://www.google.com/search?q=Sam+Yagan',
 'https://www.google.com/search?q=Shonda+Rhimes',
 'https://www.google.com/search?q=Lebron+James',
 'https://www.google.com/search?q=David+Einhorn',
 'https://www.google.com/search?q=Magnus+Carlsen',
 'https://www.google.com/search?q=Shery

In [80]:
driver = webdriver.Chrome()

In [81]:
import time

full_list2013 = []

for url in urlnames2013:
    driver.get(url)
    
    try:
        names = driver.find_element_by_class_name("SPZz6b").text 
        name = (names.split('\n'))[0]
        
    except: 
        name = 'Null'

    try:
        label = driver.find_element_by_class_name("wwUB2c").text
    except: 
        label = 'Null'

    try:
        born = driver.find_element_by_class_name("rVusze").text
    except: 
        born = 'Null'

    try:
        full_desc = driver.find_element_by_class_name("kno-rdesc").text
    except: 
        full_desc = 'Null'

    person_info = {'Name': name,
                    'Profession': label,
                    'Birth Info': born,
                    'Description': full_desc}
    
    full_list2013.append(person_info)
    
    time.sleep(1)

print (len(full_list2013))
full_list2013

94


[{'Name': 'Jay-Z',
  'Profession': 'American rapper',
  'Birth Info': 'Born: December 4, 1969 (age 51 years), Brooklyn, New York, NY',
  'Description': 'Description\nShawn Corey Carter, known professionally as Jay-Z, is an American rapper, songwriter, record executive, businessman, and record producer. He is widely regarded as one of the most influential hip-hop artists in history, and often cited as one of the greatest rappers of all time. Wikipedia'},
 {'Name': 'Valerie Jarrett',
  'Profession': 'American businesswoman',
  'Birth Info': 'Born: November 14, 1956 (age 64 years), Shiraz, Iran',
  'Description': 'Description\nValerie June Jarrett is an American businesswoman and former government official. She served as the senior advisor to U.S. President Barack Obama and assistant to the president for public engagement and intergovernmental affairs from 2009 to 2017. Wikipedia'},
 {'Name': 'Elon Musk',
  'Profession': 'CEO of SpaceX',
  'Birth Info': 'Net worth: 144.7 billion USD (2020

In [82]:
df13b = pd.DataFrame(full_list2013)
df13b

Unnamed: 0,Name,Profession,Birth Info,Description
0,Jay-Z,American rapper,"Born: December 4, 1969 (age 51 years), Brooklyn, New York, NY","Description\nShawn Corey Carter, known professionally as Jay-Z, is an American rapper, songwriter, record executive, businessman, and record producer. He is widely regarded as one of the most influential hip-hop artists in history, and often cited as one of the greatest rappers of all time. Wikipedia"
1,Valerie Jarrett,American businesswoman,"Born: November 14, 1956 (age 64 years), Shiraz, Iran",Description\nValerie June Jarrett is an American businesswoman and former government official. She served as the senior advisor to U.S. President Barack Obama and assistant to the president for public engagement and intergovernmental affairs from 2009 to 2017. Wikipedia
2,Elon Musk,CEO of SpaceX,"Net worth: 144.7 billion USD (2020) Forbes, Trending","Description\nElon Reeve Musk FRS is a business magnate, industrial designer and engineer. He is the founder, CEO, CTO and chief designer of SpaceX; early investor, CEO and product architect of Tesla, Inc.; founder of The Boring Company; co-founder of Neuralink; and co-founder and initial co-chairman of OpenAI. Wikipedia"
3,Kwon Oh-hyun,Null,"Born: October 15, 1952 (age 68 years), South Korea","Description\nKwon Oh-hyun is the Vice Chairman and CEO of Samsung Electronics. In 2013, Time Magazine added him to their top 100 list of most influential people. In October 2017, Dr. Kwon announced that he would resign in March 2018, citing an ""unprecedented crisis"" Wikipedia"
4,Scooter Braun,American media proprietor,"Born: June 18, 1981 (age 39 years), New York, NY","Description\nScott Samuel ""Scooter"" Braun is an American media proprietor, record executive, philanthropist, and investor. In 2013, Braun was included on the annual Time 100 list of the most influential people in the world and in 2020 Fortune magazine named him to its ""40 Under 40"" list in media and entertainment. Wikipedia"
5,Kevin Systrom,American computer programmer,"Born: December 30, 1983 (age 36 years), Holliston, MA","Description\nKevin Systrom is an American computer programmer and entrepreneur. He co‑founded Instagram, the world's largest photo sharing website, along with Mike Krieger. Systrom was included on the list of America's Richest Entrepreneurs Under 40 2016. Wikipedia"
6,Michael Kors,Honorary Chairman of Michael Kors,"Born: August 9, 1959 (age 61 years), Long Island, NY","Description\nMichael David Kors is an American fashion designer. He is the honorary chairman and chief creative officer of his brand, Michael Kors, which sells men's and women's ready-to-wear, accessories, watches, jewellery, footwear, and fragrance. Wikipedia"
7,P. Chidambaram,"Member of Parliament, Rajya Sabha","Born: September 16, 1945 (age 75 years), Kanadukathan, India","Description\nPalaniappan Chidambaram is an Indian politician and former attorney who currently serves as Member of Parliament, Rajya Sabha and formerly served as the Union Minister of Finance and Union Minister of Home Affairs of India. Wikipedia"
8,Ren Zhengfei,CEO of Huawei,"Born: October 25, 1944 (age 76 years), Zhenning Buyei and Miao Autonomous County, Anshun, China","Description\nRen Zhengfei is a Chinese entrepreneur and engineer. He is the founder and CEO of Shenzhen-based Huawei, the world's largest manufacturer of telecommunications equipment and second largest manufacturer of smartphones. As of February 2019, he had a net worth of US$1.3 billion. Wikipedia"
9,Ted Sarandos,Chief Executive Officer of Netflix,"Born: July 30, 1964 (age 56 years), Phoenix, AZ",Description\nTheodore Anthony Sarandos Jr. is an American businessman who serves as the Co-Chief Executive Officer and Chief Content Officer for Netflix. Sarandos oversees Netflix's annual budget of over $6 billion. Wikipedia


In [83]:
df13b['Birthplace'] = df13b['Birth Info'].str.extract(r"\), ([\w\W]+)")
df13b['Birthdate'] = df13b['Birth Info'].str.extract (r"Born: ([\w\s\d\,]+) [\(age]?")
df13b['Age'] = df13b['Birth Info'].str.extract(r"\(age ([\d]*) years\)")
df13b['Description'] = df13b.Description.str.replace('Description\n', '')
df13b['Description'] = df13b.Description.str.replace('Wikipedia', '')

try:
    df13b['Pronouns'] = df13b['Description'].str.extract(r"\b(she|he|her|his)(?i)\b")
    
except:
    pass


df13b

Unnamed: 0,Name,Profession,Birth Info,Description,Birthplace,Birthdate,Age,Pronouns
0,Jay-Z,American rapper,"Born: December 4, 1969 (age 51 years), Brooklyn, New York, NY","Shawn Corey Carter, known professionally as Jay-Z, is an American rapper, songwriter, record executive, businessman, and record producer. He is widely regarded as one of the most influential hip-hop artists in history, and often cited as one of the greatest rappers of all time.","Brooklyn, New York, NY","December 4, 1969",51.0,He
1,Valerie Jarrett,American businesswoman,"Born: November 14, 1956 (age 64 years), Shiraz, Iran",Valerie June Jarrett is an American businesswoman and former government official. She served as the senior advisor to U.S. President Barack Obama and assistant to the president for public engagement and intergovernmental affairs from 2009 to 2017.,"Shiraz, Iran","November 14, 1956",64.0,She
2,Elon Musk,CEO of SpaceX,"Net worth: 144.7 billion USD (2020) Forbes, Trending","Elon Reeve Musk FRS is a business magnate, industrial designer and engineer. He is the founder, CEO, CTO and chief designer of SpaceX; early investor, CEO and product architect of Tesla, Inc.; founder of The Boring Company; co-founder of Neuralink; and co-founder and initial co-chairman of OpenAI.",,,,He
3,Kwon Oh-hyun,Null,"Born: October 15, 1952 (age 68 years), South Korea","Kwon Oh-hyun is the Vice Chairman and CEO of Samsung Electronics. In 2013, Time Magazine added him to their top 100 list of most influential people. In October 2017, Dr. Kwon announced that he would resign in March 2018, citing an ""unprecedented crisis""",South Korea,"October 15, 1952",68.0,he
4,Scooter Braun,American media proprietor,"Born: June 18, 1981 (age 39 years), New York, NY","Scott Samuel ""Scooter"" Braun is an American media proprietor, record executive, philanthropist, and investor. In 2013, Braun was included on the annual Time 100 list of the most influential people in the world and in 2020 Fortune magazine named him to its ""40 Under 40"" list in media and entertainment.","New York, NY","June 18, 1981",39.0,
5,Kevin Systrom,American computer programmer,"Born: December 30, 1983 (age 36 years), Holliston, MA","Kevin Systrom is an American computer programmer and entrepreneur. He co‑founded Instagram, the world's largest photo sharing website, along with Mike Krieger. Systrom was included on the list of America's Richest Entrepreneurs Under 40 2016.","Holliston, MA","December 30, 1983",36.0,He
6,Michael Kors,Honorary Chairman of Michael Kors,"Born: August 9, 1959 (age 61 years), Long Island, NY","Michael David Kors is an American fashion designer. He is the honorary chairman and chief creative officer of his brand, Michael Kors, which sells men's and women's ready-to-wear, accessories, watches, jewellery, footwear, and fragrance.","Long Island, NY","August 9, 1959",61.0,He
7,P. Chidambaram,"Member of Parliament, Rajya Sabha","Born: September 16, 1945 (age 75 years), Kanadukathan, India","Palaniappan Chidambaram is an Indian politician and former attorney who currently serves as Member of Parliament, Rajya Sabha and formerly served as the Union Minister of Finance and Union Minister of Home Affairs of India.","Kanadukathan, India","September 16, 1945",75.0,
8,Ren Zhengfei,CEO of Huawei,"Born: October 25, 1944 (age 76 years), Zhenning Buyei and Miao Autonomous County, Anshun, China","Ren Zhengfei is a Chinese entrepreneur and engineer. He is the founder and CEO of Shenzhen-based Huawei, the world's largest manufacturer of telecommunications equipment and second largest manufacturer of smartphones. As of February 2019, he had a net worth of US$1.3 billion.","Zhenning Buyei and Miao Autonomous County, Anshun, China","October 25, 1944",76.0,He
9,Ted Sarandos,Chief Executive Officer of Netflix,"Born: July 30, 1964 (age 56 years), Phoenix, AZ",Theodore Anthony Sarandos Jr. is an American businessman who serves as the Co-Chief Executive Officer and Chief Content Officer for Netflix. Sarandos oversees Netflix's annual budget of over $6 billion.,"Phoenix, AZ","July 30, 1964",56.0,


In [84]:
df13b.to_csv("2013_Demographic_Data.csv",index=False)

In [85]:
time2013 = df13.join(df13b, rsuffix='_right')

time2013

Unnamed: 0,Name,Category,Year,Link,Name_right,Profession,Birth Info,Description,Birthplace,Birthdate,Age,Pronouns
0,Jay Z,TITANS,2013,content.time.comhttps://time100.time.com/2013/04/18/time-100/slide/jay-z/,Jay-Z,American rapper,"Born: December 4, 1969 (age 51 years), Brooklyn, New York, NY","Shawn Corey Carter, known professionally as Jay-Z, is an American rapper, songwriter, record executive, businessman, and record producer. He is widely regarded as one of the most influential hip-hop artists in history, and often cited as one of the greatest rappers of all time.","Brooklyn, New York, NY","December 4, 1969",51.0,He
1,Valerie Jarrett,TITANS,2013,content.time.comhttps://time100.time.com/2013/04/18/time-100/slide/valerie-jarrett/,Valerie Jarrett,American businesswoman,"Born: November 14, 1956 (age 64 years), Shiraz, Iran",Valerie June Jarrett is an American businesswoman and former government official. She served as the senior advisor to U.S. President Barack Obama and assistant to the president for public engagement and intergovernmental affairs from 2009 to 2017.,"Shiraz, Iran","November 14, 1956",64.0,She
2,Elon Musk,TITANS,2013,content.time.comhttps://time100.time.com/2013/04/18/time-100/slide/elon-musk/,Elon Musk,CEO of SpaceX,"Net worth: 144.7 billion USD (2020) Forbes, Trending","Elon Reeve Musk FRS is a business magnate, industrial designer and engineer. He is the founder, CEO, CTO and chief designer of SpaceX; early investor, CEO and product architect of Tesla, Inc.; founder of The Boring Company; co-founder of Neuralink; and co-founder and initial co-chairman of OpenAI.",,,,He
3,Oh-Hyun Kwon,TITANS,2013,content.time.comhttps://time100.time.com/2013/04/18/time-100/slide/oh-hyun-kwon/,Kwon Oh-hyun,Null,"Born: October 15, 1952 (age 68 years), South Korea","Kwon Oh-hyun is the Vice Chairman and CEO of Samsung Electronics. In 2013, Time Magazine added him to their top 100 list of most influential people. In October 2017, Dr. Kwon announced that he would resign in March 2018, citing an ""unprecedented crisis""",South Korea,"October 15, 1952",68.0,he
4,Scooter Braun,TITANS,2013,content.time.comhttps://time100.time.com/2013/04/18/time-100/slide/scooter-braun/,Scooter Braun,American media proprietor,"Born: June 18, 1981 (age 39 years), New York, NY","Scott Samuel ""Scooter"" Braun is an American media proprietor, record executive, philanthropist, and investor. In 2013, Braun was included on the annual Time 100 list of the most influential people in the world and in 2020 Fortune magazine named him to its ""40 Under 40"" list in media and entertainment.","New York, NY","June 18, 1981",39.0,
5,Kevin Systrom,TITANS,2013,content.time.comhttps://time100.time.com/2013/04/18/time-100/slide/kevin-systrom/,Kevin Systrom,American computer programmer,"Born: December 30, 1983 (age 36 years), Holliston, MA","Kevin Systrom is an American computer programmer and entrepreneur. He co‑founded Instagram, the world's largest photo sharing website, along with Mike Krieger. Systrom was included on the list of America's Richest Entrepreneurs Under 40 2016.","Holliston, MA","December 30, 1983",36.0,He
6,Michael Kors,TITANS,2013,content.time.comhttps://time100.time.com/2013/04/18/time-100/slide/michael-kors/,Michael Kors,Honorary Chairman of Michael Kors,"Born: August 9, 1959 (age 61 years), Long Island, NY","Michael David Kors is an American fashion designer. He is the honorary chairman and chief creative officer of his brand, Michael Kors, which sells men's and women's ready-to-wear, accessories, watches, jewellery, footwear, and fragrance.","Long Island, NY","August 9, 1959",61.0,He
7,Palaniappan Chidambaram,TITANS,2013,content.time.comhttps://time100.time.com/2013/04/18/time-100/slide/palaniappan-chidambaram/,P. Chidambaram,"Member of Parliament, Rajya Sabha","Born: September 16, 1945 (age 75 years), Kanadukathan, India","Palaniappan Chidambaram is an Indian politician and former attorney who currently serves as Member of Parliament, Rajya Sabha and formerly served as the Union Minister of Finance and Union Minister of Home Affairs of India.","Kanadukathan, India","September 16, 1945",75.0,
8,Ren Zhengfei,TITANS,2013,content.time.comhttps://time100.time.com/2013/04/18/time-100/slide/ren-zhengfei/,Ren Zhengfei,CEO of Huawei,"Born: October 25, 1944 (age 76 years), Zhenning Buyei and Miao Autonomous County, Anshun, China","Ren Zhengfei is a Chinese entrepreneur and engineer. He is the founder and CEO of Shenzhen-based Huawei, the world's largest manufacturer of telecommunications equipment and second largest manufacturer of smartphones. As of February 2019, he had a net worth of US$1.3 billion.","Zhenning Buyei and Miao Autonomous County, Anshun, China","October 25, 1944",76.0,He
9,Ted Sarandos,TITANS,2013,content.time.comhttps://time100.time.com/2013/04/18/time-100/slide/ted-sarandos/,Ted Sarandos,Chief Executive Officer of Netflix,"Born: July 30, 1964 (age 56 years), Phoenix, AZ",Theodore Anthony Sarandos Jr. is an American businessman who serves as the Co-Chief Executive Officer and Chief Content Officer for Netflix. Sarandos oversees Netflix's annual budget of over $6 billion.,"Phoenix, AZ","July 30, 1964",56.0,


In [86]:
time2013.to_csv("Time2013_Full_merged.csv")

## Time 2010

In [87]:
response = requests.get("http://content.time.com/time/specials/packages/completelist/0,29569,1984685,00.html")
doc2010 = BeautifulSoup(response.content, "html.parser")
doc2010.prettify()


'<!--[if IE 5]> Vignette StoryServer 5.0 Wed Mar 26 09:36:36 2014 <![endif]-->\n<!DOCTYPE html>\n<!--[if lt IE 7]> <html lang="en-us" class="no-js ie lt-ie9 lt-ie8 lt-ie7"> <![endif]-->\n<!--[if IE 7]> <html lang="en-us" class="no-js ie lt-ie9 lt-ie8"> <![endif]-->\n<!--[if IE 8]> <html lang="en-us" class="no-js ie lt-ie9"> <![endif]-->\n<!--[if gt IE 8]><!-->\n<html class="no-js" lang="en-us">\n <!--<![endif]-->\n <head>\n  <title>\n   Complete List - The 2010 TIME 100 - TIME\n  </title>\n  <link href="http://img.timeinc.net/time/favicon.ico" rel="shortcut icon"/>\n  <meta charset="utf-8"/>\n  <meta content="In our annual TIME 100 issue we name the people who most affect our world..." name="description">\n   <meta content="width=device-width, initial-scale=1.0" name="viewport"/>\n   <meta content="1124331582,512158401,808970553" property="fb:admins"/>\n   <meta content="53177223193" property="fb:app_id"/>\n   <meta content="In our annual TIME 100 issue we name the people who most affe

In [88]:
full2010 = doc2010.find_all(class_='full-list')[0]
listofnames2010 = full2010.find_all('a')

allnames2010 = []
count = 0

for listppl in listofnames2010[:100]:
    name = (listppl.text)
    links = listppl['href']
    link = 'www.content.time.com' + links
    count = count + 1

    
    category = ''
    
    if 0 < count < 26:
        category = 'LEADERS'
        
    if 25 < count < 51:
        category = 'ICONS'
    
    if 50 < count < 76:
        category = 'ARTISTS'
        
    if 75 < count < 101:
        category = 'PIONEERS'
    
    eachdict = {'Name': name, 
                'Category': category,
                'Year': '2010',
                'Link': link}
    
    allnames2010.append(eachdict)

allnames2010
    

[{'Name': 'Luiz Inácio Lula da Silva ',
  'Category': 'LEADERS',
  'Year': '2010',
  'Link': 'www.content.time.com/time/specials/packages/article/0,28804,1984685_1984864_1984866,00.html'},
 {'Name': 'J.T. Wang ',
  'Category': 'LEADERS',
  'Year': '2010',
  'Link': 'www.content.time.com/time/specials/packages/article/0,28804,1984685_1984864_1985425,00.html'},
 {'Name': 'Admiral Mike Mullen ',
  'Category': 'LEADERS',
  'Year': '2010',
  'Link': 'www.content.time.com/time/specials/packages/article/0,28804,1984685_1984864_1985426,00.html'},
 {'Name': 'Barack Obama ',
  'Category': 'LEADERS',
  'Year': '2010',
  'Link': 'www.content.time.com/time/specials/packages/article/0,28804,1984685_1984864_1985427,00.html'},
 {'Name': 'Ron Bloom ',
  'Category': 'LEADERS',
  'Year': '2010',
  'Link': 'www.content.time.com/time/specials/packages/article/0,28804,1984685_1984864_1985429,00.html'},
 {'Name': 'Yukio Hatoyama ',
  'Category': 'LEADERS',
  'Year': '2010',
  'Link': 'www.content.time.com/ti

In [89]:
df10 = pd.DataFrame(allnames2010)


In [90]:
df10.to_csv('2010_Times_Scraped_List.csv',sep=',',index=False)

In [91]:
urlnames2010 = []
for eachname in df10.Name:
        eachname = eachname.replace(" ","+")
        urlnames2010.append(f"https://www.google.com/search?q={eachname}")
        
urlnames2010

['https://www.google.com/search?q=Luiz+Inácio+Lula+da+Silva+',
 'https://www.google.com/search?q=J.T.+Wang+',
 'https://www.google.com/search?q=Admiral+Mike+Mullen+',
 'https://www.google.com/search?q=Barack+Obama+',
 'https://www.google.com/search?q=Ron+Bloom+',
 'https://www.google.com/search?q=Yukio+Hatoyama+',
 'https://www.google.com/search?q=Dominique+Strauss-Kahn+',
 'https://www.google.com/search?q=Nancy+Pelosi+',
 'https://www.google.com/search?q=Sarah+Palin+',
 'https://www.google.com/search?q=Salam+Fayyad+',
 'https://www.google.com/search?q=Jon+Kyl+',
 'https://www.google.com/search?q=Glenn+Beck+',
 'https://www.google.com/search?q=Annise+Parker+',
 'https://www.google.com/search?q=Tidjane+Thiam+',
 'https://www.google.com/search?q=Jenny+Beth+Martin+',
 'https://www.google.com/search?q=Christine+Lagarde+',
 'https://www.google.com/search?q=Recep+Tayyip+Erdogan+',
 'https://www.google.com/search?q=General+Stanley+McChrystal+',
 'https://www.google.com/search?q=Manmohan+Singh

In [92]:
driver = webdriver.Chrome()

import time

full_list2010 = []

for url in urlnames2010:
    driver.get(url)
    
    try:
        names = driver.find_element_by_class_name("ZxoDOe").text 
        name = (names.split('\n'))[0]

    except:
        name = 'Null'

    try:
        label = driver.find_element_by_class_name("EGmpye").text
    except: 
        label = 'Null'

    try:
        born = driver.find_element_by_class_name("rVusze").text
    except: 
        born = 'Null'

    try:
        full_desc = driver.find_element_by_class_name("kno-rdesc").text
    except: 
        full_desc = 'Null'

    person_info = {'Name': name,
                    'Profession': label,
                    'Birth Info': born,
                    'Description': full_desc}
    
    full_list2010.append(person_info)
    
    time.sleep(1)

print (len(full_list2010))
full_list2010

100


[{'Name': 'Null',
  'Profession': 'Null',
  'Birth Info': 'Born: October 27, 1945 (age 75 years), Caetés, State of Pernambuco, Brazil',
  'Description': 'Description\nLuiz Inácio Lula da Silva, known as Lula, is a Brazilian politician and former union leader who served as the 35th President of Brazil from 1 January 2003 to 31 December 2010. Wikipedia'},
 {'Name': 'Null',
  'Profession': 'Null',
  'Birth Info': 'Born: July 1954 (age 66 years)',
  'Description': 'Null'},
 {'Name': 'Null',
  'Profession': 'Null',
  'Birth Info': 'Born: October 4, 1946 (age 74 years), Los Angeles County, CA',
  'Description': "Description\nMichael Glenn Mullen is a retired United States Navy admiral, who served as the 17th Chairman of the Joint Chiefs of Staff from October 1, 2007, to September 30, 2011. Mullen previously served as the Navy's 28th Chief of Naval Operations from July 22, 2005, to September 29, 2007. Wikipedia"},
 {'Name': 'Null',
  'Profession': 'Null',
  'Birth Info': 'Born: August 4, 1961

In [93]:
df10b = pd.DataFrame(full_list2010)

In [94]:
df10b['Birthplace'] = df10b['Birth Info'].str.extract(r"\), ([\w\W]+)")
df10b['Birthdate'] = df10b['Birth Info'].str.extract (r"Born: ([\w\s\d\,]+) [\(age]?")
df10b['Age'] = df10b['Birth Info'].str.extract(r"\(age ([\d]*) years\)")
df10b['Description'] = df10b.Description.str.replace('Description\n', '')
df10b['Description'] = df10b.Description.str.replace('Wikipedia', '')

try:
    df10b['Pronouns'] = df10b['Description'].str.extract(r"\b(she|he|her|his)(?i)\b")
    
except:
    pass


df10b

Unnamed: 0,Name,Profession,Birth Info,Description,Birthplace,Birthdate,Age,Pronouns
0,Null,Null,"Born: October 27, 1945 (age 75 years), Caetés, State of Pernambuco, Brazil","Luiz Inácio Lula da Silva, known as Lula, is a Brazilian politician and former union leader who served as the 35th President of Brazil from 1 January 2003 to 31 December 2010.","Caetés, State of Pernambuco, Brazil","October 27, 1945",75.0,
1,Null,Null,Born: July 1954 (age 66 years),Null,,July 1954,66.0,
2,Null,Null,"Born: October 4, 1946 (age 74 years), Los Angeles County, CA","Michael Glenn Mullen is a retired United States Navy admiral, who served as the 17th Chairman of the Joint Chiefs of Staff from October 1, 2007, to September 30, 2011. Mullen previously served as the Navy's 28th Chief of Naval Operations from July 22, 2005, to September 29, 2007.","Los Angeles County, CA","October 4, 1946",74.0,
3,Null,Null,"Born: August 4, 1961 (age 59 years), Kapiʻolani Medical Center for Women & Children, Honolulu, HI","Barack Hussein Obama II is an American politician and attorney who served as the 44th president of the United States from 2009 to 2017. A member of the Democratic Party, Obama was the first African-American president of the United States.","Kapiʻolani Medical Center for Women & Children, Honolulu, HI","August 4, 1961",59.0,
4,Null,Null,"Born: 1956 (age 64 years), New York, NY",Ron Bloom was a senior official in the Obama Administration from February 2009 to August 2011. This included working as the Assistant to the President for Manufacturing Policy between February 2011 and ...,"New York, NY",1956,64.0,
5,Null,Null,"Born: February 11, 1947 (age 73 years), Bunkyo City, Tokyo, Japan",Yukio Hatoyama is a former Japanese politician who served as Prime Minister of Japan from 16 September 2009 to 8 June 2010. He was the first Prime Minister from the modern Democratic Party of Japan.,"Bunkyo City, Tokyo, Japan","February 11, 1947",73.0,He
6,Null,Null,"Born: April 25, 1949 (age 71 years), Neuilly-sur-Seine, France","Dominique Gaston André Strauss-Kahn is a French economist, politician, former managing director of the International Monetary Fund, and a controversial figure in the French Socialist Party due to his involvement in several financial and sexual scandals.","Neuilly-sur-Seine, France","April 25, 1949",71.0,his
7,Null,Null,"Born: March 26, 1940 (age 80 years), Baltimore, MD","Nancy Patricia Pelosi is an American politician serving as Speaker of the United States House of Representatives since 2019, and previously from 2007 to 2011. Pelosi has served as a U.S. representative from California since 1987.","Baltimore, MD","March 26, 1940",80.0,
8,Null,Null,"Born: February 11, 1964 (age 56 years), Sandpoint, ID","Sarah Louise Palin is an American retired politician, commentator, author, and reality television personality, who served as the ninth governor of Alaska from 2006 until her resignation in 2009.","Sandpoint, ID","February 11, 1964",56.0,her
9,Null,Null,"Born: April 12, 1952 (age 68 years), Deir al-Ghusun",Salam Fayyad is a Jordanian-Palestinian politician and former Prime Minister of the Palestinian Authority and Finance Minister. He was Finance Minister from June 2002 to November 2005 and from March 2007 to May 2012. Fayyad was Prime Minister between June 2007 and June 2013.,Deir al-Ghusun,"April 12, 1952",68.0,He


In [95]:
df10b.to_csv("2010_Demographic_Data.csv",index=False)

In [96]:
time2010 = df10.join(df10b, rsuffix='_right')

time2010

Unnamed: 0,Name,Category,Year,Link,Name_right,Profession,Birth Info,Description,Birthplace,Birthdate,Age,Pronouns
0,Luiz Inácio Lula da Silva,LEADERS,2010,"www.content.time.com/time/specials/packages/article/0,28804,1984685_1984864_1984866,00.html",Null,Null,"Born: October 27, 1945 (age 75 years), Caetés, State of Pernambuco, Brazil","Luiz Inácio Lula da Silva, known as Lula, is a Brazilian politician and former union leader who served as the 35th President of Brazil from 1 January 2003 to 31 December 2010.","Caetés, State of Pernambuco, Brazil","October 27, 1945",75.0,
1,J.T. Wang,LEADERS,2010,"www.content.time.com/time/specials/packages/article/0,28804,1984685_1984864_1985425,00.html",Null,Null,Born: July 1954 (age 66 years),Null,,July 1954,66.0,
2,Admiral Mike Mullen,LEADERS,2010,"www.content.time.com/time/specials/packages/article/0,28804,1984685_1984864_1985426,00.html",Null,Null,"Born: October 4, 1946 (age 74 years), Los Angeles County, CA","Michael Glenn Mullen is a retired United States Navy admiral, who served as the 17th Chairman of the Joint Chiefs of Staff from October 1, 2007, to September 30, 2011. Mullen previously served as the Navy's 28th Chief of Naval Operations from July 22, 2005, to September 29, 2007.","Los Angeles County, CA","October 4, 1946",74.0,
3,Barack Obama,LEADERS,2010,"www.content.time.com/time/specials/packages/article/0,28804,1984685_1984864_1985427,00.html",Null,Null,"Born: August 4, 1961 (age 59 years), Kapiʻolani Medical Center for Women & Children, Honolulu, HI","Barack Hussein Obama II is an American politician and attorney who served as the 44th president of the United States from 2009 to 2017. A member of the Democratic Party, Obama was the first African-American president of the United States.","Kapiʻolani Medical Center for Women & Children, Honolulu, HI","August 4, 1961",59.0,
4,Ron Bloom,LEADERS,2010,"www.content.time.com/time/specials/packages/article/0,28804,1984685_1984864_1985429,00.html",Null,Null,"Born: 1956 (age 64 years), New York, NY",Ron Bloom was a senior official in the Obama Administration from February 2009 to August 2011. This included working as the Assistant to the President for Manufacturing Policy between February 2011 and ...,"New York, NY",1956,64.0,
5,Yukio Hatoyama,LEADERS,2010,"www.content.time.com/time/specials/packages/article/0,28804,1984685_1984864_1985436,00.html",Null,Null,"Born: February 11, 1947 (age 73 years), Bunkyo City, Tokyo, Japan",Yukio Hatoyama is a former Japanese politician who served as Prime Minister of Japan from 16 September 2009 to 8 June 2010. He was the first Prime Minister from the modern Democratic Party of Japan.,"Bunkyo City, Tokyo, Japan","February 11, 1947",73.0,He
6,Dominique Strauss-Kahn,LEADERS,2010,"www.content.time.com/time/specials/packages/article/0,28804,1984685_1984864_1985437,00.html",Null,Null,"Born: April 25, 1949 (age 71 years), Neuilly-sur-Seine, France","Dominique Gaston André Strauss-Kahn is a French economist, politician, former managing director of the International Monetary Fund, and a controversial figure in the French Socialist Party due to his involvement in several financial and sexual scandals.","Neuilly-sur-Seine, France","April 25, 1949",71.0,his
7,Nancy Pelosi,LEADERS,2010,"www.content.time.com/time/specials/packages/article/0,28804,1984685_1984864_1985438,00.html",Null,Null,"Born: March 26, 1940 (age 80 years), Baltimore, MD","Nancy Patricia Pelosi is an American politician serving as Speaker of the United States House of Representatives since 2019, and previously from 2007 to 2011. Pelosi has served as a U.S. representative from California since 1987.","Baltimore, MD","March 26, 1940",80.0,
8,Sarah Palin,LEADERS,2010,"www.content.time.com/time/specials/packages/article/0,28804,1984685_1984864_1984871,00.html",Null,Null,"Born: February 11, 1964 (age 56 years), Sandpoint, ID","Sarah Louise Palin is an American retired politician, commentator, author, and reality television personality, who served as the ninth governor of Alaska from 2006 until her resignation in 2009.","Sandpoint, ID","February 11, 1964",56.0,her
9,Salam Fayyad,LEADERS,2010,"www.content.time.com/time/specials/packages/article/0,28804,1984685_1984864_1984897,00.html",Null,Null,"Born: April 12, 1952 (age 68 years), Deir al-Ghusun",Salam Fayyad is a Jordanian-Palestinian politician and former Prime Minister of the Palestinian Authority and Finance Minister. He was Finance Minister from June 2002 to November 2005 and from March 2007 to May 2012. Fayyad was Prime Minister between June 2007 and June 2013.,Deir al-Ghusun,"April 12, 1952",68.0,He


In [97]:
time2010.to_csv("Time2010_Full_merged.csv")

## Time 2007


In [98]:
response = requests.get("http://content.time.com/time/specials/packages/completelist/0,29569,1595326,00.html")
doc2007 = BeautifulSoup(response.content, "html.parser")
doc2007.prettify()

full2007 = doc2007.find_all(class_='full-list')[0]
listofnames2007 = full2007.find_all('a')

allnames2007 = []
count = 0

for listppl in listofnames2007[:100]:
    name = (listppl.text)
    links = listppl['href']
    link = 'content.time.com' + links
    count = count + 1
    

    category = ''
    
    if 0 < count < 23:
        category = 'ARTISTS'
        
    if 22 < count < 44:
        category = 'LEADERS'
  
    if 43 < count < 63:
         category = 'PIONEERS'

    if 62 < count < 82:
        category = 'ICONS AND THINKERS'
        
    if 81 < count < 101:
         category = 'TITANS'
    
    eachdict = {'Name': name, 
                 'Category': category,
                 'Year': '2007',
                 'Link': link}
    
    allnames2007.append(eachdict)

allnames2007
    
#--------------------
#len(allnames2007)

df07 = pd.DataFrame(allnames2007)


df07.to_csv('2007_Times_Scraped_List.csv',sep=',',index=False)
df07

urlnames2007 = []
for eachname in df07.Name:
        eachname = eachname.replace(" ","+")
        urlnames2007.append(f"https://www.google.com/search?q={eachname}")
        
urlnames2007

driver = webdriver.Chrome()



full_list2007 = []

for url in urlnames2007:
    driver.get(url)
    
    try:
        names = driver.find_element_by_class_name("ZxoDOe").text 
        name = (names.split('\n'))[0]

    except:
        name = 'Null'

    try:
        label = driver.find_element_by_class_name("EGmpye").text
    except: 
        label = 'Null'

    try:
        born = driver.find_element_by_class_name("rVusze").text
    except: 
        born = 'Null'

    try:
        full_desc = driver.find_element_by_class_name("kno-rdesc").text
    except: 
        full_desc = 'Null'

    person_info = {'Name': name,
                    'Profession': label,
                    'Birth Info': born,
                    'Description': full_desc}
    
    full_list2007.append(person_info)
    
    time.sleep(1)

print (len(full_list2007))
full_list2007


df07b = pd.DataFrame(full_list2007)
df07b

df07b['Birthplace'] = df07b['Birth Info'].str.extract(r"\), ([\w\W]+)")
df07b['Birthdate'] = df07b['Birth Info'].str.extract (r"Born: ([\w\s\d\,]+) [\(age]?")
df07b['Age'] = df07b['Birth Info'].str.extract(r"\(age ([\d]*) years\)")
df07b['Description'] = df07b.Description.str.replace('Description\n', '')
df07b['Description'] = df07b.Description.str.replace('Wikipedia', '')

try:
    df07b['Pronouns'] = df07b['Description'].str.extract(r"\b(she|he|her|his)(?i)\b")
    
except:
    pass



df07b.to_csv("2007_Demographic_Data.csv",index=False)


time2007 = df07.join(df07b, rsuffix='_right')
time2007.to_csv("Time2007_Full_merged.csv")



100


In [99]:
time2007

Unnamed: 0,Name,Category,Year,Link,Name_right,Profession,Birth Info,Description,Birthplace,Birthdate,Age,Pronouns
0,Tina Fey,ARTISTS,2007,"content.time.com/time/specials/2007/time100/article/0,28804,1595326_1595332_1615973,00.html",Null,Null,"Born: May 18, 1970 (age 50 years), Upper Darby, PA","Elizabeth Stamatina ""Tina"" Fey is an American actress, comedian, writer, producer, and playwright. She is best known for her work on the NBC sketch comedy series Saturday Night Live and for creating the comedy series 30 Rock and Unbreakable Kimmy Schmidt.","Upper Darby, PA","May 18, 1970",50.0,She
1,Youssou N'Dour,ARTISTS,2007,"content.time.com/time/specials/2007/time100/article/0,28804,1595326_1595332_1615978,00.html",Null,Null,"Born: October 1, 1959 (age 61 years), Dakar, Senegal","Youssou N'Dour is a Senegalese singer, songwriter, composer, occasional actor, businessman, and politician. In 2004, Rolling Stone magazine described him as, ""perhaps the most famous singer alive"" in Senegal and much of Africa. From April 2012 to September 2013, he was Senegal's Minister of Tourism.","Dakar, Senegal","October 1, 1959",61.0,he
2,Anna Netrebko,ARTISTS,2007,"content.time.com/time/specials/2007/time100/article/0,28804,1595326_1595332_1615983,00.html",Null,Null,"Born: September 18, 1971 (age 49 years), Krasnodar, Russia","Anna Yuryevna Netrebko is a Russian-Austrian operatic soprano. Discovered and promoted by Valery Gergiev, she began her career at the Mariinsky Theatre, collaborating with the conductor in the theater and performances elsewhere.","Krasnodar, Russia","September 18, 1971",49.0,she
3,Justin Timberlake,ARTISTS,2007,"content.time.com/time/specials/2007/time100/article/0,28804,1595326_1595332_1615988,00.html",Null,Null,"Born: January 31, 1981 (age 39 years), Memphis, TN","Justin Randall Timberlake is an American singer, songwriter, actor, and record producer. Born and raised in Tennessee, he appeared on the television shows Star Search and The All-New Mickey Mouse Club as a child.","Memphis, TN","January 31, 1981",39.0,he
4,Sacha Baron Cohen,ARTISTS,2007,"content.time.com/time/specials/2007/time100/article/0,28804,1595326_1595332_1616201,00.html",Null,Null,"Born: October 13, 1971 (age 49 years), Hammersmith, London, United Kingdom","Sacha Noam Baron Cohen is an English actor, comedian, writer, and producer. He is known for his creation and portrayal of fictional satirical characters, including Ali G, Borat Sagdiyev, Brüno Gehard and Admiral General Aladeen. Baron Cohen adopts a variety of accents and guises for his characters.","Hammersmith, London, United Kingdom","October 13, 1971",49.0,He
5,Leonardo DiCaprio,ARTISTS,2007,"content.time.com/time/specials/2007/time100/article/0,28804,1595326_1595332_1616213,00.html",Null,Null,"Born: November 11, 1974 (age 46 years), Los Angeles, CA","Leonardo Wilhelm DiCaprio is an American actor, producer and environmentalist. He has often played unconventional roles, particularly in biopics and period films. As of 2019, his films have grossed US$7.2 billion worldwide, and he has placed eight times in annual rankings of the highest-paid actors in the world.","Los Angeles, CA","November 11, 1974",46.0,He
6,Nora Roberts,ARTISTS,2007,"content.time.com/time/specials/2007/time100/article/0,28804,1595326_1595332_1616215,00.html",Null,Null,"Born: October 10, 1950 (age 70 years), Silver Spring, MD",Nora Roberts is an American author of more than 225 romance novels. She writes as J. D. Robb for the in Death series and has also written under the pseudonyms Jill March and for publications in the U.K. as Sarah Hardesty.,"Silver Spring, MD","October 10, 1950",70.0,She
7,Rick Rubin,ARTISTS,2007,"content.time.com/time/specials/2007/time100/article/0,28804,1595326_1595332_1616413,00.html",Null,Null,"Born: March 10, 1963 (age 57 years), Lido Beach, NY","Frederick Jay ""Rick"" Rubin is an American record producer and former co-president of Columbia Records. Along with Russell Simmons, he is the co-founder of Def Jam Recordings and also established American Recordings.","Lido Beach, NY","March 10, 1963",57.0,he
8,Martin Scorsese,ARTISTS,2007,"content.time.com/time/specials/2007/time100/article/0,28804,1595326_1595332_1616423,00.html",Null,Null,"Born: November 17, 1942 (age 78 years), Flushing, New York, NY","Martin Charles Scorsese is an American film director, producer, screenwriter, and actor. One of the major figures of the New Hollywood era, he is widely regarded as one of the most significant and influential directors in film history.","Flushing, New York, NY","November 17, 1942",78.0,he
9,Cate Blanchett,ARTISTS,2007,"content.time.com/time/specials/2007/time100/article/0,28804,1595326_1595332_1616643,00.html",Null,Null,"Born: May 14, 1969 (age 51 years), Ivanhoe, Australia","Catherine Elise Blanchett AC is an Australian actress, producer and theatre director. One of the most acclaimed actresses of her generation, she is known for her wide range of roles across blockbusters, independent films and on the stage.","Ivanhoe, Australia","May 14, 1969",51.0,her


## Time 2004

In [100]:
response = requests.get("http://content.time.com/time/specials/packages/completelist/0,29569,1970858,00.html")
doc = BeautifulSoup(response.content, "html.parser")
doc.prettify()

full = doc.find_all(class_='full-list')[0]
listofnames = full.find_all('a')

allnames2004 = []
count = 0

for listppl in listofnames[:100]:
    name = (listppl.text)
    links = listppl['href']
    link = 'www.content.time.com' + links
    count = count + 1

    category = ''
    
    if 0 < count < 21:
        category = 'LEADERS'
        
    if 20 < count < 41:
        category = 'TITANS'
  
    if 40 < count < 61:
         category = 'ARTISTS'

    if 60 < count < 81:
        category = 'ICONS'
        
    if 80 < count < 101:
         category = 'PIONEERS'
    
    eachdict = {'Name': name, 
                 'Category': category,
                 'Year': '2004',
                 'Link': link}
    
    allnames2004.append(eachdict)

allnames2004
    

len(allnames2004)

df = pd.DataFrame(allnames2004)


df.to_csv('2004_Times_Scraped_List.csv',sep=',',index=False)


urlnames = []
for eachname in df.Name:
        eachname = eachname.replace(" ","+")
        urlnames.append(f"https://www.google.com/search?q={eachname}")
        
urlnames

driver = webdriver.Chrome()



import time

full_list = []

for url in urlnames:
    driver.get(url)
    
    try:
        names = driver.find_element_by_class_name("SPZz6b").text 
        name = (names.split('\n'))[0]

    except:
        name = 'Null'

    try:
        label = driver.find_element_by_class_name("wwUB2c").text
    except: 
        label = 'Null'

    try:
        born = driver.find_element_by_class_name("rVusze").text
    except: 
        born = 'Null'

    try:
        full_desc = driver.find_element_by_class_name("kno-rdesc").text
    except: 
        full_desc = 'Null'

    person_info = {'Name': name,
                    'Profession': label,
                    'Birth Info': born,
                    'Description': full_desc}
    
    full_list.append(person_info)
    
    time.sleep(1)

print (len(full_list))
full_list



df2 = pd.DataFrame(full_list)
df2

df2['Birthplace'] = df2['Birth Info'].str.extract(r"\), ([\w\W]+)")
df2['Birthdate'] = df2['Birth Info'].str.extract (r"Born: ([\w\s\d\,]+) [\(age]?")
df2['Age'] = df2['Birth Info'].str.extract(r"\(age ([\d]*) years\)")
df2['Description'] = df2.Description.str.replace('Description\n', '')
df2['Description'] = df2.Description.str.replace('Wikipedia', '')

try:
    df2['Pronouns'] = df2['Description'].str.extract(r"\b(she|he|her|his)(?i)\b")
    
except:
    pass





df2.to_csv("2004_Demographic_Data.csv",index=False)

time2004 = df.join(df2, rsuffix='_right')


time2004.to_csv("Time2004_Full_merged.csv")


time2004







100


Unnamed: 0,Name,Category,Year,Link,Name_right,Profession,Birth Info,Description,Birthplace,Birthdate,Age,Pronouns
0,George W. Bush,LEADERS,2004,"www.content.time.com/time/specials/packages/article/0,28804,1970858_1970888_1971022,00.html",George W. Bush,43rd U.S. President,"Born: July 6, 1946 (age 74 years), New Haven, CT","George Walker Bush is an American politician and businessman who served as the 43rd president of the United States from 2001 to 2009. A member of the Republican Party, he had previously served as the 46th governor of Texas from 1995 to 2000.","New Haven, CT","July 6, 1946",74.0,he
1,Hu Jintao,LEADERS,2004,"www.content.time.com/time/specials/packages/article/0,28804,1970858_1970888_1971028,00.html",Hu Jintao,Former President of the People's Republic of China,"Born: December 21, 1942 (age 77 years), Taizhou, China","Hu Jintao is a Chinese politician, who was General Secretary of the Chinese Communist Party from 2002 to 2012, President of the People's Republic of China from 2003 to 2013 and Chairman of the Central Military Commission from 2004 to 2012.","Taizhou, China","December 21, 1942",77.0,
2,Luiz Inácio Lula da Silva,LEADERS,2004,"www.content.time.com/time/specials/packages/article/0,28804,1970858_1970888_1971031,00.html",Luiz Inácio Lula da Silva (Lula),Former President of Brazil,"Born: October 27, 1945 (age 75 years), Caetés, State of Pernambuco, Brazil","Luiz Inácio Lula da Silva, known as Lula, is a Brazilian politician and former union leader who served as the 35th President of Brazil from 1 January 2003 to 31 December 2010.","Caetés, State of Pernambuco, Brazil","October 27, 1945",75.0,
3,Ali Husaini Sistani,LEADERS,2004,"www.content.time.com/time/specials/packages/article/0,28804,1970858_1970888_1971034,00.html",Ali al-Sistani,Ayatollah,"Born: August 4, 1930 (age 90 years), Mashhad, Iran","Grand Ayatollah Sayyid Ali al-Husayni al-Sistani, commonly known as Ayatollah Sistani, is one of the most influential Iraqi Shia marja' of Iranian origins living in Iraq. He is described as the leading spiritual leader of Iraqi Shia Muslims, and one of the most senior clerics in Shia Islam.","Mashhad, Iran","August 4, 1930",90.0,He
4,Toshihiko Fukui,LEADERS,2004,"www.content.time.com/time/specials/packages/article/0,28804,1970858_1970888_1971048,00.html",Toshihiko Fukui,Japanese economist,"Born: September 7, 1935 (age 85 years), Osaka, Osaka, Japan",Toshihiko Fukui is a Japanese economist and central banker. He was the 29th Governor of the Bank of Japan and a Director of the Bank for International Settlements.,"Osaka, Osaka, Japan","September 7, 1935",85.0,He
5,Abu al-Zarqawi,LEADERS,2004,"www.content.time.com/time/specials/packages/article/0,28804,1970858_1970888_1971049,00.html",Abū Muṣʻab Zarqāwī,Null,"Born: October 30, 1966, Zarqa, Jordan","Abu Musab al-Zarqawi, born Ahmad Fadeel al-Nazal al-Khalayleh, was a Jordanian jihadist who ran a terrorist training camp in Afghanistan.",,"October 30, 1966, Zarqa,",,
6,Kofi Annan,LEADERS,2004,"www.content.time.com/time/specials/packages/article/0,28804,1970858_1970888_1971056,00.html",Kofi Annan,Ghanaian diplomat,"Born: April 8, 1938, Kumasi, Ghana",Kofi Atta Annan was a Ghanaian diplomat who served as the seventh Secretary-General of the United Nations from January 1997 to December 2006. Annan and the UN were the co-recipients of the 2001 Nobel Peace Prize.,,"April 8, 1938, Kumasi,",,
7,Condoleeza Rice,LEADERS,2004,"www.content.time.com/time/specials/packages/article/0,28804,1970858_1970888_1971057,00.html",Condoleezza Rice,Former United States Secretary of State,"Born: November 14, 1954 (age 66 years), Birmingham, AL","Condoleezza ""Condi"" Rice is an American diplomat, political scientist, civil servant, and professor who is the current Director of the Hoover Institution at Stanford University.","Birmingham, AL","November 14, 1954",66.0,
8,Recep Tayyip Erdogan,LEADERS,2004,"www.content.time.com/time/specials/packages/article/0,28804,1970858_1970888_1971059,00.html",Recep Tayyip Erdoğan,President of Turkey,"Born: February 26, 1954 (age 66 years), Kasımpaşa, Beyoğlu, Turkey",Recep Tayyip Erdoğan is a Turkish politician serving as the current President of Turkey. He previously served as Prime Minister of Turkey from 2003 to 2014 and as Mayor of Istanbul from 1994 to 1998.,"Kasımpaşa, Beyoğlu, Turkey","February 26, 1954",66.0,He
9,John Abizaid,LEADERS,2004,"www.content.time.com/time/specials/packages/article/0,28804,1970858_1970888_1971062,00.html",John Abizaid,United States Ambassador to Saudi Arabia,"Born: April 1, 1951 (age 69 years), Redwood City, CA","John Philip Abizaid is a retired United States Army general and former U.S. Central Command commander who is currently the United States Ambassador to Saudi Arabia. Abizaid retired after 34 years of service. As of 2007, Abizaid is employed as a fellow of the Hoover Institution at Stanford University.","Redwood City, CA","April 1, 1951",69.0,
