<h1>Scraping Data Artist</h1>
<ul>
    <li>First, what we need to prepare includes a package such as pandas, selenium, bs4 as the core of the data scraping process this time. If it doesn't exist, it needs to be installed (pip install "name_package")</li>
    <li>Second, because here I use the chrome web browser, I need to download a chromedriver (http://chromedriver.chromium.org/downloads) according to the version installed on the computer (how to check: open chrome - select menu - help - press about google chrome )</li>
    <li>Third, because we need to find data on actors or actresses, we need to find a list of their names. (https://www.imdb.com/search/name/?gender=male,female&ref_=rlm)</li>
    <li>Fourth, after getting the list of names. We need to find their biographical data, using wikipedia we can search for the appropriate name keywords. (https://en.wikipedia.org/wiki/name)</li>
    <li>Fifth, it needs to be converted into a file with a .csv or .xlsx extension</li>
</ul>

In [1]:
from bs4 import BeautifulSoup
from selenium import webdriver
import re
import datetime
import csv

<h5>Initialization process of BASE_PAGE and drivers</h5>

In [19]:
# Inititialize the page that we want to scrape
url_page = 'https://www.imdb.com/search/name/?gender=male,female&start=1&ref_=rlm'
# Process our driver same exactly as our version of chrome (or web browser)
driver = webdriver.Chrome(r'External File\chromedriver.exe')

In [20]:
# Go to web page from base url
driver.get(url_page)
# For some reason, to avoid robot/automation process we have to include this
#time.sleep(5)

<h5>After the automation of the chrome driver goes to the desired page, the scraping process can be done</h5>

In [7]:
list_artist = []
result_soup = BeautifulSoup(driver.page_source, 'html.parser')
result_soup.title.string # Now we know our scraping want to scrap Males/Females Artists in resources IMDb
# print(result_soup.prettify()) # Print our source

'IMDb: Males/Females\n(Sorted by Popularity Ascending) - IMDb'

<h5>Then what we need to do to find a list of these</h5>

In [8]:
result_soup.find("lister-list")
list_item = result_soup.find_all("div", {"class": "lister-item mode-detail"})

In [9]:
for item_get in list_item:
    list_artist.append(item_get.select("[class~=lister-item-header]")[0].select("a")[0].string.lstrip().replace('\n',''))

<h5>This is total we get about the artist</h5>

In [10]:
print(len(list_artist))

50


---------------------------------

<h5>This time we want to get all data, but do not want to manual change the page. We can do it based on our automation web browser</h5>

In [11]:
# limit = 6094592 # This limit based on total artist in IMDb (6.094.592) - Change your limit
limit = 1000 # I only take 1000 row of data for example and this row will save it in file .csv
while len(list_artist) < limit:
    url_page = 'https://www.imdb.com/search/name/?gender=male,female&start='+str((len(list_artist)+1))+'&ref_=rlm'
    driver.get(url_page)
    #time.sleep(5)
    result_soup = BeautifulSoup(driver.page_source, 'html.parser')
    
    # Then what we need to do to find a list of these
    result_soup.find("lister-list")
    list_item = result_soup.find_all("div", {"class": "lister-item mode-detail"})
    
    for item_get in list_item:
        list_artist.append(item_get.select("[class~=lister-item-header]")[0].select("a")[0].string.lstrip().replace('\n',''))
        
    # This is total we get about the artist
    print("Total Current Data Name that we get -",len(list_artist))

Total Current Data Name that we get - 100
Total Current Data Name that we get - 150
Total Current Data Name that we get - 200
Total Current Data Name that we get - 250
Total Current Data Name that we get - 300
Total Current Data Name that we get - 350
Total Current Data Name that we get - 400
Total Current Data Name that we get - 450
Total Current Data Name that we get - 500
Total Current Data Name that we get - 550
Total Current Data Name that we get - 600
Total Current Data Name that we get - 650
Total Current Data Name that we get - 700
Total Current Data Name that we get - 750
Total Current Data Name that we get - 800
Total Current Data Name that we get - 850
Total Current Data Name that we get - 900
Total Current Data Name that we get - 950
Total Current Data Name that we get - 1000


<h5>After The automation complete, or you want to intrupt that loop because to many list or artist. This will what we will get</h5>

In [21]:
#print(list_artist) 
print(len(list_artist))

1000


---

<h4>After step 3 to get the name has been completed, then it's time to retrieve data from the artist's name that has been saved</h4>

In [13]:
dictionary_bio = {'full_name':'','birth_day':'','birth_place':'','age':'','occupation':'','year_active':'','spouse':'',
                  'total_marriage':'','total_divorced':'','partners':'','children':'','alma_mater':'','relatives':'',
                  'total_won':'','total_nominated':''}
list_dictionary_bio = []
index_below_name = 0

In [None]:
# url_page = 'https://en.wikipedia.org/wiki/'+list_artist[1].replace(' ','_')
url_page = 'https://en.wikipedia.org/wiki/Elizabeth_Olsen'
# url_page = 'https://en.wikipedia.org/wiki/Tom_Hardy'
driver.get(url_page)

In [None]:
result_soup = BeautifulSoup(driver.page_source, 'html.parser')
result_soup.title.string

In [None]:
all_bio = result_soup.find("table", {"class": "infobox biography vcard"})
if all_bio != None:
    all_bio = all_bio.select('tr')

In [None]:
def update_data_dict(name_section, value_section):
#     print(name_section == "Years active"+ " --> ",len(name_section)," -- ",len('Years active'))
    if name_section == 'full_name':
        dictionary_bio['full_name'] = value_section
    elif name_section == 'birth_day':
        dictionary_bio['birth_day'] = value_section
    elif name_section == 'birth_place':
        dictionary_bio['birth_place'] = value_section
    elif name_section == 'age':
        dictionary_bio['age'] = value_section
    elif name_section.strip().replace('\xa0', ' ') == 'Years active':
        dictionary_bio['year_active'] = value_section
    elif name_section == 'Occupation':
        dictionary_bio['occupation'] = value_section
    elif name_section == 'Spouse(s)':
        dictionary_bio['spouse'] = value_section.replace('\u200b', '').replace('\xa0', ' ')
    elif name_section == 'total_marriage':
        dictionary_bio['total_marriage'] = value_section
    elif name_section == 'total_divorced':
        dictionary_bio['total_divorced'] = value_section
    elif name_section == 'Partner(s)':
        dictionary_bio['partners'] = value_section
    elif name_section == 'Children':
        dictionary_bio['children'] = re.sub("[^0-9^.]", "", value_section)
    elif name_section.strip().replace('\xa0', ' ') == 'Alma mater':
        dictionary_bio['alma_mater'] = value_section
    elif name_section == 'Relatives':
        dictionary_bio['relatives'] = value_section
    elif name_section == 'total_won':
        dictionary_bio['total_won'] = value_section
    elif name_section == 'total_nominated':
        dictionary_bio['total_nominated'] = value_section

In [None]:
for index, artist_bio in enumerate(all_bio):
    if len(all_bio[index].select('th')) > 0 and all_bio[index].select('th')[0].text == 'Born':
        update_data_dict('full_name', re.sub("[^A-Z a-z^.]", "", all_bio[index].select('[class~=nickname]')[0].text))
        update_data_dict('birth_day', all_bio[index].select('[class~=bday]')[0].string)
        update_data_dict('birth_place', all_bio[index].select('[class~=birthplace]')[0].text)
        update_data_dict('age', datetime.datetime.now().year - datetime.datetime.strptime(all_bio[index].select('[class~=bday]')[0].string, '%Y-%m-%d').year)
#         dictionary_bio['full_name'] = all_bio[index].select('[class~=nickname]')[0].string
#         dictionary_bio['birth_day'] = all_bio[index].select('[class~=bday]')[0].string
#         dictionary_bio['birth_place'] = all_bio[index].select('[class~=birthplace]')[0].text
#         dictionary_bio['age'] = datetime.datetime.now().year - datetime.datetime.strptime(all_bio[index].select('[class~=bday]')[0].string, '%Y-%m-%d').year
        
        index_below_name = index
        # Print out
        print("Section :"+all_bio[index].select('th')[0].string)
        print("Nama Panjang :"+re.sub("[^A-Z a-z^.]", "", all_bio[index].select('[class~=nickname]')[0].text))
        print("Tanggal Lahir :"+all_bio[index].select('[class~=bday]')[0].string)
        print("Umur :"+re.sub("[^0-9^.]", "", all_bio[index].select('[class~=noprint]')[0].string.lstrip()))
        print("Tempat Lahir :"+all_bio[index].select('[class~=birthplace]')[0].text)
        
    elif index > index_below_name:
        temp_list = []
        if len(all_bio[index]) > 1:
            if len(all_bio[index].select('td')[0].select('a')) > 1:
                for iteration_tag_a in range(len(all_bio[index].select('td')[0].select('a'))):
                    temp_list.append(all_bio[index].select('td')[0].select('a')[iteration_tag_a].string)
                
                update_data_dict(all_bio[index].select('th')[0].string, ', '.join(temp_list))
                print("Section :"+all_bio[index].select('th')[0].string)
                print("Isi = "+', '.join(temp_list))
            elif len(all_bio[index].select('td')[0].select('li')) > 1:
                if all_bio[index].select('td')[0].select('li')[0].string != None:
                    for iteration_tag_a in range(len(all_bio[index].select('td')[0].select('li'))):
                        temp_list.append(all_bio[index].select('td')[0].select('li')[iteration_tag_a].string)
                    update_data_dict(all_bio[index].select('th')[0].string, ', '.join(temp_list))
#                     dictionary_bio[all_bio[index].select('th')[0].string] = ', '.join(temp_list)
                    print("Section :"+all_bio[index].select('th')[0].string)
                    print("Isi = "+', '.join(temp_list))
                else:
                    count_maried = 0
                    count_divorced = 0
                    for iteration_tag_a in range(len(all_bio[index].select('td')[0].select('li'))):
                        if 'm.' in re.search('\(([^)]+)', all_bio[index].select('td')[0].select('div')[0].select('li')[iteration_tag_a].text).group(1):
                            count_maried += 1
                        if 'div.' in re.search('\(([^)]+)', all_bio[index].select('td')[0].select('div')[0].select('li')[iteration_tag_a].text).group(1):
                            count_divorced += 1
                        temp_list.append(all_bio[index].select('td')[0].select('div')[0].select('li')[iteration_tag_a].select('div')[1].string+" ("+re.search('\(([^)]+)', all_bio[index].select('td')[0].select('div')[0].select('li')[iteration_tag_a].text).group(1)+")")
                        print(all_bio[index].select('td')[0].select('div')[0].select('li')[iteration_tag_a].select('div')[1].string+" ("+re.search('\(([^)]+)', all_bio[index].select('td')[0].select('div')[0].select('li')[iteration_tag_a].text).group(1)+")")   

                    #print("Total Married :",count_maried," | Total Divorced :",count_divorced)
                    update_data_dict(all_bio[index].select('th')[0].string, ', '.join(temp_list))

                    update_data_dict('total_marriage', count_maried)
                    update_data_dict('total_divorced', count_divorced)
            else:
                if len(all_bio[index].select('th')) > 0:
                    if all_bio[index].select('td')[0].text == 'Full list':
                        update_data_dict(all_bio[index].select('th')[0].string, all_bio[index].select('td')[0].find_all(href=True)[0]['href'])
                        print("Section :"+all_bio[index].select('th')[0].string)
                        print("Isi : "+"https://en.wikipedia.org/"+all_bio[index].select('td')[0].find_all(href=True)[0]['href'])
                    else:
                        count_maried = 0
                        count_divorced = 0
                        update_data_dict(all_bio[index].select('th')[0].string, all_bio[index].select('td')[0].text)
                        print("Section :"+all_bio[index].select('th')[0].string)
                        print("Isi : "+all_bio[index].select('td')[0].get_text(strip=True))
                        if 'm.' in all_bio[index].select('td')[0].text and all_bio[index].select('th')[0].string == 'Spouse(s)':
                            count_maried = 1
                            update_data_dict('total_marriage', count_maried)
                        if 'div.' in all_bio[index].select('td')[0].text and all_bio[index].select('th')[0].string == 'Spouse(s)':
                            count_divorced = 1
                            update_data_dict('total_divorced', count_divorced)
                        
# Check the winner and nominated on table award
# if result_soup.find("table", {"class": "wikitable sortable plainrowheaders jquery-tablesorter"}) != None:
#     total_awards = result_soup.find("table", {"class": "wikitable sortable plainrowheaders jquery-tablesorter"})
#     update_data_dict('total_won', len(total_awards.findAll("td", {"class": "yes table-yes2"})))
#     update_data_dict('total_nominated', len(total_awards.findAll("td", {"class": "no table-no2"})))
#     print("Total Won ",len(total_awards.findAll("td", {"class": "yes table-yes2"}))) # Won
#     print("Total Nominated ",len(total_awards.findAll("td", {"class": "no table-no2"}))) # Nominated
# else:
#     name_ast = 'Johnny_Depp'
#     driver.get('https://en.wikipedia.org/wiki/List_of_awards_and_nominations_received_by_'+name_ast) # Name Artist
#     result_soup = BeautifulSoup(driver.page_source, 'html.parser')
#     total_win = 0
#     total_nominated = 0
#     total_awards = result_soup.findAll("table", {"class": "wikitable"})
#     for awards in total_awards:
#         total_win += len(awards.findAll("td", {"class": "yes table-yes2"}))
#         total_nominated += len(awards.findAll("td", {"class": "no table-no2"}))
#     update_data_dict('total_won', total_win)
#     update_data_dict('total_nominated', total_nominated)
if check_full_list_awards != '' and len(result_soup.findAll("table", {"class": "wikitable"})) > 0:
    total_win = 0
    total_nominated = 0
    total_awards = result_soup.findAll("table", {"class": "wikitable"})
    for awards in total_awards:
        total_win += len(awards.findAll("td", {"class": "yes table-yes2"}))
        total_nominated += len(awards.findAll("td", {"class": "no table-no2"}))
    update_data_dict('total_won', total_win)
    update_data_dict('total_nominated', total_nominated)
if dictionary_bio.get('total_won') == '' or dictionary_bio.get('total_won') == 0:
    driver.get('https://en.wikipedia.org/wiki/List_of_awards_and_nominations_received_by_'+artist_choosen) # Name Artist
    result_soup = BeautifulSoup(driver.page_source, 'html.parser')
    total_win = 0
    total_nominated = 0
    total_awards = result_soup.findAll("table", {"class": "wikitable"})
    for awards in total_awards:
        total_win += len(awards.findAll("td", {"class": "yes table-yes2"}))
        total_nominated += len(awards.findAll("td", {"class": "no table-no2"}))
    update_data_dict('total_won', total_win)
    update_data_dict('total_nominated', total_nominated)
    
# Clear the dictionary
# dictionary_bio = {}
print(dictionary_bio)

In [None]:
print(check_full_list_awards != '' and len(result_soup.findAll("table", {"class": "wikitable"})) > 0)
print(dictionary_bio.get('total_won') == '' or dictionary_bio.get('total_won') == 0)

---

<h5>Without manual click from user, making for loop for each user that we get</h5>

In [22]:
dictionary_bio = {'full_name':'','birth_day':'','birth_place':'','age':'','occupation':'','year_active':'','spouse':'',
                  'total_marriage':'','total_divorced':'','partners':'','children':'','alma_mater':'','relatives':'',
                  'total_won':'','total_nominated':''}
check_full_list_awards = ''
list_dictionary_bio = []
index_below_name = 0

In [23]:
def update_data_dict(name_section, value_section):
    if type(value_section) == str:
        value_section = value_section.strip() #.replace('-', '-').encode('ascii',errors='ignore').decode('ascii') # .replace('\u200b', '').replace('\xa0', ' ').replace('\n', '').replace(';', ':')
    if name_section == 'full_name':
        dictionary_bio['full_name'] = value_section
    elif name_section == 'birth_day':
        dictionary_bio['birth_day'] = value_section
    elif name_section == 'birth_place':
        dictionary_bio['birth_place'] = value_section
    elif name_section == 'age':
        dictionary_bio['age'] = value_section
    elif name_section.strip().replace('\xa0', ' ') == 'Years active':
        dictionary_bio['year_active'] = value_section
    elif name_section == 'Occupation':
        dictionary_bio['occupation'] = value_section.replace('\n', ',')
    elif name_section == 'Spouse(s)':
        dictionary_bio['spouse'] = value_section
    elif name_section == 'total_marriage':
        dictionary_bio['total_marriage'] = value_section
    elif name_section == 'total_divorced':
        dictionary_bio['total_divorced'] = value_section
    elif name_section == 'Partner(s)':
        dictionary_bio['partners'] = value_section
    elif name_section == 'Children':
        dictionary_bio['children'] = re.sub("[^0-9^.]", "", value_section)
    elif name_section.strip().replace('\xa0', ' ') == 'Alma mater':
        dictionary_bio['alma_mater'] = value_section
    elif name_section == 'Relatives':
        dictionary_bio['relatives'] = value_section
    elif name_section == 'total_won':
        dictionary_bio['total_won'] = value_section
    elif name_section == 'total_nominated':
        dictionary_bio['total_nominated'] = value_section

In [None]:
for name_artist_review in list_artist:
    artist_choosen = name_artist_review.replace(' ','_')
    url_page = 'https://en.wikipedia.org/wiki/'+artist_choosen
    # Go to page wikipedia at specific name of artist
    driver.get(url_page)
    # Getting source and put it to process by beautiful soup
    result_soup = BeautifulSoup(driver.page_source, 'html.parser')
    all_bio = result_soup.find("table", {"class": "infobox biography vcard"})
    if all_bio != None:
        all_bio = all_bio.select('tr')
        for index, artist_bio in enumerate(all_bio):
            if len(all_bio[index].select('th')) > 0 and all_bio[index].select('th')[0].text == 'Born':
                if len(all_bio[index].select('[class~=nickname]')) > 0:
                    update_data_dict('full_name', re.sub("[^A-Z a-z^.]", "", all_bio[index].select('[class~=nickname]')[0].text))
                else:
                    update_data_dict('full_name', name_artist_review)
                if len(all_bio[index].select('[class~=bday]')) > 0:
                    update_data_dict('birth_day', all_bio[index].select('[class~=bday]')[0].string)
                else:
                    update_data_dict('birth_day', '')
                if len(all_bio[index].select('[class~=birthplace]')) > 0:
                    update_data_dict('birth_place', all_bio[index].select('[class~=birthplace]')[0].text)
                else:
                    update_data_dict('birth_place', '')
                if len(all_bio[index].select('[class~=bday]')) > 0:
                    if len(all_bio[index].select('[class~=bday]')[0].string.split('-')) == 3:
                        update_data_dict('age', datetime.datetime.now().year - datetime.datetime.strptime(all_bio[index].select('[class~=bday]')[0].string, '%Y-%m-%d').year)
                    elif len(all_bio[index].select('[class~=bday]')[0].string.split('-')) == 2:
                        update_data_dict('age', datetime.datetime.now().year - datetime.datetime.strptime(all_bio[index].select('[class~=bday]')[0].string, '%Y-%m').year)
                    elif len(all_bio[index].select('[class~=bday]')[0].string.split('-')) == 1:
                        update_data_dict('age', datetime.datetime.now().year - datetime.datetime.strptime(all_bio[index].select('[class~=bday]')[0].string, '%Y').year)
                else:
                    update_data_dict('age','')
                index_below_name = index

            elif index > index_below_name:
                temp_list = []
                if len(all_bio[index]) > 1:
                    if len(all_bio[index].select('td')) > 1:
                        for iteration_tag_a in range(len(all_bio[index].select('td')[0].select('a'))):
                            temp_list.append(all_bio[index].select('td')[0].select('a')[iteration_tag_a].get_text(strip=True))
                        if len (all_bio[index].select('th')):
                            update_data_dict(all_bio[index].select('th')[0].text, ', '.join(filter(None, temp_list)))
                    elif len(all_bio[index].select('td')) > 1:
                        if all_bio[index].select('td')[0].select('li')[0].string != None:
                            for iteration_tag_a in range(len(all_bio[index].select('td')[0].select('li'))):
                                temp_list.append(all_bio[index].select('td')[0].select('li')[iteration_tag_a].get_text(strip=True))
                            if len (all_bio[index].select('th')):
                                update_data_dict(all_bio[index].select('th')[0].text, ', '.join(filter(None, temp_list)))
                        else:
                            count_maried = 0
                            count_divorced = 0
                            for iteration_tag_a in range(len(all_bio[index].select('td')[0].select('li'))):
                                if all_bio[index].select('td')[0].select('div')[0].select('li')[iteration_tag_a].text != "":
                                    if 'm.' in re.search('\(([^)]+)', all_bio[index].select('td')[0].select('div')[0].select('li')[iteration_tag_a].text).group(1):
                                        count_maried += 1
                                    if 'div.' in re.search('\(([^)]+)', all_bio[index].select('td')[0].select('div')[0].select('li')[iteration_tag_a].text).group(1):
                                        count_divorced += 1
    #                                 if temp_list != None or len(temp_list) > 0:
                                    if len(all_bio[index].select('td')[0].select('div')[0].select('li')[iteration_tag_a].select('div')) > 0:
                                        temp_list.append(all_bio[index].select('td')[0].select('div')[0].select('li')[iteration_tag_a].select('div')[1].text+" ("+re.search('\(([^)]+)', all_bio[index].select('td')[0].select('div')[0].select('li')[iteration_tag_a].text).group(1)+")")
                                    else:
                                        temp_list.append(all_bio[index].select('td')[0].select('div')[0].text+" ("+re.search('\(([^)]+)', all_bio[index].select('td')[0].select('div')[0].select('li')[iteration_tag_a].text).group(1)+")")
                                    
                            update_data_dict(all_bio[index].select('th')[0].text, ', '.join(filter(None, temp_list)))
                            update_data_dict('total_marriage', count_maried)
                            update_data_dict('total_divorced', count_divorced)
                    else:
                        if len(all_bio[index].select('th')) > 0:
                            if all_bio[index].text == 'Full list':
                                check_full_list_awards = all_bio[index].select('td')[0].find_all(href=True)[0]['href']
                                update_data_dict(all_bio[index].select('th')[0].text, all_bio[index].select('td')[0].find_all(href=True)[0]['href'])
                            else:
                                count_maried = 0
                                count_divorced = 0
                                if len(all_bio[index].select('td')) > 0:
                                    update_data_dict(all_bio[index].select('th')[0].text, all_bio[index].select('td')[0].text)
                                    if 'm.' in all_bio[index].select('td')[0].text and all_bio[index].select('th')[0].string == 'Spouse(s)':
                                        count_maried = 1
                                        update_data_dict('total_marriage', count_maried)
                                    if 'div.' in all_bio[index].select('td')[0].text and all_bio[index].select('th')[0].string == 'Spouse(s)':
                                        count_divorced = 1
                                        update_data_dict('total_divorced', count_divorced)

        total_win = 0
        total_nominated = 0
        # Check the winner and nominated on table award
        if len(result_soup.findAll("table", {"class": "wikitable"})) > 0:
            #total_win = 0
            #total_nominated = 0
            total_awards = result_soup.findAll("table", {"class": "wikitable"})
            for awards in total_awards:
                total_win += len(awards.findAll("td", {"class": "yes table-yes2"}))
                total_nominated += len(awards.findAll("td", {"class": "no table-no2"}))
            update_data_dict('total_won', total_win)
            update_data_dict('total_nominated', total_nominated)
        if total_win == 0 and total_nominated == 0:
            driver.get('https://en.wikipedia.org/wiki/List_of_awards_and_nominations_received_by_'+artist_choosen) # Name Artist
            result_soup = BeautifulSoup(driver.page_source, 'html.parser')
            #total_win = 0
            #total_nominated = 0
            total_awards = result_soup.findAll("table", {"class": "wikitable"})
            for awards in total_awards:
                total_win += len(awards.findAll("td", {"class": "yes table-yes2"}))
                total_nominated += len(awards.findAll("td", {"class": "no table-no2"}))
            update_data_dict('total_won', total_win)
            update_data_dict('total_nominated', total_nominated)

        # Saved dictionary into list
        list_dictionary_bio.append(dictionary_bio)
        dictionary_bio = {'full_name':'','birth_day':'','birth_place':'','age':'','occupation':'','year_active':'','spouse':'',
                  'total_marriage':'','total_divorced':'','partners':'','children':'','alma_mater':'','relatives':'',
                  'total_won':'','total_nominated':''}

<h5>Last step is convert the dictionary data into CSV</h5>

In [25]:
len(list_dictionary_bio)
# list_dictionary_bio

831

In [26]:
csv_columns = ['full_name','birth_day','birth_place','age','occupation','year_active','spouse','total_marriage','total_divorced','partners','children','alma_mater','relatives','total_won','total_nominated']
csv_file_saved = "External Output/artist_data_from_wikipedia.csv"

with open(csv_file_saved, mode='w', encoding="utf-8") as csv_file:
    writer = csv.DictWriter(csv_file, fieldnames=csv_columns)
    writer.writeheader()
    for data in list_dictionary_bio:
        writer.writerow(data)

<h5>Check the data for end result</h5>

In [27]:
import pandas as pd
pd.read_csv('External Output/artist_data_from_wikipedia.csv')

Unnamed: 0,full_name,birth_day,birth_place,age,occupation,year_active,spouse,total_marriage,total_divorced,partners,children,alma_mater,relatives,total_won,total_nominated
0,Teyana Me Shay Jacqueli Taylor,1990-12-10,"New York City, U.S.",31.0,"Actress,singer-songwriter,dancer,choreographer...",2005–present,Iman Shumpert ​(m. 2016)​,1.0,,,2.0,,,3,4
1,Elizabeth Chase Olsen,1989-02-16,"Los Angeles, California, U.S.",32.0,Actress,"1993–1996, 2010–present",,,,Robbie Arnett (2016–present; engaged),,New York University,Mary-Kate Olsen (sister)Ashley Olsen (sister),8,19
2,Brianne Howey,1989-05-24,"Los Angeles, California",32.0,Actress,2008–present,,,,,,,,0,0
3,Kiandra Layne,1991-12-10,"Cincinnati, Ohio",30.0,Actress,2015–present,,,,,,,,0,0
4,Memphis Eve Sunny Day Iris Hewson,1991-07-07,"Dublin, Ireland",30.0,Actress,2008–present,,,,,,,,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
826,William James Murray,1950-09-21,"Evanston, Illinois, U.S.[1]",71.0,"Actor,comedian,writer",1973–present,Margaret Kelly\r\n​ ​(m. 1981; div. 1996)​Jenn...,1.0,1.0,,6.0,,Brian Doyle-Murray (brother)Joel Murray (brother),6,13
827,Ian David McShane,1942-09-29,"Blackburn, Lancashire, England",79.0,Actor,1962–present,Suzan Farmer\r\n​ ​(m. 1965; div. 1968)​Ruth P...,1.0,1.0,,2.0,Royal Academy of Dramatic Art,,3,13
828,Taraji Penda Henson,1970-09-11,"Washington, D.C., U.S.",51.0,Actress,1992–present,,,,,1.0,,,40,66
829,Uma Karuna Thurman,1970-04-29,"Boston, Massachusetts, U.S.",51.0,"Actress,writer,producer,model",1985–present,Gary Oldman\r\n​ ​(m. 1990; div. 1992)​\r\n\r\...,1.0,1.0,Arpad Busson (2007–2009; 2011–2014),3.0,,Max von Schlebrügge (cousin),3,8
