# About this module

The module scrapes Eurovision data off of the main site which stores that data, which is https://eschome.net/index.html

The structure of the site is to provide the user with various slices of the data, which are presented in html tables.  No direct access to the underlying data is available.

At first glance, it seems like eschome.net is dynamically rendering the data in a way that will make it hard to scrape, but as you investigate the site, it turns out you can reverse-engineer the html POST operations which render the data, so even though the actual database calls are hidden in some php code, you can treat the resulting pages as static html.

The data of interest for this presentation is:

* List of every Eurovision final (reference)

* List of years, countries that participated in the finals that year, the order in which they performed in the finals, and how they placed. (to allow analysis of how important it is which order you perform in)

* List of years, participant countries, and how they voted (to allow analysis of block voting, like "all the Baltic countries vote together")

## Get list of all the finals, and where they were hosted

In [12]:
import requests
import pandas as pd
from io import StringIO as SIO

# url generated by eschome.net when you click on "List of all Final Events" (no details)
url = 'https://eschome.net/databaseoutput410.php'

# get the full page text
page = requests.post(url)

# create a list of tables
list_of_tables = pd.read_html(SIO(page.text), header = 0)

# assign second table to dataframe and keep only the interesting columns
df = list_of_tables[1]
all_cols = df[['Year','Country','City','Location','Broadcaster','Date']]
print("Imported " + str(len(all_cols)) + " finals records.")
all_cols.to_csv('all_finals.csv', index = False)

Imported 67 finals records.


## Get the list of what the order and placement was for all finals participants -- requires a loop.

In [6]:
import requests
import pandas as pd
from io import StringIO as SIO

# set base url
url = 'https://eschome.net/databaseoutput403.php'

# get list of countries
# note:  country list was pulled from source code of https://eschome.net/index.html and massaged in excel
countries = pd.read_csv('all_countries.csv')

# loop through all countries to get all votes from all other countries
for reciever_index, receiver_row in countries.iterrows():
    for giver_index, giver_row in countries.iterrows():
        
        # countries are not allowed to vote for themselves.
        if( reciever_index == giver_index ):
            continue

        else:
            print("Importing ratings for " + receiver_row['Name'] + " from " + giver_row['Name'] + "...")

            # get all the pages and append to all_votes
            params = {'land_erhalten' : receiver_row['Code'], 'land_gegeben' : giver_row['Code'], 'x' : '7', 'y' : '3'}
            page = requests.post(url,data = params,allow_redirects=False)

            # create a list of tables
            list_of_tables = pd.read_html(SIO(page.text), header = 0)
            if(len(list_of_tables) < 2):
                print("error - no tables found on this page.")
                continue

            # create dataframe if it doesn't exist yet, otherwise append
            df = list_of_tables[1][['Year','Type','Points']].copy()
            df['Receiver'] = receiver_row['Code']
            df['Giver'] = giver_row['Code']
            df = df[['Giver','Receiver','Year','Type','Points']]
                    
            if(reciever_index == 0 and giver_index == 1):
                all_votes = df.copy()
            else:
                all_votes = pd.concat([all_votes, df], ignore_index=True)
            
            print("Running total of " + str(len(all_votes)) + " vote records imported.")
            print(all_votes)

            if(reciever_index==2):
                break

all_votes.to_csv('all_votes.csv', index = False)


Importing ratings for Albania from Andorra...
Running total of 7 vote records imported.
  Giver Receiver  Year Type  Points
0    AD       AL  2009    F       0
1    AD       AL  2008    F       0
2    AD       AL  2007   SF       0
3    AD       AL  2006   SF       0
4    AD       AL  2005    F       0
5    AD       AL  2004    F       0
6    AD       AL  2004   SF       6
Importing ratings for Albania from Armenia...
Running total of 25 vote records imported.
   Giver Receiver  Year Type  Points
0     AD       AL  2009    F       0
1     AD       AL  2008    F       0
2     AD       AL  2007   SF       0
3     AD       AL  2006   SF       0
4     AD       AL  2005    F       0
5     AD       AL  2004    F       0
6     AD       AL  2004   SF       6
7     AM       AL  2023    F       0
8     AM       AL  2023  SF2       7
9     AM       AL  2022  SF1       2
10    AM       AL  2019    F       0
11    AM       AL  2019  SF2       0
12    AM       AL  2018    F       0
13    AM       AL