# ST446 Distributed Computing for Big Data
## Homework
### Milan Vojnovic and Christine Yuen, LT 2018
---


## P1: Querying YAGO semantic knowledge base

YAGO is a semantic knowledge base, derived from Wikipedia WordNet and GeoNames. YAGO contains knowledge about more than 10 million entities (like persons, organizations and cities) and contains more than 120 million facts about these entities. 

You may find more about YAGO [here](https://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/yago/#c10444).

In this homework assignment you are asked to use this dataset to demonstrate your knowledge about Spark graphframes and motif queries. In particular, you are asked to **_use motif queries_** to find out answers to the following queries stated in English language:
1. Politicians who are also scientists
2. Companies whose founders were born in London
3. Writers who have won a Nobel Prize (in any discipline)
4. Nobel prize winners who were born in the same city as their spouses   

## 0.1 Get YAGO data

You may download the whole YAGO dataset from here https://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/yago/downloads/, but it size is 170GB. Instead, we advise you to access the database hosted on an LSE server.

### Connecting to the YAGO database hosted on an LSE server

You can connect to the database if you are onsite and have a wired connection or are logged in the eduroam wifi service.  

To connect to the database offsite, first connect to the LSE network using the Pulse Secure client. Details on how to install and use Pulse Secure are available from:

http://www.lse.ac.uk/intranet/LSEServices/IMT/guides/workingOffCampus/installing-pulse.aspx

The Pulse Secure service is known to work well for Mac and Windows users.


### Download the postgresql JDBC driver

Download the driver from here https://jdbc.postgresql.org/download.html. You need to tell Spark where the driver is by either:
* updating `spark-defaults.conf` file by adding `spark.jars [path_to_the_jar_file]`
* adding as argument when you run Spark `pyspark --jars [path_to_the_jar_file]`


## 0.2 Read the data into Spark

In [1]:
from pyspark.sql import DataFrameReader
from pyspark.sql.types import *

url = 'postgresql://hpc-db-yago.lse.ac.uk:15435/yago?ssl=true&sslfactory=org.postgresql.ssl.NonValidatingFactory'
properties = {'user': 'yago', 'password': 'Zaeda1dah6eiTahghaeph', "driver": "org.postgresql.Driver"}
df = sqlContext.read.jdbc(
    url='jdbc:%s' % url, table='yagofacts', properties=properties
)

## 0.3 Understand the database schema

Let's look at the schema:

In [2]:
df.printSchema()

root
 |-- id: string (nullable = true)
 |-- subject: string (nullable = true)
 |-- predicate: string (nullable = true)
 |-- object: string (nullable = true)
 |-- value: double (nullable = true)



The useful information is in columns "subject", "predicate" and "object". "predicate" defines the relation between entities "subject" and "object". For example, for "Albert Einstein was born in Ulm", "Albert Einstein" is the subject, "was born in" is the predicate and "Ulm" is the object.

## 0.4 Simple query example

To get information about where Albert Einstein was born, we load data into Spark using the following query:

In [4]:
born_city_df = df.where("predicate == '<wasBornIn>' AND object == '<London>'")
born_city_df.show(1)

+--------------------+--------------------+-----------+--------+-----+
|                  id|             subject|  predicate|  object|value|
+--------------------+--------------------+-----------+--------+-----+
|<id_qe04ts_oyl_27...|<Simon_Milton_(po...|<wasBornIn>|<London>| null|
+--------------------+--------------------+-----------+--------+-----+
only showing top 1 row



In [7]:
born_city_df.where("subject = '<Albert_Einstein>'").show()

+--------------------+-----------------+-----------+------+-----+
|                  id|          subject|  predicate|object|value|
+--------------------+-----------------+-----------+------+-----+
|<id_tiwmcu_oyl_zi...|<Albert_Einstein>|<wasBornIn>| <Ulm>| null|
+--------------------+-----------------+-----------+------+-----+



You may wonder how one would know whether to use the predicate '&lt;wasBornIn&gt;' or '&lt;was_born_in&gt;' and subject '&lt;Albert_Einstein&gt;' or '&lt;AlbertEinstein&gt;'. For YAGO subjects (and objects), the naming should be aligned with Wikipedia. For example, Albert Einstein's wiki is: https://en.wikipedia.org/wiki/Albert_Einstein and you can see it is 'Albert_Einstein' at the end. 

For predicates, you can get the "property" list from the [yago web interface](https://gate.d5.mpi-inf.mpg.de/webyagospotlx/WebInterface?L01=%3Fx&L0R=%3CwasBornIn%3E&L02=%3Fc&L0T=&L03=&L0L=&L04=&L05=&L11=&L1R=&L12=&L1T=&L13=&L1L=&L14=&L15=&L21=&L2R=&L22=&L2T=&L23=&L2L=&L24=&L25=&L31=&L3R=&L32=&L3T=&L33=&L3L=&L34=&L35=&L41=&L4R=&L42=&L4T=&L43=&L4L=&L44=&L45=). Try different queries with this web interface query to understand more how to query YAGO.

## 0.5 Simple motif example

In this part of the homework, you are required to use **motif queries** to find out answers to the above four questions. Here is an example of a motif query to find out "Which city was Albert Einstein born in?" using a motif query instead of a SQL query:

In [8]:
sel_df = born_city_df.select("subject", "object", "predicate")
born_e = sel_df.withColumnRenamed("subject","src")\ # e = edges
            .withColumnRenamed("object","dst") \
            .distinct()
born_s = sel_df.select("subject")\ # s = subject
            .withColumnRenamed("subject","id")\
            .distinct()
born_o = sel_df.select("object")\ # o = object
            .withColumnRenamed("object","id")\
            .distinct()

In [9]:
from graphframes import *

v0 = born_s.unionAll(born_o).distinct()
e0 = born_e.distinct()

g0 = GraphFrame(v0,e0)

q0 = g0.find("(person)-[]-> (city)")\ # definition of person and city
        .where("person.id = '<Albert_Einstein>'") # .id defined in born_s
    
q0.show()

+-------------------+-------+
|             person|   city|
+-------------------+-------+
|[<Albert_Einstein>]|[<Ulm>]|
+-------------------+-------+



## 0.6 Some useful tips

### Get a subset of YAGO database
YAGO database is large, so don't try to load the entire database into a dataframe and then query it. If you do this, you will find out that you won't be even able to execute `df.take(1)`. Instead, you have to use Spark sql commands or `df.where` to get only a subset of the data that you need. This would work.

### Try the queries in the YAGO web interface first
It is sometimes tricky to get the right "subject", "predicate" and "object". It is easier if you start from [yago web interface](https://gate.d5.mpi-inf.mpg.de/webyagospotlx/WebInterface?L01=%3Fx&L0R=%3CwasBornIn%3E&L02=%3Fc&L0T=&L03=&L0L=&L04=&L05=&L11=&L1R=&L12=&L1T=&L13=&L1L=&L14=&L15=&L21=&L2R=&L22=&L2T=&L23=&L2L=&L24=&L25=&L31=&L3R=&L32=&L3T=&L33=&L3L=&L34=&L35=&L41=&L4R=&L42=&L4T=&L43=&L4L=&L44=&L45=) rather than directly querying in PySpark. Once your query works, then you can convert your queries to Pyspark code. Note that sometimes the web version of object/subject code may be different from what you need to type here. For example, company code is &lt;wordnet_company_108058098&gt; when you do the query here but when you do it via the web interface it is &lt;wordnet company 108058098&gt;. Also, it is possible that the query results from the web is different from here. This may because the YAGO database hosted at our LSE server is different from the one accessed via the web interface.

### Be patient and don't do this exercise in the last minute
Some trial and error is needed to get the query right and it may take a long time to get the result for a query. Also, our LSE server may get overloaded if many of you try access it at the same time. For these reasons, we advise you not to wait to work on this exercise just before the submission deadline. 

## 1) Politicians who are also scientists

In [15]:
pol_df = df.where("predicate == 'rdf:type' AND object == '<wordnet_politician_110451263>'")
pol_df.show(5)

+--------------------+--------------------+---------+--------------------+-----+
|                  id|             subject|predicate|              object|value|
+--------------------+--------------------+---------+--------------------+-----+
|<id_jvwk4m_88c_1k...|  <William_D._Swart>| rdf:type|<wordnet_politici...| null|
|<id_rygt0w_88c_1k...| <Daniel_Zimmermann>| rdf:type|<wordnet_politici...| null|
|<id_1jhv5fn_88c_1...|<James_Fairman_Fi...| rdf:type|<wordnet_politici...| null|
|<id_17obfyw_88c_1...|   <William_A._Mott>| rdf:type|<wordnet_politici...| null|
|<id_1g0o0ur_88c_1...|<Anthony_Young,_B...| rdf:type|<wordnet_politici...| null|
+--------------------+--------------------+---------+--------------------+-----+
only showing top 5 rows



In [14]:
scientist_df = df.where("predicate == 'rdf:type' AND object == '<wordnet_scientist_110560637>'")
scientist_df.show(5)

+----+--------------------+---------+--------------------+-----+
|  id|             subject|predicate|              object|value|
+----+--------------------+---------+--------------------+-----+
|null|         <René_Thom>| rdf:type|<wordnet_scientis...| null|
|null|    <Paul_A._Catlin>| rdf:type|<wordnet_scientis...| null|
|null|<Waldemar_Christo...| rdf:type|<wordnet_scientis...| null|
|null|   <Karl_von_Frisch>| rdf:type|<wordnet_scientis...| null|
|null|<Edward_Robert_Ha...| rdf:type|<wordnet_scientis...| null|
+----+--------------------+---------+--------------------+-----+
only showing top 5 rows



In [20]:
# politician dataframe
pol_df1 = pol_df.select("subject", "object", "predicate")

pol_e = pol_df1.withColumnRenamed("subject", "src")\
            .withColumnRenamed("object", "dst")\
            .distinct()

pol_o = pol_df1.select("object").withColumnRenamed("object", "id").distinct()
pol_s = pol_df1.select("subject").withColumnRenamed("subject", "id").distinct()     


# scientist dataframe
sci_df1 = scientist_df.select("subject", "object", "predicate")

sci_e = sci_df1.withColumnRenamed("subject", "src")\
            .withColumnRenamed("object", "dst")\
            .distinct()
sci_o = sci_df1.select("object").withColumnRenamed("object", "id").distinct()
sci_s = sci_df1.select("subject").withColumnRenamed("subject", "id").distinct()  

In [24]:
from graphframes import *

v0  = pol_s.unionAll(pol_o).unionAll(sci_s).unionAll(sci_o).distinct()
e0 = pol_e.unionAll(sci_e).distinct()

g0 = GraphFrame(v0,e0)

q0 = g0.find("(person)-[]-> (politician); (person) -[]-> (scientist)")\
        .where("politician.id = '<wordnet_politician_110451263>' AND scientist.id = '<wordnet_scientist_110560637>'")
    
q0.show()

+--------------------+--------------------+--------------------+
|              person|          politician|           scientist|
+--------------------+--------------------+--------------------+
|    [<Anton_Crihan>]|[<wordnet_politic...|[<wordnet_scienti...|
|   [<Haroldo_Rodas>]|[<wordnet_politic...|[<wordnet_scienti...|
|[<Håvard_Alstadhe...|[<wordnet_politic...|[<wordnet_scienti...|
| [<Michael_Zucchet>]|[<wordnet_politic...|[<wordnet_scienti...|
|[<de/Markus_Welser>]|[<wordnet_politic...|[<wordnet_scienti...|
|[<Auguste_Scheure...|[<wordnet_politic...|[<wordnet_scienti...|
|[<Marcus_Terentiu...|[<wordnet_politic...|[<wordnet_scienti...|
|      [<Guy_Quaden>]|[<wordnet_politic...|[<wordnet_scienti...|
|[<Sebastião_José_...|[<wordnet_politic...|[<wordnet_scienti...|
|[<Stanislao_Canni...|[<wordnet_politic...|[<wordnet_scienti...|
|[<Edward_George,_...|[<wordnet_politic...|[<wordnet_scienti...|
|[<Frederic_Mishkin>]|[<wordnet_politic...|[<wordnet_scienti...|
|[<pl/Jan_Jakub_Ko...|[<w

## 2) Companies whose founders were born in London

In [27]:
company_df = df.where("predicate == 'rdf:type' AND object == '<wordnet_company_108058098>'")
company_df.show(5)

+----+--------------------+---------+--------------------+-----+
|  id|             subject|predicate|              object|value|
+----+--------------------+---------+--------------------+-----+
|null|<Gujarat_Gas_Comp...| rdf:type|<wordnet_company_...| null|
|null|<United_States_He...| rdf:type|<wordnet_company_...| null|
|null|           <Laidlaw>| rdf:type|<wordnet_company_...| null|
|null|<de/Aastra_Deutsc...| rdf:type|<wordnet_company_...| null|
|null|  <Bridgespan_Group>| rdf:type|<wordnet_company_...| null|
+----+--------------------+---------+--------------------+-----+
only showing top 5 rows



In [28]:
founder_df = df.where("predicate == '<created>'")
founder_df.show(5)

+--------------------+------------+---------+--------------------+-----+
|                  id|     subject|predicate|              object|value|
+--------------------+------------+---------+--------------------+-----+
|<id_1wn4mz8_1gi_1...|   <Local_H>|<created>|     <'99–'00_Demos>| null|
|<id_zilvmt_1gi_1v...|       <WWE>|<created>|<The_Wrestling_Al...| null|
|<id_1qv3mb_1gi_1o...|    <Konami>|<created>|    <Frogger_Beyond>| null|
|<id_148brr_1gi_10...|<HAM_(band)>|<created>|    <Buffalo_Virgin>| null|
|<id_1b0pe0c_1gi_z...|  <Deerhoof>|<created>|          <Halfbird>| null|
+--------------------+------------+---------+--------------------+-----+
only showing top 5 rows



In [30]:
birth_df = df.where("predicate == '<wasBornIn>' AND object == '<London>'")
birth_df.show(5)

+--------------------+--------------------+-----------+--------+-----+
|                  id|             subject|  predicate|  object|value|
+--------------------+--------------------+-----------+--------+-----+
|<id_qe04ts_oyl_27...|<Simon_Milton_(po...|<wasBornIn>|<London>| null|
|<id_fybnjg_oyl_27...|     <Stuart_Rogers>|<wasBornIn>|<London>| null|
|<id_8mucii_oyl_27...|        <Nic_Sadler>|<wasBornIn>|<London>| null|
|<id_s35ycw_oyl_27...|    <Joan_Orenstein>|<wasBornIn>|<London>| null|
|<id_osz6a8_oyl_27...|      <Pen_Tennyson>|<wasBornIn>|<London>| null|
+--------------------+--------------------+-----------+--------+-----+
only showing top 5 rows



In [31]:
# founder dataframe
founder_df1 = founder_df.select("subject", "object", "predicate")

fo_e = founder_df1.withColumnRenamed("subject", "src")\
            .withColumnRenamed("object", "dst")\
            .distinct()

fo_s = founder_df1.select("subject").withColumnRenamed("subject", "id").distinct()  
fo_o = founder_df1.select("object").withColumnRenamed("object", "id").distinct()

# company dataframe
company_df1 = company_df.select("subject", "object", "predicate")

co_e = company_df1.withColumnRenamed("subject", "src")\
            .withColumnRenamed("object", "dst")\
            .distinct()

co_s = company_df1.select("subject").withColumnRenamed("subject", "id").distinct()     
co_o = company_df1.select("object").withColumnRenamed("object", "id").distinct()

# London birth dataframe
birth_df1 = birth_df.select("subject", "object", "predicate")

bi_e = birth_df1.withColumnRenamed("subject", "src")\
            .withColumnRenamed("object", "dst")\
            .distinct()

bi_s = birth_df1.select("subject").distinct().withColumnRenamed("subject", "id").distinct()  
bi_o = birth_df1.select("object").distinct().withColumnRenamed("subject", "id").distinct()  

In [40]:
from graphframes import *

v0 = fo_s.unionAll(fo_o).unionAll(co_s).unionAll(co_o).unionAll(bi_s).unionAll(bi_o).distinct()
e0 = fo_e.unionAll(co_e).unionAll(bi_e).distinct()

g0 = GraphFrame(v0,e0)

q0 = g0.find("(founder) -[]-> (company); (company) -[]-> (companytype); (founder) -[]-> (birthplace)")\
        .where("companytype.id = '<wordnet_company_108058098>' AND birthplace.id = '<London>'")

q0.show()

+--------------------+--------------------+--------------------+----------+
|             founder|             company|         companytype|birthplace|
+--------------------+--------------------+--------------------+----------+
|    [<Danny_Haynes>]|[<Sealy_Corporati...|[<wordnet_company...|[<London>]|
|  [<Virginia_Woolf>]|   [<Hogarth_Press>]|[<wordnet_company...|[<London>]|
|[<Henry_Herbert_C...|       [<Matchless>]|[<wordnet_company...|[<London>]|
|    [<Simon_Cowell>]|            [<Syco>]|[<wordnet_company...|[<London>]|
|    [<Simon_Cowell>]|      [<Syco_Music>]|[<wordnet_company...|[<London>]|
|  [<Gerry_Anderson>]|        [<AP_Films>]|[<wordnet_company...|[<London>]|
|  [<Paul_Oakenfold>]|[<Perfecto_Records>]|[<wordnet_company...|[<London>]|
|     [<Kit_Lambert>]|   [<Track_Records>]|[<wordnet_company...|[<London>]|
|     [<Wally_Olins>]|     [<Wolff_Olins>]|[<wordnet_company...|[<London>]|
|  [<Pete_Townshend>]|[<Eel_Pie_Publish...|[<wordnet_company...|[<London>]|
|  [<Michael

## 3) Writers who have won a Nobel Prize (in any discipline)

In [42]:
writer_df = df.where("predicate == 'rdf:type' AND object == '<wordnet_writer_110794014>'")
writer_df.show(5)

+----+-------------------+---------+--------------------+-----+
|  id|            subject|predicate|              object|value|
+----+-------------------+---------+--------------------+-----+
|null|        <Pat_Frank>| rdf:type|<wordnet_writer_1...| null|
|null| <fr/Robert_Davreu>| rdf:type|<wordnet_writer_1...| null|
|null|    <Anton_Ingolič>| rdf:type|<wordnet_writer_1...| null|
|null|<Salomon_Isacovici>| rdf:type|<wordnet_writer_1...| null|
|null|  <nl/Amand_de_Vos>| rdf:type|<wordnet_writer_1...| null|
+----+-------------------+---------+--------------------+-----+
only showing top 5 rows



In [41]:
prize_df = df.where("predicate == '<hasWonPrize>' AND object LIKE '%Nobel_Prize_in%'")
prize_df.show(5)

+--------------------+-------------+-------------+--------------------+-----+
|                  id|      subject|    predicate|              object|value|
+--------------------+-------------+-------------+--------------------+-----+
|<id_14rzeie_ab2_1...|<Gary_Becker>|<hasWonPrize>|<Nobel_Prize_in_E...| null|
+--------------------+-------------+-------------+--------------------+-----+
only showing top 1 row



In [43]:
# Nobel prize dataframe
prize_df1 = prize_df.select("subject", "object", "predicate")

pr_e = prize_df1.withColumnRenamed("subject", "src")\
            .withColumnRenamed("object", "dst")\
            .distinct()

pr_s = prize_df1.select("subject").withColumnRenamed("subject", "id").distinct()      
pr_o = prize_df1.select("object").withColumnRenamed("object", "id").distinct()  

# Writer
writer_df1 = writer_df.select("subject", "object", "predicate")

wr_e = writer_df1.withColumnRenamed("subject", "src")\
            .withColumnRenamed("object", "dst")\
            .distinct()

wr_s = writer_df1.select("subject").withColumnRenamed("subject", "id").distinct()  
wr_o = writer_df1.select("object").withColumnRenamed("object", "id").distinct()  

In [None]:
from graphframes import *

v0 = pr_s.unionAll(pr_o).unionAll(wr_o).unionAll(wr_s).distinct()
e0 = pr_e.unionAll(wr_e).distinct()

g0 = GraphFrame(v0,e0)

q0 = g0.find("(person) -[]-> (prize); (person) -[]-> (writer)")\
        .where("prize.id LIKE '%Nobel_Prize_in%' AND writer.id = '<wordnet_writer_110794014>'")
    
q0.show()

## 4) Nobel prize winners who were born in the same city as their spouses

In [7]:
spouse_df = df.where("predicate == '<isMarriedTo>'")
spouse_df.show(1)

+--------------------+--------------+-------------+--------------+-----+
|                  id|       subject|    predicate|        object|value|
+--------------------+--------------+-------------+--------------+-----+
|<id_6pwhk4_16x_16...|<Taha_Hussein>|<isMarriedTo>|<Hubert_Burda>| null|
+--------------------+--------------+-------------+--------------+-----+
only showing top 1 row



In [8]:
birth_df = df.where("predicate == '<wasBornIn>'")
birth_df.show(1)

+--------------------+----------------+-----------+------------------+-----+
|                  id|         subject|  predicate|            object|value|
+--------------------+----------------+-----------+------------------+-----+
|<id_1u855mb_oyl_1...|<Camil_Bouchard>|<wasBornIn>|<La_Tuque,_Quebec>| null|
+--------------------+----------------+-----------+------------------+-----+
only showing top 1 row



In [6]:
prize_df = df.where("predicate == '<hasWonPrize>' AND object LIKE '%Nobel_Prize_in%'")
prize_df.show(5)

+--------------------+--------------------+-------------+--------------------+-----+
|                  id|             subject|    predicate|              object|value|
+--------------------+--------------------+-------------+--------------------+-----+
|<id_logng6_ab2_1c...|    <Hartmut_Michel>|<hasWonPrize>|<Nobel_Prize_in_C...| null|
|<id_wshmev_ab2_1r...|        <Paul_Heyse>|<hasWonPrize>|<Nobel_Prize_in_L...| null|
|<id_pqb1rd_ab2_j6...|      <John_Macleod>|<hasWonPrize>|<Nobel_Prize_in_P...| null|
|<id_1nimscd_ab2_1...|  <William_Lipscomb>|<hasWonPrize>|<Nobel_Prize_in_C...| null|
|<id_1p436y6_ab2_1...|<Arthur_Lewis_(ec...|<hasWonPrize>|<Nobel_Prize_in_E...| null|
+--------------------+--------------------+-------------+--------------------+-----+
only showing top 5 rows



In [9]:
# Nobel prize dataframe
prize_df1 = prize_df.select("subject", "object", "predicate")

pr_e = prize_df1.withColumnRenamed("subject", "src")\
            .withColumnRenamed("object", "dst")\
            .distinct()

pr_s = prize_df1.select("subject").distinct().withColumnRenamed("subject", "id").distinct()       
pr_o = prize_df1.select("object").distinct().withColumnRenamed("object", "id").distinct()  

# Birthplace
birth_df1 = birth_df.select("subject", "object", "predicate")

bi_e = birth_df1.withColumnRenamed("subject", "src")\
            .withColumnRenamed("object", "dst")\
            .distinct()

bi_s = birth_df1.select("subject").withColumnRenamed("subject", "id").distinct() 
bi_o = birth_df1.select("object").withColumnRenamed("object", "id").distinct()  

# Married to
spouse_df1 = spouse_df.select("subject", "object", "predicate")

sp_e = spouse_df1.withColumnRenamed("subject", "src")\
            .withColumnRenamed("object", "dst")\
            .distinct()

sp_s = spouse_df1.select("subject").withColumnRenamed("subject", "id").distinct() 
sp_o = spouse_df1.select("object").withColumnRenamed("object", "id").distinct()  

In [12]:
from graphframes import *

v0 = pr_s.unionAll(pr_o).unionAll(sp_o).unionAll(sp_s).unionAll(bi_s).unionAll(bi_o).distinct()
e0 = pr_e.unionAll(sp_e).unionAll(bi_e).distinct()

g0 = GraphFrame(v0,e0)

q0 = g0.find("(winner) -[]-> (prize); (spouse1) -[]-> (spouse2); (spouse1) -[]-> (birthplace); (spouse2) -[]-> (birthplace)")\
        .where("winner.id = spouse1.id AND prize.id LIKE '%Nobel_Prize_in%' AND birthplace.id NOT LIKE '%Nobel_Prize_in%'") # AND prize.id = '<Nobel_Prize>'
    
q0.show()

+--------------------+--------------------+--------------------+--------------------+----------+
|              winner|               prize|             spouse1|             spouse2|birthplace|
+--------------------+--------------------+--------------------+--------------------+----------+
|[<Carl_Ferdinand_...|[<Nobel_Prize_in_...|[<Carl_Ferdinand_...|      [<Gerty_Cori>]|[<Prague>]|
|[<Frédéric_Joliot...|[<Nobel_Prize_in_...|[<Frédéric_Joliot...|[<Irène_Joliot-Cu...| [<Paris>]|
|[<Irène_Joliot-Cu...|[<Nobel_Prize_in_...|[<Irène_Joliot-Cu...|[<Frédéric_Joliot...| [<Paris>]|
|[<Irène_Joliot-Cu...|[<Nobel_Prize_in_...|[<Irène_Joliot-Cu...|[<Frédéric_Joliot...| [<Paris>]|
|   [<Harold_Pinter>]|[<Nobel_Prize_in_...|   [<Harold_Pinter>]|  [<Antonia_Fraser>]|[<London>]|
|    [<Václav_Havel>]|[<Nobel_Prize_in_...|    [<Václav_Havel>]|    [<Olga_Havlová>]|[<Prague>]|
|    [<Václav_Havel>]|[<Nobel_Prize_in_...|    [<Václav_Havel>]|    [<Olga_Havlová>]|[<Prague>]|
|      [<Gerty_Cori>]|[<Nobel_