# Workshop: Extracting Relations and Coreference Resolution

This workshop contains some reading parts, code and tasks to complete.

Please follow instructions to complete the tasks.

## 1, Extracting Relations 

**Read the following information and execute the code**

In natural language processing (NLP), extracting relations refers to the process of identifying and extracting the semantic relationships between different entities in text. For example, given the sentence "John works at Microsoft", the relation between the entities "John" and "Microsoft" is that John works at Microsoft.

Extracting relations is a more complex NLP task than entity extraction because it involves identifying not only individual entities but also the relationships between them. This task can be useful in many NLP applications, such as question answering, information extraction, and knowledge base construction.

There are different approaches to extracting relations from text, ranging from rule-based systems to machine learning models. One popular approach is to use neural networks with attention mechanisms that can focus on relevant parts of the text when predicting relationships.

Here's a general approach to extracting relations using a machine learning model:

Define the task: Decide what kind of relations you want to extract, and what entities you want to extract them from. For example, you may want to extract relations between people and organizations, or between products and their manufacturers.

Collect and preprocess data: Gather a corpus of text that contains the entities and relations you want to extract. Preprocess the data by removing stop words, stemming, lemmatizing, and converting the text to a suitable format for the machine learning model.

Train a machine learning model: Choose a suitable machine learning algorithm for the task and train it on the preprocessed data. This may involve feature engineering, hyperparameter tuning, and cross-validation to achieve good performance.

Extract relations from new text: Once the model is trained, you can use it to extract relations from new text. This involves identifying the entities in the text using entity extraction techniques, and then using the trained model to predict the relationships between them.


In [1]:
import spacy

nlp = spacy.load("en_core_web_sm")

def predict_relation(entity1, entity2):
    if entity1.label_ == "PERSON" and entity2.label_ == "ORG":
        return "works at"
    elif entity1.label_ == "ORG" and entity2.label_ == "PERSON":
        return "employed by"
    else:
        return None

text = "John works at Microsoft."

doc = nlp(text)

entities = [ent for ent in doc.ents]

for i, entity in enumerate(entities):
    for j in range(i+1, len(entities)):
        relation = predict_relation(entity, entities[j])
        if relation is not None:
            print(entity.text, relation, entities[j].text)
            print(f'relation is "{relation}"')

John works at Microsoft
relation is "works at"


In this example, we define a simple predict_relation() function that checks if the first entity is a person and the second entity is an organization, and predicts a "works at" relation if this is the case. If the first entity is an organization and the second entity is a person, it predicts an "employed by" relation instead. If neither of these conditions is true, it returns None.

We then use the spaCy library to extract entities from the sentence "John works at Microsoft.", and iterate over the identified entities to predict relations between them using the predict_relation() function. his shows that the code correctly identified the relation "John works at Microsoft" between the entities "John" and "Microsoft" in the input text. Note that this is a very simple example of a relation extraction task, and more complex machine learning models may be needed for more accurate and robust predictions.

### Task 1: Extracting Relations from an article

In this task, you will extract relations from a news article using the spaCy library and hand-made rules in Python.
This will be only a very simple method that may extract incorrect releations. More work is needed to do it with higher accuracy.

Instructions:

1. Pick an article from your Assignment 2 set of articles. 
2. List entities in the article.
3. Pick a few pairs of entities and make a rule entity1 relation entity2
4. Add this rule to the rules in the code above (you can copy that code here for convenience)
5. Run the code and extract relations.

* Share the code for relations with other students and add relations that is different from your
* As you can see, many relations either because the entities are not correctly laballed, or the relation is different than the rule suggests. **Reflect how would you improve this code to have higher accuracy of extractions**. Discuss these suggestions in the class.



In [22]:
def predict_relation(entity1, entity2):
    if entity1.label_ == "PERSON" and entity2.label_ == "ORG":
        return "works at"
    elif entity1.label_ == "ORG" and entity2.label_ == "PERSON":
        return "employed by"
    else:
        return None


text = "When Walt Disney?s ?Bambi? opened in 1942, critics praised its spare, haunting visual style, vastly different from anything Disney had done before. But what they did not know was that the film?s striking appearance had been created by a Chinese immigrant artist, who took as his inspiration the landscape paintings of the Song dynasty. The extent of his contribution to ?Bambi,? which remains a   mark for film animation, would not be widely known for decades. Like the film?s title character, the artist, Tyrus Wong, weathered irrevocable separation from his mother  ?   and, in the hope of making a life in America, incarceration, isolation and rigorous interrogation  ?   all when he was still a child. In the years that followed, he endured poverty, discrimination and chronic lack of recognition, not only for his work at Disney but also for his fine art, before finding acclaim in his 90s. Mr. Wong died on Friday at 106. A Hollywood studio artist, painter, printmaker, calligrapher,   illustrator and, in later years, maker of fantastical kites, he was one of the most celebrated   artists of the 20th century. But because of the marginalization to which   were long subject, he passed much of his career unknown to the general public. Artistic recognition, when Mr. Wong did find it, was all the more noteworthy for the fact that among Chinese immigrant men of his generation, professional prospects were largely limited to menial jobs like houseboy and laundryman. Trained as a painter, Mr. Wong was a leading figure in the Modernist movement that flourished in California between the first and second World Wars. In 1932 and again in 1934, his work was included in group shows at the Art Institute of Chicago that also featured Picasso, Matisse and Paul Klee. As a staff artist for Hollywood studios from the 1930s to the 1960s, he drew storyboards and made vibrant paintings, as detailed as any architectural illustrations, that helped the director envision each scene before it was shot. Over the years his work informed the look of animated pictures for Disney and   films for Warner Brothers and other studios, among them ?The Sands of Iwo Jima? (1949) ?Rebel Without a Cause? (1955) and ?The Wild Bunch? (1969). But of the dozens of films on which he worked, it was for ?Bambi? that Mr. Wong was  ?   belatedly  ?   most renowned. ?He was truly involved with every phase of production,? John Canemaker, an   animator and a historian of animation at New York University, said in an interview for this obituary in March. ?He created an art direction that had really never been seen before in animation. ? In 2013 and 2014, Mr. Wong was the subject of ?Water to Paper, Paint to Sky,? a major retrospective at the Disney Family Museum in San Francisco. From the museum?s windows, which overlook San Francisco Bay, he could contemplate Angel Island, where more than nine decades earlier, as a lone    he had sought to gain admission to a country that adamantly did not want him. Wong Gen Yeo (the name is sometimes Romanized Wong Gaing Yoo) was born on Oct. 25, 1910, in a farming village in Guangdong Province. As a young child, he already exhibited a love of drawing and was encouraged by his father. In 1920, seeking better economic prospects, Gen Yeo and his father embarked for the United States, leaving his mother and sister behind. Gen Yeo would never see his mother again. They were obliged to travel under false identities  ?   a state of affairs known among Chinese immigrants as being a ?paper son?  ?   in the hope of circumventing the Chinese Exclusion Act of 1882. Signed into law by President Chester A. Arthur, the act, which drastically curtailed the number of Chinese people allowed to enter the country, was among the earliest United States laws to impose severe restrictions on immigration. But in 1906, an unforeseen loophole opened in the form of the San Francisco earthquake and fire. Because a huge number of municipal documents, including birth and immigration records, were destroyed, many newly arrived Chinese capitalized on the loss, maintaining that they had been born in San Francisco before the fire. As United States citizens, they were entitled to bring over their relatives  ?   or, in the case of Gen Yeo and his father, ?paper sons? posing as relatives. Attuned to the deception, United States immigration officials put Chinese arrivals through a formidable inquisition to ensure they were who they claimed to be. The questions came like gunfire: In which direction does your village face? How many windows are in your house? Where in the house is the rice bin? How wide is your well? How deep? Are there trees in your village? Are there lakes? What shops can you name? The sponsoring relative was interrogated separately, and the answers had to match. For the new arrival, a major mistake, or a series of smaller ones, could mean deportation. To stand a chance of passing, aspirants memorized rigorous dossiers known as coaching papers. The ensuing interrogation was hard enough for adults.    Gen Yeo would undergo it alone. On Dec. 30, 1920, after a month at sea, the Wongs landed at Angel Island Immigration Station. The elder Mr. Wong was traveling as a merchant named Look Get his son as Look Tai Yow. ?Angel Island is considered to be the Ellis Island of the West Coast,? Lisa See, the author of ?On Gold Mountain? (1995) a nonfiction chronicle of her   family, said in an interview in 2016. However, she continued: ?The goal was really very different than Ellis Island, which was supposed to be so welcoming. Angel Island opened very specifically to keep the Chinese out. ? Because Mr. Wong?s father had previously lived in the United States as Look Get, he was able to clear Immigration quickly. But as a new arrival, Gen Yeo was detained on the island for nearly a month, the only child among the immigrants being held there. ?I was scared half to death I just cried,? Mr. Wong recalled in ?Tyrus,? an   documentary directed by Pamela Tom, which premiered in 2015. ?Every day is just miserable  ?   miserable. I hated that place. ? On Jan. 27, 1921, in the presence of an interpreter and a stenographer, young Gen Yeo, posing as Look Tai Yow, was interrogated by three inspectors. His father had already been questioned. Gen Yeo was well prepared and answered without error. In Sacramento, where he joined his father, a schoolteacher Americanized ?Tai Yow? to ?Tyrus,? and he was known as Tyrus Wong ever after. Soon afterward, father and son were separated once more, when the elder Mr. Wong moved to Los Angeles to seek work. For reasons that have been lost to time, he could not take his son. Tyrus lived on his own in a Sacramento boardinghouse while attending elementary school. Two years later  ?   possibly more  ?   Tyrus traveled to Los Angeles to join his father, who had found work in a gambling den. They lived in a   boardinghouse sandwiched between a butcher shop and a brothel. After school, Tyrus worked as a houseboy for two Pasadena families, earning 50 cents a day. His first art teacher was his father, who trained him nightly in calligraphy by having him dip a brush in water and trace ghostly characters on newspaper: They could not afford ink or drawing paper. When Tyrus was in junior high, a teacher, noting his drawing talent, arranged a summer scholarship to the Otis Art Institute in Los Angeles. By his own account an indifferent student in public school, Tyrus found his calling at the institute, now the Otis College of Art and Design. When his scholarship ended he declined to return to junior high. His father scraped together the $90 tuition  ?   a small fortune  ?   to let him stay on as Otis?s youngest student. He studied there for at least five years, simultaneously working as the school janitor, before graduating in the 1930s. Not long afterward his father died, leaving young Mr. Wong entirely on his own. From 1936 to 1938, Mr. Wong was an artist for the Works Progress Administration, creating paintings for libraries and other public spaces. With friends, including the   artist Benji Okubo, he founded the Oriental Artists? Group of Los Angeles, which organized exhibitions of members? work  ?   an   level of exposure for Asian artists at the time. Mr. Wong, newly married and needing steady work, joined Disney in 1938 as an ?? creating the thousands of intermediate drawings that bring animated sequences to life. Asians were then a novelty at Hollywood studios, and Mr. Wong was made keenly aware of the fact, first at Disney and later at Warner Brothers. One   flung a racial epithet at him. Another assumed on sight that he worked in the company cafeteria. Then there was the affront of the  ?s job itself: Painstaking, repetitive and for Mr. Wong quickly   it is the   work of animation  ?   ?a terrible use of his talents as a landscape artist and a painter,? Mr. Canemaker said. A reprieve came in the late 1930s, when Mr. Wong learned that Disney was adapting ?Bambi, a Life in the Woods,? the 1923 novel by the Austrian writer Felix Salten about a fawn whose mother is killed by a hunter. In trying to animate the book, Disney had reached an impasse. The studio had enjoyed great success in 1937 with its animated film ?Snow White and the Seven Dwarfs,? a baroque production in which every detail of the backgrounds  ?   every petal on every flower, every leaf on every tree  ?   was meticulously represented. In an attempt to use a similar style for ?Bambi,? it found that the ornate backgrounds camouflaged the deer and other forest creatures on which the narrative centered. Mr. Wong spied his chance. ?I said, ?Gee, this is all outdoor scenery,?? he recalled in a video interview years afterward, adding: ?I said, ?Gee, I?m a landscape painter! ?? Invoking the exquisite landscape paintings of the Song dynasty (A. D. 960 ?  1279) he rendered in watercolors and pastels a series of nature scenes that were moody, lyrical and atmospheric  ?   at once lush and spare  ?   with backgrounds subtly suggested by a stroke or two of the brush. ?Walt Disney went crazy over them,? said Mr. Canemaker, who wrote about Mr. Wong in his book ?Before the Animation Begins: The Art and Lives of Disney Inspirational Sketch Artists? (1996). ?He said, ?I love this indefinite quality, the mysterious quality of the forest. ?? Mr. Wong was unofficially promoted to the rank of inspirational sketch artist. ?But he was more than that,? Mr. Canemaker explained. ?He was the designer he was the person they went to when they had questions about the color, about how to lay something out. He even influenced the music and the special effects: Just by the look of the drawings, he inspired people. ? Mr. Wong spent two years painting the illustrations that would inform every aspect of ?Bambi. ? Throughout the finished film  ?   lent a brooding quality by its stark landscapes misty, desaturated palette and figures often seen in silhouette  ?   his influence is unmistakable. But in 1941, in the wake of a bitter employees? strike that year, Disney fired Mr. Wong. Though he had chosen not to strike  ?   he felt the studio had been good to him, Mr. Canemaker said  ?   he was let go amid the lingering climate of   resentments. On ?Bambi,? Mr. Wong?s name appears, quite far down in the credits, as a mere ?background? artist. Mr. Wong joined Warner Brothers in 1942, working there  ?   and lent out on occasion to other studios  ?   until his retirement in 1968. The indignities he endured were not confined to the studios. Trying to buy a house, he and his wife, the former Ruth Kim, were told that each property they inquired about had just been sold. ?Then in a month you?d go back there and the sign was still there,? Mr. Wong recalled in ?Tyrus. ? After the Japanese attack on Pearl Harbor in December 1941, Mr. Wong, like many   took to wearing a lapel button proclaiming his heritage, lest an angry American beat him up on the street. The war permanently dispersed the fledgling Oriental Artists? Group. Mr. Wong?s friend Mr. Okubo was sent, with tens of thousands of other   to an internment camp. ?If World War II hadn?t happened when it did, I think these artists, even the   artists, would have more of a name than they do today,? Ms. See said. ?And that?s because this little movement that had just barely started was split apart by the war. ? Mr. Wong, who became a United States citizen in 1946, also designed Christmas cards for Hallmark and painted elegant   designs on dinnerware, now sought after by collectors. A longtime resident of Sunland, Calif. he became, in retirement, a renowned kitemaker, designing, building and hand coloring astonishing, airworthy creations  ?   butterflies, swallows, whole flocks of owls, centipedes more than 100 feet long  ?   that streaked the Southern California sky like paint on blue canvas. During the last 15 years of Ruth Wong?s life, when she was ill with dementia, Mr. Wong forsook his work to care for her. After her death in 1995, he slowly began making art again. In 2001, in formal recognition of his influence on ?Bambi,? Mr. Wong was named a Disney Legend. The honor  ?   whose previous recipients include Fred MacMurray, Julie Andrews and Annette Funicello  ?   is bestowed by the Walt Disney Company for outstanding contributions. In 2003, a retrospective of his work, curated in part by Ms. See, was the inaugural exhibition at the Chinese American Museum in Los Angeles. The Disney Family Museum?s retrospective, ?Water to Paper, Paint to Sky,? traveled in 2015 to the Museum of Chinese in America, in Lower Manhattan. Mr. Wong?s death, at his home in Sunland, was confirmed by the filmmaker Ms. Tom. His survivors include three daughters, Kay Fong,   Wong and Kim Wong and two grandchildren. When his daughters were small, Mr. Wong encouraged them to make art, as his father had encouraged him. Yet he would not let them have coloring books. The reason was simple: He did not want his children constrained, he said, by lines laid down by others."

doc = nlp(text)

for word in doc.ents:
    print(word.text, word.label_)

entities = doc.ents

for i, entity in enumerate(entities):
    for j in range(i+1, len(entities)):
        relation = predict_relation(entity, entities[j])
        if relation is not None:
            print(entity.text, relation, entities[j].text)
            print(f'relation is "{relation}"')
            

Walt Disney?s PERSON
Bambi PERSON
1942 DATE
Disney ORG
film?s DATE
Chinese NORP
Bambi PERSON
decades DATE
film?s DATE
Tyrus Wong PERSON
America GPE
the years DATE
Disney ORG
his 90s DATE
Wong PERSON
Friday DATE
106 CARDINAL
Hollywood GPE
later years DATE
the 20th century DATE
Wong PERSON
Chinese NORP
Wong PERSON
Modernist NORP
California GPE
first ORDINAL
second ORDINAL
1932 DATE
1934 DATE
the Art Institute of Chicago ORG
Picasso WORK_OF_ART
Matisse ORG
Paul Klee PERSON
Hollywood GPE
the 1930s DATE
the 1960s DATE
the years DATE
Disney ORG
Warner Brothers ORG
1949 DATE
1955 DATE
The Wild Bunch LOC
1969 DATE
dozens CARDINAL
Bambi PERSON
Wong PERSON
John Canemaker PERSON
New York University ORG
March DATE
2013 DATE
2014 DATE
Wong PERSON
Sky ORG
the Disney Family Museum ORG
San Francisco GPE
the museum?s DATE
San Francisco Bay LOC
Angel Island FAC
more than nine decades earlier DATE
Wong Gen Yeo PERSON
Romanized Wong Gaing Yoo ORG
Oct. 25, 1910 DATE
Guangdong Province GPE
1920 DATE
Gen Yeo

## 2. BERT for Relation and Event extraction

**Please read the following information about using BERT for information extraction**

You can review a longer code in Kaggle showing the implementation of these tasks. Due to complexity of the code, the code is not included in the workshop.

**Using BERT for relation extraction**

    Relation Extraction (RE) is a task of identifying the semantic relationships between entities in a text. BERT can be used for RE by fine-tuning the pre-trained BERT model on a labeled dataset of entity pairs and their corresponding relationships. The input to the model is a sequence of tokens, and the output is a set of possible relations between the entities. BERT can also be used for joint entity and relation extraction by fine-tuning the pre-trained BERT model on a labeled dataset of entity pairs and their corresponding relationships along with the entity types. The input to the model is a sequence of tokens, and the output is a set of possible entity pairs and their corresponding relationships.

**BERT for event extraction**
    
    Event Extraction (EE) is a task of identifying events and their associated arguments in a text. BERT can be used for EE by fine-tuning the pre-trained BERT model on a labeled dataset of events and their associated arguments. The input to the model is a sequence of tokens, and the output is a set of possible events and their associated arguments. BERT can also be used for joint entity, relation, and event extraction by fine-tuning the pre-trained BERT model on a labeled dataset of entity pairs, their corresponding relationships, and events and their associated arguments. The input to the model is a sequence of tokens, and the output is a set of possible entity pairs, their corresponding relationships, and events and their associated arguments.
    
Here is a longer example of using BERT for relation extraction:
https://www.kaggle.com/code/duongthanhhung/bert-relation-extraction



## Creating and using Knowledge Base


Creating a knowledge base (aka Knowledge Graph) in natural language processing (NLP) involves building a structured database of entities, their attributes, and the relationships between them. A knowledge base can be constructed using various techniques such as manual curation, data mining, or machine learning. Once a knowledge base is created, it can be used to answer questions, extract information, and provide insights.


In NLP, a knowledge base can be used to find articles relevant to a query by performing a semantic search. This involves analyzing the meaning of the query and searching the knowledge base for entities, attributes, and relationships that match the query. The results of the search can be ranked based on their relevance to the query, and the top results can be presented to the user.

### Task 2: Creating and using a Knowledge Base

**Add the knowledge base and the code to queary the KB**
You can use the format given in the code, or design your own.


Here's an example of how to create a knowledge base and use it to find articles relevant to a query:


Define the schema: Define the structure of the knowledge base, including the entities, attributes, and relationships to be included. For example, a knowledge base about movies might include entities such as actors, directors, and studios, with attributes such as name, birth date, and filmography.

Collect and preprocess data: Gather a corpus of text that contains information about the entities in the knowledge base. Preprocess the data by removing stop words, stemming, lemmatizing, and converting the text to a suitable format for the knowledge base.

Build the knowledge base: Populate the knowledge base with the entities, attributes, and relationships extracted from the data. This may involve manual curation or automated methods such as data mining or machine learning.

Use the knowledge base to find articles: Given a query, analyze the meaning of the query and search the knowledge base for entities, attributes, and relationships that match the query. Rank the results based on their relevance to the query, and present the top results to the user.

Here's an example of how to use a knowledge base to find articles relevant to a query using the spaCy library in Python:


In [11]:
import spacy

nlp = spacy.load("en_core_web_sm")

knowledge_base = {
    "books": {
        "The Catcher in the Rye": {"author": "J.D. Salinger", "year": "1951"},
        "1984": {"author": "George Orwell", "year": "1949"},
        # More books...
    },
    "authors": {
        "J.D. Salinger": {"books": "The Catcher in the Rye, Franny and Zooey, etc.", "year_of_birth": "1919"},
        "George Orwell": {"books": "1984, Animal Farm, etc.", "year_of_birth": "1903"},
        # More authors...
    },
    # More entities...
}

def search(query):
    results = []
    doc = nlp(query)
    entities = [ent.text for ent in doc.ents]
    for entity in entities:
        if entity in knowledge_base["books"]:
            results.append(knowledge_base["books"][entity])
        elif entity in knowledge_base["authors"]:
            results.append(knowledge_base["authors"][entity])
    return results

query = "What book did George Orwell write?"
results = search(query)
print(results)


[{'books': '1984, Animal Farm, etc.', 'year_of_birth': '1903'}]


In this example, we define a knowledge base that includes entities such as books and authors, with attributes such as author, year, and books. We then define a search() function that takes a query as input, extracts entities from the query using spaCy, and searches the knowledge base for matching entities and attributes. The function returns a list of results that match the query.

We then test the search() function by querying "What book did George Orwell write?". Since George Orwell is an author in the knowledge base and has written the book "1984", the function should return a list containing the book "1984". This shows that the search() function correctly retrieved the book "1984" by George Orwell from the knowledge base based on the input query.

## 3. Finding Coreferences in Winograd Sentences

Finding coreferences in natural language sentences is important for accurately understanding their meaning. In Winograd sentences, which are designed to test a computer's ability to resolve coreferences, the challenge is to identify which words refer to the same entity or concept. You can read more about Winograd challenge here: https://en.wikipedia.org/wiki/Winograd_schema_challenge 

Example
Here's an example of a Winograd sentence:

"The trophy would not fit in the brown suitcase because it was too big."

In this sentence, the pronoun "it" could refer to either the trophy or the suitcase, depending on the context.

### Task 3: Coreference resulution

Let's write a Python function to identify the coreferences in a given Winograd sentence. More Winograd sentences can be found here: https://cs.nyu.edu/~davise/papers/WinogradSchemas/WSCollection.html

Instructions:
* Complete this simple code to identify coreferences in Winograd sentences.
* Add a few more sentences and their recognised coreferences.
* Disuss which coreference was identified correctly and why.
* Suggest an improvement to increase the accuracy.


In [13]:
# Define the Winograd sentence to analyze
sentence = "The city councilmen refused the demonstrators a permit because they feared violence."

# Split the sentence into individual words
words = sentence.split()

# Initialize a dictionary to keep track of coreferences
corefs = {}

# Iterate over each word in the sentence
for i, word in enumerate(words):
    # If the word is a pronoun, try to find its antecedent
    if word in ['he', 'she', 'it', 'they']:
        # Iterate over each word before the pronoun
        for j in range(i-1, -1, -1):
            # If the word is a noun or a proper noun, it is a possible antecedent
            if words[j].isalpha() and (words[j].istitle() or words[j].isupper()):
                # Save the antecedent as the coreference of the pronoun
                corefs[i] = j
                break

# Print the coreferences found
for i, j in corefs.items():
    print(f"Pronoun '{words[i]}' refers to '{words[j]}'")


Pronoun 'they' refers to 'The'


In this example, we first define the Winograd sentence to analyze and split it into individual words using the split() method. We then initialize a dictionary to keep track of coreferences, where the keys are the indices of the pronouns in the sentence and the values are the indices of their antecedents.

Next, we iterate over each word in the sentence and check if it is a pronoun. If it is, we try to find its antecedent by iterating over each word before the pronoun, starting from the previous word and moving backwards. If we find a noun or a proper noun, we assume that it is the antecedent of the pronoun and save it in the coreference dictionary.

Finally, we print the coreferences found by iterating over the key-value pairs in the coreference dictionary and printing the pronoun and its antecedent.

This example demonstrates how to perform coreference resolution in Winograd sentences using basic string manipulation in Python. However, it is important to note that this approach may not be as robust or accurate as using a dedicated natural language processing library like spaCy.

## 4. Practicing converting natural language (NL) to FOL and FOL to NL 


Converting natural language (NL) to First Order Logic (FOL) and vice versa is a task in natural language processing that involves translating statements in NL to statements in FOL and vice versa.

In FOL, we can represent statements using quantifiers, predicates, and logical connectives. For example, the sentence "All dogs are mammals" can be represented in FOL as "∀x(Dog(x) → Mammal(x))", where "∀x" represents the universal quantifier ("for all x"), "Dog(x)" is the predicate that represents "x is a dog", "Mammal(x)" is the predicate that represents "x is a mammal", and "→" represents logical implication ("if...then").

To convert NL to FOL, we first need to identify the predicates, quantifiers, and logical connectives in the sentence. We then need to assign variables to the quantified expressions, and translate the sentence into the appropriate FOL representation.

Natural Language (NL) is the language that humans use to communicate with each other. It is a rich and complex language that can convey a wide range of meanings and information. Formal Language (FL), on the other hand, is a language that is designed for precise communication and is used in mathematics, logic, and computer science. FL includes languages such as propositional logic, predicate logic, and first-order logic (FOL).

Natural Language Processing (NLP) is a field of computer science that deals with the interaction between computers and humans using natural language. One of the important tasks in NLP is to convert natural language to formal language and vice versa. This conversion is important because computers cannot directly understand natural language, but they can understand formal languages. Therefore, by converting natural language to formal language, we can enable computers to reason about natural language text.

Converting natural language to formal language involves the process of transforming the syntax and semantics of the natural language text into a form that can be understood by a computer. One of the popular formal languages used for this purpose is First-Order Logic (FOL), which is a powerful and expressive formal language that is widely used in logic, mathematics, and computer science.

The process of converting natural language to FOL involves several steps. First, we need to parse the natural language sentence and extract its syntactic structure. This involves breaking down the sentence into its constituent parts, such as nouns, verbs, adjectives, and prepositions, and identifying the relationships between these parts. This step is typically done using a parser that is trained on a set of grammatical rules.

Once we have extracted the syntactic structure of the sentence, we need to convert it to a semantic representation that can be expressed in FOL. This involves identifying the logical relationships between the parts of the sentence and representing them using FOL predicates and quantifiers. This step is typically done using a semantic parser that is trained on a set of semantic rules.

Converting FOL to natural language involves the reverse process of transforming the formal language into a natural language sentence. This process involves starting with a FOL formula and constructing a natural language sentence that expresses the meaning of the formula. This involves applying a set of transformation rules that map FOL formulas to natural language sentences.

Overall, converting natural language to formal language and vice versa is an important task in NLP that enables computers to reason about natural language text. This task involves a combination of syntactic and semantic analysis, and it requires the use of advanced parsing and semantic processing techniques.

### Task 4a: Convert the following sentences from English to FOL

"Every student loves mathematics."

"Some birds can fly."

"There is a house where every window has curtains."

"All employees who work overtime are compensated with extra pay."


### Task 4b: Convert the following sentences from FOL to English


∃x (Human(x) ∧ Parent(y, x))

∀x (Cat(x) → Mammal(x))

∃x (Cat(x) ∧ ∀y (Bird(y) → Chases(x, y)))

∀x (Employee(x) ∧ WorksOvertime(x) → CompensatedExtra(x))

 We initialize a Doc object by parsing the news article using the spaCy model. We then iterate over each entity in the document and create a social network by identifying the people and organizations mentioned in the text and understanding the relationships between them.

To create the social network, we initialize a dictionary to store the connections between people and iterate over each token in the sentence containing the entity. If the token is a person or organization, we add it to the social network as a connection to the original person.

Finally, we print the social network by iterating over the key-value pairs in the dictionary and printing the person and their connections.

## Optional: Establishing biography timeline 
Establishing biography timeline is a natural language processing task that involves extracting and organizing temporal information from a text. In other words, the goal is to identify and order events that occur over time in a person's life based on their biography.

One approach to establishing a biography timeline involves using named entity recognition (NER) to identify entities that represent time expressions such as dates, years, and time periods. We can then use temporal relation extraction techniques to identify and order the events that these time expressions correspond to.

### Optinal Task: Complete the code to print a biography line from a sentence

* Complete the code
* Use a few more sentences and check if the timeline and facts are correct
* Make a suggestion on how to improve the correctness of the timeline.

Here's an example Python code that demonstrates how we can use spaCy to establish a timeline from a person's biography:

In [14]:
import spacy

# Load the spaCy language model and parse a biography text
nlp = spacy.load('en_core_web_sm')
text = "John F. Kennedy was born on May 29, 1917, in Brookline, Massachusetts. He was elected president of the United States in 1960 and served until his assassination in 1963."
doc = nlp(text)

# Initialize a list to store the events in the person's life
events = []

# Iterate over the entities in the parsed text
for entity in doc.ents:
    # If the entity is a date, add it to the list of events
    if entity.label_ == 'DATE':
        events.append(entity)

# Sort the events by their date value
events = sorted(events, key=lambda event: event.text)

# Print out the timeline of events
for event in events:
    print(f"{event.text}: {event.sent}")


1960: He was elected president of the United States in 1960 and served until his assassination in 1963.
1963: He was elected president of the United States in 1960 and served until his assassination in 1963.
May 29, 1917: John F. Kennedy was born on May 29, 1917, in Brookline, Massachusetts.


In this code example, we first load the spaCy language model and parse a biography text about John F. Kennedy using the nlp() function. We then initialize an empty list to store the events in Kennedy's life.

Next, we iterate over the entities in the parsed text using a for loop. For each entity, we check if it is a date using the label_ attribute. If the entity is a date, we add it to the list of events.

We then sort the events by their date value using the built-in sorted() function in Python. We use a lambda function to sort the events based on their text value, which should be in a date format that can be sorted lexicographically.

Finally, we print out the timeline of events using Python's string formatting capabilities. We iterate over the sorted list of events and print out each event's text value and the sentence it appears in.

This code example demonstrates how we can use named entity recognition and temporal relation extraction techniques to establish a timeline of events from a person's biography. We could extend this approach to handle more complex timelines with overlapping events or events that occur in different time periods.