# CS 195: Natural Language Processing
## Summarization, Translation, and Question Answering

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ericmanley/f23-CS195NLP/blob/main/F2_2_SummarizationTranslationQuestionAnswering.ipynb)


## References

*Two minutes NLP — Learn the ROUGE metric* by examples by Fabio Chiusano: https://medium.com/nlplanet/two-minutes-nlp-learn-the-rouge-metric-by-examples-f179cc285499

Google's implementation of rouge_score: https://github.com/google-research/google-research/tree/master/rouge

Hugging Face's wrapper for Google's implementation: https://huggingface.co/spaces/evaluate-metric/rouge

Hugging Face Task Guide on Summarization: https://huggingface.co/docs/transformers/tasks/summarization

Hugging Face Task Guide on Translation: https://huggingface.co/docs/transformers/tasks/translation

Hugging Face Task Guide on Question Answering: https://huggingface.co/docs/transformers/tasks/question_answering


## Installing necessary modules

In [None]:
import sys
!{sys.executable} -m pip install transformers datasets evaluate rouge_score sentencepiece sacremoses bs4

Collecting transformers
  Downloading transformers-4.33.2-py3-none-any.whl (7.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.6/7.6 MB[0m [31m17.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting datasets
  Downloading datasets-2.14.5-py3-none-any.whl (519 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m519.6/519.6 kB[0m [31m21.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting evaluate
  Downloading evaluate-0.4.0-py3-none-any.whl (81 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m81.4/81.4 kB[0m [31m7.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting rouge_score
  Downloading rouge_score-0.1.2.tar.gz (17 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting sentencepiece
  Downloading sentencepiece-0.1.99-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m29.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting s

## Review: Sequence-to-Sequence Models

NLP models that take one sequence as input and produce another sequence as output are called **Seq2seq**
* summarization
* translation
* conversation

**A Challenge:** unlike classification, there's no way to tell for sure whether the prediction is right!

**Partial Solutions:**
* Qualitative metrics - humans can describe how closely they match
* ROUGE Metrics: statistics that measure similarities between two sequences.



## Review: Using Hugging Face's wrapper for ROUGE

**ROUGE:** Recall-Oriented Understudy for Gisting Evaluation

Suppose we have a **reference** sequence, which is one known possible *correct* sequence
* E.g., a translation or a summarization that a trustworthy human has produced

**Example reference:** "A broody hen sat in a nesting box all day."

**Example machine-generated prediction:** "A hen sat in every nesting box that long sunny day."



In [None]:
import evaluate

rouge = evaluate.load("rouge")

reference_sentence = "a broody hen sat in a nesting box all day"
predicted_sentence = "a hen sat in every nesting box that long sunny day"

rouge.compute(predictions=[predicted_sentence],references=[reference_sentence])

{'rouge1': 0.6666666666666666,
 'rouge2': 0.3157894736842105,
 'rougeL': 0.6666666666666666,
 'rougeLsum': 0.6666666666666666}

## Interpreting ROUGE

All of these are in the context of the F1 score - balancing precision and recall (looking at overlap relative to the *reference* or the *prediction*)

`rouge1` - overlap of individual words (1-grams) between prediction and reference

`rouge2` - overlap of *bigrams* (2-grams, pairs of consecutive words)

`rougeL` - the *longest common subsequence* between the prediction and reference. The subsequence must be in *order* but not nececssarily *consecutive*

`rougeLsum` - do `rougeL` for each newline/sentence and aggregate the results

## Summarization in Hugging Face

Hugging Face hosts many summarization models. Here's one called BART (https://huggingface.co/facebook/bart-large-cnn) that was trained on CNN/Daily Mail news articles (https://huggingface.co/datasets/cnn_dailymail) which include **reference** summaries written by the authors of the original article.

We'll try it out on a Times-Delphic article I found here: https://timesdelphic.com/2023/09/the-answer-has-little-to-do-with-affirmative-action-over-the-summer-the-supreme-court-ruled-against-the-admissions-programs-of-harvard-university-and-the-university-of-north-carolina-in-an-affirmat/

In [None]:
from transformers import pipeline

summarizer = pipeline("summarization", model="facebook/bart-large-cnn") #could also try google/pegasus-xsum

In [None]:
times_delphic_story = """
How does the Supreme Court ruling on affirmative action affect Drake?
The answer has little to do with affirmative action.
Over the summer, the Supreme Court ruled against the admissions programs of Harvard University and the University of North Carolina in an affirmative action decision. Before the decision, race already wasn’t a factor in Drake University admissions, according to Provost Sue Mattison.
“Affirmative action, with regards to admissions, only impacts those really highly selective institutions that limit the number of incoming students,” Mattison said. “So that doesn’t apply to Drake and most institutions across the country.”
She said schools like Harvard and UNC have enough applicants that they can pick and choose which applicants fill a certain number of spots.
Drake’s admissions team found that the university has “admitted all students who have a 3.0 high school GPA or [higher],” Mattison said. “Even though we’ve asked for a person’s race on the admissions form, it does not have an impact on the admissions decision, and it doesn’t displace anybody.”
Possible effects of the court’s ruling
Mark Kende, director of Drake’s Constitutional Law Center, said the Supreme Court “basically has embraced an idea that it calls colorblindness.”
“If you take their principle of colorblindness and extend it beyond universities, to other places, it could raise some problems,” Kende said. “But we don’t know yet.”
Financial aid programs that prioritize applicants of a particular race over another are more vulnerable after the court’s decision, according to Kende. He said it’s not clear what impact the decision might have on university hiring practices that consider an employee’s race, as well as corporations’ diversity programs.
Following the Supreme Court’s decision, Missouri Attorney General Andrew Bailey said Missouri institutions subject to the U.S. Constitution or Title VI must stop using race-based standards “to make decisions about things like admissions, scholarships, programs and employment.”
The University of Missouri System said that “a small number of our programs and scholarships have used race/ethnicity as a factor for admissions and scholarships,” and that “these practices will be discontinued.”
Drake is taking a different approach in the wake of the affirmative action decision. The university is monitoring maybe about forty to fifty scholarships, according to Ryan Zantingh, Drake’s director of financial aid. This is more in anticipation of a comparable case on financial aid that considers race, rather than a reaction to the affirmative action ruling.
Mattison said she thinks Drake is still trying to determine how the Supreme Court decision will impact Drake’s Crew Scholars program, which is for incoming students of color.
“There are ways that we can ensure that we continue Crew Scholars while still being compliant,” Mattison said.
Donors for some Drake scholarships specified that they wanted to support a student of color or a woman in a STEM field, Mattison said.
“And so we’re still working through what that actually means, and what we have to do to continue to achieve the values that we expect,” Mattison said. “There are ways that we can change the wording of some of the scholarships.”
Like all students, students of color may qualify for scholarships for first-generation students or students with financial need.
“There’s a lot of overlap between students of color and other areas where financial aid is directed,” Zantingh said. “Scholarship resources can be directed [to financial need or first generation status] and still reach the same students.”
Even if there is a ruling on financial aid that’s comparable to the affirmative action decision, Zantingh doesn’t expect a large impact on Drake financial aid from either decision.
“There may be some implications, but I think the overall general effect on students will be little to none,” Zantingh said.
Zantingh gave an example of scholarship language offered by legal counsel. If a scholarship is for only minority students, it might become a scholarship that gives preference to students who demonstrate a commitment to Drake’s vision for diversity on campus.
“If a white student is actively involved in anti-racist leadership here on campus, certainly they would fit that description then, wouldn’t they?” Zantingh said. “Basically, the language would not seek to exclude any particular protected class categorically.”
In some cases, a donor might be unwilling to change the scholarship’s language or be deceased, Zantingh said. If a donor is deceased, a judge might approve changes. He said he doesn’t expect Drake to cut any of the scholarships it is monitoring.
“The scholarship criteria would have to change, or the dollars would have to be repurposed in another way. Per either the donor or a court’s approval,” Zantingh said.
Race can still play a role in college admissions
The Supreme Court left at least one legal path open for race to play a role in college admissions.
When admitting students, universities are allowed to consider “an applicant’s discussion of how race affected his or her life, be it through discrimination, inspiration or otherwise,” Chief Justice John Roberts wrote in the Court’s decision. However, “the student must be treated based on his or her experiences as an individual — not on the basis of race.”
A student’s story can emerge without Drake asking for it, according to Dean of Admissions Joel Johnson.
“Especially if they’ve overcome a lot, or it’s so key to their identity… it’ll come out on its own,” Johnson said. “I don’t know if I could say the Supreme Court protected it. They couldn’t have stopped it, honestly.”
Johnson said that caring about diversity also means intentionally recruiting a diverse group of students. He said students can’t join Drake if they never apply in the first place.
In the wake of the Supreme Court’s decision on affirmative action, The Times-Delphic is publishing a series. Check next week’s paper for an article about legacy admissions and legacy financial aid with a Drake focus.

"""

In [None]:
len(times_delphic_story) #let's check how long this string is

6103

In [None]:
print(summarizer(times_delphic_story[:4000],max_length=100,min_length=50))

[{'summary_text': 'The Supreme Court ruled against the admissions programs of Harvard University and the University of North Carolina in an affirmative action decision. Before the decision, race already wasn’t a factor in Drake University admissions. Financial aid programs that prioritize applicants of a particular race over another are more vulnerable.'}]


### Group Exercise

In this example, I only use the first 4000 characters from the article.

Try using more. Why do you think I did that?

What strategies can you think of for getting summaries of longer articles?

## Let's try it on a different summarization dataset

The *BillSum* dataset contains the text of legislative bills and their summaries from both the US Federal and California State legislatures.

See more here: https://huggingface.co/datasets/billsum

This dataset has `train`, `test`, and `ca_test` splits. We can load just one of them - let's try the `ca-test` which is the smaller test set.


In [None]:
from datasets import load_dataset

billsum = load_dataset("billsum", split="ca_test")

## Let's explore the dataset

What does it look like when printed/displayed?

In [None]:
print(billsum)

Dataset({
    features: ['text', 'summary', 'title'],
    num_rows: 1237
})


What does one of the items look like?

In [None]:
billsum[0]

{'text': 'The people of the State of California do enact as follows:\n\n\nSECTION 1.\nThe Legislature finds and declares all of the following:\n(a) (1) Since 1899 congressionally chartered veterans’ organizations have provided a valuable service to our nation’s returning service members. These organizations help preserve the memories and incidents of the great hostilities fought by our nation, and preserve and strengthen comradeship among members.\n(2) These veterans’ organizations also own and manage various properties including lodges, posts, and fraternal halls. These properties act as a safe haven where veterans of all ages and their families can gather together to find camaraderie and fellowship, share stories, and seek support from people who understand their unique experiences. This aids in the healing process for these returning veterans, and ensures their health and happiness.\n(b) As a result of congressional chartering of these veterans’ organizations, the United States Inte

Let's get a summary of the first bill (first 4000 characters of the text only) using the news-article summarizer.

In [None]:
summarizer(billsum[0]["text"][:4000])

[{'summary_text': 'Since 1899 congressionally chartered veterans’ organizations have provided a valuable service to our nation’s returning service members. The U.S. Internal Revenue Service created a special tax exemption for these organizations under Section 501(c)(19) of the Internal Revenue Code.'}]

## Now let's do a batch of 5 articles

First, we need to prepare a list that contains the texts of the first 5 bills, truncated to the first 4000 characters.

In [None]:
truncated_bill_texts = []
for idx in range(5):
    curr_truncated_text = billsum[idx]["text"][:4000]
    truncated_bill_texts.append( curr_truncated_text )

Now let's get a summary of each of those texts. This might take a while.

In [None]:
prediction_summaries = summarizer(truncated_bill_texts)
actual_references = billsum["summary"][0:5]

print(prediction_summaries)
print(actual_references)


[{'summary_text': 'Since 1899 congressionally chartered veterans’ organizations have provided a valuable service to our nation’s returning service members. The U.S. Internal Revenue Service created a special tax exemption for these organizations under Section 501(c)(19) of the Internal Revenue Code.'}, {'summary_text': 'A prisoner is not eligible for resentence or recall pursuant to subdivision (e) of Section 1170 if he or she was convicted of first-degree murder if the victim was a peace officer. A prisoner sentenced to death or life in prison without possibility of parole cannot be granted medical parole.'}, {'summary_text': 'California has long been known as the land of opportunity, the republic of the future. But for too many of its residents the future is receding. Inequality continues to rise, even though California has one of the most progressive tax structures in the nation. Small businesses, like plumbing contractors, auto repair shops, and restaurants that account for over 90

Notice that summarizer returns a list of dictionaries with one key each: `'summary_text'`. If we want to evaluate these with ROUGE, we will need to get a flat list of all these texts - not contained inside a dictionary.

In [None]:
predictions_flat = []

for result in prediction_summaries:
    predictions_flat.append(result["summary_text"])

print(predictions_flat)

['Since 1899 congressionally chartered veterans’ organizations have provided a valuable service to our nation’s returning service members. The U.S. Internal Revenue Service created a special tax exemption for these organizations under Section 501(c)(19) of the Internal Revenue Code.', 'A prisoner is not eligible for resentence or recall pursuant to subdivision (e) of Section 1170 if he or she was convicted of first-degree murder if the victim was a peace officer. A prisoner sentenced to death or life in prison without possibility of parole cannot be granted medical parole.', 'California has long been known as the land of opportunity, the republic of the future. But for too many of its residents the future is receding. Inequality continues to rise, even though California has one of the most progressive tax structures in the nation. Small businesses, like plumbing contractors, auto repair shops, and restaurants that account for over 90 percent of the state’s businesses are a key rung on 

and now let's compute the ROUGE metrics

In [None]:
import evaluate

rouge = evaluate.load("rouge")

rouge.compute(predictions=predictions_flat,references=actual_references)

{'rouge1': 0.17187283476504672,
 'rouge2': 0.0758329573249116,
 'rougeL': 0.12923816949402564,
 'rougeLsum': 0.14857043405751158}

These seem to indicate there isn't a lot of overlap between the reference summaries and the predictions.

Keep in mind:
* the model was trained on a different kind of dataset
* we are only using the first part of each bill

## A translation example

Here is a model that translates from Spanish (ES) to English (EN): https://huggingface.co/Helsinki-NLP/opus-mt-es-en

In [None]:
# Use a pipeline as a high-level helper
from transformers import pipeline

spanish_sentence = "una gallina melancólica se sentó en un nido todo el día"
reference_english_sentence = "a broody hen sat in a nesting box all day"


translator = pipeline("translation", model="Helsinki-NLP/opus-mt-es-en")

predicted_sentence = translator(spanish_sentence)

print(predicted_sentence)

[{'translation_text': 'a melancholy hen sat in a nest all day'}]


## Applied Exploration

Go to the Hugging Face models page: https://huggingface.co/models
* Use the same model, but find two different news datasets (https://huggingface.co/datasets), and evaluate them using ROUGE metrics
* For each dataset, record
    - where did it come from?
    - where did the reference summaries come from?
    - how big is it?
    - how big are the texts? Did you have to truncate them?
* Evaluate the performance
    - use the ROUGE metrics
    - describe in your own words how it performed
    - how did they compare to each other?
    - how did they compare to the bills dataset?
    - what do you think is the reason for the difference in performance that you noticed?
    

In [None]:
from transformers import pipeline
from datasets import load_dataset
import evaluate

summarizer = pipeline("summarization", model="facebook/bart-large-cnn") #could also try google/pegasus-xsum
multi_news = load_dataset("multi_news")
news_summary = load_dataset("argilla/news-summary")
rouge = evaluate.load("rouge")

#rouge.compute(predictions=[predicted_sentence],references=[reference_sentence])

## Dataset: [Multi-News](https://huggingface.co/datasets/multi_news)
## Author: [Alexander R. Fabbri, Irene Li, Tianwei She, Suyi Li, Dragomir R. Radev](https://arxiv.org/abs/1906.01749)
## Description:
Multi-News is a dataset of news articles and summaries. The summaries are taken from [newser.com](newser.com), and the original articles are traced back via the sources newser cites. Multi-news contains 56,216 entries. The average character length of an article in the multi-news set is 11,018 characters.

In [None]:
# Mulit-news exploration

multi_lengths = []
for i in range(len(multi_news["train"])):
  multi_lengths.append(len(multi_news["train"][i]["document"]))

print("Max Article Character Length: ", max(multi_lengths))
print("Min Article Character Length: ", min(multi_lengths))
print("Avg Article Character Length: ", sum(multi_lengths)/len(multi_lengths))

Max Article Length:  2916289
Min Article Length:  0
Avg Article Length:  11017.722071511162


In [None]:
# Multi-news processing
multi_news_articles = []

samples_num = 10
for i in range(samples_num):
  multi_news_articles.append(multi_news["test"][i]["document"][:4000])

multi_news_predicted = summarizer(multi_news_articles)

multi_news_predicted_flat = []

for result in multi_news_predicted:
    multi_news_predicted_flat.append(result["summary_text"])

truth_multi_news = []
for i in range(samples_num):
  truth_multi_news.append(multi_news["test"][i]["summary"])

multi_news_rouge = rouge.compute(predictions=multi_news_predicted_flat,references=truth_multi_news)
print("Rouge 1: ", multi_news_rouge['rouge1'])
print("Rouge 2: ", multi_news_rouge['rouge2'])
print("Rouge L: ", multi_news_rouge['rougeL'])
print("Rouge Lsum: ", multi_news_rouge['rougeLsum'])

Rouge 1:  0.24693780812987912
Rouge 2:  0.07684285757969869
Rouge L:  0.14554465580373768
Rouge Lsum:  0.1460642944336868


## Dataset: [News-Summary](https://huggingface.co/datasets/argilla/news-summary)
## Author: [Argilla](https://huggingface.co/argilla)
## Description:

News-summary is a news article/summary dataset created by Argilla, an open-source data platform for large language models. The dataset appears to be taken from an [academic dataset](https://link.springer.com/chapter/10.1007/978-3-319-69155-8_9) originally intended for fake news classification, but was repurposed for summarization. As such it includes articles from [Reuters.com](Reuters.com), but also fake news articles from a kaggle dataset that took articles and classified them as real or fake based on their [Poltifact](https://www.politifact.com/) score. The dataset has 21,417 entries, and an average article length of 2,362 characters

In [None]:
# News Summary exploration

news_lengths = []
for i in range(len(news_summary["train"])):
  news_lengths.append(len(news_summary["train"][i]["text"]))

print("Max Article Character Length: ", max(news_lengths))
print("Min Article Character Length: ", min(news_lengths))
print("Avg Article Character Length: ", sum(news_lengths)/len(news_lengths))

Max Article Character Length:  12356
Min Article Character Length:  168
Avg Article Character Length:  2362.77


In [None]:
# News Summary processing
news_summary_articles = []

samples_num = 10
for i in range(samples_num):
  news_summary_articles.append(news_summary["test"][i]["text"][:4000])

news_summary_predicted = summarizer(news_summary_articles)

news_summary_predicted_flat = []

for result in news_summary_predicted:
    news_summary_predicted_flat.append(result["summary_text"])

truth_news_summary = []
for i in range(samples_num):
  truth_news_summary.append(news_summary["test"][i]["prediction"][0]["text"])

news_summary_rouge = rouge.compute(predictions=news_summary_predicted_flat,references=truth_news_summary)
print("Rouge 1: ", news_summary_rouge['rouge1'])
print("Rouge 2: ", news_summary_rouge['rouge2'])
print("Rouge L: ", news_summary_rouge['rougeL'])
print("Rouge Lsum: ", news_summary_rouge['rougeLsum'])

Your max_length is set to 142, but your input_length is only 129. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=64)
Your max_length is set to 142, but your input_length is only 98. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=49)
Your max_length is set to 142, but your input_length is only 110. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=55)


Rouge 1:  0.1383234086175263
Rouge 2:  0.042929390335369126
Rouge L:  0.11576156884980413
Rouge Lsum:  0.1160744034273446


### Conclusions
Observing the Rouge metrics for the bart-large-cnn summarizer, we can see that it generally performed the best on the multi-news dataset. According to Rouge, bart-large-cnn got a 24.5% overlap in Unigrams on the Multi-News set, and .145 for the longest generated substring of summary text that matches the official summary, which is the highest score for these categories. Technically bart-large-cnn performed better on Billsum when measuring by strictly bigrams, but it got a 7.6% overlap on Billsum, compared to a 7.5% on Multi-News, so the difference is insignificant, especially when considering the other Rouge metrics were better for Multi-News. When looking at the Lsum score, bart-large-cnn performs best on News Summary, however this metric is just a weighted form of Rouge-L, and is noticably  significant of an indicator than the other Rouge metrics displayed. It is possible the News Summary dataset is of lower quality than the other two due to being repurposed from a dataset designed for another task.

## Performance Evaluation

|    Rouge Metrics   | BillSum | Multi-News | News-Summary |
| ----------- | :-----------: | :-----------: | :-----------: |
| Rouge 1    | .171 | .247 | .138 |
| Rouge 2    | .076 | .077 | .043 |
| Rouge L    | .129 | .146 | .116 |
| Rouge Lsum | .148 | .146 | .116 |

## An Idea for Creative Synthesis

Write some code that lets the user type in a web address (like a Wikipedia article) and generate a summary for the whole page.
* you will have to experiment with different ideas of how to get summaries for longer texts
    - come up with your own ideas
    - research how others handle it and try those
    - you might find that combining more than one kind of model can be helpful

Record your results and discuss it at the demo!

## Question Answering

[roberta-based model](https://huggingface.co/deepset/roberta-base-squad2) trained on the [SQuAD2.0](https://huggingface.co/datasets/squad_v2) question answering data set

Requires two inputs
* a question
* context - where to find the answer

Returns
* an answer
* a location where you can find the answer in the context

In [None]:
from transformers import pipeline

model_name = "deepset/roberta-base-squad2"

# a) Get predictions
nlp = pipeline('question-answering', model=model_name, tokenizer=model_name)
QA_input = {
    'question': 'Can colleges take race into account when making admissions decisions?',
    'context': times_delphic_story
}
res = nlp(QA_input)
print(res)

{'score': 0.1444220393896103, 'start': 1416, 'end': 1433, 'answer': 'we don’t know yet'}


In [None]:
print( times_delphic_story[1416:1433] )
print( times_delphic_story[1200:1500] )

we don’t know yet
Court “basically has embraced an idea that it calls colorblindness.”
“If you take their principle of colorblindness and extend it beyond universities, to other places, it could raise some problems,” Kende said. “But we don’t know yet.”
Financial aid programs that prioritize applicants of a particula


### Let's try another question

In [None]:
QA_input2 = {
    'question' : "Which kinds of schools are most affected by the Supreme Court's affirmative action ruling?",
    'context': times_delphic_story
}
res = nlp(QA_input2)
print(res)

{'score': 0.035478729754686356, 'start': 671, 'end': 686, 'answer': 'Harvard and UNC'}


In [None]:
print( times_delphic_story[671:686] )
print( times_delphic_story[500:800] )

Harvard and UNC
 institutions that limit the number of incoming students,” Mattison said. “So that doesn’t apply to Drake and most institutions across the country.”
She said schools like Harvard and UNC have enough applicants that they can pick and choose which applicants fill a certain number of spots.
Drake’s adm


The answer I was hoping for was `"highly selective institutions"`.

### How you ask the question seems to have an impact on the answer it finds

In [None]:
QA_input3 = {
    'question' : "Does Drake consider race when deciding to admit a student?",
    'context': times_delphic_story
}
res = nlp(QA_input3)
print(res)

{'score': 0.1436648666858673, 'start': 1416, 'end': 1433, 'answer': 'we don’t know yet'}


In [None]:
QA_input4 = {
    'question' : "At Drake, does race have an impact on the admissions decision?",
    'context': times_delphic_story
}
res = nlp(QA_input4)
print(res)

{'score': 0.10744316130876541, 'start': 995, 'end': 1048, 'answer': 'it does not have an impact on the admissions decision'}


### Discussion question:

What are some ways you can think of for evaluating question answering models?

# Creative Synthesis

In [None]:
import re, random
from urllib.request import urlopen
from bs4 import BeautifulSoup
from datetime import date
from transformers import pipeline

summarizer = pipeline("summarization", model="facebook/bart-large-cnn")

url = "https://www.themoscowtimes.com/"
html = urlopen(url).read()
soup = BeautifulSoup(html, features="html.parser")

urls_raw = []
for link in soup.find_all('a'):
    urls_raw.append(link.get('href'))

# Clean urls
urls = []
for url in urls_raw:
    if url != None:
      url = url.split(" ", 1)[0]
      urls.append(url)


# Fetch the current year/month for finding articles
today = date.today()
year = str(today.strftime("%Y"))
month = str(today.strftime("%m"))

# Root of link for moscowtimes content
news_site_root = "https://www.themoscowtimes.com/"

# Root of link for articles
article_root = news_site_root + year + "/" + month

news_article_links = []
for url in urls:
  if url.find(article_root) == 0:
    news_article_links.append(url)

selected_articles = random.sample(news_article_links, 4)

weekly_summary = ""

for article in selected_articles:
  article_url = article
  html = urlopen(article_url).read()
  soup = BeautifulSoup(html, features="html.parser")

  text = soup.find('div', class_ = "article__content").get_text()

  # break into lines and remove leading and trailing space on each
  lines = (line.strip() for line in text.splitlines())
  # break multi-headlines into a line each
  chunks = (phrase.strip() for line in lines for phrase in line.split("  "))
  # drop blank lines
  text = '\n'.join(chunk for chunk in chunks if chunk)

  weekly_summary = weekly_summary + summarizer(text[:4000])[0]["summary_text"] + "\n"

print(weekly_summary)

Russia has wrapped up its regional elections, held over a three-day period. The results showed sweeping wins for the ruling United Russia party. But what made these elections particularly noteworthy was what the Kremlin might have learned from the process which it will take forward for the upcoming presidential elections in 2024.
Yevgeniya Baltatarova is one of at least five political activists and 12 military deserters who fled to Kazakhstan in hopes of escaping criminal prosecution in Russia. Kazakhstan’s migration service said that as many as 2.9 million Russian citizens crossed the border with Kazakhstan that year, though both official and independent reports estimate the number to be around 100,000.
Yaroslav Dronov, 31, better known by his stage name, Shaman, has become one of Russia's most well-known pop stars. Critics say the singer is acting as part of the Kremlin's propaganda machine with his "patriotic" songs. Fans appreciate his lyrics for conveying a sense of national pride

In [None]:
# Drone forecast
import re, random
from urllib.request import urlopen
from bs4 import BeautifulSoup
from datetime import date
from transformers import pipeline

# List of Russian Federal States that are near Ukraine
russian_territories = ["kursk", "belgorod", "volgograd", "oryol", "tver", "tambov", "moscow", "voronezh", "rostov", "pskov", "ryazan", "kaluga", "lipetsk", "krasnodar", "adygea", "smolensk", "tula", "bryansk"]

summarizer = pipeline("summarization", model="facebook/bart-large-cnn")

url = "https://www.themoscowtimes.com/ukraine-war"
html = urlopen(url).read()
soup = BeautifulSoup(html, features="html.parser")

urls_raw = []
for link in soup.find_all('a'):
    urls_raw.append(link.get('href'))

# Clean urls
urls = []
for url in urls_raw:
    if url != None:
      url = url.split(" ", 1)[0]
      urls.append(url)


# Fetch the current year/month for finding articles
today = date.today()
year = str(today.strftime("%Y"))
month = str(today.strftime("%m"))

# Root of link for moscowtimes content
news_site_root = "https://www.themoscowtimes.com/"

# Root of link for articles
article_root = news_site_root + year + "/" + month

news_article_links = []
for url in urls:
  if url.find(article_root) == 0:
    news_article_links.append(url)

#drone_articles = []
drone_forecast = []
for link in news_article_links:
  html = urlopen(link).read()
  soup = BeautifulSoup(html, features="html.parser")
  text = soup.find('div', class_ = "article__content").get_text()

  # break into lines and remove leading and trailing space on each
  lines = (line.strip() for line in text.splitlines())
  # break multi-headlines into a line each
  chunks = (phrase.strip() for line in lines for phrase in line.split("  "))
  # drop blank lines
  text = '\n'.join(chunk for chunk in chunks if chunk)

  if "drone" in text:
    text = text.lower()
    if any(territory in text.lower() for territory in russian_territories):
      #drone_articles.append(text)
      for territory in russian_territories:
        if territory in text and territory not in drone_forecast:
          drone_forecast.append(territory)

#for article in drone_articles:
#  print(article)

print(drone_forecast)