# CS 195: Natural Language Processing
## Question Answering

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ericmanley/f23-CS195NLP/blob/main/F2_3_QuestionAnswering.ipynb)


## References

Hugging Face Task Guide on Question Answering: https://huggingface.co/docs/transformers/tasks/question_answering


## Installing necessary modules

In [None]:
import sys
!{sys.executable} -m pip install transformers datasets evaluate rouge_score

Collecting transformers
  Downloading transformers-4.33.2-py3-none-any.whl (7.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.6/7.6 MB[0m [31m35.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting datasets
  Downloading datasets-2.14.5-py3-none-any.whl (519 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m519.6/519.6 kB[0m [31m36.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting evaluate
  Downloading evaluate-0.4.0-py3-none-any.whl (81 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m81.4/81.4 kB[0m [31m10.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting rouge_score
  Downloading rouge_score-0.1.2.tar.gz (17 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting huggingface-hub<1.0,>=0.15.1 (from transformers)
  Downloading huggingface_hub-0.17.2-py3-none-any.whl (294 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m294.9/294.9 kB[0m [31m14.4 MB/s[0m eta [36m0:00:00[0m
Collecting tokenizer

## Question Answering

[roberta-based model](https://huggingface.co/deepset/roberta-base-squad2) trained on the [SQuAD2.0](https://huggingface.co/datasets/squad_v2) question answering data set

Requires two inputs
* a question
* context - where to find the answer

Returns
* an answer
* a location where you can find the answer in the context

In [None]:
times_delphic_story = """
How does the Supreme Court ruling on affirmative action affect Drake?
The answer has little to do with affirmative action.
Over the summer, the Supreme Court ruled against the admissions programs of Harvard University and the University of North Carolina in an affirmative action decision. Before the decision, race already wasn’t a factor in Drake University admissions, according to Provost Sue Mattison.
“Affirmative action, with regards to admissions, only impacts those really highly selective institutions that limit the number of incoming students,” Mattison said. “So that doesn’t apply to Drake and most institutions across the country.”
She said schools like Harvard and UNC have enough applicants that they can pick and choose which applicants fill a certain number of spots.
Drake’s admissions team found that the university has “admitted all students who have a 3.0 high school GPA or [higher],” Mattison said. “Even though we’ve asked for a person’s race on the admissions form, it does not have an impact on the admissions decision, and it doesn’t displace anybody.”
Possible effects of the court’s ruling
Mark Kende, director of Drake’s Constitutional Law Center, said the Supreme Court “basically has embraced an idea that it calls colorblindness.”
“If you take their principle of colorblindness and extend it beyond universities, to other places, it could raise some problems,” Kende said. “But we don’t know yet.”
Financial aid programs that prioritize applicants of a particular race over another are more vulnerable after the court’s decision, according to Kende. He said it’s not clear what impact the decision might have on university hiring practices that consider an employee’s race, as well as corporations’ diversity programs.
Following the Supreme Court’s decision, Missouri Attorney General Andrew Bailey said Missouri institutions subject to the U.S. Constitution or Title VI must stop using race-based standards “to make decisions about things like admissions, scholarships, programs and employment.”
The University of Missouri System said that “a small number of our programs and scholarships have used race/ethnicity as a factor for admissions and scholarships,” and that “these practices will be discontinued.”
Drake is taking a different approach in the wake of the affirmative action decision. The university is monitoring maybe about forty to fifty scholarships, according to Ryan Zantingh, Drake’s director of financial aid. This is more in anticipation of a comparable case on financial aid that considers race, rather than a reaction to the affirmative action ruling.
Mattison said she thinks Drake is still trying to determine how the Supreme Court decision will impact Drake’s Crew Scholars program, which is for incoming students of color.
“There are ways that we can ensure that we continue Crew Scholars while still being compliant,” Mattison said.
Donors for some Drake scholarships specified that they wanted to support a student of color or a woman in a STEM field, Mattison said.
“And so we’re still working through what that actually means, and what we have to do to continue to achieve the values that we expect,” Mattison said. “There are ways that we can change the wording of some of the scholarships.”
Like all students, students of color may qualify for scholarships for first-generation students or students with financial need.
“There’s a lot of overlap between students of color and other areas where financial aid is directed,” Zantingh said. “Scholarship resources can be directed [to financial need or first generation status] and still reach the same students.”
Even if there is a ruling on financial aid that’s comparable to the affirmative action decision, Zantingh doesn’t expect a large impact on Drake financial aid from either decision.
“There may be some implications, but I think the overall general effect on students will be little to none,” Zantingh said.
Zantingh gave an example of scholarship language offered by legal counsel. If a scholarship is for only minority students, it might become a scholarship that gives preference to students who demonstrate a commitment to Drake’s vision for diversity on campus.
“If a white student is actively involved in anti-racist leadership here on campus, certainly they would fit that description then, wouldn’t they?” Zantingh said. “Basically, the language would not seek to exclude any particular protected class categorically.”
In some cases, a donor might be unwilling to change the scholarship’s language or be deceased, Zantingh said. If a donor is deceased, a judge might approve changes. He said he doesn’t expect Drake to cut any of the scholarships it is monitoring.
“The scholarship criteria would have to change, or the dollars would have to be repurposed in another way. Per either the donor or a court’s approval,” Zantingh said.
Race can still play a role in college admissions
The Supreme Court left at least one legal path open for race to play a role in college admissions.
When admitting students, universities are allowed to consider “an applicant’s discussion of how race affected his or her life, be it through discrimination, inspiration or otherwise,” Chief Justice John Roberts wrote in the Court’s decision. However, “the student must be treated based on his or her experiences as an individual — not on the basis of race.”
A student’s story can emerge without Drake asking for it, according to Dean of Admissions Joel Johnson.
“Especially if they’ve overcome a lot, or it’s so key to their identity… it’ll come out on its own,” Johnson said. “I don’t know if I could say the Supreme Court protected it. They couldn’t have stopped it, honestly.”
Johnson said that caring about diversity also means intentionally recruiting a diverse group of students. He said students can’t join Drake if they never apply in the first place.
In the wake of the Supreme Court’s decision on affirmative action, The Times-Delphic is publishing a series. Check next week’s paper for an article about legacy admissions and legacy financial aid with a Drake focus.

"""

In [None]:
from transformers import pipeline

model_name = "deepset/roberta-base-squad2"

# a) Get predictions
nlp = pipeline('question-answering', model=model_name, tokenizer=model_name)
QA_input = {
    'question': 'Can colleges take race into account when making admissions decisions?',
    'context': times_delphic_story
}
res = nlp(QA_input)
print(res)

{'score': 0.10679316520690918, 'start': 1414, 'end': 1431, 'answer': 'we don’t know yet'}


In [None]:
print( times_delphic_story[1414:1433] )
print( times_delphic_story[1198:1500] )

we don’t know yet.”
Court “basically has embraced an idea that it calls colorblindness.”
“If you take their principle of colorblindness and extend it beyond universities, to other places, it could raise some problems,” Kende said. “But we don’t know yet.”
Financial aid programs that prioritize applicants of a particular 


### Let's try another question

In [None]:
QA_input2 = {
    'question' : "Which kinds of schools are most affected by the Supreme Court's affirmative action ruling?",
    'context': times_delphic_story
}
res = nlp(QA_input2)
print(res)

{'score': 0.030723538249731064, 'start': 670, 'end': 685, 'answer': 'Harvard and UNC'}


In [None]:
print( times_delphic_story[670:686] )
print( times_delphic_story[500:800] )

Harvard and UNC 
institutions that limit the number of incoming students,” Mattison said. “So that doesn’t apply to Drake and most institutions across the country.”
She said schools like Harvard and UNC have enough applicants that they can pick and choose which applicants fill a certain number of spots.
Drake’s admi


The answer I was hoping for was `"highly selective institutions"`.

### How you ask the question seems to have an impact on the answer it finds

In [None]:
QA_input3 = {
    'question' : "Does Drake consider race when deciding to admit a student?",
    'context': times_delphic_story
}
res = nlp(QA_input3)
print(res)

{'score': 0.0995897501707077, 'start': 1414, 'end': 1431, 'answer': 'we don’t know yet'}


In [None]:
QA_input4 = {
    'question' : "At Drake, does race have an impact on the admissions decision?",
    'context': times_delphic_story
}
res = nlp(QA_input4)
print(res)

{'score': 0.10402798652648926, 'start': 994, 'end': 1047, 'answer': 'it does not have an impact on the admissions decision'}


## Discussion question:

What are some ways you can think of for evaluating question answering models?

## Group Exercise

Find a question answering *dataset* on Hugging Face. Test out some of the examples from the data set using metrics we decided on.

In [None]:
from transformers import pipeline
from datasets import load_dataset
import evaluate

fairy_tale = load_dataset("WorkInTheDark/FairytaleQA")

Downloading builder script:   0%|          | 0.00/6.96k [00:00<?, ?B/s]

Downloading readme:   0%|          | 0.00/4.22k [00:00<?, ?B/s]

Downloading data files:   0%|          | 0/3 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/10.2M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/1.20M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/1.17M [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/3 [00:00<?, ?it/s]

Generating train split: 0 examples [00:00, ? examples/s]

Generating validation split: 0 examples [00:00, ? examples/s]

Generating test split: 0 examples [00:00, ? examples/s]

In [None]:
#print(fairy_tale["train"][0]['story_section'])

QA_inputFT = {
    'question' : "Where could the child not go?",
    'context': fairy_tale["train"][0]['story_section']
}
res = nlp(QA_inputFT)
print(res)


QA_inputFT = {
    'question' : "What would happen if the child went outside?",
    'context': fairy_tale["train"][0]['story_section']
}
res = nlp(QA_inputFT)
print(res)

QA_inputFT = {
    'question' : "Did the king take precautions against the old woman's words?",
    'context': fairy_tale["train"][0]['story_section']
}
res = nlp(QA_inputFT)
print(res)

{'score': 0.05150960385799408, 'start': 551, 'end': 569, 'answer': 'under the open sky'}
{'score': 0.05933848023414612, 'start': 629, 'end': 663, 'answer': 'the mountain troll would fetch her'}
{'score': 0.10156332701444626, 'start': 691, 'end': 717, 'answer': 'he took her words to heart'}


## Applied Exploration

Choose a Question Answering model from Hugging Face (you may use the one we used in class). Set up an experiment to answer the following question: How does the length of the context affect the performance of the model?

Answer the following questions:
* What dataset(s) did you use (provide links)?
* Describe the kinds of questions and answers that appear in this data. How do the lengths of the context vary? Maybe provide a histogram that describes this.
* What metrics did you use? Why did you choose those?
* What were your results? Describe what you found and any additional take-aways.

In [None]:
model_name = "AlexKay/xlm-roberta-large-qa-multilingual-finedtuned-ru"

nlp = pipeline('question-answering', model=model_name, tokenizer=model_name)
QA_input = {
    'question': 'Can colleges take race into account when making admissions decisions?',
    'context': times_delphic_story
}
res = nlp(QA_input)
print(res)

Downloading (…)lve/main/config.json:   0%|          | 0.00/781 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/2.24G [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/516 [00:00<?, ?B/s]

Downloading (…)tencepiece.bpe.model:   0%|          | 0.00/5.07M [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/9.10M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/150 [00:00<?, ?B/s]

{'score': 0.24788200855255127, 'start': 1413, 'end': 1433, 'answer': ' we don’t know yet.”'}


In [None]:
Trofim_Lysenko = "Трофи́м Дени́сович Лысе́нко (17 [29] сентября 1898, Карловка, Полтавская губерния[3] — 20 ноября 1976[1][2][…], Киев[4]) — украинский и советский агроном и биолог[5]. Основатель и крупнейший представитель псевдонаучного[6][7][8][9][10][11][12] направления в биологии — мичуринской агробиологии[13], академик АН СССР (1939), академик АН УССР (1934), академик ВАСХНИЛ (1935). Герой Социалистического Труда (1945). Лауреат трёх Сталинских премий первой степени (1941, 1943, 1949). В 1934 году назначен научным руководителем, а в 1936 году — директором Всесоюзного селекционно-генетического института в Одессе. Директор Института генетики АН СССР с 1940 по 1965 год[14]. Как агроном, Трофим Лысенко предложил и пропагандировал ряд агротехнических приёмов (яровизация, чеканка хлопчатника, летние посадки картофеля)[5]. Большинство методик, предложенных Лысенко, были подвергнуты критике такими учёными, как П. Н. Константинов, А. А. Любищев, П. И. Лисицын и другие, ещё в период их широкого внедрения в советском сельском хозяйстве. Выявляя общие недостатки теорий и агрономических методик Лысенко, его научные оппоненты также осуждали его за разрыв с мировой наукой и хозяйственной практикой[15]. Некоторые методики (как, например, методика борьбы со свекловичным долгоносиком, предложенная венгерским энтомологом Яблоновским[16]) были известны ещё задолго до Лысенко, однако не оправдали ожиданий или являлись устаревшими[16]. Автор теории стадийного развития растений[5]. Отвергал менделевскую генетику и хромосомную теорию наследственности. С именем Лысенко связана кампания гонений против учёных-генетиков, а также против его оппонентов, не признававших «мичуринскую генетику»[17]. Поддерживал теорию О. Б. Лепешинской о новообразовании клеток из не имеющего клеточной структуры «живого вещества»[18][19], впоследствии признанную псевдонаучной[20][21][22]."

Q = "в каком году родился трофим?"

QA_input = {
    'question': Q,
    'context': Trofim_Lysenko
}

res = nlp(QA_input)
print(res)

{'score': 0.676340639591217, 'start': 27, 'end': 51, 'answer': ' (17 [29] сентября 1898,'}


In [None]:
Nikolai_Vavilov =  "Никола́й Ива́нович Вави́лов (13 [25] ноября 1887, Москва, Российская империя — 26 января 1943, Саратов, СССР) — русский и советский учёный-генетик, ботаник, селекционер, химик, географ, общественный и государственный деятель. Академик АН СССР (1929), АН УССР (1929) и ВАСХНИЛ[5]. Президент (1929—1935), вице-президент (1935—1940) ВАСХНИЛ, президент Всесоюзного географического общества (1931—1940), основатель (1920) и бессменный до момента ареста директор Всесоюзного института растениеводства (1930—1940), директор Института генетики АН СССР (1930—1940), член Экспедиционной комиссии АН СССР, член коллегии Наркомзема СССР, член президиума Всесоюзной ассоциации востоковедения. В 1926—1935 годах член Центрального исполнительного комитета СССР, в 1927—1929 — член Всероссийского Центрального Исполнительного Комитета, член Императорского Православного Палестинского Общества. Организатор и участник ботанико-агрономических экспедиций, охвативших большинство континентов (кроме Австралии и Антарктиды), в ходе которых выявил древние очаги формообразования культурных растений. Создал учение о мировых центрах происхождения культурных растений[6]. Обосновал учение об иммунитете растений, открыл закон гомологических рядов в наследственной изменчивости организмов[7]. Внёс существенный вклад в разработку учения о биологическом виде. Под руководством Вавилова была создана крупнейшая в мире коллекция семян культурных растений. Он заложил основы системы государственных испытаний сортов полевых культур. Сформулировал принципы деятельности главного научного центра страны по аграрным наукам, создал сеть научных учреждений в этой области[8]. Умер в заключении от упадка сердечной деятельности на фоне воспаления лёгких и общего истощения организма[9] в годы Великой Отечественной войны. Учёный был арестован в 1940 году по ложному доносу и незаконно обвинён во вредительстве и связях с оппозиционными политическими группами, в 1941 году — осуждён по статьям УК СССР 58-1, 58-6, 58-11 (вредительство, помощь буржуазным организациям, подготовка или недонесение о готовящихся преступлениях) и приговорён к расстрелу, который впоследствии был заменён 20-летним сроком заключения. Умер учёный в Саратовской тюрьме от пеллагры. В 1955 году посмертно реабилитирован как жертва Сталинских репрессий в рамках кампании по развенчанию «культа личности», инициированной Н. С. Хрущёвым."

Q = "в каком году родился трофим?"

QA_input = {
    'question': Q,
    'context': Trofim_Lysenko
}

res = nlp(QA_input)
print(res)

## What about conversational models?

Some of you have already experimented with the conversational models.

These are more difficult to evaluate than the others we've looked at.

Usually start with a pre-training step like "predict the next/missing word in this sequence"

Fine-tuned with human feedback

Next time, we'll look at a simple model for predicting the next word in a sequence