# CS 195: Natural Language Processing
## Question Answering

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ericmanley/f23-CS195NLP/blob/main/F2_3_QuestionAnswering.ipynb)


## References

Hugging Face Task Guide on Question Answering: https://huggingface.co/docs/transformers/tasks/question_answering


## Installing necessary modules

In [1]:
import sys
!{sys.executable} -m pip install transformers datasets evaluate rouge_score

Defaulting to user installation because normal site-packages is not writeable

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.2.2[0m[39;49m -> [0m[32;49m23.2.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49m/Library/Frameworks/Python.framework/Versions/3.10/bin/python3 -m pip install --upgrade pip[0m


## Question Answering

[roberta-based model](https://huggingface.co/deepset/roberta-base-squad2) trained on the [SQuAD2.0](https://huggingface.co/datasets/squad_v2) question answering data set

Requires two inputs
* a question
* context - where to find the answer

Returns
* an answer
* a location where you can find the answer in the context

In [2]:
times_delphic_story = """
How does the Supreme Court ruling on affirmative action affect Drake?
The answer has little to do with affirmative action.
Over the summer, the Supreme Court ruled against the admissions programs of Harvard University and the University of North Carolina in an affirmative action decision. Before the decision, race already wasn’t a factor in Drake University admissions, according to Provost Sue Mattison. 
“Affirmative action, with regards to admissions, only impacts those really highly selective institutions that limit the number of incoming students,” Mattison said. “So that doesn’t apply to Drake and most institutions across the country.”
She said schools like Harvard and UNC have enough applicants that they can pick and choose which applicants fill a certain number of spots.
Drake’s admissions team found that the university has “admitted all students who have a 3.0 high school GPA or [higher],” Mattison said. “Even though we’ve asked for a person’s race on the admissions form, it does not have an impact on the admissions decision, and it doesn’t displace anybody.”
Possible effects of the court’s ruling 
Mark Kende, director of Drake’s Constitutional Law Center, said the Supreme Court “basically has embraced an idea that it calls colorblindness.”
“If you take their principle of colorblindness and extend it beyond universities, to other places, it could raise some problems,” Kende said. “But we don’t know yet.”
Financial aid programs that prioritize applicants of a particular race over another are more vulnerable after the court’s decision, according to Kende. He said it’s not clear what impact the decision might have on university hiring practices that consider an employee’s race, as well as corporations’ diversity programs.
Following the Supreme Court’s decision, Missouri Attorney General Andrew Bailey said Missouri institutions subject to the U.S. Constitution or Title VI must stop using race-based standards “to make decisions about things like admissions, scholarships, programs and employment.” 
The University of Missouri System said that “a small number of our programs and scholarships have used race/ethnicity as a factor for admissions and scholarships,” and that “these practices will be discontinued.”
Drake is taking a different approach in the wake of the affirmative action decision. The university is monitoring maybe about forty to fifty scholarships, according to Ryan Zantingh, Drake’s director of financial aid. This is more in anticipation of a comparable case on financial aid that considers race, rather than a reaction to the affirmative action ruling.
Mattison said she thinks Drake is still trying to determine how the Supreme Court decision will impact Drake’s Crew Scholars program, which is for incoming students of color.  
“There are ways that we can ensure that we continue Crew Scholars while still being compliant,” Mattison said.
Donors for some Drake scholarships specified that they wanted to support a student of color or a woman in a STEM field, Mattison said.
“And so we’re still working through what that actually means, and what we have to do to continue to achieve the values that we expect,” Mattison said. “There are ways that we can change the wording of some of the scholarships.”
Like all students, students of color may qualify for scholarships for first-generation students or students with financial need. 
“There’s a lot of overlap between students of color and other areas where financial aid is directed,” Zantingh said. “Scholarship resources can be directed [to financial need or first generation status] and still reach the same students.”
Even if there is a ruling on financial aid that’s comparable to the affirmative action decision, Zantingh doesn’t expect a large impact on Drake financial aid from either decision. 
“There may be some implications, but I think the overall general effect on students will be little to none,” Zantingh said. 
Zantingh gave an example of scholarship language offered by legal counsel. If a scholarship is for only minority students, it might become a scholarship that gives preference to students who demonstrate a commitment to Drake’s vision for diversity on campus. 
“If a white student is actively involved in anti-racist leadership here on campus, certainly they would fit that description then, wouldn’t they?” Zantingh said. “Basically, the language would not seek to exclude any particular protected class categorically.”
In some cases, a donor might be unwilling to change the scholarship’s language or be deceased, Zantingh said. If a donor is deceased, a judge might approve changes. He said he doesn’t expect Drake to cut any of the scholarships it is monitoring.
“The scholarship criteria would have to change, or the dollars would have to be repurposed in another way. Per either the donor or a court’s approval,” Zantingh said. 
Race can still play a role in college admissions
The Supreme Court left at least one legal path open for race to play a role in college admissions. 
When admitting students, universities are allowed to consider “an applicant’s discussion of how race affected his or her life, be it through discrimination, inspiration or otherwise,” Chief Justice John Roberts wrote in the Court’s decision. However, “the student must be treated based on his or her experiences as an individual — not on the basis of race.” 
A student’s story can emerge without Drake asking for it, according to Dean of Admissions Joel Johnson. 
“Especially if they’ve overcome a lot, or it’s so key to their identity… it’ll come out on its own,” Johnson said. “I don’t know if I could say the Supreme Court protected it. They couldn’t have stopped it, honestly.”
Johnson said that caring about diversity also means intentionally recruiting a diverse group of students. He said students can’t join Drake if they never apply in the first place.
In the wake of the Supreme Court’s decision on affirmative action, The Times-Delphic is publishing a series. Check next week’s paper for an article about legacy admissions and legacy financial aid with a Drake focus. 

"""

In [3]:
from transformers import pipeline

model_name = "deepset/roberta-base-squad2"

# a) Get predictions
nlp = pipeline('question-answering', model=model_name, tokenizer=model_name)
QA_input = {
    'question': 'Can colleges take race into account when making admissions decisions?',
    'context': times_delphic_story
}
res = nlp(QA_input)
print(res)

{'score': 0.1444220393896103, 'start': 1416, 'end': 1433, 'answer': 'we don’t know yet'}


In [4]:
print( times_delphic_story[1416:1433] )
print( times_delphic_story[1200:1500] )

we don’t know yet
Court “basically has embraced an idea that it calls colorblindness.”
“If you take their principle of colorblindness and extend it beyond universities, to other places, it could raise some problems,” Kende said. “But we don’t know yet.”
Financial aid programs that prioritize applicants of a particula


### Let's try another question

In [5]:
QA_input2 = {
    'question' : "Which kinds of schools are most affected by the Supreme Court's affirmative action ruling?",
    'context': times_delphic_story
}
res = nlp(QA_input2)
print(res)

{'score': 0.035478729754686356, 'start': 671, 'end': 686, 'answer': 'Harvard and UNC'}


In [6]:
print( times_delphic_story[671:686] )
print( times_delphic_story[500:800] )

Harvard and UNC
 institutions that limit the number of incoming students,” Mattison said. “So that doesn’t apply to Drake and most institutions across the country.”
She said schools like Harvard and UNC have enough applicants that they can pick and choose which applicants fill a certain number of spots.
Drake’s adm


The answer I was hoping for was `"highly selective institutions"`.

### How you ask the question seems to have an impact on the answer it finds

In [7]:
QA_input3 = {
    'question' : "Does Drake consider race when deciding to admit a student?",
    'context': times_delphic_story
}
res = nlp(QA_input3)
print(res)

{'score': 0.1436648666858673, 'start': 1416, 'end': 1433, 'answer': 'we don’t know yet'}


In [8]:
QA_input4 = {
    'question' : "At Drake, does race have an impact on the admissions decision?",
    'context': times_delphic_story
}
res = nlp(QA_input4)
print(res)

{'score': 0.10744316130876541, 'start': 995, 'end': 1048, 'answer': 'it does not have an impact on the admissions decision'}


## Discussion question:

What are some ways you can think of for evaluating question answering models?

## Group Exercise

Find a question answering *dataset* on Hugging Face. Test out some of the examples from the data set using metrics we decided on.

## Applied Exploration

Choose a Question Answering model from Hugging Face (you may use the one we used in class). Set up an experiment to answer the following question: How does the length of the context affect the performance of the model?

Answer the following questions:
* What dataset(s) did you use (provide links)?
* Describe the kinds of questions and answers that appear in this data. How do the lengths of the context vary? Maybe provide a histogram that describes this.
* What metrics did you use? Why did you choose those?
* What were your results? Describe what you found and any additional take-aways.

## What about conversational models?

Some of you have already experimented with the conversational models.

These are more difficult to evaluate than the others we've looked at.

Usually start with a pre-training step like "predict the next/missing word in this sequence"

Fine-tuned with human feedback

Next time, we'll look at a simple model for predicting the next word in a sequence