In [1]:
%reload_ext autoreload
%autoreload 2
%matplotlib inline

In [1]:
import spacy
spacy_nlp = spacy.load('en_core_web_md')
abstract = """Developments in hybrid propulsion technology over the past several decades have made these motors attractive candidates for a variety of applications. In the past, they have been overlooked due to the low regression rate of classical hybrid fuels or in favor of the heritage and commercial availability of liquid or solid propulsion systems. The slow burning rate translates into either a reduced thrust level or the requirement for a complicated, multi-port fuel grain to increase the available burning surface area. These major disadvantages can be mitigated through the use of liquefying hybrid fuels, such as paraffin. Typically, this increase is enough to achieve desired thrust levels with a simple, single port design. Benefits unique to the paraffin-based hybrid design makes it a competitive and viable option for solar system exploration missions. Two specific examples are included to illustrate the advantages of hybrids for solar system exploration. A hybrid design for a Mars Ascent Vehicle as part of a sample return campaign takes advantage of paraffin's tolerance to low and variable temperatures. Hybrid propulsion systems are well suited for planetary orbit insertion because of their ability to throttle, stop and restart at high thrust levels. The high regression rates of liquefying hybrid fuels are due to a fuel entrainment mass transfer mechanism. The design, assembly and results of an experiment to visualize this mechanism are presented. A combustion chamber with three windows allows visual access to the combustion process. A flow conditioning system is employed to create a uniform oxidizer flow at the entrance to the combustion chamber. Experimental visualization of entrainment mass transfer will enable the improvement of combustion models and therefore future hybrid designs."""

In [3]:
import pandas as pd
fast_topics = pd.read_csv('/Users/jpnelson/2020/sul-dlss/ai-etd/data/topic_uri_label_utf8.csv', names=['URI', 'Label'])

In [26]:
topic_labels = {}
for row in fast_topics.iterrows():
    topic_labels[row[1]['URI']] = [row[1]['Label'],]

In [None]:
topic_labels

In [27]:
from spacy_lookup import Entity
topic_entity = Entity(keywords_dict=topic_labels, label="FAST_TOPICS")

In [28]:
spacy_nlp.add_pipe(topic_entity)

In [29]:
spacy_nlp.remove_pipe("ner")

('ner', <spacy.pipeline.pipes.EntityRecognizer at 0x7f9ef20c9940>)

In [30]:
doc = spacy_nlp(abstract)

In [117]:
for ent in doc.ents:
    print(ent.text, topic_entity.keyword_processor.get_keyword(ent.text))

technology http://id.worldcat.org/fast/1145078
motors http://id.worldcat.org/fast/1028189
propulsion systems http://id.worldcat.org/fast/1079285
fuel http://id.worldcat.org/fast/935806
grain http://id.worldcat.org/fast/945891
design http://id.worldcat.org/fast/891253
design http://id.worldcat.org/fast/891253
missions http://id.worldcat.org/fast/1023771
design http://id.worldcat.org/fast/891253
propulsion systems http://id.worldcat.org/fast/1079285
ability http://id.worldcat.org/fast/794400
fuel http://id.worldcat.org/fast/935806
mass transfer http://id.worldcat.org/fast/1011450
design http://id.worldcat.org/fast/891253
combustion http://id.worldcat.org/fast/869027
windows http://id.worldcat.org/fast/1175789
combustion http://id.worldcat.org/fast/869027
process http://id.worldcat.org/fast/1078016
combustion http://id.worldcat.org/fast/869027
visualization http://id.worldcat.org/fast/1168121
mass transfer http://id.worldcat.org/fast/1011450
will http://id.worldcat.org/fast/1198525
combus

In [86]:
from spacy import displacy

In [88]:
displacy.render(doc, style='ent')

In [115]:
topic_entity.keyword_processor.get_keyword('mass transfer')

'http://id.worldcat.org/fast/1011450'

In [118]:
len(topic_entity.keyword_processor)

460098

In [24]:
for row in topic_labels:
    if 'propulsion systems' in row.lower():
        print(row)

Artificial satellites--Propulsion systems
Aerobee rockets--Propulsion systems
Aerobee rockets--Propulsion systems--Failures
Large space structures (Astronautics)--Propulsion systems
Guided missiles--Propulsion systems
Nanosatellites--Propulsion systems
Propulsion systems
Space shuttles--Propulsion systems
Space shuttles--Propulsion systems--Design and construction
Space shuttles--Propulsion systems--Environmental aspects
Space shuttles--Propulsion systems--Testing
Space vehicles--Propulsion systems
Space vehicles--Propulsion systems--Automatic control
Space vehicles--Propulsion systems--Computer simulation
Delta launch vehicles--Propulsion systems--Failures
Delta launch vehicles--Propulsion systems
Space vehicles--Propulsion systems--Design and construction
Space vehicles--Propulsion systems--Environmental aspects
Space vehicles--Propulsion systems--Materials
Space vehicles--Propulsion systems--Mathematical models
Space vehicles--Propulsion systems--Research
Space vehicles--Propulsion 

In [13]:
import stanza
stanza_ner = stanza.Pipeline(lang='en', processors='tokenize,ner')
stanza_doc = stanza_ner(abstract)

2020-08-17 19:09:53 INFO: Loading these models for language: en (English):
| Processor | Package   |
-------------------------
| tokenize  | ewt       |
| ner       | ontonotes |

2020-08-17 19:09:53 INFO: Use device: cpu
2020-08-17 19:09:53 INFO: Loading: tokenize
2020-08-17 19:09:53 INFO: Loading: ner
2020-08-17 19:09:54 INFO: Done loading processors!


In [15]:
print(*[f'entity: {ent.text}\ttype: {ent.type}' for ent in stanza_doc.ents], sep='\n')

entity: the past several decades	type: DATE
entity: Two	type: CARDINAL
entity: a Mars Ascent Vehicle	type: PRODUCT
entity: three	type: CARDINAL


In [16]:
from transformers import pipeline
hugs_ner = pipeline('ner')

In [17]:
hugs_doc = hugs_ner(abstract)

In [18]:
hugs_doc

[{'word': 'Mars',
  'score': 0.9837314486503601,
  'entity': 'I-MISC',
  'index': 180},
 {'word': 'As', 'score': 0.9873371124267578, 'entity': 'I-MISC', 'index': 181},
 {'word': '##cent',
  'score': 0.9526634216308594,
  'entity': 'I-MISC',
  'index': 182},
 {'word': 'Vehicle',
  'score': 0.9836892485618591,
  'entity': 'I-MISC',
  'index': 183}]

In [29]:
topic_kb = spacy.KnowledgeBase(fast_topics)

AttributeError: module 'spacy' has no attribute 'KnowledgeBase'

In [119]:
doc2 = spacy_nlp("""People often engage in behaviors that benefit both themselves and others. In particular, people frequently receive something in exchange for their prosocial behavior. These self-interested benefits can take the form of tangible items, feelings of moral self-regard, or a positive image in the eyes of others. I explore how people navigate these various motives and their effects on prosocial decision making. Chapter 1 examines the inconsistency in existing research showing that appeals to self-interest sometimes increase and sometimes decrease prosocial behavior. I propose that this inconsistency is in part due to the framings of these appeals. Different framings generate different salient reference points, leading to different assessments of the appeal. Study 1 demonstrates that buying an item with the proceeds going to charity evokes a different set of alternative behaviors than donating and receiving an item in return. Studies 2 and 3a-g establish that people are more willing to act, and give more when they do, when reading the former framing than the latter. Study 4 establishes ecological validity by replicating the effect in a field experiment assessing participants' actual charitable contributions. Finally, Study 5 provides additional process evidence via moderation for the proposed mechanism. Chapter 2 further examines how the motivation to feel moral guides people's behavior. I propose that people's efforts to preserve their moral self-regard conform to a moral threshold model. This model predicts that people are primarily concerned with whether their prosocial behavior legitimates the claim that they have acted morally, a claim that often diverges from whether their behavior is in the best interests of the recipient of the prosocial behavior. Specifically, it predicts that for people to feel moral following a prosocial decision, that decision need not have promised the greatest benefit for the recipient but only one larger than at least one other available outcome. Moreover, this model predicts that once people produce a benefit that exceeds this threshold, their moral self-regard is relatively insensitive to the magnitude of benefit that they produce. In seven studies, I test this moral threshold model by examining people's prosocial risk decisions. I find that, compared to risky egoistic decisions, people systematically avoid making risky prosocial decisions that carry the possibility of producing the worst possible outcome in a choice set—even when those decisions are objectively superior. I further find that people's greater aversion to producing the worst possible outcome when the beneficiary is a prosocial cause leads their prosocial (vs. egoistic) risk decisions to be less sensitive to those decisions' maximum possible benefit. Finally, Chapter 3 explores the potential drawbacks that come with behaving prosocially in public. Specifically, I argue that being identified for one's prosocial behavior can sometimes crowd out feelings of moral self-regard. This in turn, leads to a preference for private acts of prosociality over public ones. Five studies provide evidence that, when given the option between engaging in prosocial behavior in public or in private, people often choose the latter—contrary to prior work. In further support of a crowding out effect, people perceived private prosocial behavior to be more moral than public prosocial behavior. However, this difference in morality between public and private behavior was malleable and depended on the salient comparison point used, providing evidence that contextual factors play a role in how the identifiability of a prosocial act affects one's moral self-regard.""")

In [122]:
doc2.ents

(exchange,
 self,
 self,
 decision making,
 research,
 self-interest,
 charity,
 reading,
 process,
 evidence,
 moderation,
 self,
 self,
 risk,
 possibility,
 aversion,
 risk,
 drawbacks,
 self,
 evidence,
 work,
 evidence,
 play,
 self)