<a href="https://colab.research.google.com/github/kalawinka/miscellaneous/blob/main/MinAck_Demo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Try NER model for acknowledgement texts in scientific articles built with FLAIR NLP-framework**
## Our NER model is able to recognized 6 types of acknoweledged entities <br>
**FUND**   Funding organization <br>
**GRNB**   Grant number<br>
**IND**   Person <br>
**COR**   Corporation <br>
**UNI**   University <br>
**MISC**   Miscellaneous

In [1]:
#install libraries
!pip3 install flair
!pip3 install wget

Collecting flair
  Downloading flair-0.12.2-py3-none-any.whl (373 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m373.1/373.1 kB[0m [31m3.7 MB/s[0m eta [36m0:00:00[0m
Collecting segtok>=1.5.7 (from flair)
  Downloading segtok-1.5.11-py3-none-any.whl (24 kB)
Collecting mpld3==0.3 (from flair)
  Downloading mpld3-0.3.tar.gz (788 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m788.5/788.5 kB[0m [31m7.7 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting sqlitedict>=1.6.0 (from flair)
  Downloading sqlitedict-2.1.0.tar.gz (21 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting deprecated>=1.2.4 (from flair)
  Downloading Deprecated-1.2.14-py2.py3-none-any.whl (9.6 kB)
Collecting boto3 (from flair)
  Downloading boto3-1.28.21-py3-none-any.whl (135 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m135.8/135.8 kB[0m [31m3.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollect

In [2]:
%%time
#import libraries
from flair.data import Sentence
from flair.models import SequenceTagger
import wget

CPU times: user 7.99 s, sys: 1.61 s, total: 9.61 s
Wall time: 14.7 s


In [3]:
%%time
# load the trained model
model = SequenceTagger.load(wget.download('https://zenodo.org/record/5776202/files/final-model.pt?download=1'))

2023-08-08 11:44:01,364 SequenceTagger predicts: Dictionary with 15 tags: O, B-FUND, I-FUND, B-GRNB, I-GRNB, B-MISC, I-MISC, B-UNI, I-UNI, B-IND, I-IND, B-COR, I-COR, <START>, <STOP>
CPU times: user 4.67 s, sys: 3.55 s, total: 8.22 s
Wall time: 2min 13s


In [8]:
model = SequenceTagger.load("kalawinka/flair-ner-acknowledgments")

Downloading pytorch_model.bin:   0%|          | 0.00/419M [00:00<?, ?B/s]

2023-08-08 12:57:21,357 SequenceTagger predicts: Dictionary with 27 tags: O, S-IND, B-IND, E-IND, I-IND, S-FUND, B-FUND, E-FUND, I-FUND, S-GRNB, B-GRNB, E-GRNB, I-GRNB, S-UNI, B-UNI, E-UNI, I-UNI, S-MISC, B-MISC, E-MISC, I-MISC, S-COR, B-COR, E-COR, I-COR, <START>, <STOP>


# **Try the model with our example**

In [9]:
# create example sentence
sentence = Sentence("This work was supported by State Key Lab of Ocean Engineering Shanghai Jiao Tong University and financially supported by China National Scientific and Technology Major Project (Grant No. 2016ZX05028-006-009)")

In [10]:
%%time
# predict the tags
model.predict(sentence)
#print output as a tagged string
print(sentence.to_tagged_string())

Sentence[31]: "This work was supported by State Key Lab of Ocean Engineering Shanghai Jiao Tong University and financially supported by China National Scientific and Technology Major Project (Grant No. 2016ZX05028-006-009)" → ["State Key Lab of Ocean Engineering Shanghai Jiao Tong University"/UNI, "China National Scientific and Technology Major Project"/FUND, "2016ZX05028-006-009"/GRNB]
CPU times: user 3.26 s, sys: 60.4 ms, total: 3.32 s
Wall time: 3.36 s


In [11]:
%%time
# predict the tags
model.predict(sentence)
#print output as spans
for entity in sentence.get_spans('ner'):
    print(entity)

Span[5:15]: "State Key Lab of Ocean Engineering Shanghai Jiao Tong University" → UNI (0.9396)
Span[19:26]: "China National Scientific and Technology Major Project" → FUND (0.9865)
Span[29:30]: "2016ZX05028-006-009" → GRNB (0.9996)
CPU times: user 2.97 s, sys: 76.1 ms, total: 3.05 s
Wall time: 3.04 s


# **Gold standard**
State Key Lab of Ocean Engineering Shanghai Jiao Tong University &ensp; **UNI** <br>
China National Scientific and Technology Major Project &ensp;**FUND** <br>
2016ZX05028-006-009 &ensp; **GRNB**

# **You can also use your own examples**
just type you acknowledgement in the Sentence( ) object.

In [None]:
# create example sentence
sentence = Sentence("The original work was funded by the German Center for Higher Education Research and Science Studies (DZHW) via the project Mining Acknowledgement Texts in Web of Science (MinAck)"



SyntaxError: ignored

In [None]:
%%time
# predict the tags
model.predict(sentence)
#print output as a tagged string
print(sentence.to_tagged_string())

In [None]:
%%time
# predict the tags
model.predict(sentence)
#print output as spans
for entity in sentence.get_spans('ner'):
    print(entity)

Span[8:19]: "National Science Foundation , Science of Science and Innovation Policy Program" → FUND (0.9116)
Span[20:22]: "NSF 09-3281" → GRNB (0.9783)
Span[25:31]: "Ewing Marion Kauffman Foundation Dissertation Fellowship" → FUND (0.9843)
Span[34:48]: "University of North Carolina ( UNC ) Chapel Hill Graduate School Dissertation Completion Fellowship" → UNI (0.9299)
Span[51:53]: "David Hsu" → IND (0.9994)
Span[54:56]: "Rosemarie Ziedonis" → IND (0.9998)
Span[57:59]: "John Hardin" → IND (0.9997)
Span[60:62]: "Virginia Gray" → IND (0.9979)
Span[63:66]: "Jeremy G. Moulton" → IND (0.9996)
Span[67:69]: "Christine Durrance" → IND (0.9993)
Span[70:74]: "Jade V.M . Jenkins" → IND (0.9924)
Span[75:77]: "Alexandra Graddy-Reed" → IND (0.9986)
Span[79:81]: "Jesse Hinde" → IND (0.9995)
Span[101:105]: "2014 Academy of Management" → MISC (0.6552)
Span[106:111]: "2014 Technology Transfer Society Meeting" → MISC (0.802)
Span[112:117]: "2014 Ewing Marion Kauffman Foundation" → MISC (0.8307)
Span[118:12

# **Here you can find some more acknowledgement texts!**
## ***Text 1:***
This work was supported partly by the National Natural Science Foundation of China (51579026 and 51079013), partly by the Program for Liaoning Excellent Talents in University (LR2015007), and partly by the Technology Foundation for Selected Overseas Chinese Scholar, the Ministry of Human Resources and Social Security of the People's Republic of China, and partly by the Fundamental Research Funds for the Central Universities (3132014332).

## **Gold standard**
National Natural Science Foundation of China &ensp; **FUND** <br>
51579026  &ensp; **GRNB** <br>
51079013 &ensp; **GRNB** <br>
Program for Liaoning Excellent Talents in University &ensp; **FUND** <br>
LR2015007 &ensp; **GRNB** <br>
Technology Foundation for Selected Overseas Chinese Scholar &ensp; **GRNB** <br>
Ministry of Human Resources and Social Security of the People's Republic of China &ensp; **FUND** <br>
Fundamental Research Funds for the Central Universities &ensp; **FUND**

## ***Text 2:***
Special thanks to Andy McLennan, with whom we are working on a follow-up article addressing algorithmic aspects of our approach.

## **Gold standard**
Andy McLennan &ensp; **IND**

## ***Text 3:***
This research was funded in part by the National Science Foundation, Science of Science and Innovation Policy Program (NSF 09-3281), the Ewing Marion Kauffman Foundation Dissertation Fellowship, and the University of North Carolina (UNC) Chapel Hill Graduate School Dissertation Completion Fellowship. We thank David Hsu, Rosemarie Ziedonis, John Hardin, Virginia Gray, Jeremy G. Moulton, Christine Durrance, Jade V.M. Jenkins, Alexandra Graddy-Reed, and Jesse Hinde for their comments on earlier versions of this paper. This paper benefited from discussions with seminar participants at the 2014 Academy of Management, 2014 Technology Transfer Society Meeting, 2014 Ewing Marion Kauffman Foundation's Emerging Scholars conference, UNC Chapel Hill Public Policy seminar, Office of Advocacy at the Small Business Administration, Center for Economic Studies at the U.S Census Bureau, School of Global Policy & Strategy at the University of California San Diego, Department of Management at the University of Oregon, Department of Geography and Earth Sciences at the University of North Carolina at Charlotte, Directorate for Engineering and the Directorate for Social & Behavioral Sciences at the National Science Foundation, 2015 Atlanta Conference on Science and Innovation Policy, and the 2015 West Coast Research Symposium at the University of Washington. In addition, we thank John Hardin and Kenneth Roland for providing project-level SBIR State Match data for North Carolina and Kentucky, respectively.

## **Gold standard**
National Science Foundation &ensp; **FUND**  <br>
Science of Science and Innovation Policy Program &ensp; **FUND**  <br>
NSF 09-3281 &ensp; **GRNB**  <br>
Ewing Marion Kauffman Foundation Dissertation Fellowship &ensp; **FUND** <br>
University of North Carolina (UNC) Chapel Hill Graduate School Dissertation Completion Fellowship &ensp; **UNI** <br>
David Hsu &ensp; **IND**  <br>
Rosemarie Ziedonis &ensp; **IND** <br>
John Hardin &ensp; **IND** <br>
Virginia Gray &ensp; **IND** <br>
Jeremy G. Moulton &ensp; **IND** <br>
Christine Durrance &ensp; **IND** <br>
Jade V.M. Jenkins &ensp; **IND** <br>
Alexandra Graddy-Reed &ensp; **IND** <br>
Jesse Hinde &ensp; **IND** <br>
2014 Academy of Management &ensp; **MISC** <br>
2014 Technology Transfer Society Meeting &ensp; **MISC** <br>
2014 Ewing Marion Kauffman Foundation's Emerging Scholars conference &ensp; **MISC** <br>
UNC Chapel Hill Public Policy seminar &ensp; **MISC** <br>
Office of Advocacy at the Small Business Administration, Center for Economic Studies at the U.S Census Bureau &ensp; **MISC** <br>
School of Global Policy & Strategy at the University of California San Diego &ensp; **MISC** <br>
Department of Management at the University of Oregon &ensp; **MISC** <br>
Department of Geography and Earth Sciences at the University of North Carolina at Charlotte &ensp; **MISC** <br>
Directorate for Engineering and the Directorate for Social & Behavioral Sciences at the National Science Foundation &ensp; **MISC** <br>
2015 Atlanta Conference on Science and Innovation Policy &ensp; **MISC** <br>
2015 West Coast Research Symposium at the University of Washington &ensp; **MISC** <br>
John Hardin &ensp; **IND** <br>
Kenneth Roland &ensp; **IND**
