Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Research infrastructure recognition #1085

Merged
merged 15 commits into from Feb 11, 2024
Merged

Research infrastructure recognition #1085

merged 15 commits into from Feb 11, 2024

Conversation

kermitt2
Copy link
Owner

@kermitt2 kermitt2 commented Feb 11, 2024

This PR adds an explicit recognition of the acknowledged research infrastructure to the funding-acknowledgement model, with specific features and gazetteer resources.

Training data for the funding-acknowledgement model has been extended with an infrastructure class as a refinement of "institution".

The extracted research infrastructured are then given in an additional block <listOrg type="infrastructure"> (similar to the funding) at the back (/TEI/text/back/listOrg[@type="infrastructure"]):

           <listOrg type="infrastructure">
                <org type="infrastructure">
                    <orgName type="extracted">CINES</orgName>
                    <orgName type="full" lang="en">National Computer Center for Higher Education</orgName>
                    <orgName type="full" lang="fr">Centre informatique national de l'enseignement supérieur</orgName>
                </org>
                <org type="infrastructure">
                    <orgName type="extracted">GENCI</orgName>
                    <orgName type="full" lang="fr">Grand Équipement National de Calcul Intensif</orgName>
                </org>
            </listOrg>

And the refined mark-up are also visible in the acknowledgement and funding sections:

          <div type="acknowledgement">
                <div>
                    <head>Acknowledgment</head>
                    <p>This work was partially supported by the <rs type="funder">EIPHI Graduate School</rs> 
(contract "<rs type="grantNumber">ANR-17-EURE-0002</rs>"). This work was granted access to the AI resources 
of <rs type="institution" subtype="infrastructure">CINES</rs> under the allocation 
<rs type="grantNumber">AD010613582</rs> made by <rs type="institution" subtype="infrastructure">GENCI</rs> 
and also from the <rs type="institution">Mesocentre of Franche-Comté</rs>.
                    </p>
                </div>
            </div>

Two issues:

  • not a lot of training data currently relatively to research infrastructure
  • a very loose definition of what is a research infrastructure exactly, which makes the human annotation uneasy. Research infrastructures overlap usual research institutions, funded projects and even funders (as research infrastructures are often funding competitive research experiments with application numbers, etc.). Should research infrastructures be limited to "federated" infrastructures used by various organizations or should it include smaller research platform inside an institution offering research services to several departments of ths unique organization?

@coveralls
Copy link

coveralls commented Feb 11, 2024

Coverage Status

coverage: 39.954% (+0.07%) from 39.886%
when pulling 8751dcb on research-infrastructures
into cab0947 on master.

@kermitt2 kermitt2 merged commit 4daa2ce into master Feb 11, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants