<h1 style="text-align:center; text-decoration:underline">Sebastiano's Notebook</h1>
<br>
<br>
<p style="font-size: 14.5px">This is a Jupyter Notebook to keep track of the thoughts and progresses about the <b>Open Science</b> project (DHDK - a.y. 2022-2023). The research question to be tackled is the following one:</p>
<div style="padding-left: 25px; padding-right: 43px; padding-top: 8px; display:block"><p style="text-align:justify; font-size:14px"><em>"What is the coverage of publications in Social Science and Humanities (SSH) journals (according to ERIH-PLUS) included in OpenCitations Meta? What are the disciplines that have more publications? What are countries providing the largest number of publications and journals? How many of the SSH journals are available in Open Access according to the data in DOAJ?"</em></p></div>
<hr style="border: 1pt solid #89CFF0">

<h3 style="text-decoration:underline">27/03/2023</h3>
<p style="font-size: 14.5px">First of all, let's try to subdivide our research questions into smaller units:</p>
<table style="display:block; float:left; font-size: 14px; margin-bottom: 15px">
    <tr>
        <th style="border: 1px solid #dddddd; text-align: left; padding: 8px;">Question</th>
        <th style="border: 1px solid #dddddd; text-align: left; padding: 8px;">Resources to be used</th>
        <th style="border: 1px solid #dddddd; text-align: left; padding: 8px;">Possible solution</th>
      </tr>
      <tr>
        <td style="border: 1px solid #dddddd; text-align: left; padding: 8px;">Coverage of publications in SSH journals included in OpenCitations Met</td>
        <td style="border: 1px solid #dddddd; text-align: left; padding: 8px;"><a href="https://kanalregister.hkdir.no/publiseringskanaler/erihplus/">ERIH-PLUS</a>, <a href="http://opencitations.net/meta">OpenCitations Meta</a></td>
        <td style="border: 1px solid #dddddd; text-align: left; padding: 8px;">Extract publications in SSH journals; use the dois to find the intersection of the two datasets.</td>
      </tr>
      <tr>
        <td style="border: 1px solid #dddddd; text-align: left; padding: 8px;">Disciplines and countries providing more publications/journals</td>
        <td style="border: 1px solid #dddddd; text-align: left; padding: 8px;"><a href="https://kanalregister.hkdir.no/publiseringskanaler/erihplus/">ERIH-PLUS</a></td>
        <td style="border: 1px solid #dddddd; text-align: left; padding: 8px;">Pandas DataFrame to sql; Sql query</td>
      </tr>
      <tr>
        <td style="border: 1px solid #dddddd; text-align: left; padding: 8px;">How many SSH journals are available in Open Access</td>
        <td style="border: 1px solid #dddddd; text-align: left; padding: 8px;"><a href="https://kanalregister.hkdir.no/publiseringskanaler/erihplus/">ERIH-PLUS</a>, <a href="https://doaj.org/">DOAJ</a></td>
        <td style="border: 1px solid #dddddd; text-align: left; padding: 8px;">Intersection of journals (maybe using ISSN)</td>
      </tr>
</table>
<p style="font-size:14.5px; text-align: justify">The analysis of the research question allows understanding what is the <b>purpose</b> of the project. Our goal is to investigate the <b>openness</b> of <b>SSH publications</b> and to analyse how different disciplines and countries approach Open Science. At the same time, a primary is played by <b>citations</b>, as they represent a measure of the diffusion and sharing of a publication.</p>
<div style="display:block; background-color: #D4F1F4; border-radius: 3px; border: 1pt solid #89CFF0; padding: 5px 10px; width: 90%; margin-left:5%; margin-top: 10px"><p><b>ON THIS DATE</b>: analysis of the reaserch question; possible draft of our project's abstract published in <a href="https://github.com/open-sci/2022-2023/commit/bcfc173e2d3e615ec3c32985fb68346a1eb201fd">Github</a></p></div>
<hr style="border: 1pt solid #89CFF0">

<h3 style="text-decoration:underline">28/03/2023</h3>
<p style="text-align:justify; font-size: 14.5px">Let's start working on our datasets. The first one I would like to analyze is ERIH-PLUS. This is an academic journal index: the data is available in the form of a <code>.csv</code> file at the following <a href="https://kanalregister.hkdir.no/publiseringskanaler/erihplus/periodical/listApproved">link</a>. Let's try to upload this file using Pandas.

In [1]:
import pandas as pd

erih_plus = pd.read_csv("2023-03-27 ERIH PLUS.csv", sep=";")
erih_plus.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 11072 entries, 0 to 11071
Data columns (total 9 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   Journal ID              11072 non-null  int64 
 1   Print ISSN              8747 non-null   object
 2   Online ISSN             9515 non-null   object
 3   Original Title          11072 non-null  object
 4   International Title     11072 non-null  object
 5   Country of Publication  10952 non-null  object
 6   ERIH PLUS Disciplines   11072 non-null  object
 7   OECD Classifications    11072 non-null  object
 8   [Last Updated]          11072 non-null  object
dtypes: int64(1), object(8)
memory usage: 778.6+ KB


In [2]:
erih_plus.head()

Unnamed: 0,Journal ID,Print ISSN,Online ISSN,Original Title,International Title,Country of Publication,ERIH PLUS Disciplines,OECD Classifications,[Last Updated]
0,486254,1989-3477,,@tic.revista d'innovació educativa,@tic.revista d'innovació educativa,Spain,Interdisciplinary research in the Social Scien...,Educational Sciences; Other Social Sciences,2015-06-25 13:48:26
1,488561,,2341-0515,[i2] Investigación e Innovación en Arquitectur...,[i2] Investigación e Innovación en Arquitectur...,Spain,"Art and Art History, Cultural Studies, Human G...","Arts (Arts, History of Arts, Performing Arts, ...",2016-04-18 17:34:55
2,504135,,2068-3472,[Inter]sections,[Inter]sections,Romania,"Gender Studies, Cultural Studies, Literature, ...",Languages and Literature; Other Humanities; So...,2022-10-18 08:40:28
3,495209,2250-4591,2346-9986,+E: Revista de Extensión Universitaria,+E: Revista de Extensión Universitaria,Argentina,"Interdisciplinary research in the Humanities, ...",Other Humanities; Other Social Sciences,2023-02-02 17:14:12
4,488332,,0719-5737,100-Cs,100-Cs,Chile,Interdisciplinary research in the Social Sciences,Other Social Sciences,2018-04-26 16:33:26


<p style="font-size: 14.5px">Let's try to check the content of the a cell in the <code>ERIH PLUS Disciplines</code> column</p>

In [3]:
erih_plus.at[1,"ERIH PLUS Disciplines"]

'Art and Art History, Cultural Studies, Human Geography and Urban Studies, Interdisciplinary research in the Humanities, Interdisciplinary research in the Social Sciences, Pedagogical & Educational Research, Science and Technology Studies'

<p style="font-size: 14.5; text-align: justify">As we can see, <b>multiple disciplines</b> are included in the same journal: thus, we need to check for single publications in order to get some more detailed information. To get this information we could try to exploit this <a href="https://erih.dimensions.ai/discover/publication">link</a>. Here we can find all the publications included in the ERIH-PLUS database's journals. However, the system seems to allows users to export a maximum of 500 publications per time, meaning that it's almost impossible to retrieve all the information about the 10,769,086 currently included in the system. Nevertheless, let's try to analyze one of these exported files:</p>

In [8]:
erih_plus_500_publications = pd.read_csv("ERIH-PLUS 500.csv")
erih_plus_500_publications.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 500 entries, 0 to 499
Data columns (total 33 columns):
 #   Column                            Non-Null Count  Dtype  
---  ------                            --------------  -----  
 0   Rank                              500 non-null    int64  
 1   Publication ID                    500 non-null    object 
 2   DOI                               500 non-null    object 
 3   PMID                              462 non-null    float64
 4   PMCID                             343 non-null    object 
 5   Title                             500 non-null    object 
 6   Abstract                          500 non-null    object 
 7   Acknowledgements                  309 non-null    object 
 8   Funding                           267 non-null    object 
 9   Source title                      500 non-null    object 
 10  Anthology title                   0 non-null      float64
 11  MeSH terms                        233 non-null    object 
 12  Publicat

In [9]:
erih_plus_500_publications.head()

Unnamed: 0,Rank,Publication ID,DOI,PMID,PMCID,Title,Abstract,Acknowledgements,Funding,Source title,...,Corresponding Authors,Authors Affiliations,Times cited,Recent citations,RCR,FCR,Source Linkout,ERIH PLUS by Dimensions URL,Fields of Research (ANZSRC 2020),Sustainable Development Goals
0,500,pub.1155098456,10.1162/jocn_a_01972,36735619.0,PMC10024573,Time Courses of Attended and Ignored Object Re...,Selective attention prioritizes information th...,,"Sean Noah, National Eye Institute (https://dx....",Journal of Cognitive Neuroscience,...,,"Noah, Sean (University of California, Davis; U...",0,0,,,https://psyarxiv.com/2aj3n/download,https://erih.dimensions.ai/details/publication...,52 Psychology; 5202 Biological Psychology; 520...,
1,500,pub.1154380038,10.1162/jocn_a_01959,36626349.0,,Cochlear Theta Activity Oscillates in Phase Op...,It is widely established that sensory percepti...,,DOC Fellowship Programme of the Austrian Acade...,Journal of Cognitive Neuroscience,...,,"Köhler, Moritz Herbert Albrecht (University of...",0,0,,,https://doi.org/10.1101/2022.02.21.481289,https://erih.dimensions.ai/details/publication...,32 Biomedical and Clinical Sciences; 3202 Clin...,
2,500,pub.1147283528,10.1037/pspp0000420,35446080.0,,Trait-Specificity Versus Global Positivity: A ...,"For decades, a recurring question in person pe...",,,Journal of Personality and Social Psychology,...,"Thielmann, Isabel (; University of Münster; Un...","Thielmann, Isabel (University of Münster; Univ...",2,2,,,https://psyarxiv.com/z78na/download,https://erih.dimensions.ai/details/publication...,52 Psychology; 5201 Applied and Developmental ...,
3,500,pub.1153677667,10.1037/bne0000544,36521141.0,,Pair housing does not alter incubation of crav...,Evidence suggests that single housing in rats ...,,,Behavioral Neuroscience,...,,"Nett, Kelle E (); LaLumiere, Ryan T ()",0,0,,,https://doi.org/10.1101/2022.07.28.501777,https://erih.dimensions.ai/details/publication...,32 Biomedical and Clinical Sciences; 3214 Phar...,3 Good Health and Well Being
4,500,pub.1156401252,10.1371/journal.pone.0283259,36947531.0,PMC10032514,Human scent signature on cartridge case surviv...,This paper focuses on a chemical analysis of h...,,,PLOS ONE,...,,"Ladislavová, Nikola (University of Chemistry a...",0,0,,,https://journals.plos.org/plosone/article/file...,https://erih.dimensions.ai/details/publication...,34 Chemical Sciences; 3401 Analytical Chemistry,


<p style="font-size:14.5px">As we can see, the third colum is named <code>DOI</code>, so this DataFrame contains some specifical information we might be interested in.</p>

In [11]:
erih_plus_500_publications.at[0, "Fields of Research (ANZSRC 2020)"]

'52 Psychology; 5202 Biological Psychology; 5204 Cognitive and Computational Psychology'

<div style="display:block; background-color: #D4F1F4; border-radius: 3px; border: 1pt solid #89CFF0; padding: 5px 10px; width: 90%; margin-left:5%; margin-top: 10px"><p><b>ON THIS DATE</b>: analysis of the ERIH-PLUS dump file and search by DOI; abstract updated in <a href="https://github.com/open-sci/2022-2023/commit/f08398b6ad90fef436f10be3c6940bfc55d86fa8">Github</a></p></div>
<hr style="border: 1pt solid #89CFF0">