In [1]:
# Import usual Python and data handling stuff
import numpy as np
import pandas as pd

In [2]:
# Not yet used - No graph :-(
#import matplotlib.pyplot as plt
#%matplotlib inline

# Analysis of the participations in EU Framework Programmes

In the analysis, we cover the participation in the EU Framework Programmes for Research and Innovation between 2007 and 2017. This period is covered by FP7 (2007-2013) and Horizon 2020 (since 2014).

## FP7 datasets: 2007-2013

### Set up general coding information from Cordis

Cordis has several dictionaries for coded information. The following datasets are created on this basis and will be used to "decode" the information provided in the projects and organisations datasets for FP7.

In [3]:
# Symbols and names of countries in several langues
country_file = "FP7/cordisref-countries.xls"
countries = pd.read_excel(
    country_file,
    sheet_name="cordisref-countries",
    header=0,
)

In [4]:
# Symbols and names of FP7 sub-programmes in several langues
fp7_programmes_file = "FP7/cordisref-FP7programmes.xls"
fp7_programmes = pd.read_excel(
    fp7_programmes_file,
    sheet_name="Hoja1",
    header=0,
)

In [5]:
# Full names of funding schemes
fp7_schemes_file = "FP7/cordisref-projectFundingSchemeCategory.xls"
fp7_schemes = pd.read_excel(
    fp7_schemes_file,
    sheet_name="cordisref-projectFundingSchemeC",
    header=0,
)

### FP7 projects and organisations
The data for projects and participations in FP7 are available on the Open Data Portal of the European Union:

https://data.europa.eu/euodp/en/data/dataset/cordisfp7projects

Two datasets will be created:
* `fp7_proj`: with the descriptors of the FP7 funded projects
* `fp7_part`: with the descriptions of the participating organisations

These two datasets will be merged into one sigle `fp7` dataset where the project informations will be repeated to each participations.

In [6]:
%%time

fp7_proj_file = "FP7/cordis-fp7projects.xlsx"
fp7_part_file = "FP7/cordis-fp7organizations.xlsx"

fp7_proj = pd.read_excel(
    fp7_proj_file, sheet_name="cordis-fp7projects",
    header=0,
)
fp7_part = pd.read_excel(
    fp7_part_file,
    sheet_name="cordis-h2020organizations", # Which is a mistake in the Cordis dataset!
    header=0,
)

Wall time: 30.2 s


In [7]:
# We  rename some columns which are in both datasets:
# In projects:
fp7_proj = fp7_proj.rename(
    index=str,
    columns={
        "id": "projectID",
        "rcn": "projectRCN",
        "acronym": "projectAcronym"
    }
)
# In participations:
fp7_part = fp7_part.rename(
    index=str,
    columns={
        "id": "organizationID",
        "projectRcn": "projectRCN",
        "name": "organizationName",
    }
)

Some project acronyms are not consistent in their writing (lower case), so we force upper case writing:

In [8]:
fp7_proj["projectAcronym"] = fp7_proj["projectAcronym"].str.upper()
fp7_part["projectAcronym"] = fp7_part["projectAcronym"].str.upper()

So, let's merge these two datasets and create the `fp7` dataset.

In [9]:
fp7 = fp7_proj.merge(
    fp7_part,
    on=["projectRCN","projectID","projectAcronym"]
)

In [14]:
# Display shape of Projects, Participants and the merged DataFrame fp7
# The number of rows in the merged DataFrame FP7 must be the same as the
# number of rows in participants

fp7_proj.shape, fp7_part.shape, fp7.shape

((25778, 21), (146021, 23), (146021, 41))

### International entities located in Switzerland
A number of international organisations are located in Switzerland but are not retained when counting Swiss participation because they might not perform their R&D activities in Switzerland (see SERI report "Swiss Participation in European Research Framework Programme" Fact and figures 2015).

We store these organisations in an excel file and load it in a DataFrame

In [28]:
# DataFrame containing international organisation in Switzerland excluded from the statistics

fp7_int_org_file = "fp7_DiscartededInternationalOrgInSwitzerland.xlsx"

international_orgs_in_CH = pd.read_excel(
    fp7_int_org_file, 
    header=0,
)

In [30]:
#Let's view the internatioanl organisations that have been loaded
international_orgs_in_CH

Unnamed: 0,id,name,shortName
0,950706410,EUROPEAN SOCIETY OF INTENSIVE CARE MEDECINE,ESICM
1,950877033,INTERNATIONAL CENTRE FOR TRADE AND SUSTAINABLE...,ICTSD
2,953076217,UNITED NATIONS INTERNATIONAL STRATEGY FOR DISA...,UNISDR
3,973978165,COUNCIL ON HEALTH RESEARCH FOR DEVELOPMENT ASS...,COHRED
4,993279904,"Quaker United Nations Office, Geneva",QUNO
5,994946073,European Society for Medical Oncology - ESMO,ESMO
6,997485048,WORLD HEART FEDERATION,WHF
7,997721825,UNITED NATIONS INSTITUTE FOR TRAINING AND RESE...,UNITAR
8,997786718,EUROPEAN MOLECULAR BIOLOGY ORGANIZATION,EMBO
9,998890675,INTERNATIONAL ORGANIZATION FOR MIGRATION,IOM


In [31]:
# List of international organisations located in Switzerland (short names) and 
# their number of participations in projects


print ("International organisations in Switzerland: " + \
       ", ".join(international_orgs_in_CH['shortName'])
      )

international = {}
for org in international_orgs_in_CH['shortName']:
    international[org] = fp7.loc[fp7['shortName'] == org]
    print ("FP7: Participations for " + org + ": " + \
           str(international[org].shape[0]))

International organisations in Switzerland: ESICM, ICTSD, UNISDR, COHRED, QUNO, ESMO, WHF, UNITAR, EMBO, IOM, FT, WMO, EBU, IUCN, CERN, EMBL, ITU
FP7: Participations for ESICM: 1
FP7: Participations for ICTSD: 1
FP7: Participations for UNISDR: 1
FP7: Participations for COHRED: 4
FP7: Participations for QUNO: 1
FP7: Participations for ESMO: 1
FP7: Participations for WHF: 1
FP7: Participations for UNITAR: 3
FP7: Participations for EMBO: 4
FP7: Participations for IOM: 60
FP7: Participations for FT: 44
FP7: Participations for WMO: 8
FP7: Participations for EBU: 10
FP7: Participations for IUCN: 9
FP7: Participations for CERN: 115
FP7: Participations for EMBL: 194
FP7: Participations for ITU: 59


#### Exclude administrative contacts from ERC grants records
contactType can be: leadPrincipalInvestigator, principalInvestigator, relatedContact or NaN.

The ERC Starting Grants (fundingScheme `ERC-SG`), Advanced Grants (fundingScheme `ERC-AG`) and Consolidator Grant (`ERC-SG`) have two (or more) rows per grant: one for the beneficiary (contactType = principalInvestigator or leadPrincipalInvestigator) and one or more for an administrative contact (contactType = relatedContact).
To avoid counting these project types multiple times, the administrative contact(s) should be filtered out.

For Synergy Grant (ERC-SyG) relatedContact sould also be filtered out.

For the Proof of Concept (`CSA-SA(POC)`) there is _only one_ `contactType relatedContact`. It should **not** be filtered out.

In [32]:
ERC_relatedContact = fp7.loc[(fp7['country'] == 'CH') & 
                             (fp7['contactType'] == 'relatedContact') & 
                             (fp7['fundingScheme'].isin(['ERC-SG','ERC-AG','ERC-CG','CSA-SA(POC)','ERC-SyG']))]

In [33]:
ERC_relatedContact

Unnamed: 0,projectRCN,projectID,projectAcronym,status,programme,topics,frameworkProgramme,title,startDate,endDate,...,postCode,organizationUrl,contactType,contactTitle,contactFirstNames,contactLastNames,contactFunction,contactTelephoneNumber,contactFaxNumber,contactEmail
1537,100403,282280,ERMITO,ONG,FP7-IDEAS-ERC,ERC-SG-LS3,FP7,Molecular Anatomy and Pathophysiology of the e...,2012-01-01,2016-12-31,...,1211,www.unige.ch,relatedContact,Dr.,Alex,Waehry,,+41 22 379 75 60,+41 22 379 11 80,http://cordis.europa.eu/mailanon/form_en?addre...
15460,92797,240518,MAQD,ONG,FP7-IDEAS-ERC,ERC-SG-PE1,FP7,Mathematical Aspects of Quantum Dynamics,2009-12-01,2014-11-30,...,8006,http://www.unizh.ch,relatedContact,Prof.,Benjamin,Schlein,,+41 44 635 58 51,,http://cordis.europa.eu/mailanon/form_en?addre...
16548,102277,283617,GENOCIDE,ONG,FP7-IDEAS-ERC,ERC-SG-SH2,FP7,Corpses of Genocide and Mass Violence: Interdi...,2012-02-01,2016-01-31,...,1211,www.unige.ch,relatedContact,Dr.,Alex,Waehry,,+41 22 3797560,+41 22 3791180,http://cordis.europa.eu/mailanon/form_en?addre...
20322,111265,339941,ADAPT,ONG,FP7-IDEAS-ERC,ERC-AG-SH6,FP7,Life in a cold climate: the adaptation of cere...,2014-02-01,2019-01-31,...,8903,www.wsl.ch,relatedContact,Mrs.,Esther,Moor,,+41 44 739 2201,,http://cordis.europa.eu/mailanon/form_en?addre...
26477,185668,617588,AUGURY,ONG,FP7-IDEAS-ERC,ERC-CG-2013-PE10,FP7,Reconstructing Earth’s mantle convection,2014-03-01,2020-02-29,...,8092,https://www.ethz.ch/de.html,relatedContact,Prof.,Paul,Tackley,,+41 44 633 27 58,,http://cordis.europa.eu/mailanon/form_en?addre...
28838,185569,614964,CELLFITNESS,ONG,FP7-IDEAS-ERC,ERC-CG-2013-LS3,FP7,Active Mechanisms of Cell Selection: From Cell...,2014-06-01,2019-05-31,...,3012,http://www.unibe.ch,relatedContact,Ms.,Maddalena,Tognola,,+41 31 6314809,+41 31 6315106,http://cordis.europa.eu/mailanon/form_en?addre...
34407,110853,337703,ZEBRAHEART,ONG,FP7-IDEAS-ERC,ERC-SG-LS4,FP7,Novel insights into cardiac regeneration throu...,2014-02-01,2019-01-31,...,3012,http://www.unibe.ch,relatedContact,Ms.,Maddalena,Tognola,,+41 31 6314809,,http://cordis.europa.eu/mailanon/form_en?addre...
34473,107187,322576,REFINE,ONG,FP7-IDEAS-ERC,ERC-AG-LS8,FP7,"Phenotypic plasticity, animal welfare, and the...",2013-05-01,2018-04-30,...,3012,http://www.unibe.ch,relatedContact,Ms.,Maddalena,Tognola,,+41 31 6314809,+41 31 6315106,http://cordis.europa.eu/mailanon/form_en?addre...
34912,100955,281904,REVERSIBLEINFECTION,ONG,FP7-IDEAS-ERC,ERC-SG-LS6,FP7,Pathogen and commensal immunity compared in a ...,2011-11-01,2016-10-31,...,3012,http://www.unibe.ch,relatedContact,Ms.,Maddalena,Tognola,,+41 31 6314809,+41 31 631 5106,http://cordis.europa.eu/mailanon/form_en?addre...
35189,96329,260358,EPIGENOME,ONG,FP7-IDEAS-ERC,ERC-SG-LS1,FP7,Understanding epigenetic mechanisms of complex...,2010-11-01,2015-10-31,...,3012,http://www.unibe.ch,relatedContact,Ms.,Maddalena,Tognola,,+41 316314809,+41 31 6315106,http://cordis.europa.eu/mailanon/form_en?addre...


### Swiss participations in FP7

We select all participations where the country of the organisation's address is set to Switzerland, but where the organisation in not in the list of the international organisations listed above.

_Comment on the code:_

`~fp7['org_shortName'].isin(international_orgs_in_CH` identifies all rows in fp7 whose org_shortName is __not__ in the list of international organisations in Switzerland.

`~((fp7['contactType'] == 'relatedContact') etc...` filters out related contacts from ERC grants 

In [34]:
ch_all = fp7.loc[(fp7['country'] == 'CH')]

# Correction by removing the international organisations in Switzerland
ch = fp7.loc[
    (fp7['country'] == 'CH') & (
        # We exclude here all international organisations located in
        # Switzerland, see list above.
        # "~" with ".isin" implements a de facto "is not in"
        ~fp7['shortName'].isin(international_orgs_in_CH['shortName'])
    ) 
    & (
        #Exclude all related contacts from ERC-CG, ERC-SG and ERC-AG grants
        #Note 1: for ERC-SyG we might count twice the same participants...
        #Note 2: CSA-SA(POC) have only one contactsType of type relatedContact, therefore we don't
        #filter out relatedContact for this funding scheme
        ~((fp7['contactType'] == 'relatedContact') & (
            (fp7['fundingScheme'].isin(['ERC-SG','ERC-AG','ERC-CG','ERC-SyG']))))
    
    )
    
]

# Coordinations
# org_role == "hostInstitution" is taken as coordinator for ERC
# beneficiaries
fp7_coord = (
    (fp7['role'] == "coordinator") | (fp7['role'] == "beneficiary")
).sum()
ch_coord  = (
    (ch['role'] == "coordinator") |
    (ch['role'] == "beneficiary") |
    (ch['role'] == "hostInstitution")
).sum()


# General participation
print ("General statistics (vs. SERI publication):")
print ("Total participations in FP7: {:7d} (SERI: 133'615)".format(fp7.shape[0]))
print ("Total coordinations in FP7:  {:7d} (SERI: 25'237)".format(fp7_coord))

# Participations
print ("\nSwiss participations in FP7:")
print ("Number of Swiss participations: {:5d} (SERI: 4'269)".format(ch.shape[0]))
print ("Proportion of Swiss participations: {:4.2f}% (SERI: 3.2%)".format(
    ch.shape[0]/fp7.shape[0]*100)
)

# Coordinations
print ("\nSwiss coordinations:")
print ("Number of Swiss coordinations: {:5d} (SERI 972)".format(ch_coord))
print ("Proportion of Swiss coordinations: {:4.2f}% (SERI 3.9%)".format(
    ch_coord/fp7_coord*100)
)

# Financial contributions
print ("\nFinancial contributions:")
print ("Amount received: M€{a:10.3f}".format(
    a=ch["ecContribution"].sum()/1000000)
)
print ("Proportion of funding to Switzerland: {:4.2f}% (SERI: 4.2%)".format(
    ch["ecContribution"].sum()/fp7["ecContribution"].sum()*100)
)


General statistics (vs. SERI publication):
Total participations in FP7:  146021 (SERI: 133'615)
Total coordinations in FP7:    22816 (SERI: 25'237)

Swiss participations in FP7:
Number of Swiss participations:  4361 (SERI: 4'269)
Proportion of Swiss participations: 2.99% (SERI: 3.2%)

Swiss coordinations:
Number of Swiss coordinations:  1013 (SERI 972)
Proportion of Swiss coordinations: 4.44% (SERI 3.9%)

Financial contributions:
Amount received: M€  1850.096
Proportion of funding to Switzerland: 3.54% (SERI: 4.2%)


#### Check Differences year by year
to ceck our analysis we compare our values with the number of proposal per year pulished by the SERI (table 3).
We expect disreapancies in the last years (2014 or 2015) since the data of SERI might not ave been complete at that time (6 Oct 2014)

In [35]:
ch_year = pd.DataFrame(ch['startDate'].dt.year.value_counts(sort=False))

#Data from SERI report, table 3
ch_year['SERI'] = pd.Series([10,606,560,689,654,683,745,312,10], index=[2007,2008,2009,2010,2011,2012,2013,2014,2015])

#Difference between the SERI report and our values .
ch_year['Delta'] = ch_year['startDate']-ch_year['SERI']

In [36]:
ch_year

Unnamed: 0,startDate,SERI,Delta
2007.0,10,10,0
2008.0,609,606,3
2009.0,571,560,11
2010.0,693,689,4
2011.0,657,654,3
2012.0,700,683,17
2013.0,767,745,22
2014.0,306,312,-6
2015.0,42,10,32


In [37]:
#Optional
#Saving file in XLSX format for manual inspection on data

writer = pd.ExcelWriter('ch_fp7.xlsx')
#ch[ch['startDate'].dt.year == 2008].to_excel(writer)
ch.to_excel(writer)
writer.save()
writer.close()

#### Check Differences by programmes
we compare our values with the number of proposal per programme pulished by the SERI (table 8).
We expect disreapancies in total counts since the data from SERI are not complete at that time of the publication (Corda data from 6 Oct 2014).

In [38]:
ch_programme = pd.DataFrame(ch['programme'].value_counts(sort=False))
#Data from SERI report, table 8
ch_programme['SERI'] = pd.Series([429,212,852,422,141,176,154,43,54,79,2,189,364,758,126,155,9,0,28,1,14,3,58], 
                                 index=['FP7-HEALTH','FP7-KBBE','FP7-ICT','FP7-NMP','FP7-ENERGY','FP7-ENVIRONMENT',
                                        'FP7-TRANSPORT','FP7-SSH','FP7-SPACE','FP7-SECURITY','FP7-GA','FP7-JTI',
                                        'FP7-IDEAS-ERC','FP7-PEOPLE','FP7-INFRASTRUCTURES','FP7-SME','FP7-REGIONS',
                                        'FP7-REGPOT','FP7-SIS','FP7-COH','FP7-INCO','FP7-EURATOM-FUSION','FP7-EURATOM-FISSION'])

#Difference between the SERI report and our values .
ch_programme['Delta'] = ch_programme['programme']-ch_programme['SERI']

ch_programme

Unnamed: 0,programme,SERI,Delta
FP7-ICT,875,852,23
FP7-SECURITY,83,79,4
FP7-GA,2,2,0
FP7-SPACE,54,54,0
FP7-HEALTH,433,429,4
FP7-INFRASTRUCTURES,130,126,4
FP7-SIS,27,28,-1
FP7-EURATOM-FISSION,60,58,2
FP7-KBBE,213,212,1
FP7-TRANSPORT,156,154,2


#### Check Differences for the ERC programmes
we compare our values with the number of proposal per type of the ERC programme pulished by the SERI (derived from table 10).
We expect disreapancies in total counts since the data from SERI are not complete at that time of the publication (Corda data from 6 Oct 2014).

Note that some PI migt have change host institution in the meantime, explaining differences wrt SERI countings. Ex. for ERC Consolodator Grant, projects ONTOTRANSEVOL and NORMCOMMIT have their PI moving abroad.
These projects are marked by endOfParticipation = true

ERC Starting Grants can have more than one beneficiary. For ex. the ERC-SG prject GENOCIDE has the host institution in France but Uni Geneve is beneficiary. We count only the contactType=principalInvestigator in Switzerland.

There might be some confusion between participation and number of projects for ERC-SyG: should we count the leadPrincipalInvestigator and  principalInvestigator?

In [39]:
ch_erc = pd.DataFrame(ch[ch['fundingScheme'].isin(['ERC-SG','ERC-AG','ERC-CG','CSA-SA(POC)','ERC-SyG'])]['fundingScheme'].value_counts(sort=False))

#Data from SERI report, table 3
ch_erc['SERI'] = pd.Series([132,24,155,4,14], index=['ERC-SG','ERC-CG','ERC-AG','ERC-SyG','CSA-SA(POC)'])

#Difference between the SERI report and our values .
ch_erc['Delta'] = ch_erc['fundingScheme']-ch_erc['SERI']

ch_erc

Unnamed: 0,fundingScheme,SERI,Delta
ERC-SyG,2,4,-2
CSA-SA(POC),14,14,0
ERC-SG,149,132,17
ERC-CG,21,24,-3
ERC-AG,145,155,-10


In [11]:
# We won't use these dataframes anymore
fp7_part = 0
fp7_proj = 0

## Horizon 2020 datasets: 2014-2020

### Set up general coding information from Cordis

Cordis has several dictionaries for coded information. The following datasets are created on this basis and will be used to "decode" the information provided in the projects and organisations datasets for FP7 and Horizon 2020.

In [40]:
# Symbols and names of Horizon 2020 sub-programmes in several languages
h2020_programmes_file = "Horizon 2020/cordisref-H2020programmes.xls"
h2020_programmes = pd.read_excel(
    h2020_programmes_file,
    sheet_name="Hoja1",
    header=0,
)

In [41]:
# Symbols and names of Horizon 2020 topics in several languages
h2020_topics_file = "Horizon 2020/cordisref-H2020topics.xlsx"
h2020_topics = pd.read_excel(
    h2020_topics_file,
    sheet_name="cordisref-H2020topics",
    header=0,
)

In [42]:
# Symbols and names of Horizon 2020 research topics
h2020_sic_file = "Horizon 2020/cordisref-sicCode.xls"
h2020_sic = pd.read_excel(
    h2020_sic_file,
    sheet_name="cordisref-sicCode",
    header=0,
)

### Horizon 2020 projects and organisations
The data for projects and participations in Horizon 2020 are available on the Open Data Portal of the European Union:

https://data.europa.eu/euodp/en/data/dataset/cordisH2020projects

The cut-off date is: 2017-10-12

Two datasets will be created:
* `h2020_proj`: with the descriptors of the Horizon 2020 funded projects
* `h2020_part`: with the descriptions of the participating organisations

These two datasets will be merged into one sigle `h2020` dataset where the project informations will be repeated to each participations.


In [43]:
%%time

h2020_proj_file = "Horizon 2020/cordis-h2020projects.csv"
h2020_part_file = "Horizon 2020/cordis-h2020organizations.xlsx"

h2020_proj = pd.read_csv(
    h2020_proj_file,
    sep=";",
    header=0,
)
h2020_part = pd.read_excel(
    h2020_part_file,
    sheet_name="organisation",
    header=0,
)

Wall time: 8.8 s


In [44]:
# We  rename some columns which are in both datasets:
# In projects:
h2020_proj = h2020_proj.rename(
    index=str,
    columns={
        "id": "projectID",
        "rcn": "projectRCN",
        "acronym": "projectAcronym"
    }
)
# In participations:
h2020_part = h2020_part.rename(
    index=str,
    columns={
        "id": "organizationID",
        "projectRcn": "projectRCN",
        "name": "organizationName",
    }
)

Two projects with projectRCN have non-consistent project acronyms, so let's force them to the same writing:
* projectRCN = 208306
* projectRCN = 194607

In [45]:
h2020_part.loc[h2020_part.projectRCN.eq(208306)] =\
    h2020_proj.loc[h2020_proj.projectRCN.eq(208306)].projectAcronym[0]
h2020_part.loc[h2020_part.projectRCN.eq(194607)] =\
    h2020_proj.loc[h2020_proj.projectRCN.eq(194607)].projectAcronym[0]

So, let's merge these two datasets and create the `h2020` dataset.

In [46]:
h2020 = h2020_proj.merge(
    h2020_part,
    on=["projectRCN", "projectID", "projectAcronym"]
)

In [None]:
#TO DO: probably same issues as in FP7 projects - Acronym upper/lower case do not match
# therefore the merge DataFrame h2020 has a row mismatch compared to h2020_part

In [47]:
h2020_proj.shape, h2020_part.shape, h2020.shape

((14837, 21), (71312, 23), (62495, 41))

We can see from the fusion that not every project record number has found its corresponding record number in the other dataset. We find indeed 8806 project record numbers (projectRCN) in the `h2020_part` dataset that are _not_ matched in the `h2020_proj` dataset, which is exactly the difference between the size of `h2020_part` and `h2020`.

In [55]:
rcn_in_proj = h2020_proj["projectRCN"].unique()
missing_rcn = h2020_part.loc[~h2020_part["projectRCN"].isin(rcn_in_proj),"projectRCN"]

ValueError: Buffer dtype mismatch, expected 'Python object' but got 'long long'

In [56]:
len(missing_rcn) + h2020.shape[0] == h2020_part.shape[0]

NameError: name 'missing_rcn' is not defined

In [57]:
# We won't use these dataframes anymore
h2020_part = 0
h2020_proj = 0

### Merge all project and participations information
We list below all the column headers of the datasets.

In [58]:
h2020.columns == fp7.columns

array([ True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True], dtype=bool)

We see that both datasets, `fp7` and `h2020`have the same set of columns. So let's merge these two datasets, `h2020` following `fp7`:

In [59]:
p = fp7.append(h2020)

# We won't use these dataframes anymore
fp7 = 0
h2020 = 0

In [60]:
p = p.set_index(["frameworkProgramme"])

In [61]:
print ("Column headers:\n- " + "\n- ".join(p.columns))

Column headers:
- projectRCN
- projectID
- projectAcronym
- status
- programme
- topics
- title
- startDate
- endDate
- projectUrl
- objective
- totalCost
- ecMaxContribution
- call
- fundingScheme
- coordinator
- coordinatorCountry
- participants
- participantCountries
- subjects
- role
- organizationID
- organizationName
- shortName
- activityType
- endOfParticipation
- ecContribution
- country
- street
- city
- postCode
- organizationUrl
- contactType
- contactTitle
- contactFirstNames
- contactLastNames
- contactFunction
- contactTelephoneNumber
- contactFaxNumber
- contactEmail


## International organisations in Switzerland

Extract the data for Switzerland, but without the following international organisations that are counted for Switzerland by the European Commission. We consider in this statistics:

- Organisation météorologique mondiale: WMO
- United Nations International Strategy for Disaster Reduction: UNISDR
- International Organization for Migration: IOM
- European Organization for Nuclear Research: CERN
- World Health Organization: FT
- Union International pour la Conservation de la Nature et de ses Ressources: IUCN
- International Centre for Trade and Sustainable Development: ICTSD
- European Society of Intensive Care Medecine: ESICM
- United Nations Institute for Training and Research: UNITAR
- Worl Heart Federation: WHF
- Quaker United Nations Office, Geneva: 
- Council on Health Research for Development Association: COHRED
- Union Européenne de Radio Télévision: EBU
- European Molecular Biology Organization: EMBO, which is sometimes located by the European Commission in Switzerland (Geneva)
- European Society for Medical Oncology: ESMO
- International Telecommunication Union: ITU


In [62]:
fp7 = p.loc["FP7"]
h2020 = p.loc["H2020"]

In [63]:
# List of international organisations located in Switzerland (short names)
international_orgs_in_CH = [
    "WMO","UNISDR","IOM", "ITU", # in 'PUB'
    "CERN", "FT", "IUCN", "ICTSD", "ESICM", "UNITAR", # in 'REC'
    "WHF", "QUNO", "COHRED", "EBU", "EMBO", "ESMO", # in 'OTH'
]
international_orgs_in_CH.sort()

print ("Participations of international organisations in Switzerland:\n")

orgs_ch = pd.DataFrame (index = international_orgs_in_CH)

for org in orgs_ch.index:
    orgs_ch.loc[org,"Part in FP7"] = fp7.loc[fp7['shortName'] == org].shape[0]
    orgs_ch.loc[org,"FP7 Funding M€"] = fp7.loc[
        fp7['shortName'] == org,
        "ecContribution"
    ].sum()/1000000
    orgs_ch.loc[org,"Part in H2020"] = h2020.loc[
        h2020['shortName'] == org
    ].shape[0]
    orgs_ch.loc[org,"H2020 Funding M€"] = h2020.loc[
        h2020['shortName'] == org,
        "ecContribution"
    ].sum()/1000000
orgs_ch

Participations of international organisations in Switzerland:



Unnamed: 0,Part in FP7,FP7 Funding M€,Part in H2020,H2020 Funding M€
CERN,115.0,126.864954,52.0,37.340125
COHRED,4.0,1.170203,1.0,
EBU,10.0,2.55262,2.0,0.414555
EMBO,4.0,24.473964,0.0,0.0
ESICM,1.0,0.232286,0.0,0.0
ESMO,1.0,0.05778,0.0,0.0
FT,44.0,11.902131,10.0,4.733969
ICTSD,1.0,0.1418,0.0,0.0
IOM,60.0,15.432281,5.0,1.711823
ITU,59.0,12.472875,21.0,8.208944


In [64]:
all_ch = p.loc[(p['country'] == 'CH')]

# Correction by removing the international organisations in Switzerland
ch = p.loc[
    (p['country'] == 'CH') & (
        # We exclude here all international organisations located in
        # Switzerland, see list above.
        # "~" with ".isin" implements a de facto "is not in"
        ~p['shortName'].isin(international_orgs_in_CH)
    )
]

In [66]:
ch_part = pd.DataFrame(
    index = [
        "All",
        "International organisations in CH",
        "Without international organisations"
    ],
    columns = ["FP7","H2020"]
)
for prog in ["FP7","H2020"]:
    ch_part.loc["All",prog] = all_ch.loc[prog].shape[0]
    ch_part.loc["International organisations in CH",prog] = \
        all_ch["shortName"].isin(international_orgs_in_CH).sum()
    ch_part.loc["Without international organisations",prog] = ch.loc[prog].shape[0]    

print("Swiss participations in the framework programmes:\n")
ch_part

Swiss participations in the framework programmes:



Unnamed: 0,FP7,H2020
All,4921,1566
International organisations in CH,258,258
Without international organisations,4726,1503


## Analysis

This parts is the general analysis of the participations dataset extracted from the Cordis project files (`fp7` and `h2020` datasets).

### Swiss participations in FP7

#### Reference
We compare our analysis to the official figures provided by the Swiss Confederation (State Secretariat for Education, Research and Innovation):

Source: [Swiss Participation in European Research Framework Programmes](https://www.sbfi.admin.ch/dam/sbfi/en/dokumente/2016/01/beteiligung_der_schweizandeneuropaeischenforschungsrahmenprogram.pdf.download.pdf/swiss_participationineuropeanresearchframeworkprogrammes.pdf)

#### Swiss participations in FP7

We select all participations where the country of the organisation's address is set to Switzerland, but where the organisation in not in the list of the international organisations listed above.

In [67]:
ch_all = fp7.loc[(fp7['country'] == 'CH')]

**Comment on the code:**
`~fp7['shortName'].isin(international_orgs_in_CH` identifies all rows in fp7 whose org_shortName is not in the list of international organisations in Switzerland.

In [32]:
# Correction by removing the international organisations in Switzerland
ch = fp7.loc[
    (fp7['country'] == 'CH') & (
        # We exclude here all international organisations located in
        # Switzerland, see list above.
        # "~" with ".isin" implements a de facto "is not in"
        ~fp7['shortName'].isin(international_orgs_in_CH)
    )
]

# Coordinations
# org_role == "hostInstitution" is taken as coordinator for ERC
# beneficiaries
fp7_coord = (
    (fp7['role'] == "coordinator") | (fp7['role'] == "beneficiary")
).sum()
ch_coord  = (
    (ch['role'] == "coordinator") |
    (ch['role'] == "beneficiary") |
    (ch['role'] == "hostInstitution")
).sum()

In [33]:
# General participation
print ("General statistics:")
print ("Total participations in FP7: {:7d}".format(fp7.shape[0]))
print ("Total coordinations in FP7:  {:7d}".format(fp7_coord))
print (
    "Total financial contributions in FP7: M€{:10.3f}".format(
        fp7["ecContribution"].sum()/1000000
    )
)

# Participations
print ("\nSwiss participations in FP7:")
print ("Number of Swiss participations: {:5d}".format(ch.shape[0]))
print ("Proportion of Swiss participations: {:4.2f}%".format(
    ch.shape[0]/fp7.shape[0]*100)
)

# Coordinations
print ("\nSwiss coordinations:")
print ("Number of Swiss coordinations: {:5d}".format(ch_coord))
print ("Proportion of Swiss coordinations: {:4.2f}%".format(
    ch_coord/fp7_coord*100)
)

# Financial contributions
print ("\nFinancial contributions to Switzerland:")
print ("Amount received: M€{a:10.3f}".format(
    a=ch["ecContribution"].sum()/1000000)
)
print ("Proportion of funding to Switzerland: {:4.2f}%".format(
    ch["ecContribution"].sum()/fp7["ecContribution"].sum()*100)
)

General statistics:
Total participations in FP7:  146021
Total coordinations in FP7:    22816
Total financial contributions in FP7: M€ 52308.999

Swiss participations in FP7:
Number of Swiss participations:  4726
Proportion of Swiss participations: 3.24%

Swiss coordinations:
Number of Swiss coordinations:  1378
Proportion of Swiss coordinations: 6.04%

Financial contributions to Switzerland:
Amount received: M€  2436.137
Proportion of funding to Switzerland: 4.66%
