# 2b: Organisation Categories

The IAC is an interesting event, not least because it brings together different actors from the space community: space agencies, governments, companies, universities, NGOs...

I'd like to leverage this fact, but the original data did not provide whether a particular organisation is a university or a space agency. We'll take care of that here.

In [None]:
import re
import json
import pickle

## 1. Inputs Unpickling

In [5]:
directory = "/content/drive/MyDrive/Colab Notebooks/ESPI_Codes/IAC_Analysis/2.Pre-Processing/"
with open(directory+"2a.cleaned_organisation_names.pickle", "rb") as f:
  raw1 = pickle.load(f)

## 2. Creating Categories

When an organisation is a university, it'll often use that word in its name. Likewise, companies often include their legal form in their name ("AG" for joint stock company). The code below constructs a list of organisation type identifiers. If an organisation has "universi" or "institut" in it's name, it's matched as a university. Then comes the manual part: for every organisation that has submitted at least five abstracts in any year of the period and it has not been matched automatically, I manually added it to the list.

In [6]:
categories = {"space agency": ["space agency", "Space Agency", "DLR", "ESA", "ASI", "JAXA", "CNES", "NASA", "ISRO", "EUSPA", "ROSCOSMOS", 
                               "China Academy of Space Technology", "ESTEC", "Jet Propulsion Laboratory", "KARI", "Agence Spatiale Algrienne", "Agencia Boliviana Espacial"],
              
              "university": ["universi", "institut", "Universi", "Institut", "Politec", "politec", "Polytec", "Polytec", 
                             "College", "college", "Universty", "Univeristy","Ecole", "Universidad", "School", "Univerisity", "Uiniversity", "Hochschule",
                             "Universidad", "Facultad", "UNIVERSIDAD", "Institute of Technology", "Scuola", "Faculty", "Institue of Technology",
                             "TU Berlin", "TU Graz", "McGill", "TU Delf", "KU Leuven", "TU Braunschweig", "Royal Military Academy", "ETHZ",
                             "TU Munchen", "TU Munich", "TU Muenchen", "TU Wien", "TU Vienna", "U.C. Berkeley", "TU Clausthal", "Institute of Technology", "TU Darmstadt", 
                             "TU Dresden", "UNSW Australia", "New Mexico Tech", "RWTH", "UC Davis", "UC Berkeley", "UNAM", "Beijing Institude of technology",
                             "TU Bergakademie Freiberg", "INSA de Lyon", "Caltech", "IMT Mines Albi", "IIIT Delhi", "RNSIT Bangalore", "ISU", "NTNU",
                             "Shaanxi Engineering Laboratory for Microsatellites", "UPC-BarcelonaTECH", "Tecnolgico de Estudios Superiores de Ecatepec",
                             "MEDES - IMPS", "CTU in Prague", "BTU Cottbus", "Princeton Plasma Physics Laboratory",
                             ],
              
              "other research institutions": ["Purple Mountain Observatory", "Center", "Centre", "Chinese Academy of Sciences",
                               "Zentrum", "TNO", "Fraunhofer", "Academy", "INAF", "Centro", "Helmholtz", "Observatory", "Observatoire", "Academies",
                               "KNMI", "CSSTEAP", "U.S. Geological Survey", "LAAS-CNRS", "VITO nv", "Geoforschungszentrum", "Bundesanstalt", "Joanneum",
                               "Naval Research Laboratory", "INSERM",  "IARI", "Istituto Italiano di Tecnologia", "RIKEN", "CNRS", "Unmanned Exploration Laboratory (UEL)",
                               "National Insitute of Nuclear Physics", "Istituto Nazionale di Fisica Nucleare", "LEEM-UPM", "Liquid Propulsion System centre",
                               "CALT", "DMARS", "D-MARS", "Unidroit", "CSIRO", "GTD", "Science and Technology on Space Physics Laboratory", "Italian National Research Council (CNR)",
                               "IFAC-CNR", "ITA-DCTA", "JHU Applied Physics Laboratory", "National Nuclear Laboratory", "International Space Station",
                               "Office National dEtudes et de Recherches Arospatiales", "National Key Laboratory of Aerospace Flight Dynamics",
                               "ONERA - The French Aerospace Lab", "MBRSC", "National Technology of Mexico", "Italian National Research Council",
                               """Commissariat l'nergie atomique et aux nergies alternatives""", "National Research Council", "Consiglio Nazionale delle Ricerche",
                               "Consejo Superior de Investigaciones Cientficas", "Council for Scientific and Industrial Research", "Colegio Federado de Ingenieros y de Arquitectos de Costa Rica",
                               "Mauritius Research and Innovation Council", "Pakistan Space and Upper Atmosphere Research Commission"
                               ],
              
              "other": ["ESO", "UN Office", "EUMETSAT", "Air Force", "Society", "Museum", "Hospital", "Foundation", "e.V.", "Association", "IFALPA", "Agency", "Ministry",
                        "United Nations", "Associazione", "National Research Fund", "Initiative for Interstellar Studies", "European External Action Service", "Mars Without Borders",
                        "DreamUp","Nexus Aurora", "European Organization for Nuclear Research", "ITU", "EISMEA", "Breakthrough Initiatives", "NOAA", "EURISY",
                        "ILEWG", "City of Los Angeles", "ASTRAX KIDS", "ATOMX Education", "Fondazione E. Amaldi", "UNISEC-Global", "US DoD", "International MoonBase Alliance",
                        "Yuzhnoye State Design Office", "U R RAO SATELLITE CENTRE", "INPE", "ISTI-CNR", "Aeronautica Militare", "SGAC", "Slovak Organisation for Space Activities",
                        "Swedish National Space Board (SNSB)", "Catena Space", "FSC RF-IMBP",
                        "CAST", "BECCAL collaboration", "Reunion Island Space Initiative", "BLUECUBE Aerospace", "Space Renaissance International",
                        "Federal Aviation Administration", "European Commission", "Chamber of Commerce", "Comision Colombiana del Espacio"
              ],
              
              "company": ["inc.", "Inc.", "Company", "Co.","Airbus", "S.p.A.", "S.p.A", "SPA", "SpA", "GmbH", "gmbh", "AG", "SAS", "S.R.L.", "Ltd", "LTD", "LLC",
                          "L.L.C", "S.A.", "B.V.", "Srl", "Inc", "SRL", "srl", "INC.", "GMBH",
                          "S.r.l", "Solutions", "Industry", "Industries", "Instruments", "Systems",
                          "S.A.T.E.", "Ariane", "ArianeGroup",
                          "OHB", "Thales", "Raphael", "Dassault", "Gomspace", "Boeing", "Bryce", "Lockheed", "Euroconsult", "LeoLabs", "Planet",
                          "OneWeb", "Oneweb", "Enpulsion", "ENPULSION", "ICEYE", "Kayser", 
                          "MDA", "Pioneer Astronautics", "TECHNO SYSTEM DEV", "Technologies", "Argotec", "EADS", "IBM", "Group", "srl",
                          "Blue Horizon", "RHEA", "Air Liquide", "Bayern Chemie", "Tata", "SSC", "BEEVERYCREATIVE", "Jacobs", "SODERN", "Sodern", "SENER", "JSC Glavkosmos",
                          "BHO Legal", "Valles Marineris International Private Limited", "Thales", "THALES","Kongsberg", "Deloitte", "Caribou Digital", "Booz, Allen & Hamilton",
                          "Aerojet Rocketdyne", "Europropulsion", "Beyond Gravity", "Iridium", "Orbital Loft", "Destinus", "Roketsan", "Leap Biosystems", "Astronika",
                          "maxon motor", "LIDE", "e-GEOS", "Orbit Logic", "Scanway", "Elecnor", "ARESYS", "Rocket Factory Augsburg", "Aubert & Duval", "Orbit Fab",
                          "Almatech SA", "EnduroSat AD", "CASC", "Ghalam LLP", "Rocket Lab", "Vieira de Almeida & Associados", "Eutelsat", "S\[&]T",
                          "UC Rusal", "Beyond Gravity", "GomSpace", "Leonardo Spa", "Science \[&] Technology AS", "Sitael Spa", "Valispace", "GMV Aerospace & Defence SAU",
                          "Space Applications Services NV", "LuxSpace Sarl", "Deimos Space SLU", "SpaceForest", "Space Applications Services N.V", "KBRwyle",
                          "Space Applications Services", "Honeybee Robotics", "Deimos Space", "Telespazio", "T4i", "Unmanned Exploration Laboratory", "Spaceonova",
                          "Advanced Space", "SAFRAN", "ASTRAX", "LandSpace Technology"
                          ],

                        
              "rest": [],
              }

## 3. Implementing Organisation Categories

Creating a simplified dict:

In [18]:
raw2 = {}
for key, value in raw1.items():
  raw2.update({key: []})
  for organisation in value:
    if re.search("|".join(categories["space agency"]), organisation):
      raw2[key].append({"Name": organisation, "Type": "Space Agency"})
    elif re.search("|".join(categories["university"]), organisation):
      raw2[key].append({"Name": organisation, "Type": "University"})
    elif re.search("|".join(categories["other research institutions"]), organisation):
      raw2[key].append({"Name": organisation, "Type": "Other Research Institution"})
    elif re.search("|".join(categories["other"]), organisation):
      raw2[key].append({"Name": organisation, "Type": "Other"})
    elif re.search("|".join(categories["company"]), organisation):
      raw2[key].append({"Name": organisation, "Type": "Company"})
    else:
      raw2[key].append({"Name": organisation, "Type": "Unknown"})

In [None]:
#raw2["45305"]

## 4. Exporting

In [23]:
with open("2b.orga_names_and_types.pickle", "wb") as handle:
  pickle.dump(raw2, handle, protocol = pickle.HIGHEST_PROTOCOL)

# 5. Conclusion

In this notebook, I added a new feature to the dataset: organisation type. This will be very useful to see the composition of organisation types at the IAC. In the next and final part of the pre-processing step, I'll pre-process the abstracts themselves.