# Academia–Practice Interaction Mapping Using NLP

**Notebook 05: Entity Classification**

**Author:** Kamila Lewandowska  
**Project Status:** *In Progress*  
**Last Updated:** April 2025  

---

### Notebook Overview

**Goal:** Develop a typology of non-academic organizations based on their names, using an inductive, grounded approach.

This notebook:
- Loads the cleaned list of non-academic organization names
- Randomly samples 1,000 entries for manual review and typology creation
- Prepares a CSV file for human annotation 
---



## STEP 3: Categorize data

In [1]:
import pandas as pd
import random

# Load non-academic orgs
df_non_academic = pd.read_csv("../output/non_academic_org_entities.csv")
non_academic_entities = df_non_academic["ORG_Entity"].dropna().tolist()

In [2]:
# Establish categories of non-academic entities

"""
To classify non-academic entities identified in the impact case studies, I developed a typology of organization 
types based on a grounded, inductive review process. Specifically, I randomly selected a sample of approximately 1,000 
unique non-academic organization names from the full list of 4,735 deduplicated entities. This sample served as an 
exploratory base for developing an inductive typology. By manually reviewing the selected entries, I identified recurring 
organizational patterns and formulated a set of categories that reflect the functional diversity of non-academic 
stakeholders mentioned in the case studies.
"""




'\nTo classify non-academic entities identified in the impact case studies, I developed a typology of organization \ntypes based on a grounded, inductive review process. Specifically, I randomly selected a sample of approximately 1,000 \nunique non-academic organization names from the full list of 4,735 deduplicated entities. This sample served as an \nexploratory base for developing an inductive typology. By manually reviewing the selected entries, I identified recurring \norganizational patterns and formulated a set of categories that reflect the functional diversity of non-academic \nstakeholders mentioned in the case studies.\n'

In [5]:
# Create a random sample of 1000 entities

# Set seed for reproducubility
random.seed(42)

# Randomly sample 1,000 entires form non-academic entities list
sample_size = 1000
non_academic_sampled = random.sample(non_academic_entities, sample_size)

# Convert to DataFrame for review
non_academic_sampled_df = pd.DataFrame(non_academic_sampled, columns=["organization_name"])
non_academic_sampled_df.to_csv("../output/non_academic_sampled.csv", index=False, encoding="utf-8-sig")


In [4]:
# Establish categories of non-academic entities

"""
1. Company / Business
Commercial enterprises, corporations, startups, and private firms (e.g., Kaufland, KGHM ZANAM, Voicelab, Photon).

2. Government / Public Administration
Includes ministries, central/local government agencies, parliament, and other state entities (e.g., Senat RP, Urząd Miasta, Ministerstwo Rozwoju).

3. NGO / Association / Foundation
Non-profit organizations, foundations, professional associations, and social initiatives (e.g., Fundacja La Strada, Polskie Towarzystwo Psychologiczne, Stowarzyszenie Wioska Gotów).

4. Media / Publishing
News outlets, broadcasters, publishers, and cultural magazines (e.g., Polskie Radio, TVP Info, Deutsche Welle, Gazeta Lubuska).

5. Cultural Institution / Arts
Museums, theatres, orchestras, festivals, galleries (e.g., Teatr Wielki, Muzeum Historii Polski, Galeria Arsenał).

6. Health / Hospitals / Medical
Clinics, hospitals, medical institutes, and health-related organizations (e.g., Centrum Zdrowia Szansa, NFZ, American Heart Association).

7. Religious Organization
Churches, dioceses, religious associations, and theological institutions (e.g., Kościół Katolicki, Episkopat Polski, Cerkiew).

8. Military / Defense / Security
Armed forces, police, defense industry, or military R&D (e.g., Wojsko Polskie, Żandarmeria Wojskowa, Lockheed Martin).

9. International Organization / EU
UN, EU, NATO, OECD, international consortia or partnerships (e.g., European Commission, UNESCO, OECD).

10. Education (non-university)
Includes schools, kindergartens, vocational schools, continuing education centers (e.g., Szkoła Podstawowa, Centrum Kształcenia Ustawicznego).

11. Other / Unclear
Anything that doesn’t clearly fall into the above categories or needs human validation.

"""

'\n1. Company / Business\nCommercial enterprises, corporations, startups, and private firms (e.g., Kaufland, KGHM ZANAM, Voicelab, Photon).\n\n2. Government / Public Administration\nIncludes ministries, central/local government agencies, parliament, and other state entities (e.g., Senat RP, Urząd Miasta, Ministerstwo Rozwoju).\n\n3. NGO / Association / Foundation\nNon-profit organizations, foundations, professional associations, and social initiatives (e.g., Fundacja La Strada, Polskie Towarzystwo Psychologiczne, Stowarzyszenie Wioska Gotów).\n\n4. Media / Publishing\nNews outlets, broadcasters, publishers, and cultural magazines (e.g., Polskie Radio, TVP Info, Deutsche Welle, Gazeta Lubuska).\n\n5. Cultural Institution / Arts\nMuseums, theatres, orchestras, festivals, galleries (e.g., Teatr Wielki, Muzeum Historii Polski, Galeria Arsenał).\n\n6. Health / Hospitals / Medical\nClinics, hospitals, medical institutes, and health-related organizations (e.g., Centrum Zdrowia Szansa, NFZ, Am