Skip to content

kcaylor/faculty_expertise

Repository files navigation

UCSB Faculty Directory

A tool for scraping, structuring, and enriching faculty data from UCSB departmental websites.

Project Goals

This project aims to create a structured database of faculty information from UC Santa Barbara's departmental websites, including:

  1. Basic contact information (name, title, email, office, etc.)
  2. Research specializations and expertise
  3. Structured summaries of faculty research using AI
  4. Department and inter-departmental relationships

The project uses a notebook-driven development approach with nbdev to maintain well-documented, tested code with rich explanatory context.

Features

  • Specialized scrapers for different department website layouts (Drupal, WordPress, custom)
  • Flexible Unit class to manage department-specific scraping and enrichment
  • AI-powered faculty research summarization (using OpenAI)
  • Utilities for crawling and analyzing faculty websites
  • Modular design for easy extension to additional departments

Developer Guide

If you are new to using nbdev here are some useful pointers to get you started.

Install faculty_expertise in Development mode

# make sure faculty_expertise package is installed in development mode
$ pip install -e .

# make changes under nbs/ directory
# ...

# compile to have changes apply to faculty_expertise
$ nbdev_prepare

Usage

Installation

Install latest from the GitHub repository:

$ pip install git+https://github.com/caylor/faculty_expertise.git

or from conda

$ conda install -c caylor faculty_expertise

or from pypi

$ pip install faculty_expertise

Documentation

Documentation can be found hosted on this GitHub repository's pages. Additionally you can find package manager specific guidelines on conda and pypi respectively.

Project Structure

  • nbs/: Jupyter notebooks that define the code base

    • 00_core.ipynb: Core data structures (Unit class)
    • 01_scrapers.ipynb: HTML scrapers for different department layouts
    • 02_enrichment.ipynb: AI enrichment and metadata extraction
  • faculty_expertise/: Auto-generated Python modules from notebooks

    • core.py: Core data structures
    • my_scrapers.py: HTML scraping functions
    • my_enrichment.py: AI enrichment functions
  • faculty_html/: HTML files from department websites

  • faculty_screenshots/: Screenshots of department pages

Quick Start Example

# Scrape faculty from a department
from faculty_expertise.core import Unit

# Create a unit for Computer Science department
unit = Unit("Computer Science", "faculty_html/Computer_Science.html")

# Scrape faculty information
df = unit.scrape()

# Display the first few rows
print(df.head())

# Optionally enrich with AI-powered summaries (requires OpenAI API key)
from faculty_expertise.my_enrichment import enrich_faculty_row
row = df.iloc[0]
result = enrich_faculty_row(row)
print(result)

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

A set of tools for scraping campus faculty webpages and developing semantic descriptions of research expertise

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors