In [None]:
#| hide
from faculty_expertise.core import *

# UCSB Faculty Directory

> A tool for scraping, structuring, and enriching faculty data from UCSB departmental websites.

## Project Goals

This project aims to create a structured database of faculty information from UC Santa Barbara's departmental websites, including:

1. Basic contact information (name, title, email, office, etc.)
2. Research specializations and expertise
3. Structured summaries of faculty research using AI
4. Department and inter-departmental relationships

The project uses a notebook-driven development approach with [nbdev](https://nbdev.fast.ai) to maintain well-documented, tested code with rich explanatory context.

## Features

- Specialized scrapers for different department website layouts (Drupal, WordPress, custom)
- Flexible `Unit` class to manage department-specific scraping and enrichment
- AI-powered faculty research summarization (using OpenAI)
- Utilities for crawling and analyzing faculty websites
- Modular design for easy extension to additional departments

## Developer Guide

If you are new to using `nbdev` here are some useful pointers to get you started.

### Install faculty_expertise in Development mode

```sh
# make sure faculty_expertise package is installed in development mode
$ pip install -e .

# make changes under nbs/ directory
# ...

# compile to have changes apply to faculty_expertise
$ nbdev_prepare
```

## Usage

### Installation

Install latest from the GitHub [repository][repo]:

```sh
$ pip install git+https://github.com/caylor/faculty_expertise.git
```

or from [conda][conda]

```sh
$ conda install -c caylor faculty_expertise
```

or from [pypi][pypi]


```sh
$ pip install faculty_expertise
```


[repo]: https://github.com/caylor/faculty_expertise
[docs]: https://caylor.github.io/faculty_expertise/
[pypi]: https://pypi.org/project/faculty_expertise/
[conda]: https://anaconda.org/caylor/faculty_expertise

### Documentation

Documentation can be found hosted on this GitHub [repository][repo]'s [pages][docs]. Additionally you can find package manager specific guidelines on [conda][conda] and [pypi][pypi] respectively.

[repo]: https://github.com/caylor/faculty_expertise
[docs]: https://caylor.github.io/faculty_expertise/
[pypi]: https://pypi.org/project/faculty_expertise/
[conda]: https://anaconda.org/caylor/faculty_expertise

## Project Structure

- `nbs/`: Jupyter notebooks that define the code base
  - `00_core.ipynb`: Core data structures (Unit class)
  - `01_scrapers.ipynb`: HTML scrapers for different department layouts
  - `02_enrichment.ipynb`: AI enrichment and metadata extraction
  
- `faculty_expertise/`: Auto-generated Python modules from notebooks
  - `core.py`: Core data structures
  - `my_scrapers.py`: HTML scraping functions
  - `my_enrichment.py`: AI enrichment functions

- `faculty_html/`: HTML files from department websites
- `faculty_screenshots/`: Screenshots of department pages

## Quick Start Example

In [None]:
# Scrape faculty from a department
from faculty_expertise.core import Unit

# Example code to show how to use the package
# This will use a specialized scraper for each department based on its website structure
unit = Unit("Computer Science", "../faculty_html/Computer_Science.html")
df = unit.scrape()
df.head(2)

Unnamed: 0,Name,Title(s),Specialization,Email,Phone,Office,Website,Photo URL,Department,Unit
0,Divyakant Agrawal,Distinguished Professor & Chair,,agrawal@cs.ucsb.edu,(805)893-4385,3117 Harold Frank Hall,https://www.cs.ucsb.edu/~agrawal/,,Computer Science,Computer Science
1,Prabhanjan Ananth,Assistant Professor,,prabhanjan@ucsb.edu,,1119 Harold Frank Hall,https://sites.google.com/site/prabhanjanva/,,Computer Science,Computer Science


### Enriching Faculty Data with AI

The library can enrich faculty data with AI-powered summaries of their research expertise:

In [None]:
# OPTIONAL: Enrich with AI (requires OpenAI API key set in .env file)
from faculty_expertise.my_enrichment import enrich_faculty_row

# Example: Take first faculty member's data
row = df.iloc[0]

# This will fetch the faculty website and use OpenAI to generate a structured summary
enriched_data = enrich_faculty_row(row)  # Uncomment to run (requires API key)

In [None]:
# Print output of enrichment in a pretty format
from pprint import pprint
pprint(row)
pprint(enriched_data)

Name                              Divyakant Agrawal
Title(s)            Distinguished Professor & Chair
Specialization                                 None
Email                           agrawal@cs.ucsb.edu
Phone                                 (805)893-4385
Office                       3117 Harold Frank Hall
Website           https://www.cs.ucsb.edu/~agrawal/
Photo URL                                      None
Department                         Computer Science
Unit                               Computer Science
Name: 0, dtype: object
{'CV URL': 'https://www.cs.ucsb.edu/~agrawal/DivyAgrawalcv.pdf',
 'Crawled URLs': ['https://www.cs.ucsb.edu/~agrawal/',
                  'https://www.cs.ucsb.edu/~agrawal/research.html',
                  'https://www.cs.ucsb.edu/~agrawal/DivyAgrawalcv.pdf',
                  'https://www.cs.ucsb.edu/~agrawal/bio.html'],
 'Disciplines': ['Computer Science', 'Information Technology', 'Data Science'],
 'Expertise': "Dr. Divyakant Agrawal's research focus

## License

This project is licensed under the MIT License - see the LICENSE file for details.