# UVM Course-Faculty Network Analysis

This notebook demonstrates how to programmatically interact with the UVM course enrollment database and perform custom network analyses.

## Setup

First, let's import the necessary modules and initialize the database connection.

In [None]:
import sys
sys.path.insert(0, '..')

from src.database import Database
from src.network_analysis import NetworkAnalyzer
import networkx as nx
import matplotlib.pyplot as plt
import pandas as pd

%matplotlib inline
plt.style.use('seaborn-v0_8-darkgrid')

## Database Statistics

Let's explore what data we have in the database.

In [None]:
with Database() as db:
    stats = db.get_statistics()

print("Database Statistics:")
for key, value in stats.items():
    print(f"  {key}: {value}")

## Explore Departments

Let's see what departments we have and how many courses each offers.

In [None]:
with Database() as db:
    db.cursor.execute('''
        SELECT d.code, COUNT(DISTINCT c.id) as course_count
        FROM departments d
        LEFT JOIN courses c ON d.id = c.department_id
        GROUP BY d.id
        ORDER BY course_count DESC
    ''')
    dept_data = db.cursor.fetchall()

dept_df = pd.DataFrame(dept_data, columns=['Department', 'Course Count'])
print(dept_df)

# Visualize
plt.figure(figsize=(12, 6))
plt.barh(dept_df['Department'], dept_df['Course Count'])
plt.xlabel('Number of Courses')
plt.title('Courses by Department')
plt.tight_layout()
plt.show()

## Build and Analyze Networks

Let's build a bipartite network of courses and faculty.

In [None]:
with Database() as db:
    analyzer = NetworkAnalyzer(db)
    
    # Build bipartite network
    G = analyzer.build_bipartite_network(start_year=2020, end_year=2024)

print(f"Network Statistics:")
print(f"  Nodes: {G.number_of_nodes()}")
print(f"  Edges: {G.number_of_edges()}")
print(f"  Density: {nx.density(G):.4f}")

## Faculty Collaboration Network

Create a network where faculty members are connected if they taught the same course.

In [None]:
with Database() as db:
    analyzer = NetworkAnalyzer(db)
    faculty_net = analyzer.build_faculty_collaboration_network(start_year=2020, end_year=2024)

print(f"Faculty Network Statistics:")
print(f"  Nodes: {faculty_net.number_of_nodes()}")
print(f"  Edges: {faculty_net.number_of_edges()}")
print(f"  Density: {nx.density(faculty_net):.4f}")

## Centrality Analysis

Calculate and visualize centrality measures for faculty.

In [None]:
# Calculate degree centrality
degree_centrality = nx.degree_centrality(faculty_net)

# Get top 15 faculty by degree centrality
top_faculty = sorted(degree_centrality.items(), key=lambda x: x[1], reverse=True)[:15]

# Extract names and scores
names = [faculty_net.nodes[f[0]].get('name', f[0]) for f in top_faculty]
scores = [f[1] for f in top_faculty]

# Plot
plt.figure(figsize=(12, 8))
plt.barh(names, scores)
plt.xlabel('Degree Centrality')
plt.title('Top 15 Faculty by Degree Centrality (2020-2024)')
plt.gca().invert_yaxis()
plt.tight_layout()
plt.show()

## Interdisciplinary Analysis

Find faculty members who teach across multiple departments.

In [None]:
with Database() as db:
    analyzer = NetworkAnalyzer(db)
    G = analyzer.build_bipartite_network(start_year=2020, end_year=2024)
    interdisciplinary = analyzer.identify_interdisciplinary_connections(G)

# Create DataFrame
inter_df = pd.DataFrame(interdisciplinary[:20])
print("\nTop 20 Interdisciplinary Faculty:")
print(inter_df[['faculty', 'num_departments', 'num_courses']])

# Visualize distribution
dept_counts = [f['num_departments'] for f in interdisciplinary]
plt.figure(figsize=(10, 6))
plt.hist(dept_counts, bins=range(2, max(dept_counts)+2), edgecolor='black', alpha=0.7)
plt.xlabel('Number of Departments')
plt.ylabel('Number of Faculty')
plt.title('Distribution of Interdisciplinary Teaching')
plt.tight_layout()
plt.show()

## Custom Query Example

You can run custom SQL queries to explore the data.

In [None]:
with Database() as db:
    # Example: Find courses with highest average enrollment
    db.cursor.execute('''
        SELECT c.full_code, c.course_title, 
               AVG(co.enrollment) as avg_enrollment,
               COUNT(co.id) as num_offerings
        FROM courses c
        JOIN course_offerings co ON c.id = co.course_id
        WHERE co.enrollment IS NOT NULL
        GROUP BY c.id
        HAVING num_offerings >= 3
        ORDER BY avg_enrollment DESC
        LIMIT 20
    ''')
    
    results = db.cursor.fetchall()

enrollment_df = pd.DataFrame(results, 
    columns=['Course Code', 'Title', 'Avg Enrollment', 'Offerings'])
print("\nCourses with Highest Average Enrollment:")
print(enrollment_df)

# Visualize
plt.figure(figsize=(12, 8))
plt.barh(enrollment_df['Course Code'], enrollment_df['Avg Enrollment'])
plt.xlabel('Average Enrollment')
plt.title('Top 20 Courses by Average Enrollment')
plt.gca().invert_yaxis()
plt.tight_layout()
plt.show()

## Export Data for Further Analysis

Export network data or database records for use in other tools.

In [None]:
# Export faculty network to CSV
with Database() as db:
    analyzer = NetworkAnalyzer(db)
    faculty_net = analyzer.build_faculty_collaboration_network(start_year=2020, end_year=2024)

# Convert to DataFrame
edges = []
for u, v, data in faculty_net.edges(data=True):
    edges.append({
        'source': faculty_net.nodes[u].get('name', u),
        'target': faculty_net.nodes[v].get('name', v),
        'weight': data.get('weight', 1)
    })

edges_df = pd.DataFrame(edges)
print("\nFaculty Collaboration Network Edges:")
print(edges_df.head(10))

# Save to CSV if needed
# edges_df.to_csv('faculty_network_edges.csv', index=False)

## Conclusion

This notebook demonstrates the basic functionality of the UVM course enrollment analysis system. You can:

- Query the database directly using SQL
- Build and analyze different types of networks
- Calculate various network metrics
- Create custom visualizations
- Export data for further analysis

Feel free to modify and extend this notebook for your specific research questions!