# Data Extraction from Meeting Minute PDFs

PDFs we are extracting from are UCSB AS F&B Meeting Minutes, publicly available at [AS F&B Committee Minutes](https://asfb.as.ucsb.edu/minutes2018-2019/) -- We are using Fall 2024 and Winter 2025 quarter minutes.

In [1]:
import pdfplumber
import re
import pandas as pd
import logging

## Motion Extraction

Converting PDFs to text and grabbing all of the motions passed by the committee

In [5]:
# Ignore non-critical warnings from pdfminer through pdfplumber
logging.getLogger("pdfminer").setLevel(logging.ERROR)

# Folder of pdfs, from UCSB AS F&B Meeting Minutes, publicly available, see above
pdf_folder = "meeting-mins-pdfs/"

# This function will convert pdf pages holding relevant motions into text, then return it in all in one big string
def motions_text_from_pdf(pdf_path):

    collecting = False
    out = ''
    
    with pdfplumber.open(pdf_path) as pdf:
         
        for page in pdf.pages: 
            
            text = page.extract_text()

            if not collecting:

                if "action items" in text.lower():
                    collecting = True
                    
            if collecting:
                out += text
                
        return find_motions(out)

# Looks for and returns list of  otions found in the text
def find_motions(text):

    pattern = r"motion language:(.*?)action: passed"
    motions = re.findall(pattern, text.lower(), flags=re.DOTALL)
    return motions
    

print(motions_text_from_pdf(r'meeting-mins-pdfs/10.07.2024 Finance Committee Meeting Minutes.pdf'))

[': motion to fully fund debate team at ucsb $5,000 from academic\nteams\n', ': motion to fully fund united dance company $1,225 from seal fall\nquarter fund.', ': motion to fully fund sigma alpha zeta $3,200 from culture and\ngrad\n', ': motion to table korean american student association for one\nweek.\n', ': motion to table speech forensics at ucsb for one week.\n', ': motion to fully fund association of computing machinery at ucsb\n$537.50 from seal fall quarter fund.\n', ': motion to fully fund ucsb model united nations $5,000 from\nacademic teams\n', ': motion to fully fund ucsbreakin’ $850 from seal fall quarter\nfund.\n', ': motion to fully fund pre-law society $250 from seal fall quarter fund.\n', ': motion to strike the motion to fund pre-law society $250 from\nseal fall quarter fund\n', ': motion to forward los ingenieros requesting $3,500 to cab for further\nconsideration\n', ': motion to fully fund mock trial $4,728 from academic teams\n', ': motion to reaffirm chabad ucsb