Senate Committee Video Scraper

About

Surprisingly, there is not a central repository for past Senate Hearings across all committees. There are tables on each website for each individual committee, but not one that contains all the hearings. This project intends to change that. By collecting links to all the videos for various hearings, it should make it easier to search for a given hearing vs navigating a given comittees website.

Within senateVideos, there is a CSV file for each major Senate committee, along with MasterFile.csv which is a combination of all those other files.

Requirements

A requirements.txt file is linked in the repository. While more then 3 packages are listed, the main 3 packages to work with the scraper are:

Beautiful Soup
Pandas
Requests

To run the scraper, the main method to get set up is to run MergeSenateVideoFiles.py which will run all the scrapers for the committees and then merge them into a master CSV file. Alternativly, you can run a individual scraper contained within SenateVideoScrapers.

Name		Name	Last commit message	Last commit date
Latest commit History 222 Commits
SenateVideoFiles		SenateVideoFiles
SenateVideoScrapers		SenateVideoScrapers
.Rhistory		.Rhistory
.gitattributes		.gitattributes
.gitignore		.gitignore
MasterFileWithTags.csv		MasterFileWithTags.csv
MergeSenateVideoFiles.py		MergeSenateVideoFiles.py
New_T_File.csv		New_T_File.csv
README.md		README.md
TagFiles.R		TagFiles.R
basic_tags_tibble.csv		basic_tags_tibble.csv
combined_tags.csv		combined_tags.csv
elastic_search_upload.py		elastic_search_upload.py
lda_tags.csv		lda_tags.csv
requirements.txt		requirements.txt
transcriptPDFText.py		transcriptPDFText.py
transcript_text.csv		transcript_text.csv
wit_count.csv		wit_count.csv
witnessCounts.py		witnessCounts.py

Leschonander/SenateVideoScraper

Folders and files

Latest commit

History

Repository files navigation

Senate Committee Video Scraper

About

Requirements

About

Resources

Stars

Watchers

Forks

Languages