A Python application that scrapes Ultimate Guitar user contribution pages to extract song data and generate markdown lists with Hugo shortcode formatting.
- Web Scraping: Automatically scrapes Ultimate Guitar contribution pages with pagination support
- HTML File Processing: Process saved HTML files with wildcard support (e.g.,
*.html) - Markdown Generation: Outputs formatted markdown with Hugo shortcodes for web display
- Duplicate Detection: Prevents duplicate entries when processing multiple pages
- Error Handling: Robust error handling for network issues and parsing problems
- Clone this repository:
git clone https://github.com/yourusername/UGExtractPython.git
cd UGExtractPython- Install required dependencies:
pip install -r requirements.txtScrape all songs from the web:
python main.pyProcess saved HTML files:
python main.py "*.html"
python main.py "page1.html" "page2.html"Redirect output to a markdown file:
python main.py > songs.mdThe application generates markdown in this format:
107 songs I've transcribed and shared as of August 03, 2025:
* Artist Name - {{<rawhtml>}}<a href="https://tabs.ultimate-guitar.com/tab/..." target="blank">Song Title</a>{{</rawhtml>}} (chords)requests- HTTP library for web scrapingbeautifulsoup4- HTML parsing library
main.py- Entry point with command line argument processingsong_scraper.py- Core scraping and processing logicsong.py- Song data modelsong_type.py- File type enumeration (chords, tabs, guitar pro)
The application is currently configured to scrape user ID 6193383-gusp3r from Ultimate Guitar. To change this, edit the user ID in the import_web() method in song_scraper.py.
This project is for personal use. Please respect Ultimate Guitar's terms of service and rate limits when using this tool.