# The Broadcastify Archive Toolkit for python
#  Based of the `broadcastify-archtk` Demo

## Supply a Webdriver Path


[Updated Chrome WebDrivers locations](https://googlechromelabs.github.io/chrome-for-testing/#stable)

[User Guide: Installing the WebDriver](https://ljhopkins2.github.io/broadcastify-archtk/user-guide/installation.html#installing-the-webdriver)

If your webdriver is saved in a directory in your operating system's `PATH` environment variable, you can leave this cell alone.

If not, provide the path to the webdriver.

Recommend downloading and placing in the chromedriver folder

In [1]:
webdriver_path = r'.\chromedriver\chromedriver.exe'  # recommend driver location

### Install depencies 

In [None]:
!pip install broadcastify_archtk
!pip install selenium
!pip install lxml
!pip install --upgrade jupyter ipywidgets

### Test Selenium
Chrome should launch indicating that it is working 

In [2]:
from selenium import webdriver

driver = webdriver.Chrome(webdriver_path)  # Path to ChromeDriver
driver.get('http://www.google.com')

## Import the Package

[User Guide: Importing the package](https://ljhopkins2.github.io/broadcastify-archtk/user-guide/creating-an-archive.html#importing-the-package)

In [2]:
from btk import BroadcastifyArchive
import datetime as dt
import configparser
import threading
import os

## Instantiate the Toolkit

[User Guide: Instantiating the toolkit](https://ljhopkins2.github.io/broadcastify-archtk/user-guide/creating-an-archive.html#instantiating-the-toolkit)

Choose a feed to test:
- from https://www.broadcastify.com/listen/
    - click through the map to a feed of your choice
    - grad the `feed_id` from the URL (`www.broadcastify.com/listen/feed/[feed_id]`)
- or from the list in the cell below

### Supply Login Credentials

For the full demo, get a premium account for Broadcastify (see [User Guide: Getting through the paywall](https://ljhopkins2.github.io/broadcastify-archtk/user-guide/installation.html#getting-through-the-paywall)).

Without a premium account, you can do Step 4 – but not Step 5 – of the demo.


Once you have your Broadcastify account set up, you have two options:

**OPTION 1**. Enter your username and password directly in the cell below.

In [3]:
## OPTION 1: Enter a username & password for a valid Broadcastify premium account below


config = configparser.ConfigParser()
config.read('config.ini')

USERNAME = config['Broadcastify']['Username']
PASSWORD = config['Broadcastify']['Password']

**OPTION 2**. Create a configuration file (see [User Guide: Password configuration files](https://ljhopkins2.github.io/broadcastify-archtk/user-guide/creating-an-archive.html#password-configuration-files)) and supply the absolute path to the file

In [4]:
## OPTION 2: Create a password configuration file, and supply the link below

login_path = None

### Instantiate a `BroadcastifyArchive` object

## Build the Archive

[User Guide: Building the archive](https://ljhopkins2.github.io/broadcastify-archtk/user-guide/building-the-archive.html#building-the-archive)

The code below will build the archive for a one-week period starting 10 days ago.

## Download Audio Files

[User Guide: Downloading audio files](https://ljhopkins2.github.io/broadcastify-archtk/user-guide/downloading-audio-files.html#downloading-audio-files)

Fill in the absolute path to the directory you'll store the downloaded audio files in.

In [11]:
# Build the archive for 3-10 days ago
start_date = (dt.datetime.now() - dt.timedelta(days=300)).date()
end_date = (dt.datetime.now() - dt.timedelta(days=5)).date()

print(f'start_date: {start_date}, end_date: {end_date}')

# Specify the archive to download path
# mp3_path = './downloaded_files/'
mp3_path = r'C:\Users\linoa\Downloads\audio/'

start_date: 2023-03-11, end_date: 2023-12-31


In [12]:
# TEST_FEED_ID = '30659' # Ballard Marine - Ch 13, 14 and 16
TEST_FEED_IDs = [ '41152','30659', '26694','38117', '36119', '37404', '37460', '38764',
                '40658', '38382', '22851', '31613', '38236', '37640',
                 '17329','31445', '22612','26383','33765'] # list of all feeds to download

In [13]:
# Start a year ago, and end today
download_start_time = dt.datetime.combine(end_date - dt.timedelta(days=300), dt.datetime(1,1,1,22,0).time())
download_end_time = dt.datetime.combine(end_date, dt.datetime(1,1,1,2,0).time())
print(f'Downloading archives from {download_start_time} to {download_end_time}')


Downloading archives from 2023-03-06 22:00:00 to 2023-12-31 02:00:00


In [14]:
def process_feed(TEST_FEED_ID):
    try:
        print(f'Building archive for feed {TEST_FEED_ID}')
        archive = BroadcastifyArchive(TEST_FEED_ID, 
                                        # login_cfg_path=login_path, 
                                        username=USERNAME, 
                                        password=PASSWORD,
                                        webdriver_path=webdriver_path)
        archive.build(start=start_date, end=end_date)

        # Download the mp3s for the given time range
        out_path = os.path.join(mp3_path, TEST_FEED_ID)
        if not os.path.exists(out_path):
            os.makedirs(out_path)

        out_path += '/'
        archive.download(start=download_start_time, end=download_end_time, output_path=out_path)
    except:
        print(f'Failed to download feed {TEST_FEED_ID}')
        pass

threads = []

for TEST_FEED_ID in TEST_FEED_IDs:
    thread = threading.Thread(target=process_feed, args=(TEST_FEED_ID,))
    threads.append(thread)
    thread.start()

# Wait for all threads to complete
for thread in threads:
    thread.join()

print("All downloads completed.")


Building archive for feed 41152Building archive for feed 30659Building archive for feed 26694
Building archive for feed 38117
Building archive for feed 36119
Building archive for feed 37404
Building archive for feed 37460

Building archive for feed 38764Building archive for feed 40658Building archive for feed 38382
Building archive for feed 22851
Building archive for feed 31613Building archive for feed 38236
Building archive for feed 37640
Building archive for feed 17329Building archive for feed 31445Building archive for feed 22612
Building archive for feed 26383Building archive for feed 33765







Initializing calendar navigation for Ballard Marine - Ch 13, 14 and 16...
Initializing calendar navigation for San Francisco Bay Marine Traffic...Initializing calendar navigation for Port Canaveral Marine Channels 16 & 12...

Initializing calendar navigation for Northern NJ and NY City Area Marine ...
Initializing calendar navigation for San Francisco Marine Ch 16...
Initializing calendar 

Exception in thread Thread-29 (process_feed):
Traceback (most recent call last):
  File "C:\Users\linoa\AppData\Local\Programs\Python\Python310\lib\threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "C:\Users\linoa\AppData\Local\Programs\Python\Python310\lib\threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\linoa\AppData\Local\Temp\ipykernel_35876\3483433650.py", line 4, in process_feed
  File "c:\Users\linoa\OneDrive\Documents\GitHub\MaritimeCommCollector\btk.py", line 186, in __init__
    self.feed_id = feed_id
  File "c:\Users\linoa\OneDrive\Documents\GitHub\MaritimeCommCollector\btk.py", line 515, in feed_id
    self._get_archive_dates()
  File "c:\Users\linoa\OneDrive\Documents\GitHub\MaritimeCommCollector\btk.py", line 468, in _get_archive_dates
    self.archive_calendar = ArchiveCalendar(self, browser,
  File "c:\Users\linoa\OneDrive\Documents\GitHub\MaritimeCommCollector\btk.py", line 753, in __init__
    self._att =

HBox(children=(IntProgress(value=0, description='Building dates', layout=Layout(flex='2'), max=140, style=Prog…

HBox(children=(IntProgress(value=0, description='Building dates', layout=Layout(flex='2'), max=208, style=Prog…

HBox(children=(IntProgress(value=0, description='Building dates', layout=Layout(flex='2'), max=296, style=Prog…

HBox(children=(IntProgress(value=0, description='Building dates', layout=Layout(flex='2'), max=296, style=Prog…

HBox(children=(IntProgress(value=0, description='Building dates', layout=Layout(flex='2'), max=296, style=Prog…

HBox(children=(IntProgress(value=0, description='Building dates', layout=Layout(flex='2'), max=296, style=Prog…

HBox(children=(IntProgress(value=0, description='Building dates', layout=Layout(flex='2'), max=296, style=Prog…

HBox(children=(IntProgress(value=0, description='Building dates', layout=Layout(flex='2'), max=296, style=Prog…

HBox(children=(IntProgress(value=0, description='Building dates', layout=Layout(flex='2'), max=296, style=Prog…

HBox(children=(IntProgress(value=0, description='Building dates', layout=Layout(flex='2'), max=296, style=Prog…

HBox(children=(IntProgress(value=0, description='Building dates', layout=Layout(flex='2'), max=296, style=Prog…

HBox(children=(IntProgress(value=0, description='Building dates', layout=Layout(flex='2'), max=296, style=Prog…

HBox(children=(IntProgress(value=0, description='Building dates', layout=Layout(flex='2'), max=296, style=Prog…

HBox(children=(IntProgress(value=0, description='Building dates', layout=Layout(flex='2'), max=296, style=Prog…

HBox(children=(IntProgress(value=0, description='Building dates', layout=Layout(flex='2'), max=296, style=Prog…

HBox(children=(IntProgress(value=0, description='Building dates', layout=Layout(flex='2'), max=296, style=Prog…

HBox(children=(IntProgress(value=0, description='Building dates', layout=Layout(flex='2'), max=296, style=Prog…

HBox(children=(IntProgress(value=0, description='Building dates', layout=Layout(flex='2'), max=296, style=Prog…

BroadcastifyArchive
 (Feed ID = 41152
  Feed Name = Port Canaveral Marine Channels 16 & 12
  Feed URL = "https://www.broadcastify.com/listen/feed/41152"
  Archive URL = "https://www.broadcastify.com/archives/feed/41152"
  Start Date: 2023-08-14
  End Date:   2024-01-05
  Username = "parad0xx" Password = [True]
  6,634 built archive entries between 2023-08-14 and 2023-12-31


HBox(children=(IntProgress(value=0, description='Overall progress', layout=Layout(flex='2'), max=6590, style=P…

Downloading 08-13-23 23:53 to 12-31-23 01:12
Storing at C:\Users\linoa\Downloads\audio/41152/.


Exception in thread Thread-25 (process_feed):
Traceback (most recent call last):
  File "C:\Users\linoa\AppData\Local\Programs\Python\Python310\lib\threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "C:\Users\linoa\AppData\Local\Programs\Python\Python310\lib\threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\linoa\AppData\Local\Temp\ipykernel_35876\3483433650.py", line 17, in process_feed
  File "c:\Users\linoa\OneDrive\Documents\GitHub\MaritimeCommCollector\btk.py", line 446, in download
    dn.get_archive_mp3s(filtered_entries, output_path)
  File "c:\Users\linoa\OneDrive\Documents\GitHub\MaritimeCommCollector\btk.py", line 638, in get_archive_mp3s
    mp3_soup = self.get_download_soup(archive_uri)
  File "c:\Users\linoa\OneDrive\Documents\GitHub\MaritimeCommCollector\btk.py", line 604, in get_download_soup
    raise ConnectionError(f'Problem connecting while getting soup from '
ConnectionError: Problem connecting while get

All downloads completed.


In [None]:
def delete_small_files(root_directory, size_threshold=20480):
    """
    Delete files smaller than the specified size threshold.

    :param root_directory: The directory to search in.
    :param size_threshold: The size threshold in bytes. Files smaller than this will be deleted. Defaults to 20 KB.
    """
    for root, dirs, files in os.walk(root_directory):
        for file in files:
            file_path = os.path.join(root, file)
            if os.path.getsize(file_path) < size_threshold:
                os.remove(file_path)
                print(f"Deleted: {file_path}")

# Usage example
root_dir = r'C:\Users\linoa\Downloads\audio'  # Replace with your directory path
delete_small_files(root_dir)


----