Skip to content

nog642/an-dl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

an-dl

Audio Network scraper

anw_dl.py

anw_dl.py is a scraper for one song at a time.

usage: anw_dl.py <url to song> [(-p | --chromedriver-path) </path/to/chromedriver>]

Can also download secondary songs. (e.g. https://www.audionetwork.com/browse/m/track/conversation-2_49932)

Depends on selenium and requires chromedriver.

an_dl.py

an_dl.py also downloads one song, same functionality and usage as anw_dl.py, but it is generally slower because it only works on one audionetwork page layout, and their A/B testing makes it fail often.

usage: an_dl.py <url to song>

an_scraper.py depends on this, which is why it exists.

Depends on bs4.

an_scraper.py

an_scraper.py is a scraper for several categories at a time.

usage: an_scraper.py [(-r | --redownload) (0 or 1)] [(-t | --timeout) <timeout>])

To download categories, you must copy the URL of the category into ./data/categories.txt, one category per line.

Example categories.txt:

https://www.audionetwork.com/browse/m/musical-styles/latin/bossa-nova/results
https://www.audionetwork.com/browse/m/musical-styles/musical-styles/chill-out/results
https://www.audionetwork.com/browse/m/musical-styles/musical-styles/electronica/results
https://www.audionetwork.com/browse/m/musical-styles/musical-styles/trip-hop-downbeat/results

The -r argument takes 0 or 1 as an argument (just converts input with bool() in python, will not throw exception if it's not 0 or 1).

Since ./data/songs.json only stores all the songs from the last run, but doesnt match up songs with categories, if you want to download a category that is in songs.json but don't want to download all the categories, this causes an issue since you can either use all the json data or none.

If -r is 0, it will try to download all the songs in songs.json, including some that may not be in the categories you have selected. keep in mind that if the files are already in the songs directory, this will be the faster option.

If -r is 1, it will not use songs.json, but rather scrape audio network again for the song URLs, which is the faster option if the audio files from the last run are not in the songs directory.

Timeout can be specified, and is an integer representing seconds.

Songs that are in multiple categories will not be downloaded twice.

It saves songs as mp3 files in ./songs directory.

It saves song URLs and the categories they came from in ./data/songs.json, formatted as such:

{
    "categories": [
        "https://www.audionetwork.com/browse/m/musical-styles/latin/bossa-nova/results",
        "https://www.audionetwork.com/browse/m/musical-styles/musical-styles/electronica/results",
        "https://www.audionetwork.com/browse/m/musical-styles/musical-styles/chill-out/results",
        "https://www.audionetwork.com/browse/m/musical-styles/musical-styles/trip-hop-downbeat/results",
    ],
    "songs": [
        [
            "Coral", 
            "http://content2.audionetwork.com/Preview/tracks/mp3/v5res/ANW1024/06.mp3"
        ], 
        [
            "Glider", 
            "http://content2.audionetwork.com/Preview/tracks/mp3/v5res/ANW1094/06.mp3"
        ],
        ...
        [
            "Cosmic Hustle", 
            "http://content2.audionetwork.com/Preview/tracks/mp3/v5res/ANW2078/01.mp3"
        ], 
        [
            "Storm Warning", 
            "http://content2.audionetwork.com/Preview/tracks/mp3/v5res/ANW1665/01.mp3"
        ]
    ]
}

It will check if the song is already in the songs folder, and if it is it will not download it again.

If there is an error in downloading the song, it will log the song title, the URL, the error, and the error message in ./data/skipped.log.