0.5.0: Add bugfixes, improve logging + documentation, support threading
- compare changes to previous version
- major change includes changing output file name to include
_reverse_chronological
or_chronological
- e.g.
MyChannel_reverse_chronological_videos_list.txt
MyChannel_chronological_videos_list.txt
- compare with previous naming convention of
MyChannel_videos_list.txt
- REGARDLESS of whether the file was in reverse chronological order or chronological order
- e.g.
addresses issues:
- 'gbk' codec can't decode byte 0x80
- Can lc.create_list_for() return the csv file name?
- Temporal filename "yt_videos_list_temp.txt"
inserts "_reverse_chronological" or "_chronological" to file name:
significantly improves logging:
- vertically align similar messages to facilitate quick comparisons between related messages
- commit c877927: Vertically align logging output
- commit e7458d0: Make logging messages more visible
- commit 3487888: Right pad all testing log messages with ">"
- commit 60cbee3: Log thread being created (during testing)
- commit 18146c4: Rejustify program logging messages
- commit 296fc4d: log program info with custom logger helper module (↑ DRY)
- commit ab18840: Log "PROGRAM COMPLETE" instead of "PROGRAM COMPLETED"
- log datetime for every event
- LOG ALL information to corresponding LOG FILE for channel
- general logging
- commit 2b6bb4f: Enable optional logging to user specified log_file
- commit b1e784a: Log test output to "{suffix}.log" (testing)
- commit 2b63e9c: Enable INFO level logging by default
- commit 82a0129: Simplify logging via custom context manager text writer (EXTREMELY detailed!)
- commit 3b6e3fc: Pass logging_output_location to txt_writer()
- commit 6513697: Log program start & end messages instead of printing to console
- commit add1f35: Log name of driver during testing
- commit 21e6bde: Add testing info to log files during tests
- commit 9a41424: Enable logging to multiple files during testing
- commit 8bb1008: Simplify testing info logging
- commit 9a41424: Enable logging to multiple files during testing
- commit 99d7be0: Add "*" 200x when test starts to clearly divide log file
- commit ca7f4c9: Log thread name when new thread created (during testing)
- commit 97ccc33: Log ">>>STARTING PROGRAM<<<"
- commit 2e20d8a: Log ">>>PROGRAM COMPLETED<<<"
- commit 8a3e4f2: Log write & file renaming successes separately
- commit dab5ecf: Move create_file.py & update_file.py decorator code → log_extraction_information()
- commit fb83118: Always log to log file but allow console logging muting
- commit 56bc309: Log "video" if 1 new video found, otherwise log "videos"
interesting logging (python standard library package) bug and workaround:
- commit 82a0129: Simplify logging via custom context manager text writer (also mentioned above, EXTREMELY detailed!)
multi-threading bug (very detailed explanations) and workaround (just avoid using global variables):
- only occurs when
- scraping the same channel on 2 threads with reverse_chronological set to
True
on one thread andFalse
on the other thread - and starting both threads WITHIN a few tenths of a second of each other
- WITH pre-existing files for both reverse_chronological file and chronological files but DIFFERENT number of videos in the files for reverse_chronological and the chronological files
- scraping the same channel on 2 threads with reverse_chronological set to
- commit 97d928f: Test pre-existing csv, txt, md files first
- commit 7bf88c1: Modify partial chronological files (catch bug more frequently)
- commit e787c3a: Return visited videos sets instead of creating global variables
other multi-threading bugs/challenges/changes:
- commit 3b78b0a: Delete only relevant files before testing
- commit e2e1ae9: Avoid starting new thread after last test case
- commit 7be64c3: Explicitly check which thread ends first
- commit 5aafa78: Simplify threading logic for tests
- commit 930b59c: Avoid multi-threading for safaridriver
- commit 4da6186: Remove debugging print statements ("previous commit" refers to the commit above)
- commit fcd744d: Verify variable exists before printing message
- commit 9c2529d: Ensure threads finish before proceeding
- commit 27cc6a9: Make thread checks more robust
removes deprecated create_list_for() arguments:
- commit 6bbac49: Remove deprecated create_list_for() arguments
**creates custom threading.Thread subclass to store result of thread during testing**:
points future drivers to newest available driver:
creates json file with all download commands:
- commit e0569f2: Create json file for download commands
- previously the project only provided pseudo json in the yt_videos_list/docs/dependencies_pseudo_json.txt file
fixes inability to update package due to testing module dependency on package submodule:
- started with
- addressed with
- commit 9550ca5: Update local package without yt_videos_list submodule function
- commit 878fb67: Remove duplicate import (test_cross_platform_drivers.py) (since function now imported from tests.determine module)
- commit 0204dd2: Run pip install directly from test script
- commit 829a1ae: Update local package if python test module called directly
Benchmarking
# without yt_videos_list submodule function
for i in {1..10}; do (time (for i in {1..100}; do python3 minifier.py; done)); done
real 0m8.261s
user 0m5.433s
sys 0m2.259s
real 0m8.288s
user 0m5.429s
sys 0m2.247s
real 0m8.022s
user 0m5.272s
sys 0m2.164s
real 0m7.989s
user 0m5.266s
sys 0m2.165s
real 0m7.984s
user 0m5.253s
sys 0m2.163s
real 0m8.009s
user 0m5.268s
sys 0m2.164s
real 0m8.047s
user 0m5.269s
sys 0m2.175s
real 0m8.068s
user 0m5.242s
sys 0m2.182s
real 0m8.030s
user 0m5.289s
sys 0m2.164s
real 0m8.046s
user 0m5.284s
sys 0m2.176s
# with yt_videos_list submodule function
for i in {1..10}; do (time (for i in {1..100}; do python3 minifier.py; done)); done
real 1m28.987s
user 0m42.470s
sys 0m41.508s
real 1m28.921s
user 0m42.508s
sys 0m41.411s
real 1m28.753s
user 0m42.436s
sys 0m41.378s
real 1m29.467s
user 0m42.700s
sys 0m41.732s
real 1m28.672s
user 0m42.286s
sys 0m41.406s
real 1m28.415s
user 0m42.297s
sys 0m41.202s
real 1m28.629s
user 0m42.360s
sys 0m41.244s
real 1m29.088s
user 0m42.587s
sys 0m41.527s
real 1m29.392s
user 0m42.644s
sys 0m41.637s
real 1m29.345s
user 0m42.657s
sys 0m41.643s
# without yt_videos_list submodule function again
for i in {1..10}; do (time (for i in {1..100}; do python3 minifier.py; done)); done
real 0m8.488s
user 0m5.585s
sys 0m2.308s
real 0m8.293s
user 0m5.497s
sys 0m2.251s
real 0m8.115s
user 0m5.396s
sys 0m2.188s
real 0m8.116s
user 0m5.396s
sys 0m2.179s
real 0m8.145s
user 0m5.395s
sys 0m2.198s
real 0m8.066s
user 0m5.367s
sys 0m2.170s
real 0m8.042s
user 0m5.340s
sys 0m2.162s
real 0m8.029s
user 0m5.329s
sys 0m2.159s
real 0m8.170s
user 0m5.420s
sys 0m2.195s
real 0m8.154s
user 0m5.426s
sys 0m2.190s
other interesting bugs:
- commit 1cdd8f5: Revert "Make command consistent with other unix commands"
- addressed in
not a bug, but best practice:
- commit 30e9701: Make global varaibles local