Skip to content

Commit

Permalink
0.5.0: Add bugfixes, improve logging + documentation, support threading
Browse files Browse the repository at this point in the history
- addresses issues:
  - 'gbk' codec can't decode byte 0x80
    - #3
    - PR #5
      - commit 1fcf32a: Explicitly use utf-8 for encoding and decoding all files
      - commit 4c4cc5a: Specify name,operation,encoding params for all file IO
  - Can lc.create_list_for() return the csv file name?
    - #4
    - PR #6
      - commit a30be66: Return file name after lc.create_list_for() finishes
  - Temporal filename "yt_videos_list_temp.txt"
    - #7
    - #8
      - commit 5d7bfad: Name temp files using channel (for file creation)
      - commit c781a2f: Name temp files using channel (for file updates)
      - commit 3772d57: Indicate type of temp file being written
    - append time to end of temp file name
      - initially appended UNIX time
        - commit d0f76e2: Append UNIX timestamp to temporary file name
        - commit 110912d: Replace the dot in timestamp with a dash (should have been included in commit above)
      - then changed appended time from UNIX time to ISO 8601 datetime format to increase readability
        - commit a349c5b: Append ISO 8601 datetime instead of UNIX time to temp file name

- inserts "_reverse_chronological" or "_chronological" to file name:
  - commit 8cf2e15: Append (reverse_)?chronological to file name
  - commit 3285d93: Update location for testing file paths
  - commit dc38e2d: Modify output file naming
  - commit 90212e8: Simplify file suffix creation

- significantly improves logging:
  - vertically align similar messages to facilitate quick comparisons between related messages
    - commit c877927: Vertically align logging output
    - commit e7458d0: Make logging messages more visible
    - commit 3487888: Right pad all testing log messages with ">"
    - commit 60cbee3: Log thread being created (during testing)
    - commit 18146c4: Rejustify program logging messages
    - commit 296fc4d: log program info with custom logger helper module (↑ DRY)
    - commit ab18840: Log "PROGRAM COMPLETE" instead of "PROGRAM COMPLETED"
  - log datetime for every event
    - commit b00b088: Print datetime during testing
    - commit e6101b8: Log datetime while running program
  - **LOG ALL information to corresponding LOG FILE for channel**
    - log file naming
      - commit 257e9e9: Name log file using ISO 8601 datetime
      - commit f53e6af: Name log file using output file name
  - general logging
    - commit 2b6bb4f: Enable optional logging to user specified log_file
    - commit b1e784a: Log test output to "{suffix}.log" (testing)
    - commit 2b63e9c: Enable INFO level logging by default
    - commit 82a0129: Simplify logging via custom context manager text writer (EXTREMELY detailed!)
      - commit 3b6e3fc: Pass logging_output_location to txt_writer()
    - commit 6513697: Log program start & end messages instead of printing to console
    - commit add1f35: Log name of driver during testing
    - commit 21e6bde: Add testing info to log files during tests
    - commit 9a41424: Enable logging to multiple files during testing
    - commit 8bb1008: Simplify testing info logging
    - commit 9a41424: Enable logging to multiple files during testing
    - commit 99d7be0: Add "*" 200x when test starts to clearly divide log file
    - commit ca7f4c9: Log thread name when new thread created (during testing)
    - commit 97ccc33: Log ">>>STARTING PROGRAM<<<"
    - commit 2e20d8a: Log ">>>PROGRAM COMPLETED<<<"
    - commit 8a3e4f2: Log write & file renaming successes separately
    - commit dab5ecf: Move create_file.py & update_file.py decorator code → log_extraction_information()
    - commit fb83118: Always log to log file but allow console logging muting
    - commit 56bc309: Log "video" if 1 new video found, otherwise log "videos"

- interesting logging (python standard library package) bug and workaround:
  - commit 82a0129: Simplify logging via custom context manager text writer (also mentioned above, EXTREMELY detailed!)

- multi-threading bug (very detailed explanations) and workaround (just avoid using global variables):
  - only occurs when
    - scraping the same channel on 2 threads with reverse_chronological set to `True` on one thread and `False` on the other thread
    - and starting both threads WITHIN a few tenths of a second of each other
    - WITH pre-existing files for both reverse_chronological file and chronological files but DIFFERENT number of videos in the files for reverse_chronological and the chronological files
  - commit 97d928f: Test pre-existing csv, txt, md files first
  - commit 7bf88c1: Modify partial chronological files (catch bug more frequently)
  - commit e787c3a: Return visited videos sets instead of creating global variables

- other multi-threading bugs/challenges/changes:
  - commit 3b78b0a: Delete only relevant files before testing
  - commit e2e1ae9: Avoid starting new thread after last test case
  - commit 7be64c3: Explicitly check which thread ends first
  - commit 5aafa78: Simplify threading logic for tests
  - commit 930b59c: Avoid multi-threading for safaridriver
  - commit 4da6186: Remove debugging print statements ("previous commit" refers to the commit above)
  - commit fcd744d: Verify variable exists before printing message
  - commit 9c2529d: Ensure threads finish before proceeding
  - commit 27cc6a9: Make thread checks more robust

- removes deprecated create_list_for() arguments:
  - commit 6bbac49: Remove deprecated create_list_for() arguments

- **creates custom threading.Thread subclass to store result of thread during testing**:
  - commit 8fc6270: Add custom class to store thread result
  - commit f1d58f6: Make ThreadWithResult attribute names more descriptive
  - commit b10480b: Add ThreadWithResult class docstring (test_shared.py)

- points future drivers to newest available driver:
  - commit fd8ad48: Point future drivers to newest available driver ("next commit" refers to commit below)
  - commit fd878f3: Indicate failed update may be due to new driver version

- creates json file with all download commands:
  - commit e0569f2: Create json file for download commands
    - previously the project only provided pseudo json in the yt_videos_list/docs/dependencies_pseudo_json.txt file

- fixes inability to update package due to testing module dependency on package submodule:
  - started with
    - commit 6fa0deb: Run "pip" on Windows and "pip3" on Unix
      - following commit 8c73de6: Make PATH_SLASH a global variable
  - addressed with
    - commit 9550ca5: Update local package without yt_videos_list submodule function
    - commit 878fb67: Remove duplicate import (test_cross_platform_drivers.py) (since function now imported from tests.determine module)
    - commit 0204dd2: Run pip install directly from test script
    - commit 829a1ae: Update local package if python test module called directly

- other interesting bugs:
  - commit 1cdd8f5: Revert "Make command consistent with other unix commands"
    - commit 6783c40: Make command consistent with other unix commands
      - following commit 76c066f: Move repeated commands into helper functions
  - addressed in
    - commit d762b00: Remove `rm /usr/local/bin/sha512_sum` command (bravedriver)
    - commit e32f69f: Remove sha512 removal command for Windows bravedriver too

- not a bug, but best practice:
  - commit 30e9701: Make global varaibles local
  • Loading branch information
shailshouryya committed Jan 5, 2021
1 parent 7bd34f1 commit a88ec79
Show file tree
Hide file tree
Showing 3 changed files with 5 additions and 5 deletions.
4 changes: 2 additions & 2 deletions python/dev/__init__.py
Expand Up @@ -3,7 +3,7 @@


'''
version: 0.4.7
version: 0.5.0
author: Shail-Shouryya
email: yt.videos.list@gmail.com
development_status: 4 - Beta
Expand All @@ -14,7 +14,7 @@
'''


__version__ = '0.4.7'
__version__ = '0.5.0'
__author__ = 'Shail-Shouryya'
__email__ = 'yt.videos.list@gmail.com'
__development_status__ = '4 - Beta'
Expand Down
2 changes: 1 addition & 1 deletion python/setup.py
Expand Up @@ -11,7 +11,7 @@

setup(
name = 'yt_videos_list',
version = '0.4.7',
version = '0.5.0',
description = 'Extract YouTube video titles and URLs with end-to-end web scraping API + automate Selenium webdriver dependency set up',
long_description = long_description,
long_description_content_type = 'text/markdown',
Expand Down
4 changes: 2 additions & 2 deletions python/yt_videos_list/__init__.py
Expand Up @@ -3,7 +3,7 @@


'''
version: 0.4.7
version: 0.5.0
author: Shail-Shouryya
email: yt.videos.list@gmail.com
development_status: 4 - Beta
Expand All @@ -14,7 +14,7 @@
'''


__version__ = '0.4.7'
__version__ = '0.5.0'
__author__ = 'Shail-Shouryya'
__email__ = 'yt.videos.list@gmail.com'
__development_status__ = '4 - Beta'
Expand Down

0 comments on commit a88ec79

Please sign in to comment.