Skip to content

Commit

Permalink
Merge pull request #5536 from benjaoming/bug/video_url
Browse files Browse the repository at this point in the history
Make video/thumbnail download more robust
  • Loading branch information
benjaoming committed Nov 13, 2017
2 parents 5b2bf2e + 36c66ae commit 713f908
Show file tree
Hide file tree
Showing 16 changed files with 359 additions and 224 deletions.
2 changes: 2 additions & 0 deletions docs/developer_docs/behave_testing.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. _bdd:

Behavior-Driven Integration Tests
=================================

Expand Down
29 changes: 23 additions & 6 deletions docs/developer_docs/environment.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
.. _development-environment:

Setting up your development environment
=======================================
Getting started
===============

.. warning:: These directions may be out of date! This page needs to be consolidated with the `Getting Started page on our wiki <https://github.com/learningequality/ka-lite/wiki/Getting-started>`_.

Expand Down Expand Up @@ -69,10 +69,10 @@ __________
You can install KA Lite in its very own separate environment that does not
interfere with other Python software on your machine like this::

$> pip install virtualenv virtualenvwrapper
$> mkvirtualenv my-kalite-env
$> workon my-kalite-env
$> pip install ka-lite
pip install virtualenv virtualenvwrapper
mkvirtualenv my-kalite-env
workon my-kalite-env
pip install ka-lite


Running tests
Expand All @@ -83,3 +83,20 @@ On Circle CI, we run Selenium 2.53.6 because it works in their environment. Howe
for more recent versions of Firefox, you need to upgrade Selenium::

pip install selenium\<3.5 --upgrade

To run all of the tests (this is slow)::

kalite manage test

To skip BDD tests (because they are slow)::

kalite manage test --no-bdd

To run a specific test (not a BDD test), add an argument ``<app>.<TestClass>``::

kalite manage test updates.TestDownload --no-bdd

To run a specific item from :ref:`bdd`, use ``<app>.<feature_module_name>``::

kalite manage test distributed.content_rating --bdd-only

2 changes: 1 addition & 1 deletion docs/developer_docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Developer Docs
Useful stuff our devs think that the rest of our devs ought to know about.

.. toctree::
Setting up your development environment <environment>
Getting started <environment>
Front-End Code <front_end_code>
Javascript Unit Tests <javascript_testing>
Behavior-Driven Integration Tests <behave_testing>
Expand Down
6 changes: 6 additions & 0 deletions docs/installguide/release_notes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,13 +15,19 @@ to read the release notes.
Bug fixes
^^^^^^^^^

* Video download retry upon connection timeouts/errors :url-issue:`5528`
* Simplified login is now working when there are 1,000 or more users registered in a facility. :url-issue:`5523`

Developers
^^^^^^^^^^

* Do not use `npm clean`, now requires npm>=5 for building on unclean systems :url-issue:`5519`

Contents
^^^^^^^^

* Resized video torrent set for English rebuilt with missing videos


0.17.3
------
Expand Down
11 changes: 10 additions & 1 deletion docs/usermanual/userman_admin.rst
Original file line number Diff line number Diff line change
Expand Up @@ -674,7 +674,7 @@ _________________
In order to keep local data in the ``UserLog`` model, detailing usage, you can choose the number of ``UserLog`` objects that you wish to retain. These objects are not sync'ed.


Online Synchronization
Online synchronization
______________________

* ``USER_LOG_SUMMARY_FREQUENCY = <desired frequency (number, amount of time)>``
Expand All @@ -688,6 +688,15 @@ ______________________
When you log in to our online server, you will see a *full* history of these records.


Other settings
______________

* ``KALITE_DOWNLOAD_RETRIES = <integer>``
``(default = 5)``
If you are trying to download videos with a very unstable connection, this
setting will increase the number of retries for individual video downloads.
The grace period between each attempt automatically increases.

Environment variables
_____________________

Expand Down
48 changes: 25 additions & 23 deletions kalite/packages/bundled/fle_utils/internet/download.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,15 +3,12 @@
import socket
import sys
import tempfile
from requests.utils import default_user_agent

socket.setdefaulttimeout(20)

from requests.packages.urllib3.util.retry import Retry

class DownloadCancelled(Exception):

def __str__(self):
return "Download has been cancelled"
socket.setdefaulttimeout(20)


class URLNotFound(Exception):
Expand All @@ -28,36 +25,33 @@ def callback_percent_proxy_inner_fn(fraction):
return callback_percent_proxy_inner_fn


def _reporthook(numblocks, blocksize, filesize, url=None):
base = os.path.basename(url)
if filesize <= 0:
filesize = blocksize
try:
percent = min((numblocks * blocksize * 100) / filesize, 100)
except:
percent = 100
if numblocks != 0:
sys.stdout.write("\b" * 40)
sys.stdout.write("%-36s%3d%%" % (base, percent))
if percent == 100:
sys.stdout.write("\n")


def _nullhook(*args, **kwargs):
pass


def download_file(url, dst=None, callback=None):
def download_file(url, dst=None, callback=None, max_retries=5):
if sys.stdout.isatty():
callback = callback or _reporthook
callback = callback or _nullhook
else:
callback = callback or _nullhook
dst = dst or tempfile.mkstemp()[1]


from requests.adapters import HTTPAdapter

s = requests.Session()

retries = Retry(
total=max_retries,
backoff_factor=0.1,
)

s.mount('http://', HTTPAdapter(max_retries=retries))

# Assuming the KA Lite version is included in user agent because of an
# intention to create stats on learningequality.org
from kalite.version import user_agent
response = requests.get(
response = s.get(
url,
allow_redirects=True,
stream=True,
Expand All @@ -78,4 +72,12 @@ def download_file(url, dst=None, callback=None):
total_size = float(response.headers['content-length'])
fraction = min(float(bytes_fetched) / total_size, 1.0)
callback(fraction)
# Verify file existence
if (
not os.path.isfile(dst) or
"content-length" not in response.headers or
not os.path.getsize(dst) == int(response.headers['content-length'])
):
raise URLNotFound("URL was not found, tried: {}".format(url))

return response
77 changes: 2 additions & 75 deletions kalite/packages/bundled/fle_utils/videos.py
Original file line number Diff line number Diff line change
@@ -1,80 +1,7 @@
"""
Legacy module, do not use
"""
import glob
import logging
import os
import socket

from general import ensure_dir
from internet.download import callback_percent_proxy, download_file, URLNotFound, DownloadCancelled

# This is used in the Central Server for redirects.
OUTSIDE_DOWNLOAD_BASE_URL = "http://s3.amazonaws.com/KA-youtube-converted/" # needed for redirects
OUTSIDE_DOWNLOAD_URL = OUTSIDE_DOWNLOAD_BASE_URL + "%s/%s" # needed for default behavior, below

logger = logging.getLogger(__name__)


def get_outside_video_urls(youtube_id, download_url=OUTSIDE_DOWNLOAD_URL, format="mp4"):

video_filename = "%(id)s.%(format)s" % {"id": youtube_id, "format": format}
url = download_url % (video_filename, video_filename)

thumb_filename = "%(id)s.png" % {"id": youtube_id}
thumb_url = download_url % (video_filename, thumb_filename)

return (url, thumb_url)


def download_video(youtube_id, download_path="../content/", download_url=OUTSIDE_DOWNLOAD_URL, format="mp4", callback=None):
"""Downloads the video file to disk (note: this does NOT invalidate any of the cached html files in KA Lite)"""

ensure_dir(download_path)

url, thumb_url = get_outside_video_urls(youtube_id, download_url=download_url, format=format)
video_filename = "%(id)s.%(format)s" % {"id": youtube_id, "format": format}
filepath = os.path.join(download_path, video_filename)

thumb_filename = "%(id)s.png" % {"id": youtube_id}
thumb_filepath = os.path.join(download_path, thumb_filename)

try:
response = download_file(url, filepath, callback_percent_proxy(callback, end_percent=95))
if (
not os.path.isfile(filepath) or
"content-length" not in response.headers or
not os.path.getsize(filepath) == int(response.headers['content-length'])):
raise URLNotFound("Video was not found, tried: {}".format(url))

response = download_file(thumb_url, thumb_filepath, callback_percent_proxy(callback, start_percent=95, end_percent=100))
if (
not os.path.isfile(thumb_filepath) or
"content-length" not in response.headers or
not os.path.getsize(thumb_filepath) == int(response.headers['content-length'])):
raise URLNotFound("Thumbnail was not found, tried: {}".format(thumb_url))

except DownloadCancelled:
delete_downloaded_files(youtube_id, download_path)
raise

except (socket.timeout, IOError) as e:
logging.exception(e)
logging.info("Timeout -- Network UnReachable")
delete_downloaded_files(youtube_id, download_path)
raise

except Exception as e:
logging.exception(e)
delete_downloaded_files(youtube_id, download_path)
raise


def delete_downloaded_files(youtube_id, download_path):
files_deleted = 0
for filepath in glob.glob(os.path.join(download_path, youtube_id + ".*")):
try:
os.remove(filepath)
files_deleted += 1
except OSError:
pass
if files_deleted:
return True

0 comments on commit 713f908

Please sign in to comment.