-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First party YouTube API client - with video info #1082
Conversation
This involves parsing the whole video JSON and consulting parsed data rather than relying so heavily on checking for the presence of strings in the text.
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | ||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | ||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE | ||
SOFTWARE. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know at what point we've transformed this out of being a derivative work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once we aren't scraping HTML we have to be pretty close...
associated with. | ||
""" | ||
escaped_id = quote_plus(video_id) | ||
return f"https://www.youtube.com/watch?v={escaped_id}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Robbed from the other service. Maybe the YT parsing thing should be here too?
short_description=data["shortDescription"], | ||
author=data["author"], | ||
thumbnails=data["thumbnail"]["thumbnails"], | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part is the only part which isn't strictly necessary for transcripts. I put it here to show we can grab this at the same time easily.
We could use this type of info to give a very rich interface in the file picker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI we already have code for fetching these kinds of video details (name, thumbnail) from the offical V3 API, the file picker in LMS already does this and presents a rich picker interface. But that's in LMS, it's separate from the code in Via that needs to get the transcripts themselves.
Via does also need to get the video title as well as the transcript but that's quite easily done from the v3 API:
Lines 58 to 69 in a04feaa
def get_video_title(self, video_id): | |
"""Call the YouTube API and return the title for the given video_id.""" | |
# https://developers.google.com/youtube/v3/docs/videos/list | |
return self._http_service.get( | |
"https://www.googleapis.com/youtube/v3/videos", | |
params={ | |
"id": video_id, | |
"key": self._api_key, | |
"part": "snippet", | |
"maxResults": "1", | |
}, | |
).json()["items"][0]["snippet"]["title"] |
translation_languages=[ | ||
{"code": language["languageCode"], "name": language["languageName"]} | ||
for language in data.get("translationLanguages", []) | ||
], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're not using it, but if:
- A transcript
is_translatable
- And you have languages here
- Then you can request a transcript be translated by appending
&tlang={language_code}
on the end
This list of languages does not belong to any specific caption track.
|
||
|
||
@dataclass | ||
class VideoDetails: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of the names of these things are chosen for minimum mapping from the YouTube JSON names. I followed the format very closely.
via/services/youtube_api/models.py
Outdated
title: str | ||
short_description: str | ||
author: str | ||
thumbnails: List[dict] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are other things in here which might be of interest to us. For example:
isPrivate
- Might this be an indicator a user has messed up? We might want to catch thatisLiveContent
- Doesn't seem annotatable?
This is no longer needed: it's been replaced by a series of non-WIP PRs |
For:
This is a modification of the approach taken by https://pypi.org/project/youtube-transcript-api/
The basic structure of that approach is still here:
How this approach differs:
vssId
for a transcript insteadSide effects:
This is a demo! We don't want to use it like this!
At the time of writing this is integrated into the view which loads up a YouTube video, which means we are doing the look up twice:
As it's currently written, even with caching for transcripts we'd continue to hit YouTube all the time.
We don't want this. I'm not proposing this. This is just a demo showing we can get all of the info.
As / when we get an official API we could replace the first lookup however: