Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

paginate through latest talks? #25

Closed
AlJohri opened this issue Oct 23, 2016 · 19 comments
Closed

paginate through latest talks? #25

AlJohri opened this issue Oct 23, 2016 · 19 comments

Comments

@AlJohri
Copy link

AlJohri commented Oct 23, 2016

i've been using pyvideo quite a bit over the last few days and sometimes I'm just looking to watch something new that catches my eye as opposed to going to a particular conference or tag or speaker. would love to be be able to just paginate through some of the most recent videos

@logston
Copy link
Member

logston commented Oct 23, 2016

I agree. This really needs to be considered again. I stopped work on it a while back. I think the best way to do this would be to add an arbitrary id field to each video. This ID field would be unique for each video (some videos already have this ID). This ID would increment as new videos are added and would create a natural sorting field for "newest" videos. This field has been requested by others for other uses so we wouldn't be adding data to the pyvideo/data repo for use solely in pyvideo/pyvideo. @zerok @Daniel-at-github @codersquid @lgh2 Thoughts?

@Daniel-at-github
Copy link
Contributor

@lgh2
Copy link
Collaborator

lgh2 commented Oct 25, 2016

Thank you for your suggestion. Did you look at the "Latest Events" heading
on the main page? Or did you want to see the talks organized in a different
way?

On Sun, Oct 23, 2016 at 11:36 AM, Al Johri notifications@github.com wrote:

i've been using pyvideo quite a bit over the last few days and sometimes
I'm just looking to watch something new that catches my eye as opposed to
going to a particular conference or tag or speaker. would love to be be
able to just paginate through some of the most recent videos


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#25, or mute the thread
https://github.com/notifications/unsubscribe-auth/AQoSQu_7DscthaM0sBy6qc5Uz8_kldo6ks5q238XgaJpZM4KeKbv
.

@lgh2
Copy link
Collaborator

lgh2 commented Oct 25, 2016

If it's not too hard to add an ID number field to each video then it sounds
like a good idea.

On Sun, Oct 23, 2016 at 3:36 PM, Paul notifications@github.com wrote:

I agree. This really needs to be considered again. I stopped work on it a
while back. I think the best way to do this would be to add an arbitrary id
field to each video. This ID field would be unique for each video (some
videos already have this ID). This ID would increment as new videos are
added and would create a natural sorting field for "newest" videos. This
field has been requested by others for other uses so we wouldn't be adding
data to the pyvideo/data repo for use solely in pyvideo/pyvideo. @zerok
https://github.com/zerok @Daniel-at-github
https://github.com/Daniel-at-github @codersquid
https://github.com/codersquid @lgh2 https://github.com/lgh2 Thoughts?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#25 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AQoSQoa6tWqyuNEmEiuoZOtB66qsxhR3ks5q27cogaJpZM4KeKbv
.

@zerok
Copy link
Collaborator

zerok commented Oct 25, 2016

Sorry for the late response :) Personally, I'd prefer if we could somehow get all the dates and times of each talk and event. Having a global number in each category.json and video file sounds slightly off-topic for me inside the data repository and ideally unnecessary if we can get the necessary datetime :)

@AlJohri
Copy link
Author

AlJohri commented Oct 25, 2016

Can we just grab it from YouTube (when it's a ur video) for the purposes of the database or does it need to be in the exact time on the events schedule?

@AlJohri
Copy link
Author

AlJohri commented Oct 25, 2016

YT*

@zerok
Copy link
Collaborator

zerok commented Oct 25, 2016

The Youtube dates are probably not sufficient as some conferences upload their content sometimes weeks after the actual event. Youtube is also not the only source that is supported (albeit a major one) so we probably won't get around some manual work :)

@Daniel-at-github
Copy link
Contributor

@lgh2 events page is good. But a RSS of the events makes pyvideo more useful to the end user, it will know when a new event is added.
And nowadays events are ordered by date, it will make more difficult to notice when issues from events from years ago get added.

@zerok only date could be confusing if two events overlap in time. I think event + date is better. What you think?

@willkg
Copy link
Member

willkg commented Oct 26, 2016

With the "old pyvideo", we had two fields. One was the date the talk was recorded. This was sometimes wrong when we added the videos to the corpus initially, but would get fixed later on. The other was the date the talk was added to the corpus.

The "latest videos" list and feed on "old pyvideo" was sorted by the date the talk was added to the corpus figuring that these were "new to pyvideo".

"old pyvideo" was a Django app, so it wasn't hard to maintain a "date the talk was added" field. It's harder with a git-based corpus, though we could organize things by the date of the git commit the file was added. Pretty sure that's not hard to compute generally speaking. I don't know if that's hard to do with the static site generator being used.

Would that sort of thing help here?

@Daniel-at-github
Copy link
Contributor

We could compute the event dates from git and save it outside the files, in this way pyvideo-only data don’t mix with conference data.

Outside the files can be:

  • A new json file:
    • metadata/events.json ?
    • metadata/pyvideo/events.json ?
  • A tinydb json file:
    • pyvideo.tiniydb.json?

Tinydb is more versatile (and shiny ✨) and its less than half an hour to learn it (I think).

@zerok
Copy link
Collaborator

zerok commented Oct 27, 2016

The focus would be hear an "added" timestamp field, right? Personally I don't mind how it is cached either way 🙂

On 26 Oct 2016, at 20:34, Daniel-at-github notifications@github.com wrote:

We could compute the event dates from git and save it outside the files, in this way pyvideo-only data don’t mix with conference data.

Outside the files can be:

A new json file:
metadata/events.json ?
metadata/pyvideo/events.json ?
A tinydb json file:
pyvideo.tiniydb.json?
Tinydb is more versatile (and shiny ✨) and its less than half an hour to learn it (I think).


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

@codersquid
Copy link
Collaborator

I vote for whatever is simplest, and adding something to the video json doesn't seem bad to me despite mixing up pyvideo data and conference data. then whatever tool someone uses to scrape data could merely have a timestamp to get a 'good enough' estimate for recent things added to pyvideo even if it isn't technically accurate with respect to merge-time.

@logston
Copy link
Member

logston commented Nov 2, 2016

Thanks everybody! Lots of great feedback here. Keeping in-line with our goal of keeping things simple, here's what I think our course should be:

I think sorting these videos by their "added to the data repo" date based on commit SHA is probably the easiest way to go. It sounds a bit more complicated but I think it will cut out tons of manual work in the long run. Plus, computers are good at doing this mindless sorting :)

Next, how to store that sorting. This sort is only needed in the pyvideo/pyvideo repo. Thus, I think it would be best to place it in a new JSON/CSV/etc file that lived in the pyvideo repo. I think a new metadata directory that @Daniel-at-github suggested sound perfect.

If anyone wants to work on this, feel free to assign yourself. Otherwise, I will start work on it after I finish writing my PyCon Canada talks :)

@zerok
Copy link
Collaborator

zerok commented Nov 2, 2016

@logston +1 on adding that kind of information into a metadata folder within the pyvideo website project 😄

@Daniel-at-github Would you mind if I still also extend the recorded field within the video.json to support a time component and the category.json with something like start and end fields?

@Daniel-at-github
Copy link
Contributor

@zerok I'm a bit lost with the time component in the video.json. Can you give an example or elaborate? (maybe I'm sleepy or lost in translation).

Start and end fields in category.json seems great to me. It's more work, but not too much. Adding related_urls (webpage, schedule are usual; slides, repository are less usual, ...) or tags in category.json it's something to think too.

start_date it's allready created, I have seen it somewhere:

tools/video_statistics.py .

Category statistics

Field Class Empty Count
alias str have data 67
description str have data 83
description str is empty 15
start_date NoneType have data 2
start_date str have data 10
title str have data 112
url str have data 12
url str is empty 2

Video statistics

Field Class Empty Count
alias str have data 3457
category str have data 3776
copyright_text str have data 4365
copyright_text str is empty 957
description NoneType have data 73
description str have data 3517
description str is empty 1939
duration NoneType have data 2818
duration int have data 2546
id int have data 4113
language str have data 5353
quality_notes str have data 33
quality_notes str is empty 3753
recorded NoneType have data 352
recorded str have data 5177
related_urls list have data 1469
related_urls list is empty 694
slug str have data 4148
speakers list have data 5081
speakers list is empty 448
summary str have data 2078
summary str is empty 1698
tags list have data 2057
tags list is empty 3297
thumbnail_url NoneType have data 99
thumbnail_url str have data 5394
thumbnail_url str is empty 9
title str have data 5529
videos list have data 5529

@zerok
Copy link
Collaborator

zerok commented Nov 3, 2016

@Daniel-at-github I mean moving "recorded" from being date-only to a datetime field so that we can do a more natural ordering on the event page itself, for instance.

"recorded": "2016-11-03"

vs.

"recorded": "2016-11-03T05:22:38+00:00"

using ISO8601 but trying to keep everything in UTC.

@Daniel-at-github
Copy link
Contributor

@zerok in youtube scraped videos we often have to settle for a good enough date and take the upload date, and time there will be T00:00:00+00:00.
Even so, I think that datetime is a good idea and a step towards having better data.

@Daniel-at-github
Copy link
Contributor

Metadata file generated with:

for EVENT_FILE in */category.json
do
  echo -n "$EVENT_FILE "
  git log --diff-filter=A --follow --format=%ai -1 -- "$EVENT_FILE"
done \
  | sort -k2,2r -k1,1r \
  | sed -e 's?/category.json??; s/ .*$//; s/^/- /' > metadata.yml
vim metadata.yml  # Add a parent node and comments.

Looks like:

# Pyvideo metadata file

# List of pyvideo data events, ordered from recent to older
#   Useful for:
#     Events RSS
#     Event pagination
event:
  - pycon-se-2016
  - pydata-dc-2016
  - pydata-chicago-2016
  - pycon-de-2016
  - pygotham-2016
  - data-school
  - ndc-oslo-2016
  - pybay-2016
  - pycon-jp-2016
  - pycon-italia-2016
  - pycon-apac-2016
  - pycon-uk-2016
  - pycon-israel-2016
  - kiwi-pycon-2016
  - pydata-san-francisco-2016
  - pydata-berlin-2016
  - pydata-amsterdam-2016
  - writethedocs-na-2016
  - swiss-python-summit-2016
  - pydx-2015
  - pyday-galicia-2016
...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants