Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

machine-readable id for splits selected in the tool #1

Closed
softloud opened this issue Jun 10, 2023 · 4 comments
Closed

machine-readable id for splits selected in the tool #1

softloud opened this issue Jun 10, 2023 · 4 comments

Comments

@softloud
Copy link

Thanks for providing this python wrapper for splits.io 👍 I'm really enjoying working with it, and am learning a lot of python in doing so.

I'm investigating a friend's speed running stats for Super Metroid, and giving some open science talks on it. Apologies if I both garble the speed runner speak, and my python skills are new; I'm extracting dataframes using this package with Python to port into R for visualisation and dashboarding.

I have the player name for the split, such as "ice beam" but this point is selected, if I understand correctly, from a tick box in the program. How do I get the ids of the tick boxes, that is, the common id across players for split chosen? Is this doable? I'd like to do some by route analysis.

image

@jeremander
Copy link
Owner

Hi @softloud! Glad to hear you're getting some use out of the Python API.

Suppose you've retrieved a particular run of interest, e.g.

>>> run = Run.from_id('arzj', historic=True)

Each segment (AKA split) has both an id and a human-created name:

>>> run.segments[0].id
'deb041e7-433b-4254-9d61-5fb4759c1d4e'
>>> run.segments[0].name
'Ceres Escape'

If you've set historic=True, there's an easy way to get a DataFrame containing all of the segment durations, via:

>>> run.segment_durations()

which can then be exported to do whatever statistical analysis you'd like (sorry this has not been documented at all; I should add that to the README, at least). There's also some rudimentary plotting/stats stuff in this package, but it's just experimental.

Now, if your question is about whether there are "common IDs across players" for a given split, I'm pretty sure the answer is no, because each player defines their own splits and uploads them to the site independently, meaning they'd have different IDs than someone else's splits even if they have the same name. So I don't think there's an easy way to compare split times across players, other than trying to identify similar split names.

@softloud
Copy link
Author

Oooh that's great! Thank you for getting back to me so quickly, feeling the open source love, I'm in deep with prepping for these talks. That's a bummer there's no way to identify the splits across players; but is a delightful opportunity to explore classification algorithms. Nice demonstration of how ML pops up in the everyday life of a data scientist for PyData talk. This work in progress is a beautiful mess.

I've extracted start, end, duration, and shortest duration. Basically every data point I can use in vis and analysis for one category for one game. I was this project old when I learn about list comprehension and nested dictionaries in python and now I want to randomly grab people in the street and shake them by the shoulders shouting, "You need to hear about list comprehension on nested dictionaries now!".

image

I have another question, how do I differentiate a historical run from a leaderboard run, and how do I extract their current rank (I can always do this with row labelling, but would rather extract it if it's there 'cause I'll worry about a bug my row labelling).

@jeremander
Copy link
Owner

@softloud Looks like a neat project you're working on... good luck with your presentation!

Unfortunately, I don't think splits.io tracks rankings, so that information may have to be cross-referenced with other sources like speedrun.com.

The way I was distinguishing between completed and aborted attempts is to look at which attempts had their final segment completed. These attempts can be accessed on a run via

>>> run.completed_attempts

The fact that attempts were stopped at different points can sometimes make things tricky to analyze, since you'll usually have more completions of earlier splits than later splits.

@softloud
Copy link
Author

Such useful information, thank you so much. I'd feel like such an idiot if I'm standing up in front of .... 😮 68 😮 people next week saying, well you just can't get these things, and it was obvious and right there in the api, so really appreciate what is and is not easy to figure out. So helpful (I will be singing your praises to a crowd in Copenhagen).

I've been waiting to close this so I can show you a slightly less rickety analysis site. Gallery is only... uh. somewhat broken. And colour palettes are shall we say experimental. Oh yeah, and I'm refactoring the splitsio vignette atm, so there's a lot of errors...

Anyways, thank you so much! Don't hesitate to open an issue if you find some ideas after perusing the site. It's open science, join in if you like :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants